Data Scraping

                                 What is Data Scraping?

Data scraping, also known as web scraping, is the process of importing information from a website into a spreadsheet or local file saved on your computer. Well the data can be both static and dynamic but it does not matter until the website which is specified is wrong.

Some domains of web scraping

      
Well web scraping can be obtained by programming as well as using already                     
available software tools but if we were to work without getting charged then    
programming is a way to go because we need to pay to get those tools.The 
programming language which is widely used for scraping is Python.

Let us look at the code below:
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  

from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq

myurl = 'https://www.flipkart.com/search?q=iphone&otracker=start&as-show=on&as=off'
uClient = uReq(myurl)
page_html = uClient.read()
uClient.close()
page_soup=soup(page_html,"html.parser")

containers=page_soup.findAll("div",{"class":"bhgxx2 col-12-12"})
print(len(containers))

filename="products.csv"f=open(filename,"w")
headers="Product_name,Pricing,Rating\n"f.write(headers)
for container in containers:
    product_name=container.findAll("div",{"class":"col col-7-12"})  #Finding given class in each container

    price_container = container.findAll("div",{"class":"col col-5-12 _2o7WAb"})


    rating_container=container.findAll("div",{"class":"niH0FQ"})
    print(rating_container[0].text)


    print("product_name:" + name)    print("price:" + price)    print("ratings:" + rating)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
The above code scrapes data from flipkart website aforementioned in the url.

Explanation:

uReq(an alias name taken for urlopen)- Used to download a webpage virtually.
soup()- Converts the webpage to set of HTML tags.
findAll()- Finds all the tags mentioned in the method.
Note: We need to know HTML tags before knowing how to  do Web Scraping.


Can we scrape data from any website?

No my friend...this is not possible with every website as some of the popular websites have defense intelligence robots(a piece of code) to check who are trying to visit and scrape data
from the website. To avoid this we have to change User Agents when we scrape the same website for more number of times or we scrape some high security websites.

Now...what is a User Agent? 
A User Agent is a "string" which is used to tell the server what kind of  a browser,what version and what device the request is coming from. So we can fool the server/robot
by mentioning another User Agent in our url.

To know more about User Agent  CLICK HERE!!!


Where is the data stored?

The scraped data can be stored in relational databases as well as normal files in forms like CSV,Excel spreadsheet etc.

To know more about storing data  CLICK HERE!!!


As for the people who are not interested in programming,fortunately there are tools available
which are listed below.
    
       

Best Data Scraping Tools Available

               (Just click on the below links to continue)

1. Scraper API                                              6. Diffbot

2. 80 legs                                                    7. Import.io

3. Octoparse                                               8. Dexi Intelligent

4. ParseHub                                                9. Webhose.io

5. Scrapy                                                    10. Spinn3r


Comments

Popular posts from this blog

How to adjust output size in Pycharm

Introduction to Data Visualization