Skip to main content

Posts

Showing posts from 2020

rrr

In this post, we’ll learn to scrape web pages using browser automation with JavaScript. We’ll be using puppeteer for this.  Puppeteer  is a Node library API that allows us to control headless Chrome.  Headless Chrome  is a way to run the Chrome Browser without actually running Chrome. How to proceed Generally, web scraping is divided into two parts: Fetching data by making an HTTP request Extracting important data by parsing the HTML DOM Libraries & Tools Puppeteer Nodejs What we are going to scrape We are going to Scrape Book price and title from this  website .  Which is a fake bookstore specifically set up to help people practice scraping. Setup Our setup is pretty simple. Just create a folder and install puppeteer. For creating a folder and installing libraries type below given commands. I am assuming that you have already installed Node.js. mkdir scraper cd scraper npm i puppeteer — save Now, create a file inside that folder by any name you li

Web Scraping Python Tutorial

In this post, we are going to learn web scraping with python. Using python we are going to scrape Yahoo Finance. This is a great source for stock-market data. We will code a scraper for that. Using that scraper you would be able to scrape stock data of any company from yahoo finance. As you know I like to make things pretty simple, for that, I will also be using a  web scraper  which will increase your scraping efficiency. Why this tool?   This tool will help us to scrape dynamic websites using millions of rotating proxies so that we don’t get blocked. It also provides a captcha clearing facility. It uses headerless chrome to scrape dynamic websites. Requirements Generally, web scraping is divided into two parts: Fetching data by making an HTTP request Extracting important data by parsing the HTML DOM Libraries & Tools Beautiful Soup   is a Python library for pulling data out of HTML and XML files. Requests  allow you to send HTTP requests very easily. web scrapin