Skip to main content

rrr


In this post, we’ll learn to scrape web pages using browser automation with JavaScript. We’ll be using puppeteer for this. Puppeteer is a Node library API that allows us to control headless Chrome. Headless Chrome is a way to run the Chrome Browser without actually running Chrome.

How to proceed

Libraries & Tools

What we are going to scrape

Setup

mkdir scraper
cd scraper
npm i puppeteer — save

Preparing the Food

const puppeteer = require(‘puppeteer’);let scrape = async () => {   // Actual Scraping goes Here… 
  // Return a value
}; scrape().then((value) => {   console.log(value);  // Success!});
let scrape = async () => {
  return 'test';
};
let scrape = async () => {
  const browser = await puppeteer.launch({headless: false});
  const page = await browser.newPage();
  await page.goto('http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html');
  await page.waitFor(1000);  // Scrape 
  browser.close();
  return result;};
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
await page.goto('http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html');
await page.waitFor(1000);
browser.close();
return result;


const result = await page.evaluate(() => {
  // return something
});


let title = document.querySelector('h1');
let title = document.querySelector('h1').innerText;


let price = document.querySelector('.price_color').innerText;
return {
  title,
  price
}
const result = await page.evaluate(() => {
  let title = document.querySelector('h1').innerText;
  let price = document.querySelector('.price_color').innerText;return {
  title,
  price
}});
return result;
const puppeteer = require(‘puppeteer’);
let scrape = async () => {
 const browser = await puppeteer.launch({headless: false});
 const page = await browser.newPage(); await page.goto(‘http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html');
 await page.waitFor(1000); const result = await page.evaluate(() => {
  let title = document.querySelector(‘h1’).innerText;
  let price = document.querySelector(‘.price_color’).innerText; return {title,price}});browser.close();
 return result;
};scrape().then((value) => {
 console.log(value); // Success!
});
node scrape.js// { title: 'A Light in the Attic', price: '£51.77' }

Making it Perfect

const result = await page.evaluate(() => {
  let data = []; // Create an empty array
  let elements = document.querySelectorAll('xxx'); // Select all   // Loop through each proudct
    // Select the title
    // Select the price
    data.push({title, price}); // Push the data to our array  return data; // Return our data array});



Conclusion

Additional Resources

Comments

Popular posts from this blog

30 Amazing Machine Learning Projects for the Past Year (v.2018)

For the past year, we’ve compared nearly 8,800 open source Machine Learning projects to pick Top 30 (0.3% chance). This is an extremely competitive list and it carefully picks the best open source Machine Learning libraries, datasets and apps published between January and December 2017. Mybridge AI evaluates the quality by considering popularity, engagement and recency. To give you an idea about the quality, the average number of Github stars is 3,558.
Do visit our Hotel price comparison api which compares more than 200 hotel websites to get you the best possible price of your dream hotel.
Python Projects of the Year (avg. 3,707 ⭐️): Here(0 duplicate)Learn Machine Learning from Top Articles for the Past Year: Here(0 duplicate) Open source projects can be useful for data scientists. You can learn by reading the source code and build something on top of the existing projects. Give a plenty of time to play around with Machine Learning projects you may have missed for the past year…

This Is Exactly How You Should Train Yourself To Be Smarter [Infographic]

Design inspired by the Cognitive Bias Codex
View the high resolution version of the infographic by clicking here. Out of all the interventions we can do to make smarter decisions in our life and career, mastering the most useful and universal mental models is arguably the most important. Over the last few months, I’ve written about how many of the most successful self-made billionaire entrepreneurs like Ray Dalio, Elon Musk, and Charlie Munger swear by mental models… “Developing the habit of mastering the multiple models which underlie reality is the best thing you can do. “ — Charlie Munger “Those who understand more of them and understand them well [principles / mental models] know how to interact with the world more effectively than those who know fewer of them or know them less well. “ — Ray Dalio “It is important to view knowledge as sort of a semantic tree — make sure you understand the fundamental principles, i.e. the trunk and big branches, before you get into the leav…

Cheat Sheets for AI, Neural Networks, Machine Learning, Deep Learning & Big Data

Over the past few months, I have been collecting AI cheat sheets. From time to time I share them with friends and colleagues and recently I have been getting asked a lot, so I decided to organize and share the entire collection. To make things more interesting and give context, I added descriptions and/or excerpts for each major topic. This is the most complete list and the Big-O is at the very end, enjoy… If you like this list, you can let me know here Neural Networks

Neural Networks Cheat Sheet Neural Networks Graphs

Neural Networks Graphs Cheat Sheet



Neural Network Cheat Sheet Ultimate Guide to Leveraging NLP & Machine Learning for your Chatbot
Code Snippets and Github Includedchatbotslife.com
Machine Learning Overview

Machine Learning Cheat Sheet
Machine Learning: Scikit-learn algorithm This machine learning cheat sheet will help you find the right estimator for the job which is the most difficult part. The flowchart will help you check the documentation and rough guide of …