Scrapping web data using Node.js

Hello guys welcome to the phphelp.net . Today in this tutorial i will be guiding you how to extract data from website using Node.js. This will be helpful for those who are learning data science. Node.js is open-source javascript framework.

prerequisites:-

  1. Node js – You can install in your pc easily by going to the official site
  2. cheerio – Basically it is jquery for Node
  3. axios- Promise based HTTP client for the browser and node.js

Let’s get started

At first we need to install cheerio and then request-promise

npm install cheerio axios

You need to copy and paste the following code into the terminal and you are done.After installing the cheerio and request-promise in your project folder it’s time to import these two by using the following code

const axios = require('axios');
const cheerio = require('cheerio');

After inserting the above two lines of code ‘cheerio’ is loaded into variable cheerio and ‘axios’ module is loaded into the variable ‘axios’.I assume that you have a basic understanding of the Node.JS and continue the tutorial 🙂

How cheerio and axios works?

First of you should pass the url to be scrapped into the axios argument and which give the html for that page, after that you should load the elements to the cheerio and perform query operation to get the elements or data we required.

At first step you should assign the url you want to scrap into the variable:

const url = 'https://www.premierleague.com/stats/top/players/goals';

We now pass the above url into the axios which give html as response.After getting html as response ,it is again passed to the cheerio to load the elements by below lines of codes:

axios(url)
    .then(function(response){
        const html = response.data;
        const $ = cheerio.load(html)

 As we finally loaded the html content using cheerio it’s fun time now.If you have a basic knowledge of the jquery you can use the cheerio easily to get the required data.As you can see above i have inserted the premier league site link in url variable in my code section.I will be importing the player names and goal scored by the player.First of all have a look in the picture below :

 You can see in the picture above that the data i want to import is in table ‘ statsTableContainer ‘.I will apply a simple code to extract these elements

const statsTable = $('.statsTableContainer');
       statsTable.each(function () {
            const rank = $(this).find('.rank').text();
            const playerName = $(this).find('.playerName ').text();
        }

 Above lines of the code will help you to get the require data.As in this tutorial i have used the axios for https request purpose ; you can also use request or request-promise .Only thing you need is learn the documentation and apply logics 🙂

 

 

Leave a Comment