Introduction -
"Web scraping is a technique of extracting useful information from any websites. or we can say Getting the HTML source code from the website. Reading the Dom, Making sense of the HTML content, Extracting the useful information which we are interested in, and extracting it. Moving the discovered information to the storage of your choice (.txt file, database(MySQL, NoSQL), etc.".
Why Web-scraping -
- Web scraping is fast and Reliable
- With a single crawler function, we can capture the complete data of any website.
- Web scraping replaces the copy-paste method.
- Web scraping is completely Automated.
- it read the complete Dom Structure and can easily capture the required information.
Step of Web scraping using node js and Cheerio Libary
1-Install Node js on your system.
2- Install Cheerio Libary for data scraping.
3 - Install MongoDB for storing data
4- Install Request and other dependencies According to your requirement.
Once all the dependencies are installed you can start scraping the Website please follow the steps below -
1. Pass the URL with required parameters on request and check the response of given URL
e
2) Pass the HTML response in the Cheerio library.
above code will read the complete Dom of our HTML response and we can easily extract the useful information from it.
3. Read the required information
after getting complete data we can store it on the database or in the CSV file. here my complete output of the crawler and I am Storing it on MongoDB database and also generating a CSV file for complete information.
Conclusion:-
With the help fo the cheerio library, we can easily extract useful information from any website and stored this information on our database /.text.XML or JSON file.
Thanks