Create an account


Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[Tut] Scrape a Bookstore in 5 Steps Python [Learn Project]

#1
Scrape a Bookstore in 5 Steps Python [Learn Project]

5/5 – (1 vote)

Story: This series of articles assume you work in the IT Department of Mason Books. The Owner asks you to scrape the website of a competitor. He would like this information to gain insight into his pricing structure.

? Note: Before continuing, we recommend you possess, at minimum, a basic knowledge of HTML and CSS and have reviewed our articles on How to Scrape HTML tables.

What You’ll Build in This Project


Let’s navigate to Books to Scrape and review the format.


At first glance, you will notice:

  • Book categories display on the left-hand side.
  • There are, in total, 1,000 books listed on the website.
  • Each web page shows 20 Books.
  • Each price is in £ (in this instance, the UK pound).
  • Each Book displays minimum details.
  • To view complete details for a book, click on the image or the Book Title hyperlink. This hyperlink forwards to a page containing additional book details for the selected item (see below).
  • The total number of website pages displays in the footer (Page 1 of 50).

Step 1: Install and Import Libraries for Project


Before any data manipulation can occur, three (3) new libraries will require installation.

  • The Pandas library enables access to/from a DataFrame.
  • The Requests library provides access to the HTTP requests in Python.
  • The Beautiful Soup library enables data extraction from HTML and XML files.

To install these libraries, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.

$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

$ pip install requests

Hit the <Enter> key on the keyboard to start the installation process.

$ pip install beautifulsoup4

Hit the <Enter> key on the keyboard to start the installation process.

If the installations were successful, a message displays in the terminal indicating the same.


Feel free to view the PyCharm installation guides for the required libraries.


Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import pandas as pd
import requests
from bs4 import BeautifulSoup
import time
import urllib.request
from csv import reader, writer
  • The time library is built-in with Python and does not require installation. This library contains time.sleep() and is used to set a delay between page scrapes.
  • The urllib library is built-in with Python and does not require installation. This library contains urllib.request and is used to save images.
  • The csv library is built-in Pandas and does not require additional installation. This library contains reader and writer methods to save data to a CSV file.

Step 2: Understand Basics and Scrape Your First Results



In this step, you’ll perform the following tasks:

  • Reviewing the website to scrape.
  • Understanding HTTP Status Codes.
  • Connecting to the Books to Scrape website using the requests library.
  • Retrieving Total Pages to Scrape
  • Closing the Open Connection.

? Learn More: Learn everything you need to know to reproduce this step in the in-depth Finxter blog tutorial.

Step 3: Configure URL to Scrape and Avoid Spamming the Server


Rule: Don’t Spam the Server!

In this step, you’ll perform the following tasks:

  • Configuring a page URL for scraping
  • Setting a delay: time.sleep() to pause between page scrapes.
  • Looping through two (2) pages for testing purposes.

? Learn More: Learn everything you need to know to reproduce this step in the in-depth Finxter blog tutorial.

Step 4: Save Book Details in a Python List



In this step, you’ll perform the following tasks:

  • Locating Book details.
  • Writing code to retrieve this information for all Books.
  • Saving Book details to a List.

? Learn More: Learn everything you need to know to reproduce this step in the in-depth Finxter blog tutorial.

Step 5: Clean and Save the Scraped Output



In this step, you’ll perform the following tasks:

  • Cleaning up the scraped code.
  • Saving the output to a CSV file.

? Learn More: Learn everything you need to know to reproduce this step in the in-depth Finxter blog tutorial.

Conclusion


This tutorial has guided you through the steps to create your first practical web scraping project: scraping the contents of a book store!

Now, go out and use your skills wisely and to the benefit of humanity, my friend! ?




https://www.sickgaming.net/blog/2022/06/...n-project/
Reply



Possibly Related Threads…
Thread Author Replies Views Last Post
  [Tut] Python BS4 – How to Scrape Absolute URL Instead of Relative Path xSicKxBot 0 30 12-02-2025, 10:14 AM
Last Post: xSicKxBot
  [Tut] I Created My First DALL·E Image in Python OpenAI Using Four Easy Steps xSicKxBot 0 1,276 03-10-2023, 03:46 PM
Last Post: xSicKxBot
  [Tut] How to Install Pip? 5 Easy Steps xSicKxBot 0 1,210 02-28-2023, 06:55 AM
Last Post: xSicKxBot
  [Tut] I Used These 3 Easy Steps to Create a Bitcoin Wallet in Python (Public/Private) xSicKxBot 0 1,224 01-29-2023, 02:51 AM
Last Post: xSicKxBot
  [Tut] Spectacular Titles: An Easy Python Project Generating Catchy Titles xSicKxBot 0 1,194 01-01-2023, 12:21 PM
Last Post: xSicKxBot
  [Tut] Learn the Basics of MicroPython for Absolute Python Beginners xSicKxBot 0 1,196 11-01-2022, 09:08 AM
Last Post: xSicKxBot
  [Tut] 3 Simple Steps to Convert calendar.ics to CSV/Excel in Python xSicKxBot 0 1,283 08-12-2022, 01:58 AM
Last Post: xSicKxBot
  [Tut] Top 8 Profitable Python Packages to Learn in 2023 xSicKxBot 0 1,144 07-27-2022, 10:25 AM
Last Post: xSicKxBot
  [Tut] Ten Easy Steps to Your First Python Flask App xSicKxBot 0 1,171 06-17-2022, 01:53 AM
Last Post: xSicKxBot
  [Tut] 7 Easy Steps to Redirect Your Standard Output to a Variable (Python) xSicKxBot 0 1,286 05-20-2022, 12:47 PM
Last Post: xSicKxBot

Forum Jump:


Users browsing this thread:
1 Guest(s)

Forum software by © MyBB Theme © iAndrew 2016