Beautiful Soup is a Python library for pulling data out of HTML and XML files. It's one of the most popular python modules and is pretty easy to use.
To install Beautiful Soup 4 (BS4), you can just use pip:
pip install beautifulsoup4
In this post, I am going to use the requests python library:
pip install requests
Before we get started, be sure to include the modules in your python script:
import requests from bs4 import BeautifulSoup
The first step is to retrieve the html from the website.
response = requests.get('http://example.com/') soup = BeautifulSoup(response.text, 'html.parser')
soup contains the BS4 object that you can parse.
To find an element with a specific id:
element = soup.find(id='elementTagID')
One common task is extracting all the URLs found within a page’s
for link in soup.find_all('a'): print(link.get('href')) # http://example.com/elsie # http://example.com/lacie # http://example.com/tillie
To get the link text from the page you can do
for link in soup.find_all('a'): print(link.get_text()) # Click here # another link text # IDK what else to put
To get all link elements with the CSS class
© 2023 by Ryan Rickgauer