Beautiful Soup is a Python library for pulling data out of HTML and XML files. It's one of the most popular python modules and is pretty easy to use.

To install Beautiful Soup 4 (BS4), you can just use pip:

pip install beautifulsoup4

In this post, I am going to use the requests python library:

pip install requests

Before we get started, be sure to include the modules in your python script:

import requests
from bs4 import BeautifulSoup


The first step is to retrieve the html from the website.

response = requests.get('')
soup = BeautifulSoup(response.text, 'html.parser')

soup contains the BS4 object that you can parse.

There are a few different parsers: html.parse, lxml, lxml-xml, and html5lib. Each has its own advantages and disadvantages.

Searching for elements

To find an element with a specific id:

element = soup.find(id='elementTagID')

One common task is extracting all the URLs found within a page’s <a> tags:

for link in soup.find_all('a'):


To get the link text from the page you can do

for link in soup.find_all('a'):

# Click here
# another link text
# IDK what else to put

To get all link elements with the CSS class sister:

soup.find_all("a", class_="sister"):

