Python Selenium Tutorial 2021
Learn to make bots, test apps, and automate tasks with Python
Want to be able to magically make bots that do your work for you? In this tutorial, I’ll show you how to use Selenium Webdriver to automate tasks, use it for testing applications, and also go into some lesser-known advanced features available with Selenium Webdriver
What is Selenium
Selenium is a tool designed for automating web browsers programmatically. Selenium’s primary use is for automated software testing but it is also commonly used for scraping content where rendering Javascript is necessary and any other activity requiring automation in the browser such as bots.
One of the main selling points of Selenium is that it supports nearly all popular browsers such as Chrome, Firefox, Safari, Internet Explorer, and others. All of this is accessible through a uniform API that can be used with almost any programming language. In this tutorial, you’ll be using Python but you could also slightly modify the code to be used in Javascript, C#, Java, PHP, and other languages.
Video Tutorial
If you prefer watching a video you can follow along here:
Setup
To follow this tutorial you need 2 things:
- A basic Python development environment
- Google Chrome and Chromedriver downloaded
You can download the chrome driver at this link:
Make sure to get the version that matches your current browser version for Chrome. If you use Firefox you can also get geckodriver to use Selenium.
Place the chromedriver.exe file in your project directory and Selenium will automatically find it
Selenium Basics
The first thing you need to do is install the Selenium Python client library using pip:
pip install selenium
You can verify everything is working properly by running the following code
This code will import Selenium’s webdriver module, open up a Chrome instance, visit Google’s homepage, and then close.
Working with Web Elements in Selenium
You probably want to do more with Selenium than just open up websites. To do that you’ll need to interact with elements on web pages to automate whatever task you would do normally.
Luckily, Selenium provides several different ways to find elements on web pages and interact with them.
When automating any task with Selenium there are 2 things you need to do:
- Find the element using a locator tag
- Use a built-in method or property to interact with that element
Finding elements with Selenium Webdriver
There are many ways to accomplish the same goal with Selenium. In this section, I will give you an opinionated way based on my own experience of how you should locate web page elements with Selenium.
For example, you could grab the search bar element on Google’s home page using any of the following lines of code:
Order of priority for selectors
As you can see from the code above, it’s much easier to use something like a name or ID tag than to use Xpath or alternative search options. Whenever possible you’ll want to use those much simpler available options, but in some cases, you won’t have a choice.
My personal order for finding elements is:
- Name or ID tag
- CSS classes
- CSS selectors
- Xpath
- HTML tag names
Most websites will be easy to scrape or interact with and you’ll be able to use simple CSS classes or IDs. Other websites will actively try to prevent scraping and you’ll have to use more complicated selectors to find elements on the page.
Reddit is a good example of this, it’s one of the more challenging websites to scrape, so after this tutorial you’ll be able to handle most other websites very easily in comparison!
Returning multiple elements
The code above will only return the first instance that matches the parameters of the search. If you want to return an entire list of elements, for example, a list of tweets or some other content, you can simply add an “s” to the method call like this:
This code will return every web element that matches in an array that you can then iterate over.
Finding nested elements
You also have the option of using a parent element to search for more elements inside. Instead of searching from the driver object, you can first locate an element on the page, and then run further find_element_by_X
commands to locate inner elements.
Using dev tools to speed up Selenium development time
Here’s a selenium veteran tip. When I first started out working with Selenium it took forever for me to find elements. I would change my python selector code, run the entire program, and often it would cause an error when it couldn’t find an element. This was frustrating because for long-running bots testing each small change could take a few minutes.
It took me way too long to realize that I could use developer tools to speed up the iteration process. Any selector you use in Selenium can first be tested in your dev tools console to make sure you find the elements you want.
You won’t need to do this for basic stuff like CSS class names or IDs, but when working with complicated queries using Xpath it’s a lifesaver that will save you hours. Here’s an example of how to use Xpath with chrome dev tools to save an element using the $x
command:
Interacting with web elements using Selenium
So now you know how to locate elements with Selenium, now let’s learn how to actually do something with them.
Each element has a number of built-in methods to easily interact with them, here are the most common ones:
In addition to actions you can take on web page elements, there are also numerous properties you can access for each element to get more information if you need it. Be sure to check out the documentation and API to see all the options available
Making a Reddit Bot
Now let’s make use of this by making a simple bot that logs into Reddit. This basic tutorial will put into practice the following concepts
- creating a Selenium instance
- navigating to a web page
- Grabbing and clicking on an element
- Typing text into a form
- Submitting the form
Here is the code:
Most of this is very simple, although 2 lines of code may confuse you where you have to switch “frames”.
frame = driver.find_element_by_class_name(‘_25r3t_lrPF3M6zD2YkWvZU’)
What’s happening here is that clicking the “login” button on Reddit actually opens up an iFrame. We need to locate that by class name like a normal web element.
However, iFrames are treated differently in Selenium, so to actually interact with the elements inside the iFrame we have to switch our context using the following line of code:
driver.switch_to.frame(frame)
Once we are inside the iFrame we can again locate and interact with the internal elements. Once the code above is finished running you should be logged in if you provided actual login information instead of the dummy values I provided as input.
I also wrapped the code in a try/except block for error handling, which can obviously be useful for figuring out why your code breaks if you run into errors.
Create Selenium Automations without Code using Selenium IDE
Selenium also provides Selenium IDE, a browser extension that can be used to autogenerate code by recording actions you take in the browser.
It’s not perfect by any means but for simple automation tasks, it can be used to save time. I won’t do a full tutorial here but it might be worth checking out if it fits your use case.
Testing with Selenium
The actual intended purpose of Selenium is to make automated tests for web applications. In the past, these tests were commonly done by manual UI testers who had to check to make sure a website was working before new features were deployed. This was time-consuming and costly.
Using Selenium for testing follows the same concepts as you learned above, you are still just locating elements and interacting with them, but there are a few dedicated features Selenium also provides for testing purposes that I’ll cover.
Selenium Grid
Selenium Grid allows you to run multiple Selenium tests across multiple different browsers and operating systems in parallel. They can also be distributed across multiple servers to speed up how long it takes all your tests to run.
Selenium Grid has 2 main parts, the Hub and Nodes. The hub is in charge of controlling the nodes and runs on a single machine. Tests are loaded into the Hub and it spins up the actual browser instances across all available nodes to distribute the resource requirements.
Why would you want to do this? Well if you have a huge application it could drastically speed up how long it takes to run all your tests before deploying. In theory, if you have 10 servers to run your tests, your testing process just sped up by 10X compared to running them on a single machine.
Advanced Selenium Topics
The above will cover probably 90% of real-world use cases. But I figured I’d go the extra mile and also cover some more advanced features of Selenium in this section.
Waiting for dynamic Javascript elements
Many modern applications use frontend frameworks like ReactJS or VueJS that dynamically modify web pages. This can result in issues where Selenium tries to find an element a split second before it exists on the page, which leads to an error.
To get around this problem you can use the Wait feature of selenium to delay your code until the element has appeared on-page. These waits can be defined as a set number of seconds to delay, or until the element appears with a set timeout period in case the element never actually appears on the page.
Here’s an example from the Selenium docs:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.get("http://somedomain/url_that_delays_loading")
try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, "myDynamicElement"))
)
finally:
driver.quit()
Selenium Options and Capabilities
The Options and capabilities objects can be passed as arguments to Selenium when you create a browser instance. Some useful features include running in headless mode, uploading browser extensions, preventing browser notifications, choosing the size of the browser window, and running Selenium from behind a proxy.
Here are some common use cases with comments explaining what the code is doing:
Use Browser Extensions
In some cases, it might be useful to load your Selenium browser instance with an extension like Ad-block. Doing this is fairly simple, you just need to download the extension itself and then pass the CRX file path to Selenium using code similar to this:
Action Builders and Action Chains
Action chains are used when you need more fine-grained controls over browser actions like mouse movement, mouse button actions, key presses or double clicks, and context menu interactions like testing custom right-click menu features.
Storing cookies
If you are working with a website that requires being logged in, storing session cookies can make things easier so you don’t have to log in every time you run Selenium in a new instance. To do this you simply have to export and save the cookies and then add them to Selenium the next time you run an instance.
File Uploads
Selenium can’t interact with the native file explorer that opens when you click on a file upload button in the browser, instead, we grab the element for uploading the file and then type the entire file path as input, then click on the submit button:
Selenium 4 Updates
Selenium 4 is the newest version of Selenium with a ton of new features available, however it is still technically in alpha release. I’ll go over some of the highlights to look forward to in the near future.
Relative locators- This can be used to find elements above, below, left, or right, or simply “near” the current element you are running the method from. This could be very useful for testing forms or just grabbing all the elements in a list. The problem I see from this is that it works somewhat “magically” which could lead to problems maintaining an app.
Chrome DevTools Protocol Support- With this feature Selenium can now simulate different geographic locations for features that require testing based on location. It also allows simulating slow network conditions to verify your app works when a user has bad network conditions.
Browser tabs and window switching- Switching between tabs in Selenium has always been a major struggle this new syntax should make things significantly easier.
Conclusion
I hope this tutorial helped you get a nice introduction to Selenium Webdriver, if you have any issues be sure to leave a comment and I’ll try to help you figure things out!