Python is particularly effective when you want to automatically browse or perform actions on web pages. If it is enough to use libraries like Beautiful Soup when you just scrape web pages, it is sometimes necessary to perform actions on pages that require Javascript, or even to imitate human behavior. To do this, it is possible to fully emulate a web browser with Python.
Installation
We will use Selenium, which is easily installed with:
pip install selenium
To be able to use it, it is necessary to install a rendering engine. We will use Gecko, the Firefox rendering engine.
To do this, download geckodriver
, decompress it, and place it in /usr/local/bin
.
tar xvfz geckodriver-version-plateform.tar.gz
mv geckodriver /usr/local/bin
Usage in Python
To use Selenium, simply import at the beginning of the file:
from selenium import webdriver
Simple usage
You can start Selenium with:
driver = webdriver.Firefox(executable_path=r'/usr/local/bin/geckodriver')
Thus, to click on an element, we use for example:
driver.find_element_by_class_name('my_class').click()
To free the memory, you can exit Selenium with:
driver.quit()
Use without graphical interface
If you want to launch it without a graphical user interface, you can use:
options = webdriver.FirefoxOptions()
options.add_argument('-headless')
driver = webdriver.Firefox(options=options, executable_path=r'/usr/local/bin/geckodriver')
Using with a proxy or Tor
If you use a proxy, or Tor (which is used as a proxy, with local IP 127.0.0.0.1
and port 9050
), it is possible to connect to it with Selenium using the following options:
profile = webdriver.FirefoxProfile()
profile.set_preference("network.proxy.type", 1)
profile.set_preference("network.proxy.socks", '127.0.0.1')
profile.set_preference("network.proxy.socks_port", 9050)
profile.set_preference("network.proxy.socks_remote_dns", False)
profile.update_preferences()
You can then use:
driver = webdriver.Firefox(firefox_profile=profile, executable_path=r'/usr/local/bin/geckodriver')
Cache and cookies
Other options are available, for example to disable the cache:
profile.set_preference("browser.cache.disk.enable", False)
profile.set_preference("browser.cache.memory.enable", False)
profile.set_preference("browser.cache.offline.enable", False)
profile.set_preference("network.http.use-cache", False)
It is also possible to clear cookies with:
driver.delete_all_cookies()