Home / Blog / Proxy 101 / Proxy with Python Requests: The Ultimate Guide
In this article,we will teach the readers about how they can integrate and use proxies with Python Requests.
Proxies are vital tools for protecting your privacy. If you’re writing a web scraper using Python’s Requests library, you may want to prevent your IP address from getting exposed to malicious actors. Using a proxy can help protect your privacy and circumvent IP address bans and geoblocking.
In this article, you’ll learn how to use proxies with the Requests library. You’ll also learn how to set proxies for a single request, how to set proxies with sessions, and how to implement advanced techniques like rotating proxies.
Proxies are servers that sit in the middle of a client and the destination server. Instead of connecting to the destination server directly, the client connects to the proxy server, which in turn connects to the destination server.
From the perspective of the destination server, the proxy server acts like the client, and the destination server doesn’t know about the existence of the actual client. The proxy server protects the privacy of the client, essentially, by cloaking it.
There are different types of proxy servers based on the type of requests they can handle. The most common ones are HTTP proxies, HTTPS proxies, and SOCKS proxies. HTTP and HTTPS proxies can handle HTTP and HTTPS requests, respectively. If you want to make HTTP(S) requests, you should use an HTTP(S) proxy. In contrast, SOCKS proxies are much more versatile as they use TCP connections to communicate. This means a SOCKS proxy can handle multiple protocols, such as HTTP, FTP, and SMTP. A SOCKS proxy can also use UDP to send data, which is faster and more efficient than TCP and is helpful for more general-purpose scenarios, such as content streaming and peer-to-peer (P2P) file sharing.
As you’ve probably already figured out, proxies are critical tools with many use cases, including the following:
Keep in mind that just because you can bypass bans and restrictions using proxies doesn’t mean that you should abuse this power. You should read the terms and conditions of the website you want to scrape and abide by them to avoid any legal issues. You should also respect common courtesy measures, such as abiding by the robots.txt file and being careful not to overload servers.
robots.txt
To follow along with the rest of the tutorial, make sure you have the Requests library installed:
pip install requests
If you want to use SOCKS proxies, make sure you install the required dependencies:
pip install 'requests[socks]'
After you’ve installed your dependencies, create a file named proxy.py and start by importing the Requests library:
proxy.py
import requests
To add proxies to requests, you need to create a dictionary that holds the proxy URL. To connect to a proxy, you need its hostname or IP address, the port number, the proxy type, and, optionally, its authentication credentials. For this tutorial, you can grab some free proxies from the Free Proxy List.
requests
You need to construct a proxy server URL in this format:
PROTOCOL://PROXY_HOST:PROXY_PORT
Here, PROTOCOL can be http, https, or socks5, depending on the type of proxy you pick. In this article, you’ll use only HTTP and HTTPS proxies.
PROTOCOL
http
https
socks5
Once you’ve obtained the details, create a dictionary as follows and replace PROXY_URL_1 and PROXY_URL_2 with your HTTP and HTTPS proxy URLs, respectively:
PROXY_URL_1
PROXY_URL_2
proxies = { 'http': 'PROXY_URL_1', 'https': 'PROXY_URL_2' }
Note: If you want to use only HTTP or HTTPS proxy, you can remove the appropriate key from the earlier dictionary. Remember that if you decide not to use an HTTPS proxy, you need to use http://httpbin.org/ip as the target URL in the following code blocks.
http://httpbin.org/ip
Now, you can make an HTTP request by passing the proxies dictionary. In this tutorial, you’ll make a request to https://httpbin.org/ip that returns the client’s IP address. This helps you identify whether the proxy worked or not:
proxies
https://httpbin.org/ip
response = requests.get('https://httpbin.org/ip', proxies=proxies)
Finally, print the response:
print(response.json())
Run the code with python proxy.py. You should then see the IP address of your proxy server:
python proxy.py
{'origin': '116.98.220.11'}
It’s also possible to use proxies in requests.Session. In this scenario, a Session object is used to reuse the same TCP connection for several requests, which persist cookies. This helps in cases where a persistent session is required; for example, while scraping a site that requires you to log in.
requests.Session
Session
To use proxies in Session, you need to set the proxies property like this:
import requests session = requests.Session() # set the proxies session.proxies = { 'http': 'PROXY_URL_1', 'https': 'PROXY_URL_2' } # perform an HTTP GET request response = session.get('https://httpbin.org/ip') print(response.json())
If you’re using a premium proxy, chances are your proxy server requires authentication in the form of a username and password. In this case, you need to modify the proxy URLs to include the credentials like this:
PROTOCOL://USERNAME:PASSWORD@PROXY_HOST:PROXY_PORT
The rest of the code can be left as is:
import requests session = requests.Session()
# set the proxies session.proxies = { 'http': 'PROXY_URL_1', 'https': 'PROXY_URL_2' }
# perform an HTTP GET request response = session.get('https://httpbin.org/ip')
Free proxy servers are often unreliable as they may be slow to respond or become unavailable. It’s recommended to use timeouts to ensure that the request is aborted if it takes too long to get a response. You can include the timeout parameter with a float value that indicates how many seconds the request should attempt before giving up:
timeout
requests.get('https://httpbin.org/ip', proxies=proxies, timeout=5) # wait 5 seconds
While using a proxy server can help you circumvent IP bans, using one proxy server is often not enough because if your proxy server gets IP banned, you won’t be able to keep web scraping. That’s why it’s recommended to use multiple proxy servers and rotate them periodically.
How you rotate proxies depends on your use case. You can either randomly select a proxy from a list of proxies for each request or go through a list of proxies sequentially. You can also rotate a proxy when you encounter an error, which likely indicates that the current proxy has been IP-banned.
When you choose to randomly select a proxy, you prepare a list of proxies and pick a proxy at random whenever you make a request.
To do so, initially, import the necessary modules:
import random import requests
Then, create a list of proxies:
HTTP_PROXIES = [ 'PROXY_URL_1', 'PROXY_URL_2', # ... 'PROXY_URL_N' ] HTTPS_PROXIES = [ 'PROXY_URL_1', 'PROXY_URL_2', # ... 'PROXY_URL_N' ]
HTTP_PROXIES = [ 'PROXY_URL_1', 'PROXY_URL_2', # ... 'PROXY_URL_N'
] HTTPS_PROXIES = [ 'PROXY_URL_1', 'PROXY_URL_2', # ... 'PROXY_URL_N' ]
Define a function that picks a random proxy using random.choice() and makes an HTTP request:
random.choice()
def make_request(method, url): response = None try: http_proxy = random.choice(HTTP_PROXIES) https_proxy = random.choice(HTTPS_PROXIES) proxies = { 'http': http_proxy, 'https': https_proxy } print(f'Using proxy: {proxies}') response = requests.request(method, url, proxies=proxies, timeout=5) except Exception as e: print(e) return response
def make_request(method, url): response = None
try: http_proxy = random.choice(HTTP_PROXIES) https_proxy = random.choice(HTTPS_PROXIES) proxies = { 'http': http_proxy, 'https': https_proxy } print(f'Using proxy: {proxies}') response = requests.request(method, url, proxies=proxies, timeout=5) except Exception as e: print(e)
return response
Now, you can use this function to make requests:
response = make_request("get", "https://httpbin.org/ip") if response is not None: print(response.json())
The full code looks like this:
import random import requests HTTP_PROXIES = [ 'PROXY_URL_1', 'PROXY_URL_2', # ... 'PROXY_URL_N' ] HTTPS_PROXIES = [ 'PROXY_URL_1', 'PROXY_URL_2', # ... 'PROXY_URL_N' ] def make_request(method, url): response = None try: http_proxy = random.choice(HTTP_PROXIES) https_proxy = random.choice(HTTPS_PROXIES) proxies = { 'http': http_proxy, 'https': https_proxy } print(f'Using proxy: {proxies}') response = requests.request(method, url, proxies=proxies, timeout=5) except Exception as e: print(e) return response response = make_request("get", "https://httpbin.org/ip") if response is not None: print(response.json())
You can also go through the list of proxies sequentially instead of choosing one randomly. As before, start with the list of proxies:
import random import requests HTTP_PROXIES = [ 'PROXY_URL_1', 'PROXY_URL_2', # ... 'PROXY_URL_N' ] HTTPS_PROXIES = [ 'PROXY_URL_1', 'PROXY_URL_2', # ... 'PROXY_URL_N' ]
Define a variable to hold the index of the current proxy:
current = 0
Then, define a function that picks the proxies at the index current and makes a request. Afterward, it increments the value of current:
current
def make_request(method, url): response = None global current try: http_proxy = HTTP_PROXIES[current] https_proxy = HTTPS_PROXIES[current] proxies = { 'http': http_proxy, 'https': https_proxy } print(f'Using proxy {current}') response = requests.request(method, url, proxies=proxies, timeout=5) except Exception as e: print(e) current = (current + 1) % len(HTTP_PROXIES) return response
def make_request(method, url): response = None global current
try: http_proxy = HTTP_PROXIES[current] https_proxy = HTTPS_PROXIES[current] proxies = { 'http': http_proxy, 'https': https_proxy } print(f'Using proxy {current}') response = requests.request(method, url, proxies=proxies, timeout=5) except Exception as e: print(e) current = (current + 1) % len(HTTP_PROXIES) return response
Note that the line current = (current + 1) % len(HTTP_PROXIES) ensures that the value of current goes back to 0 after you have gone through the entire list. For simplicity, this code assumes that both the HTTP_PROXIES and HTTPS_PROXIES contain the same number of proxies.
current = (current + 1) % len(HTTP_PROXIES)
0
HTTP_PROXIES
HTTPS_PROXIES
for i in range(10): response = make_request("get", "https://httpbin.org/ip") if response is not None: print(response.json())
The complete code looks like this:
import random import requests HTTP_PROXIES = [ 'PROXY_URL_1', 'PROXY_URL_2', # ... 'PROXY_URL_N' ] HTTPS_PROXIES = [ 'PROXY_URL_1', 'PROXY_URL_2', # ... 'PROXY_URL_N' ] current = 0 def make_request(method, url): response = None global current try: http_proxy = HTTP_PROXIES[current] https_proxy = HTTPS_PROXIES[current] proxies = { 'http': http_proxy, 'https': https_proxy } print(f'Using proxy {current}') response = requests.request(method, url, proxies=proxies, timeout=5) except Exception as e: print(e) current = (current + 1) % len(HTTP_PROXIES) return response for i in range(10): response = make_request("get", "https://httpbin.org/ip") if response is not None: print(response.json())
Sometimes, using a different proxy for each request may be inefficient. In that case, you can keep using a proxy until you encounter an error, which might indicate that the proxy has stopped working. The make_request function in the following code combines this technique with randomly rotating proxies. This function picks a random proxy and keeps using it until the status code of the response is not 200. Then, it picks a new proxy:
make_request
200
import random import requests HTTP_PROXIES = [ 'PROXY_URL_1', 'PROXY_URL_2', # ... 'PROXY_URL_N' ] HTTPS_PROXIES = [ 'PROXY_URL_1', 'PROXY_URL_2', # ... 'PROXY_URL_N' ] current = 0 def make_request(method, url): response = None global current try: http_proxy = HTTP_PROXIES[current] https_proxy = HTTPS_PROXIES[current] proxies = { 'http': http_proxy, 'https': https_proxy } print(f'Using proxy {current}') response = requests.request(method, url, proxies=proxies, timeout=5) if response.status_code != 200: current = random.randrange(len(HTTP_PROXIES)) print(f'Request failed. Picked new proxy {current}') except Exception as e: print(e) return response for i in range(10): response = make_request("get", "https://httpbin.org/ip") if response is not None: print(response.json())
def make_request(method, url): response = None global current try: http_proxy = HTTP_PROXIES[current] https_proxy = HTTPS_PROXIES[current] proxies = { 'http': http_proxy, 'https': https_proxy } print(f'Using proxy {current}') response = requests.request(method, url, proxies=proxies, timeout=5) if response.status_code != 200: current = random.randrange(len(HTTP_PROXIES)) print(f'Request failed. Picked new proxy {current}') except Exception as e: print(e)
Depending on your use case, you might consider other status codes as valid. For example, you might consider 404 as a valid status code. In that case, you can change the check to include multiple status codes:
404
if response.status_code not in [200, 404, 401, 403]: ...
Often, using free proxy services can be unreliable. The proxy server might be unavailable or slow to respond. While using a timeout takes care of this issue for a single request, you might need to use retries for a more robust solution. So far, the make_request function tries each request only once, but you can modify the code so that it retries each request a set number of times before giving up.
In the following code, the make_request function has been modified to take a max_attempts parameter. The code tries each request max_attempts times:
max_attempts
import random import requests HTTP_PROXIES = [ 'PROXY_URL_1', 'PROXY_URL_2', # ... 'PROXY_URL_N' ] HTTPS_PROXIES = [ 'PROXY_URL_1', 'PROXY_URL_2', # ... 'PROXY_URL_N' ] current = 0 def make_request(method, url, max_attempts = 3): response = None global current attempt = 1 while attempt <= max_attempts: try: print(f'Attempt {attempt}') http_proxy = HTTP_PROXIES[current] https_proxy = HTTPS_PROXIES[current] proxies = { 'http': http_proxy, 'https': https_proxy } print(f'Using proxy {current}') response = requests.request(method, url, proxies=proxies, timeout=5) if response.status_code != 200: current = random.randrange(len(HTTP_PROXIES)) print(f'Request failed. Picked new proxy {current}') else: break attempt += 1 except Exception as e: print(e) current = random.randrange(len(HTTP_PROXIES)) print(f'Request failed. Picked new proxy {current}') attempt += 1 return response for i in range(10): response = make_request("get", "https://httpbin.org/ip") if response is not None: print(response.json())
def make_request(method, url, max_attempts = 3): response = None global current attempt = 1
while attempt <= max_attempts: try: print(f'Attempt {attempt}') http_proxy = HTTP_PROXIES[current] https_proxy = HTTPS_PROXIES[current] proxies = { 'http': http_proxy, 'https': https_proxy } print(f'Using proxy {current}') response = requests.request(method, url, proxies=proxies, timeout=5) if response.status_code != 200: current = random.randrange(len(HTTP_PROXIES)) print(f'Request failed. Picked new proxy {current}') else: break attempt += 1 except Exception as e: print(e) current = random.randrange(len(HTTP_PROXIES)) print(f'Request failed. Picked new proxy {current}') attempt += 1
So far, you’ve used a fixed list of proxies that you curated manually. However, creating and maintaining a list of usable proxies can be time-consuming and error-prone. Instead, you can use an API such as ProxyScrape to download a list of free proxies.
The following get_proxies function fetches a list of proxies from the API:
get_proxies
def get_proxies(): url = "https://api.proxyscrape.com/v2/?request=displayproxies&protocol=http&timeout=10000&country=all&ssl=all&anonymity=all" response = requests.request("GET", url) print(response.text) return list(map(lambda x: x.strip(), response.text.strip().split("\n")))
def get_proxies(): url = "https://api.proxyscrape.com/v2/?request=displayproxies&protocol=http&timeout=10000&country=all&ssl=all&anonymity=all"
response = requests.request("GET", url)
print(response.text) return list(map(lambda x: x.strip(), response.text.strip().split("\n")))
Now, you can use this list of proxies in the make_request function:
PROXIES = [] current = 0 def make_request(method, url, max_attempts = 3): response = None global current, PROXIES attempt = 1 while attempt <= max_attempts: if len(PROXIES) == 0: print("Fetching new proxies") PROXIES = get_proxies() print(PROXIES) try: print(f'Attempt {attempt}') http_proxy = PROXIES[current] proxies = { 'http': 'http://' + http_proxy } print(f'Using proxy {current}') response = requests.request(method, url, proxies=proxies, timeout=5) if response.status_code != 200: PROXIES.pop(current) # Remove the proxy current = random.randrange(len(PROXIES)) print(f'Request failed. Picked new proxy {current}') else: break attempt += 1 except Exception as e: print(e) PROXIES.pop(current) # Remove the proxy current = random.randrange(len(PROXIES)) print(f'Request failed. Picked new proxy {current}') attempt += 1 return response
PROXIES = [] current = 0
def make_request(method, url, max_attempts = 3): response = None global current, PROXIES attempt = 1
while attempt <= max_attempts: if len(PROXIES) == 0: print("Fetching new proxies") PROXIES = get_proxies() print(PROXIES) try: print(f'Attempt {attempt}') http_proxy = PROXIES[current] proxies = { 'http': 'http://' + http_proxy } print(f'Using proxy {current}') response = requests.request(method, url, proxies=proxies, timeout=5) if response.status_code != 200: PROXIES.pop(current) # Remove the proxy current = random.randrange(len(PROXIES)) print(f'Request failed. Picked new proxy {current}') else: break attempt += 1 except Exception as e: print(e) PROXIES.pop(current) # Remove the proxy current = random.randrange(len(PROXIES)) print(f'Request failed. Picked new proxy {current}') attempt += 1
Note that for simplicity, this code deals with HTTP proxies only. You can make an HTTP request using this function:
for i in range(10): response = make_request("get", "http://httpbin.org/ip") if response is not None: print(response.json())
import random import requests PROXIES = [] current = 0 def get_proxies(): url = "https://api.proxyscrape.com/v2/?request=displayproxies&protocol=http&timeout=10000&country=all&ssl=all&anonymity=all" response = requests.request("GET", url) print(response.text) return list(map(lambda x: x.strip(), response.text.strip().split("\n"))) PROXIES = [] current = 0 def make_request(method, url, max_attempts = 3): response = None global current, PROXIES attempt = 1 while attempt <= max_attempts: if len(PROXIES) == 0: print("Fetching new proxies") PROXIES = get_proxies() print(PROXIES) try: print(f'Attempt {attempt}') http_proxy = PROXIES[current] proxies = { 'http': 'http://' + http_proxy } print(f'Using proxy {current}') response = requests.request(method, url, proxies=proxies, timeout=5) if response.status_code != 200: PROXIES.pop(current) # Remove the proxy current = random.randrange(len(PROXIES)) print(f'Request failed. Picked new proxy {current}') else: break attempt += 1 except Exception as e: print(e) PROXIES.pop(current) # Remove the proxy current = random.randrange(len(PROXIES)) print(f'Request failed. Picked new proxy {current}') attempt += 1 return response for i in range(10): response = make_request("get", "http://httpbin.org/ip") if response is not None: print(response.json())
PROXIES = [] current = 0 def get_proxies(): url = "https://api.proxyscrape.com/v2/?request=displayproxies&protocol=http&timeout=10000&country=all&ssl=all&anonymity=all"
The Requests library is designed to be synchronous. In other words, when you make an HTTP request with requests, the code execution is halted until the request finishes or an exception occurs. But in some situations, you might prefer processing the request asynchronously so that the code execution isn’t blocked. The requests-future library is a small add-on for the Requests library that adds concurrency to requests.
You can install the library by running pip install requests-futures.
pip install requests-futures
Let’s consider the following synchronous code that makes two requests:
import requests session = requests.Session() # perform an HTTP GET request response = session.get('https://httpbin.org/ip') print("Request 1") print(response.json()) response = session.get('https://httpbin.org/ip') print("Request 2") print(response.json())
session = requests.Session()
# perform an HTTP GET request response = session.get('https://httpbin.org/ip') print("Request 1") print(response.json())
response = session.get('https://httpbin.org/ip') print("Request 2") print(response.json())
If you run the previous code, your output looks like this:
Request 1 {'origin': '100.20.101.111'} Request 2 {'origin': '100.20.101.111'}
You can turn it into an asynchronous code using requests-futures:
from requests_futures.sessions import FuturesSession session = FuturesSession() # perform an HTTP GET request in the background future_one = session.get('https://httpbin.org/ip') # the second request starts imemdiately future_two = session.get('https://httpbin.org/ip') # wait for the result response = future_one.result() print("Request 1") print(response.json()) response = future_two.result() print("Request 2") print(response.json())
from requests_futures.sessions import FuturesSession
session = FuturesSession()
# perform an HTTP GET request in the background future_one = session.get('https://httpbin.org/ip')
# the second request starts imemdiately future_two = session.get('https://httpbin.org/ip')
# wait for the result response = future_one.result() print("Request 1") print(response.json())
response = future_two.result() print("Request 2") print(response.json())
The output is as follows:
Although the output looks the same, in the second code snippet, the second request starts immediately before the first one finishes.
Using proxies with requests-futures follows the same method as using proxies with requests. You simply need to set the session.proxies:
requests-futures
session.proxies
session.proxies = { 'http': 'PROXY_URL_1', 'https': 'PROXY_URL_2' }
Whether you’re writing a web scraper or crawler in Python using the Requests library, you might need to use proxies. Proxies prevent malicious actors from snooping on your IP address. They can also help you bypass georestrictions and scraping bans.
In this article, you saw how you can use proxies with the Requests library. You also learned how to include proxy details in requests.get and requests.Session as well as implement advanced techniques, such as rotating proxies and handling errors.
requests.get
Not sure what provider to choose? Go over our list of the best proxy providers.
7 min read
Kealan Parr
9 min read
Wyatt Mercer
Jonathan Schmidt