Proxy with Python Requests: The Ultimate Guide

In this article,we will teach the readers about how they can integrate and use proxies with Python Requests.

proxy with python request image

Proxies are vital tools for protecting your privacy. If you’re writing a web scraper using Python’s Requests library, you may want to prevent your IP address from getting exposed to malicious actors. Using a proxy can help protect your privacy and circumvent IP address bans and geoblocking.

In this article, you’ll learn how to use proxies with the Requests library. You’ll also learn how to set proxies for a single request, how to set proxies with sessions, and how to implement advanced techniques like rotating proxies.


What Are Proxies

Proxies are servers that sit in the middle of a client and the destination server. Instead of connecting to the destination server directly, the client connects to the proxy server, which in turn connects to the destination server.

From the perspective of the destination server, the proxy server acts like the client, and the destination server doesn’t know about the existence of the actual client. The proxy server protects the privacy of the client, essentially, by cloaking it.

There are different types of proxy servers based on the type of requests they can handle. The most common ones are HTTP proxies, HTTPS proxies, and SOCKS proxies. HTTP and HTTPS proxies can handle HTTP and HTTPS requests, respectively. If you want to make HTTP(S) requests, you should use an HTTP(S) proxy. In contrast, SOCKS proxies are much more versatile as they use TCP connections to communicate. This means a SOCKS proxy can handle multiple protocols, such as HTTP, FTP, and SMTP. A SOCKS proxy can also use UDP to send data, which is faster and more efficient than TCP and is helpful for more general-purpose scenarios, such as content streaming and peer-to-peer (P2P) file sharing.

Use Cases for Proxies

As you’ve probably already figured out, proxies are critical tools with many use cases, including the following:

  • Protecting privacy: This is the most common use case of proxies because proxies can hide your IP address from websites.
  • Circumventing IP bans: If you’re writing a web scraper, chances are your device’s IP address may get blocked if your scraper sends too many requests and exhibits bot-like behavior. You can use a proxy server, which hides your IP address. If the proxy server gets IP banned, you can use a different proxy server and continue scraping.
  • Circumventing georestrictions: Often, content is not available in certain countries. In this scenario, you can connect to a proxy server in a different country and bypass the georestrictions.

Keep in mind that just because you can bypass bans and restrictions using proxies doesn’t mean that you should abuse this power. You should read the terms and conditions of the website you want to scrape and abide by them to avoid any legal issues. You should also respect common courtesy measures, such as abiding by the robots.txt file and being careful not to overload servers.


How to Use Proxies with Python Requests

To follow along with the rest of the tutorial, make sure you have the Requests library installed:

pip install requests

If you want to use SOCKS proxies, make sure you install the required dependencies:

pip install 'requests[socks]'

After you’ve installed your dependencies, create a file named proxy.py and start by importing the Requests library:

import requests

To add proxies to requests, you need to create a dictionary that holds the proxy URL. To connect to a proxy, you need its hostname or IP address, the port number, the proxy type, and, optionally, its authentication credentials. For this tutorial, you can grab some free proxies from the Free Proxy List.

You need to construct a proxy server URL in this format:

PROTOCOL://PROXY_HOST:PROXY_PORT

Here, PROTOCOL can be http, https, or socks5, depending on the type of proxy you pick. In this article, you’ll use only HTTP and HTTPS proxies.

Once you’ve obtained the details, create a dictionary as follows and replace PROXY_URL_1 and PROXY_URL_2 with your HTTP and HTTPS proxy URLs, respectively:

proxies = {
   'http': 'PROXY_URL_1',
   'https': 'PROXY_URL_2'
}

Note: If you want to use only HTTP or HTTPS proxy, you can remove the appropriate key from the earlier dictionary. Remember that if you decide not to use an HTTPS proxy, you need to use http://httpbin.org/ip as the target URL in the following code blocks.

Now, you can make an HTTP request by passing the proxies dictionary. In this tutorial, you’ll make a request to https://httpbin.org/ip that returns the client’s IP address. This helps you identify whether the proxy worked or not:

response = requests.get('https://httpbin.org/ip', proxies=proxies)

Finally, print the response:

print(response.json())

Run the code with python proxy.py. You should then see the IP address of your proxy server:

{'origin': '116.98.220.11'}

It’s also possible to use proxies in requests.Session. In this scenario, a Session object is used to reuse the same TCP connection for several requests, which persist cookies. This helps in cases where a persistent session is required; for example, while scraping a site that requires you to log in.

To use proxies in Session, you need to set the proxies property like this:

import requests

session = requests.Session()

# set the proxies
session.proxies = {
   'http': 'PROXY_URL_1',
   'https': 'PROXY_URL_2'
}

# perform an HTTP GET request
response = session.get('https://httpbin.org/ip')

print(response.json())

Handling Proxy Authentication

If you’re using a premium proxy, chances are your proxy server requires authentication in the form of a username and password. In this case, you need to modify the proxy URLs to include the credentials like this:

PROTOCOL://USERNAME:PASSWORD@PROXY_HOST:PROXY_PORT

The rest of the code can be left as is:

import requests

session = requests.Session()

# set the proxies
session.proxies = {
   'http': 'PROXY_URL_1',
   'https': 'PROXY_URL_2'
}

# perform an HTTP GET request
response = session.get('https://httpbin.org/ip')

print(response.json())

Using Timeouts

Free proxy servers are often unreliable as they may be slow to respond or become unavailable. It’s recommended to use timeouts to ensure that the request is aborted if it takes too long to get a response. You can include the timeout parameter with a float value that indicates how many seconds the request should attempt before giving up:

requests.get('https://httpbin.org/ip', proxies=proxies, timeout=5) # wait 5 seconds

How to Rotate Proxies

While using a proxy server can help you circumvent IP bans, using one proxy server is often not enough because if your proxy server gets IP banned, you won’t be able to keep web scraping. That’s why it’s recommended to use multiple proxy servers and rotate them periodically.

How you rotate proxies depends on your use case. You can either randomly select a proxy from a list of proxies for each request or go through a list of proxies sequentially. You can also rotate a proxy when you encounter an error, which likely indicates that the current proxy has been IP-banned.

Randomly Selecting a Proxy

When you choose to randomly select a proxy, you prepare a list of proxies and pick a proxy at random whenever you make a request.

To do so, initially, import the necessary modules:

import random
import requests

Then, create a list of proxies:

HTTP_PROXIES = [
    'PROXY_URL_1',
    'PROXY_URL_2',
    # ...
    'PROXY_URL_N'

]
HTTPS_PROXIES = [
    'PROXY_URL_1',
    'PROXY_URL_2',
    # ...
    'PROXY_URL_N'
]

Define a function that picks a random proxy using random.choice() and makes an HTTP request:

def make_request(method, url):
   response = None

   try:
      http_proxy = random.choice(HTTP_PROXIES)
      https_proxy = random.choice(HTTPS_PROXIES)
      proxies = {
            'http': http_proxy,
            'https': https_proxy
      }
      print(f'Using proxy: {proxies}')
      response = requests.request(method, url, proxies=proxies, timeout=5)
   except Exception as e:
      print(e)

   return response

Now, you can use this function to make requests:

response = make_request("get", "https://httpbin.org/ip")
if response is not None:
   print(response.json())

The full code looks like this:

import random
import requests

HTTP_PROXIES = [
    'PROXY_URL_1',
    'PROXY_URL_2',
    # ...
    'PROXY_URL_N'

]
HTTPS_PROXIES = [
    'PROXY_URL_1',
    'PROXY_URL_2',
    # ...
    'PROXY_URL_N'
]

def make_request(method, url):
   response = None

   try:
      http_proxy = random.choice(HTTP_PROXIES)
      https_proxy = random.choice(HTTPS_PROXIES)
      proxies = {
            'http': http_proxy,
            'https': https_proxy
      }
      print(f'Using proxy: {proxies}')
      response = requests.request(method, url, proxies=proxies, timeout=5)
   except Exception as e:
      print(e)

   return response

response = make_request("get", "https://httpbin.org/ip")
if response is not None:
   print(response.json())

Rotating Proxies Sequentially

You can also go through the list of proxies sequentially instead of choosing one randomly. As before, start with the list of proxies:

import random
import requests

HTTP_PROXIES = [
    'PROXY_URL_1',
    'PROXY_URL_2',
    # ...
    'PROXY_URL_N'

]
HTTPS_PROXIES = [
    'PROXY_URL_1',
    'PROXY_URL_2',
    # ...
    'PROXY_URL_N'
]

Define a variable to hold the index of the current proxy:

current = 0

Then, define a function that picks the proxies at the index current and makes a request. Afterward, it increments the value of current:

def make_request(method, url):
   response = None
   global current

   try:
      http_proxy = HTTP_PROXIES[current]
      https_proxy = HTTPS_PROXIES[current]
      proxies = {
            'http': http_proxy,
            'https': https_proxy
      }
      print(f'Using proxy {current}')
      response = requests.request(method, url, proxies=proxies, timeout=5)
   except Exception as e:
      print(e)
   current = (current + 1) % len(HTTP_PROXIES)
   return response

Note that the line current = (current + 1) % len(HTTP_PROXIES) ensures that the value of current goes back to 0 after you have gone through the entire list. For simplicity, this code assumes that both the HTTP_PROXIES and HTTPS_PROXIES contain the same number of proxies.

Now, you can use this function to make requests:

for i in range(10):
   response = make_request("get", "https://httpbin.org/ip")
   if response is not None:
      print(response.json())

The complete code looks like this:

import random
import requests

HTTP_PROXIES = [
    'PROXY_URL_1',
    'PROXY_URL_2',
    # ...
    'PROXY_URL_N'

]
HTTPS_PROXIES = [
    'PROXY_URL_1',
    'PROXY_URL_2',
    # ...
    'PROXY_URL_N'
]

current = 0

def make_request(method, url):
   response = None
   global current
   try:
      http_proxy = HTTP_PROXIES[current]
      https_proxy = HTTPS_PROXIES[current]
      proxies = {
            'http': http_proxy,
            'https': https_proxy
      }
      print(f'Using proxy {current}')
      response = requests.request(method, url, proxies=proxies, timeout=5)
   except Exception as e:
      print(e)
   current = (current + 1) % len(HTTP_PROXIES)
   return response

for i in range(10):
   response = make_request("get", "https://httpbin.org/ip")
   if response is not None:
      print(response.json())

Rotating Proxies Based on Response

Sometimes, using a different proxy for each request may be inefficient. In that case, you can keep using a proxy until you encounter an error, which might indicate that the proxy has stopped working. The make_request function in the following code combines this technique with randomly rotating proxies. This function picks a random proxy and keeps using it until the status code of the response is not 200. Then, it picks a new proxy:

import random
import requests

HTTP_PROXIES = [
    'PROXY_URL_1',
    'PROXY_URL_2',
    # ...
    'PROXY_URL_N'

]
HTTPS_PROXIES = [
    'PROXY_URL_1',
    'PROXY_URL_2',
    # ...
    'PROXY_URL_N'
]

current = 0

def make_request(method, url):
   response = None
   global current
   try:
      http_proxy = HTTP_PROXIES[current]
      https_proxy = HTTPS_PROXIES[current]
      proxies = {
            'http': http_proxy,
            'https': https_proxy
      }
      print(f'Using proxy {current}')
      response = requests.request(method, url, proxies=proxies, timeout=5)
      if response.status_code != 200:
         current = random.randrange(len(HTTP_PROXIES))
         print(f'Request failed. Picked new proxy {current}')
   except Exception as e:
      print(e)

   return response

for i in range(10):
   response = make_request("get", "https://httpbin.org/ip")
   if response is not None:
      print(response.json())

Depending on your use case, you might consider other status codes as valid. For example, you might consider 404 as a valid status code. In that case, you can change the check to include multiple status codes:

if response.status_code not in [200, 404, 401, 403]:
   ...

How to Deal with Error Handling and Retries

Often, using free proxy services can be unreliable. The proxy server might be unavailable or slow to respond. While using a timeout takes care of this issue for a single request, you might need to use retries for a more robust solution. So far, the make_request function tries each request only once, but you can modify the code so that it retries each request a set number of times before giving up.

In the following code, the make_request function has been modified to take a max_attempts parameter. The code tries each request max_attempts times:

import random
import requests

HTTP_PROXIES = [
    'PROXY_URL_1',
    'PROXY_URL_2',
    # ...
    'PROXY_URL_N'

]
HTTPS_PROXIES = [
    'PROXY_URL_1',
    'PROXY_URL_2',
    # ...
    'PROXY_URL_N'
]

current = 0

def make_request(method, url, max_attempts = 3):
   response = None
   global current
   attempt = 1

   while attempt <= max_attempts:
      try:
         print(f'Attempt {attempt}')
         http_proxy = HTTP_PROXIES[current]
         https_proxy = HTTPS_PROXIES[current]
         proxies = {
               'http': http_proxy,
               'https': https_proxy
         }
         print(f'Using proxy {current}')
         response = requests.request(method, url, proxies=proxies, timeout=5)
         if response.status_code != 200:
            current = random.randrange(len(HTTP_PROXIES))
            print(f'Request failed. Picked new proxy {current}')
         else:
                break
         attempt += 1
      except Exception as e:
         print(e)
         current = random.randrange(len(HTTP_PROXIES))
         print(f'Request failed. Picked new proxy {current}')
         attempt += 1

   return response

for i in range(10):
   response = make_request("get", "https://httpbin.org/ip")
   if response is not None:
      print(response.json())

How to Dynamically Manage Proxies

So far, you’ve used a fixed list of proxies that you curated manually. However, creating and maintaining a list of usable proxies can be time-consuming and error-prone. Instead, you can use an API such as ProxyScrape to download a list of free proxies.

The following get_proxies function fetches a list of proxies from the API:

def get_proxies():
   url = "https://api.proxyscrape.com/v2/?request=displayproxies&protocol=http&timeout=10000&country=all&ssl=all&anonymity=all"

   response = requests.request("GET", url)

   print(response.text)
   return list(map(lambda x: x.strip(), response.text.strip().split("\n")))

Now, you can use this list of proxies in the make_request function:

PROXIES = []
current = 0

def make_request(method, url, max_attempts = 3):
   response = None
   global current, PROXIES
   attempt = 1

   while attempt <= max_attempts:
      if len(PROXIES) == 0:
         print("Fetching new proxies")
         PROXIES = get_proxies()
         print(PROXIES)
      try:
         print(f'Attempt {attempt}')
         http_proxy = PROXIES[current]
         proxies = {
               'http': 'http://' + http_proxy
         }
         print(f'Using proxy {current}')
         response = requests.request(method, url, proxies=proxies, timeout=5)
         if response.status_code != 200:
            PROXIES.pop(current) # Remove the proxy
            current = random.randrange(len(PROXIES))
            print(f'Request failed. Picked new proxy {current}')
         else:
            break
attempt += 1
      except Exception as e:
         print(e)
         PROXIES.pop(current) # Remove the proxy
         current = random.randrange(len(PROXIES))
         print(f'Request failed. Picked new proxy {current}')
         attempt += 1

   return response

Note that for simplicity, this code deals with HTTP proxies only. You can make an HTTP request using this function:

for i in range(10):
   response = make_request("get", "http://httpbin.org/ip")
   if response is not None:
      print(response.json())

The full code looks like this:

import random
import requests

PROXIES = []
current = 0
def get_proxies():
   url = "https://api.proxyscrape.com/v2/?request=displayproxies&protocol=http&timeout=10000&country=all&ssl=all&anonymity=all"

   response = requests.request("GET", url)

   print(response.text)
   return list(map(lambda x: x.strip(), response.text.strip().split("\n")))

PROXIES = []
current = 0

def make_request(method, url, max_attempts = 3):
   response = None
   global current, PROXIES
   attempt = 1

   while attempt <= max_attempts:
      if len(PROXIES) == 0:
         print("Fetching new proxies")
         PROXIES = get_proxies()
         print(PROXIES)
      try:
         print(f'Attempt {attempt}')
         http_proxy = PROXIES[current]
         proxies = {
               'http': 'http://' + http_proxy
         }
         print(f'Using proxy {current}')
         response = requests.request(method, url, proxies=proxies, timeout=5)
         if response.status_code != 200:
            PROXIES.pop(current) # Remove the proxy
            current = random.randrange(len(PROXIES))
            print(f'Request failed. Picked new proxy {current}')
         else:
                break
         attempt += 1
      except Exception as e:
         print(e)
         PROXIES.pop(current) # Remove the proxy
         current = random.randrange(len(PROXIES))
         print(f'Request failed. Picked new proxy {current}')
         attempt += 1

   return response

for i in range(10):
   response = make_request("get", "http://httpbin.org/ip")
   if response is not None:
      print(response.json())

How to Use Proxies with requests-futures

The Requests library is designed to be synchronous. In other words, when you make an HTTP request with requests, the code execution is halted until the request finishes or an exception occurs. But in some situations, you might prefer processing the request asynchronously so that the code execution isn’t blocked. The requests-future library is a small add-on for the Requests library that adds concurrency to requests.

You can install the library by running pip install requests-futures.

Let’s consider the following synchronous code that makes two requests:

import requests

session = requests.Session()

# perform an HTTP GET request
response = session.get('https://httpbin.org/ip')
print("Request 1")
print(response.json())

response = session.get('https://httpbin.org/ip')
print("Request 2")
print(response.json())

If you run the previous code, your output looks like this:

Request 1
{'origin': '100.20.101.111'}
Request 2
{'origin': '100.20.101.111'}

You can turn it into an asynchronous code using requests-futures:

from requests_futures.sessions import FuturesSession

session = FuturesSession()


# perform an HTTP GET request in the background
future_one = session.get('https://httpbin.org/ip')

# the second request starts imemdiately
future_two = session.get('https://httpbin.org/ip')

# wait for the result
response = future_one.result()
print("Request 1")
print(response.json())

response = future_two.result()
print("Request 2")
print(response.json())

The output is as follows:

Request 1
{'origin': '100.20.101.111'}
Request 2
{'origin': '100.20.101.111'}

Although the output looks the same, in the second code snippet, the second request starts immediately before the first one finishes.

Using proxies with requests-futures follows the same method as using proxies with requests. You simply need to set the session.proxies:

session.proxies = {
   'http': 'PROXY_URL_1',
   'https': 'PROXY_URL_2'
}

Conclusion

Whether you’re writing a web scraper or crawler in Python using the Requests library, you might need to use proxies. Proxies prevent malicious actors from snooping on your IP address. They can also help you bypass georestrictions and scraping bans.

In this article, you saw how you can use proxies with the Requests library. You also learned how to include proxy details in requests.get and requests.Session as well as implement advanced techniques, such as rotating proxies and handling errors.

Not sure what provider to choose? Go over our list of the best proxy providers.

arrow_upward