Warning: Undefined array key "ptarchive" in /home/1234445.cloudwaysapps.com/jcuzevxgzk/public_html/wp-content/themes/psr_theme/functions.php on line 174

How to Use a Proxy in Node Fetch

In this article, you'll learn how to integrate and use a proxy with Node Fetch.

Proxies in Node Fetch image

Node Fetch is a widely used npm package with approximately 53 million weekly downloads. It’s a lightweight module that ports the Fetch API to the Node.js ecosystem, allowing you to send POST data, make GET requests, and delete content directly from your code.

Despite its popularity for tasks like web scraping, Node Fetch doesn’t support proxies inherently. To avoid getting IP blocked, you need to use https-proxy-agent.

In this article, you’ll learn how to integrate and use a proxy with Node Fetch.


Using Proxies with Node Fetch

Before you begin this tutorial, make sure you have Node.js 14 or newer.

Configure Your Codebase

To start, navigate to a folder where you want to start up your codebase and initialize a Node.js repository with npm init. Fill in the necessary information (ie license, author name, etc.). When complete, your package.json should look something like this:

{
"name": "proxies",
"version": "1.0.0",
"description": "A simple Node script to scrape using proxies",
"main": "index.js",
"scripts": {
"test": "echo \"Error: no test specified\" && exit 1"
},
"author": "KealanP",
"license": "ISC"
}

Install Your Dependencies and Create a Node Script

For this tutorial, you need two dependencies:

  • https-proxy-agent: The package you use to configure and utilize a proxy.
  • Node Fetch: The package through which you make your network requests.

Run the following commands to install the dependencies:

npm i https-proxy-agent
npm i node-fetch

By default, in your package.json, Node.js looks for a file called index.js. To ensure your script runs correctly, you need to create this file, which you use to make your network requests and configure your proxies.

Install Your Dependencies and Create a Node Script

For this tutorial, you need two dependencies:

  • https-proxy-agent: The package you use to configure and utilize a proxy.
  • Node Fetch: The package through which you make your network requests.

Run the following commands to install the dependencies:

npm i https-proxy-agent
npm i node-fetch

By default, in your package.json, Node.js looks for a file called index.js. To ensure your script runs correctly, you need to create this file, which you use to make your network requests and configure your proxies.

After you’ve created your index.js file, it’s time to make a network request (without a proxy).

To start, you create the base structure of the app, which is an async function named scrape that holds the scraper logic. Paste the following code in index.js:

import fetch from 'node-fetch';

const scrape = async () => {

};

scrape();

Here, you’re importing node-fetch and creating the scrape function, which you then call. Right now, the function body is empty, but you’ll fill it with the scraper code next.

To make your imports work with the ES6 module syntax used here, you need to add "type": "module" at the top level of your package.json so that you can use the import keyword rather than require.

Your package.json should look something like this:

{
  "name": "proxies",
  "type": "module",
  "version": "1.0.0",
  "description": "A simple Node script to scrape using proxies",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "KealanP",
  "license": "ISC",
  "dependencies": {
    "https-proxy-agent": "^7.0.4",
    "node-fetch": "^3.3.2"
  }
}

At this point, you need to decide which URL you should send an HTTP request to. For this article, you’ll use ident.me, which is a service that echoes the IP address of the client and tells you whether the proxy is working or not. ident.me is a web service that you can access without prior permission as long as you don’t exceed 5,000 requests per second.

Modify index.js to look like this:

import fetch from 'node-fetch';

const scrape = async () => {
    const url = 'https://ident.me/';
    const response = await fetch(url)
    const text = await response.text();
    console.log(text);
};

scrape();

Run the script with node index.js. You should then see that the IP address of your device is logged:

161.102.11.6

You can make sure that the IP is configured correctly by navigating to https://www.whatismyip.com/. You should see the same IP in both places.


Retrieve Your Proxy from a Proxy Aggregator

There are many ways to obtain a proxy, but one of the easiest ways is to use a proxy list that is updated every five minutes and has numerous available proxies. In this scenario, you use GeoNode. Navigate to this website and see a list of free proxies:

A table of available proxies

It’s best to pick a proxy at the top of the list because it’s more likely to be fresh. The two columns you need to pay attention to are IP ADDRESS and PORT as you use these to configure your proxy.

Copy and store the port and host (ie IP address) information. You need this information to configure your proxy in the next step.


Configure Your Proxy

Once you’ve obtained your proxy details, it’s time to add this information to your script.

You add the details of the proxy as an Agent to your request. An Agent is essentially a way to manage connections on your network calls and can be used to configure your requests to go through the proxy.

To configure your proxy, you need to construct a URL from the parameters you retrieved in the previous step in the following format: http://<PROXY_HOST>:<PROXY_PORT>.

Paste the following code in index.js, updating the proxy host and port:

import fetch from 'node-fetch';
import { HttpsProxyAgent } from 'https-proxy-agent';

const scrape = async () => {
    // Configuring our proxy details
    const proxyHost = '43.134.32.184';
    const proxyPort = 3128;
    const proxyConfig = `http://${proxyHost}:${proxyPort}`;
    const proxyAgent = new HttpsProxyAgent(proxyConfig);

    const url = 'https://ident.me/';

    const response = await fetch(url, { agent: proxyAgent });
    const text = await response.text();
    console.log(text);
};

scrape();

This code uses HttpsProxyAgent to add the proxy configuration to the network call. By passing the agent parameter to fetch, this HttpsProxyAgent ensures the request is routed through the proxy.

When you run this code, you should see that your proxy IP has been logged:

43.134.32.184

Rotate Your Proxies

The benefit of using a proxy is that it adds a layer of abstraction to your service. This abstraction ensures your real IP never gets blocked because the proxy acts as a middleman.

However, if you make too many requests using your proxy server, its IP address can get blocked. That’s why you need to rotate the proxies you use.

Increasing the number of proxies making requests makes it harder to identify and block your scraping attempts, reducing the impact of losing a single proxy server. Adding multiple proxies is relatively simple—you just need to iterate over your list of proxies by adding your proxies to an array. You store the host names of the proxies and the ports you need in an array called proxies. Then, you can simply step over the array, construct your proxy URL, and create a proxyAgent from the URL.

Add the proxy agent to your node-fetch network request and then scrape the URLs you previously configured. It looks something like this:

import fetch from 'node-fetch';
import { HttpsProxyAgent } from 'https-proxy-agent';

const proxies = [
  { host: '43.134.68.153',   port: 3128 },
  { host: '221.140.235.236', port: 5002 },
  { host: '8.219.97.248',    port: 80   },
];

async function scrapeWithRotatingProxies(proxies, url) {
  for (const proxy of proxies) {
    try {
      const proxyConfig = `http://${proxy.host}:${proxy.port}`;
      const proxyAgent = new HttpsProxyAgent(proxyConfig);

      const response = await fetch(url, { agent: proxyAgent });
      const text = await response.text();

      console.log(text);
    } catch (err) {
      console.error(err);
    }
  }
}

const url = 'https://ident.me/ip';
await scrapeWithRotatingProxies(proxies, url);

And that’s all you have to do to avoid being blocked!

All the code for this tutorial can be found in this GitHub repo.


Conclusion

Proxies are vital in web scraping. They protect your IP address and let you circumvent geoblocking and other restrictions.

In this article, you learned how to get a proxy, configure a URL to access a remote proxy, and plug the proxy into your workflow to scrape effectively using Node Fetch. With this knowledge, you can easily build web scrapers that can scrape without being blocked.

arrow_upward