n8n and 429s - Dealing with API Rate Limits - Scott Murray

You’re running an n8n workflow that processes 500 items through an external API. Everything runs smoothly until item 347 hits a 429 “Too Many Requests” error. Existential dread sets in… it’s over.

Here are the most common outcomes I see when throttling issues happen:

Stop on Error – this is the default for n8n
Retry on Fail – if you configure this, the node will try a few more times and then give up
Continue on Error – execution chugs along but the 429s hard fail somewhere downstream

I have a microcap trading bot I built on n8n (no, I do not recommend this) and it frequently runs into 429s pulling pricing information. Load all of the tickers from the NYSE and NASDAQ and then try to get the prices at a fairly high frequency – you’re bound to run into 429s. I thought I would share a pattern I use to handle rate limiting.

Why Built-in n8n Error Handling Falls Short

n8n provides solid error handling features out of the box. The “Retry on Fail” setting handles transient errors well. The Error Trigger workflow catches catastrophic failures. However, there’s a gap between “retry a few times” and “the whole workflow crashed.” We need to handle gracefully, not crash, and still get the data.

Consider these scenarios:

An API has sporadic rate limiting that clears after 60 seconds – longer than the max 5-retry window
A batch of 1000 items has 3 malformed records that will never succeed
An external service goes down for 10 minutes during your 2-hour batch job

In each case, you need a way to:

Capture the failed items with full context
Continue processing the remaining items
Retry the failures later, automatically
Escalate items that keep failing after multiple attempts

For this, I use the Dead Letter Queue pattern (DLQ).

What is a Dead Letter Queue?

A Dead Letter Queue (DLQ) is a holding area for messages or items that couldn’t be processed successfully. The term comes from message queue systems like RabbitMQ and AWS SQS, but the pattern works anywhere you’re processing batches of data.

Main Workflow:
  ├─ Success → Normal processing
  └─ Failure → Send to DLQ table

Retry Workflow (runs hourly):
  ├─ Pull items from DLQ
  ├─ Attempt processing
  ├─ Success → Delete from DLQ
  └─ Failure → Increment attempt counter
      └─ If attempts > 5 → Flag for manual review

Instead of losing failed items or blocking your entire workflow, failures get captured in a database table (since I use Postgres as the backend for n8n itself and holding the trading data). A separate workflow periodically retries them. Items that fail repeatedly stay in the queue for manual investigation.

Lab Setup

I like a clean lab that clearly demonstrates the concept.

A Postgres database (it’s what I already have – feel free to use whatever queuing mechanism makes sense in your environment)
- If your n8n environment is running on SQLite and you don’t have Postgres in the env, switch over now before you run into SQLite issues
Two n8n workflows: the main processing workflow and a retry workflow
A chaos-testing endpoint to simulate real-world API failures – for the lab, this is in Cloudflare. It’s just a worker that triggers 429s randomly

Step 1: Create a DLQ Table

If you already have a database on Postgres outside of the n8n database where you are already storing information – great! Put the new table there. If not, create a new database and then put the table there.

CREATE DATABASE workflow_data;

Then, select your database and create the table there:

CREATE TABLE workflow_dlq (
    id SERIAL PRIMARY KEY,
    workflow_name VARCHAR(100) NOT NULL,
    item_payload JSONB NOT NULL,
    error_message TEXT,
    error_code VARCHAR(10),
    attempts INTEGER DEFAULT 1,
    max_attempts INTEGER DEFAULT 5,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    last_attempt_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    resolved_at TIMESTAMP,
    status VARCHAR(20) DEFAULT 'pending'
);

CREATE INDEX idx_dlq_status ON workflow_dlq(status);
CREATE INDEX idx_dlq_workflow ON workflow_dlq(workflow_name);

Key fields:

item_payload: The complete original item as JSON, so you have everything needed to retry
attempts: How many times we’ve tried to process this item
status: “pending”, “retrying”, “resolved”, or “failed” (exceeded max attempts)

Now, all we need is the credential within n8n that allows for connecting to our new table from within a workflow.

And then the actual creds. In my lab, n8n and Postgres are running in Docker containers. The name of the Docker container for Postgres is literally “postgres” which means I use that for the host name. Make sure you have a route you understand from n8n to your DB instance.

Step 2: Configure Your Main Workflow for Error Routing

The critical change: enable the error output on nodes that call external APIs. Some of you may already have this – if not, enable it.

Open your HTTP Request or API node
Go to Settings
Optionally set Retry on Fail (recommended)
Set On Error to “Continue (using error output)”

This gives the node two outputs:

Output 1: Items that succeeded
Output 2: Items that failed (with error details attached)

When you process 100 items and 3 fail, 97 items flow out of Output 1 and 3 items flow out of Output 2. Each failed item carries its original data plus the error information. The “Retry on Fail” in combination with the DLQ path (On Error – Continue) gives us the best chance for success in getting the data from the target APIs.

Now, from the Error output for the HTTP Request node, we can attach a Postgres Insert node:

Now, when a fail happens it should get inserted into the database:

Success!

Step 3: Build the Retry Workflow

Here’s the layout for the DLQ workflow. It’s a little more cautious than the original flow. We only process 1 item at a time and we only kick the DLQ processing off like once per hour (or at a slower cadence than the original flow at the very least). The goal with this flow is to successfully process the queue – we’re already delayed so there’s no reason to make it worse.

Schedule Trigger – at a cadence less than that of the original
GET DLQ – a query to our workflow table to get the current DLQ

SELECT * FROM workflow_dlq 
WHERE status = 'pending' 
  AND attempts < max_attempts
ORDER BY created_at ASC
LIMIT 50;

Loop – we’re going to process each item in the DLQ 1 at a time – no concurrency
HTTP Request – exactly the same as the original request. Configured exactly the same with retries and on error continue
Resolve on Success – flag the item in the DLQ from pending -> resolved if the HTTP request is successful

UPDATE workflow_dlq 
SET status = 'resolved', resolved_at = CURRENT_TIMESTAMP
WHERE id = {{ $('Loop Over Items').item.json.id }}

Increment Error Attempts – updates the DLQ attempts column if we have another fail

UPDATE workflow_dlq 
SET attempts = attempts + 1,
    last_attempt_at = CURRENT_TIMESTAMP,
    status = CASE WHEN attempts >= max_attempts - 1 THEN 'failed' ELSE 'pending' END
WHERE id = {{ $('Loop Over Items').item.json.id }}

Get Attempts – queries the DB for attempts for that specific id/record

select id,attempts,max_attempts from workflow_dlq where id = {{ $('Loop Over Items').item.json.id }};

IF – checks if attempts >= max_attempts
If the IF is true, we send a fail message

Nothing too magical here – we’re just going to process each item in the DLQ one at a time and give it the highest chance for success. Hopefully, we’re able to get our data and the DLQ will move to resolved.

You can download the sample workflows here: https://github.com/scomurr/blog_post_assets/raw/refs/heads/main/DLQ.zip

Additionally, here is the Cloudflare worker code I used for the chaos endpoint:

export default {
  async fetch(request) {
    const now = Math.floor(Date.now() / 1000);
    const cyclePosition = now % 20;
    const inFlurry = cyclePosition >= 12 && cyclePosition <= 17;
    
    const random = Math.random();
    const should429 = inFlurry ? random < 0.8 : random < 0.05;
    
    if (should429) {
      return new Response(JSON.stringify({
        error: "Too Many Requests",
        retry_after: Math.floor(Math.random() * 5) + 1
      }), {
        status: 429,
        headers: { 
          "Content-Type": "application/json",
          "Retry-After": "3"
        }
      });
    }
    
    const url = new URL(request.url);
    const id = url.searchParams.get("id") || "unknown";
    
    return new Response(JSON.stringify({
      id: id,
      processed: true,
      timestamp: new Date().toISOString()
    }), {
      status: 200,
      headers: { "Content-Type": "application/json" }
    });
  }
};

Real-World Example: Batch LLM Processing

Here’s how this pattern applies to a real workflow for me: scoring 500 stock tickers using multiple LLM providers.

Each ticker is one item. The workflow sends all 500 items through the same nodes – there’s no branching per ticker. But when one item fails (rate limit, timeout, bad data), I need to capture it without stopping the other 499.

The Challenge:

OpenAI, Gemini, and local Llama models all have different rate limits
Processing takes a bit of time – usually an hour or so; can’t afford to lose progress
Some payloads have malformed data that will always fail

The Solution:

Main workflow sends each ticker to the LLM with error routing enabled
Successful scores go directly to the results table
Failed items (429s, timeouts, malformed data) go to the DLQ
Retry workflow runs every 30 minutes
Items failing 5+ times get flagged for manual review

Result: A batch that previously required babysitting now runs overnight. Failed items get captured and retried automatically. The handful of truly broken records get flagged instead of silently disappearing.

When to Use the DLQ Pattern

Good candidates:

Batch processing workflows (100+ items)
External API calls with rate limits
Long-running workflows where partial failure is acceptable
Workflows where data loss is unacceptable

I obviously wouldn’t use this pattern if the workflow is all or nothing – if the data is too time sensitive going and getting it later might not work. If the workflow is going to never process more than a handful of items (say 10) at a time, there’s no reason to implement something like this either.

Summary

The Dead Letter Queue pattern fills a gap in n8n’s built-in error handling: what to do with items that fail after retries but shouldn’t be lost.

By routing failed items to a database table and processing them with a separate retry workflow, you get:

Zero data loss: Every failed item is captured with full context
Automatic recovery: Transient failures get retried without manual intervention
Visibility: You know exactly what failed, when, and why
Graceful degradation: Main workflow keeps processing instead of blocking on failures

This is a pattern that scales. Whether you’re processing 100 items or 100,000, the DLQ keeps your workflows resilient without adding complexity to your main processing logic.

Related Resources: