Pinecone-backed customer support bot in n8n, with the gotchas nobody mentions

A customer types "how do I cancel my subscription" into the chat widget on your site. A support bot responds, ideally by finding the correct cancellation instructions from your documentation, writing a short natural-language answer, and linking the relevant page. If the bot gets it right, the ticket never reaches a human. If it gets it wrong, you get a complaint instead of a cancellation, and the human still has to deal with it.

This workflow builds that bot in n8n using Pinecone as the vector index and the OpenAI API for embeddings and generation. The happy path is 45 minutes of work. Getting it production-ready is another two weeks of hitting the failure modes below. I am writing this post because every RAG tutorial I have read covers the happy path and then disappears, and the failure modes are the whole reason this kind of system is hard.

What you'll build

A Pinecone index containing embeddings of every page in your documentation
An n8n workflow that ingests new or updated docs and keeps the index fresh
A second n8n workflow exposed as a webhook, which a chat widget can POST user questions to and receive a grounded answer back
A rotation mechanism so the index stays accurate as documentation changes
Logging so you can audit bad answers after the fact

Prerequisites

An n8n account. Cloud starts at £20/mo for the Starter tier, or you can self-host on a VPS for free (about £5/mo on Hetzner or similar).
A Pinecone account. The Starter tier is free and handles around 100k vectors, which is enough for a documentation of ~500 pages with typical chunking. Production use cases typically need the Standard tier at £50/mo.
An OpenAI API key with access to text-embedding-3-small (for embeddings) and gpt-4o-mini or gpt-4o (for generation).
Your documentation in a format you can read programmatically. Markdown files in a Git repo, a Notion database, or a CMS API all work. This post uses markdown files.
About 2 hours for the initial build. Longer if you have to clean up messy source documentation first.

How to build it

Step 1: Create the Pinecone index

Log into Pinecone, create a new index with:

Name: support-docs
Dimension: 1536 (this is the dimension of text-embedding-3-small)
Metric: cosine
Pod type: Starter (or Serverless on newer accounts)

Copy the environment name and API key. You will need both in n8n.

Step 2: Ingest your documentation

Create an n8n workflow called "Ingest Support Docs" with these nodes:

Manual Trigger (you will swap this for a Git webhook or a schedule later)
Read Files from Disk or HTTP Request to fetch your docs
Code node to chunk the documents
OpenAI Embeddings to embed each chunk
Pinecone (upsert) to write the vectors

The chunking step is where most tutorials get handwavy. Here is the actual Code node I use, which splits documents at natural boundaries with some overlap so sentences that span chunks do not lose their context:

// n8n Code node, JavaScript, runs once for all items
const CHUNK_SIZE = 600;        // characters, roughly 150 tokens
const CHUNK_OVERLAP = 100;     // characters of overlap between chunks

const outputs = [];

for (const item of items) {
  const doc = item.json;
  const text = doc.content;
  const slug = doc.slug;

  // Split on paragraph boundaries first
  const paragraphs = text.split(/\n\s*\n/);

  // Greedy-pack paragraphs into chunks up to CHUNK_SIZE
  let currentChunk = "";
  let chunkIndex = 0;

  for (const para of paragraphs) {
    if ((currentChunk + "\n\n" + para).length > CHUNK_SIZE) {
      if (currentChunk.length > 0) {
        outputs.push({
          json: {
            id: `${slug}#${chunkIndex}`,
            text: currentChunk.trim(),
            metadata: {
              slug,
              title: doc.title,
              url: doc.url,
              chunk_index: chunkIndex,
            },
          },
        });
        chunkIndex += 1;

        // Start new chunk with overlap from the end of the previous
        const overlapStart = Math.max(0, currentChunk.length - CHUNK_OVERLAP);
        currentChunk = currentChunk.substring(overlapStart) + "\n\n" + para;
      } else {
        // Paragraph itself is larger than CHUNK_SIZE, hard split
        for (let i = 0; i < para.length; i += CHUNK_SIZE - CHUNK_OVERLAP) {
          outputs.push({
            json: {
              id: `${slug}#${chunkIndex}`,
              text: para.substring(i, i + CHUNK_SIZE),
              metadata: { slug, title: doc.title, url: doc.url, chunk_index: chunkIndex },
            },
          });
          chunkIndex += 1;
        }
        currentChunk = "";
      }
    } else {
      currentChunk += (currentChunk ? "\n\n" : "") + para;
    }
  }

  // Don't forget the last chunk
  if (currentChunk.length > 0) {
    outputs.push({
      json: {
        id: `${slug}#${chunkIndex}`,
        text: currentChunk.trim(),
        metadata: { slug, title: doc.title, url: doc.url, chunk_index: chunkIndex },
      },
    });
  }
}

return outputs;

Wire the output of this node into an OpenAI → Embeddings node, which takes {{$json.text}} as the input and produces a 1536-dimension vector. Then wire that into the Pinecone → Upsert node with:

Index: support-docs
Vector: the embedding output
ID: {{$json.id}}
Metadata: {{$json.metadata}} (and also store the chunk text itself in metadata as text so you can read it back at query time)

Run the workflow once manually with your full docs corpus. For a ~500-page documentation, this takes 5-10 minutes and costs about £0.20 in OpenAI embedding fees.

Step 3: Build the query workflow

Create a second n8n workflow called "Support Bot". Nodes:

Webhook trigger (POST, with question in the body)
OpenAI → Embeddings to embed the user question
Pinecone → Query to retrieve the top 5 chunks
Code node to build the RAG prompt
OpenAI → Chat to generate the answer
Respond to Webhook with the answer and the source URLs

The prompt-building Code node:

const question = items[0].json.body.question;
const matches = items[0].json.matches || [];

const context = matches
  .map((m, i) => `[${i + 1}] ${m.metadata.title} (${m.metadata.url})\n${m.metadata.text}`)
  .join("\n\n---\n\n");

const sources = matches.map((m) => ({
  title: m.metadata.title,
  url: m.metadata.url,
  score: m.score,
}));

return [{
  json: {
    prompt: `You are a customer support bot for Acme Ltd.

Answer the customer's question using ONLY the documentation excerpts below.
If the documentation does not contain the answer, say "I do not have that
information, please contact support@acme.com". Do not guess. Cite the
source number in square brackets like [1] when you reference information.

Documentation:
${context}

Customer question: ${question}`,
    sources,
    question,
  }
}];

The Chat node uses GPT-4o-mini with:

Model: gpt-4o-mini
Temperature: 0.2 (keep answers consistent, don't get creative)
Message: {{$json.prompt}}

The final Respond to Webhook node returns:

{
  "answer": "{{$json.message.content}}",
  "sources": "{{$('Prompt Builder').item.json.sources}}"
}

Activate the workflow and hit the webhook URL with a test question. You should get back a grounded answer with source citations. If you do, congratulations, you have built the happy path that every RAG tutorial shows.

Now the real work starts.

The four gotchas

1. Your chunks are bleeding into each other

Every chunk has 100 characters of overlap with its neighbour for context. Pinecone's cosine similarity loves this because the overlaps mean neighbouring chunks both score high on the same queries. The retrieval returns the top 5 matches, which turn out to be five adjacent chunks of the same page, and the model's answer uses the same information five times.

Fix: After retrieving the top 5 chunks, deduplicate by page slug in the Code node before building the context:

const seen = new Set();
const unique = [];
for (const m of matches) {
  if (!seen.has(m.metadata.slug)) {
    seen.add(m.metadata.slug);
    unique.push(m);
    if (unique.length >= 3) break;
  }
}

You end up with three chunks from three different pages, which gives the model actual breadth to work with.

2. The index goes stale and nobody notices

A customer asks about your pricing. The bot confidently gives the old pricing from 9 months ago because nobody re-ingested the pricing page when you updated it. The customer is now angry at you for giving them outdated information, and the fault is that your ingest workflow ran exactly once, the day you set it up.

Fix: Re-run the ingest workflow automatically on every documentation change. The cleanest way is a webhook from your Git provider (GitHub, GitLab, or Bitbucket) that triggers n8n whenever a docs file changes on main. If your docs live in Notion, use Notion's webhook support. If your docs live in a CMS like Contentful or Sanity, they have webhooks too. The specific mechanism matters less than the rule: every documentation change triggers a re-embed.

As a belt-and-braces measure, also run the ingest on a weekly schedule to catch anything the webhook missed. Cost is negligible (under £1/week for a typical docs corpus).

3. Deletions are silent

You deleted a deprecated page from your docs. The vectors for that page are still in Pinecone. A customer asks about the deprecated feature, the bot retrieves the old chunks, the answer references a URL that is now a 404.

Fix: Track deletions explicitly. When your ingest workflow runs, compare the current list of document IDs against what is in Pinecone, and call the Pinecone delete endpoint for any IDs that no longer exist in the source. This requires maintaining a manifest file somewhere (a simple docs-manifest.json in the same repo as your docs, updated by the ingest workflow, works fine).

A simpler alternative: fully rebuild the index on a weekly schedule. Drop the whole index, re-embed everything, upsert. This is simple and bulletproof but costs more in embeddings (about £2-5 per full rebuild for a 500-page corpus) and takes 5-10 minutes where the bot is working with a partial index.

4. The bot gives confident wrong answers when the retrieval is bad

Sometimes the user asks a question that legitimately is not in the docs. Pinecone still returns five "top" matches, because cosine similarity always returns something. The model sees context that is vaguely topic-adjacent but not actually useful, and writes a confident answer that is subtly wrong. This is the worst failure mode because there is no error, just a wrong answer that looks authoritative.

Fix: Check the similarity scores on the retrieved chunks before handing them to the model. If the best match is below a threshold (I typically use 0.75 for text-embedding-3-small), do not call the model at all. Instead, return a pre-written "I do not have that information" response directly.

const THRESHOLD = 0.75;
const bestScore = Math.max(...matches.map((m) => m.score));

if (bestScore < THRESHOLD) {
  return [{
    json: {
      answer: "I don't have information about that in our documentation. " +
              "Please email support@acme.com and a human will get back to you.",
      sources: [],
    }
  }];
}

This will occasionally refuse to answer questions the bot could reasonably have handled. That's OK. A refusal is recoverable, a confident wrong answer is not.

One thing worth naming before you ship this: RAG of this kind is only as good as the documentation it reads from. If your docs are sparse, written inconsistently, or contradict themselves across pages, the bot will reflect all of that exactly. No amount of prompt tuning fixes bad source material. Before you build this, spend half a day reading your own documentation as if you were a new customer and fix the worst pages first. You will get more accuracy improvement from an hour of documentation editing than from a week of vector search tuning.

Cost breakdown

n8n Cloud Starter: £20/mo (or £5/mo self-hosted on a small VPS)
Pinecone: Starter tier free, or Standard £50/mo if you need more vectors
OpenAI API (embeddings and GPT-4o-mini for queries): depends on volume, typically £10-30/mo for small teams
Total: £30 to £100/mo depending on scale

Per-query cost: roughly 0.05 pence using GPT-4o-mini. A bot handling 1000 queries a month costs about 50p in inference, and maybe £2-5 in embeddings for ingest.

The other thing to build in from day one is routing. The bot is not going to replace a human for anything emotionally complex. Refunds, billing disputes, angry escalations: wire those to a human immediately via a keyword or sentiment check at the top of the webhook workflow. Use RAG for the 80 percent of questions that are genuinely looking for information, not for the 20 percent that need judgement. Your support team will notice the difference within a week, and so will your customers.