Product specification extraction and bill of materials generation

Product specification extraction from technical documents is a task that kills productivity. You receive a PDF datasheet, a product manual, or technical specification, and someone needs to manually read through it, identify key specifications, cross-reference component details, and build a bill of materials. This takes hours. It's error-prone. It's boring work that distracts from actual engineering............. For more on this, see Technical specifications extraction from product requirem....

The real frustration emerges when you need to do this repeatedly. Ten datasheets become fifty. Fifty becomes two hundred. By that point, you're either hiring someone to do manual data entry (expensive and slow) or building a custom script that breaks whenever document formats change slightly. Neither option is good.

This workflow solves that problem by chaining three AI tools together with zero manual handoff. You upload a document, and out the other end comes structured specifications and a complete bill of materials, ready to import into your inventory system or manufacturing planning software.

The Automated Workflow

This workflow uses four distinct steps: document ingestion, specification extraction, component identification, and bill of materials generation. We'll walk through using n8n as the orchestration layer, since it offers excellent support for file handling and provides good visibility into data flow. For more on this, see Game design document generation from concept pitch and me....

How the workflow works

Here's what happens at each stage:

A user uploads a PDF (product datasheet, technical manual, or specification sheet) to your workflow
Chat with PDF by Copilotus extracts the text and key information from the document
ParSpec AI parses the extracted specifications and identifies technical parameters
PDnob Image Translator AI processes any embedded schematics or diagrams to extract component references
The orchestration tool (n8n in this example) combines all outputs and formats them into a structured bill of materials

The critical insight here is that each tool handles what it does best. Chat with PDF is excellent at understanding document structure and answering questions about content. ParSpec specialises in parsing technical specifications into structured data. PDnob handles visual components. None of them require manual intervention between steps.

Setting up n8n for this workflow

Start a new workflow in n8n. You'll need the following nodes:

HTTP Request node to handle file uploads
Chat with PDF node
ParSpec AI node
PDnob Image Translator AI node
Function node to format the bill of materials
Output node to return the results

Create the trigger first. Use a Webhook node set to accept POST requests:


POST /workflows/product-specs
Content-Type: multipart/form-data

This endpoint becomes the API consumers call to submit documents.

Step 1:

Upload and extract via Chat with PDF

The Chat with PDF by Copilotus API accepts a PDF file and a query string. Your first n8n node should parse the incoming webhook and prepare the file.

Add an HTTP Request node configured like this:


Method: POST
URL: https://api.copilotus.com/v1/chat-with-pdf
Headers:
  Authorization: Bearer YOUR_COPILOTUS_API_KEY
  Content-Type: multipart/form-data

Body (form data):
  file: [incoming file from webhook]
  query: Extract all technical specifications, product name, version, and key features from this document
  response_format: json

The response includes extracted text and identified key sections. Store this in a variable for the next step.

Step 2:

Parse specifications with ParSpec AI

ParSpec AI takes unstructured text and returns structured specification data. It understands electrical parameters, mechanical dimensions, operating conditions, and component details.

Add a second HTTP Request node:


Method: POST
URL: https://api.parspec.ai/v1/parse
Headers:
  Authorization: Bearer YOUR_PARSPEC_API_KEY
  Content-Type: application/json

Body:
{
  "text": "{{ $node['Chat with PDF'].data.extracted_content }}",
  "document_type": "technical_specification",
  "include_metadata": true,
  "extract_components": true
}

ParSpec returns a JSON object containing structured specifications:

{
  "product_name": "Industrial Temperature Sensor",
  "product_version": "v2.1",
  "specifications": {
    "operating_temperature": {
      "min": -40,
      "max": 125,
      "unit": "celsius"
    },
    "accuracy": {
      "value": 0.5,
      "unit": "celsius"
    },
    "interface": "RS-485"
  },
  "components": [
    {
      "part_number": "NTC-103F",
      "description": "Thermistor",
      "quantity": 1
    }
  ]
}

Store this output for use in the next steps.

Step 3:

Extract visual component data

Some documents contain schematics or diagrams showing component placement and connections. PDnob Image Translator AI extracts component references from images embedded in the PDF.

Add another HTTP Request node to process any images from the original document:


Method: POST
URL: https://api.pdnob.com/v1/translate-image
Headers:
  Authorization: Bearer YOUR_PDNOB_API_KEY
  Content-Type: multipart/form-data

Body (form data):
  image: [image extracted from PDF]
  analysis_type: component_identification
  output_format: json

The response identifies components visible in diagrams:

{
  "detected_components": [
    {
      "reference_designator": "R1",
      "component_type": "resistor",
      "value": "10k",
      "notes": "pull-up resistor"
    },
    {
      "reference_designator": "C1",
      "component_type": "capacitor",
      "value": "100uF",
      "voltage_rating": "16V"
    }
  ]
}

Step 4:

Combine and format the bill of materials

This is where you use an n8n Function node to combine all extracted data into a structured bill of materials. The function receives the outputs from ParSpec and PDnob and merges them intelligently.

// Combine ParSpec structured specs with PDnob visual components
const parsedSpecs = $node['ParSpec AI'].data.specifications;
const visualComponents = $node['PDnob Image Translator'].data.detected_components;
const parspecComponents = $node['ParSpec AI'].data.components;

// Merge component lists
const allComponents = {};

// Add components from ParSpec
parspecComponents.forEach(comp => {
  const key = comp.part_number;
  allComponents[key] = {
    part_number: comp.part_number,
    description: comp.description,
    quantity: comp.quantity,
    source: 'specification_text'
  };
});

// Add components from visual analysis
visualComponents.forEach(comp => {
  const key = comp.reference_designator;
  if (!allComponents[key]) {
    allComponents[key] = {
      reference_designator: comp.reference_designator,
      component_type: comp.component_type,
      value: comp.value,
      quantity: 1,
      source: 'schematic_diagram'
    };
  }
});

// Format final bill of materials
const billOfMaterials = {
  product_name: parsedSpecs.product_name,
  product_version: parsedSpecs.product_version,
  generated_timestamp: new Date().toISOString(),
  specifications: parsedSpecs,
  bill_of_materials: Object.values(allComponents),
  component_summary: {
    total_line_items: Object.values(allComponents).length,
    total_parts: Object.values(allComponents).reduce((sum, item) => sum + (item.quantity || 1), 0)
  }
};

return { data: billOfMaterials };

This function ensures that components identified through different methods are deduplicated and organised logically.

Step 5:

Output and delivery

Add a final node to return the structured bill of materials. You can either:

Return it directly as a JSON response to the webhook caller
Store it in a database (MongoDB, PostgreSQL) for later retrieval
Send it to cloud storage (AWS S3, Google Cloud Storage)
Forward it to your CAD system or inventory management software via API

For direct return, use an HTTP Response node:


Status Code: 200
Body:
{
  "success": true,
  "data": "{{ $node['Format BOM'].data }}"
}

For storage in a database, use an n8n MongoDB node:


Collection: product_specifications
Operation: Insert
Data:
{
  "product_name": "{{ $node['Format BOM'].data.product_name }}",
  "bom": "{{ $node['Format BOM'].data.bill_of_materials }}",
  "created_at": "{{ $now.toISOString() }}",
  "source_document": "{{ $node['Webhook'].data.filename }}"
}

The Manual Alternative

If you prefer more control over the extraction process, you can build a semi-automated workflow using Claude Code. This approach extracts the document and shows you a preview of the identified specifications before committing them to your system.

Upload your PDF to a Claude Code environment, then use the following prompt:


I have a technical specification PDF. Please:
1. Extract all product specifications and parameters
2. Identify all component references and part numbers
3. List any operating conditions or environmental requirements
4. Format the results as JSON
5. Highlight any fields where you're uncertain about the extraction

Claude Code parses the document interactively, and you can ask follow-up questions if anything looks incorrect. Once satisfied, you export the JSON and import it into your system manually. This takes longer but gives you confidence in the data before it enters your database.

This approach works well if you process documents infrequently or if your documents have highly variable formats that require human judgment.

Pro Tips

1. Handle rate limiting gracefully. ParSpec AI and PDnob both have rate limits. If you're processing many documents, add a delay node between requests. Set it to 2-3 seconds between API calls:


n8n Delay Node Configuration:
Type: Duration
Duration: 3 seconds

This prevents you hitting rate limits and getting errors. You can also implement exponential backoff in a custom Function node if you expect occasional overages.

2. Validate extraction confidence scores. Most AI extraction APIs return confidence scores. Check them. If ParSpec returns a specification with confidence below 75%, flag it for manual review rather than importing it directly:

const specs = $node['ParSpec AI'].data.specifications;
const flaggedForReview = [];

Object.entries(specs).forEach(([key, value]) => {
  if (value.confidence && value.confidence < 0.75) {
    flaggedForReview.push({ field: key, confidence: value.confidence });
  }
});

if (flaggedForReview.length > 0) {
  // Send email to engineer for manual review
  return { review_required: true, flagged_items: flaggedForReview };
}

This prevents bad data from silently entering your system.

3. Deduplicate components intelligently. When you merge outputs from multiple tools, you'll sometimes get the same component referenced differently. Use fuzzy matching to catch these:

const Levenshtein = require('levenshtein');

function isSameComponent(comp1, comp2) {
  // Check exact match first
  if (comp1.part_number === comp2.part_number) return true;
  
  // Check fuzzy match on description
  const distance = new Levenshtein(
    comp1.description.toLowerCase(),
    comp2.description.toLowerCase()
  ).distance;
  
  return distance < 3; // Allow up to 2 character differences
}

This prevents duplicate line items in your bill of materials.

4. Cache documents for faster re-processing. If you need to extract different data from the same document later, store the extracted content from Chat with PDF in your database. Re-processing the same file is much faster than re-uploading and re-parsing it:

// Check if we've already extracted this document
const cachedExtraction = await mongodb.findOne({
  collection: 'extracted_documents',
  query: { source_hash: md5(document_content) }
});

if (cachedExtraction) {
  return cachedExtraction.data; // Skip re-extraction
}

This saves API costs and processing time, especially valuable if you maintain a reference library of datasheets.

5. Set up error notifications. When a document fails to process, you need to know about it. Add a conditional node that sends an alert if any step fails:


Conditions:
If Chat with PDF status !== 200
OR ParSpec AI status !== 200
OR PDnob Image Translator status !== 200

Then: Send Slack message to #engineering-alerts
Message: "BOM extraction failed for {{ filename }}. Error: {{ error_message }}"

This prevents failed extractions from silently disappearing into the ether.

Cost Breakdown

Tool	Plan Needed	Monthly Cost	Notes
Chat with PDF by Copilotus	Professional	£49	Includes 5,000 page extractions; excess pages at £0.01 each
ParSpec AI	Standard	£79	Covers 10,000 specifications; additional at £0.008 per spec
PDnob Image Translator AI	Business	£99	Includes 2,000 image analyses; overage at £0.05 per image
n8n	Self-hosted (free) or Cloud Pro	£0–£100	Self-hosted is free; Cloud Pro is £100/month for higher execution limits
Make (Integromat)	Alternative orchestration	£9.99–£299	Pay-as-you-go if you prefer; generally cheaper for low volume
Zapier	Alternative orchestration	£19.99–£599	More expensive but easier setup; good if you lack technical resources

For a typical workflow processing 50 documents per month with average 10 specifications and 5 images per document, expect around £200–£230 monthly if self-hosting n8n. If you opt for cloud orchestration or need higher processing volumes, costs scale accordingly.

The real saving comes from not hiring someone to do this manually. A technical writer or engineer spending 2-3 hours per document extraction costs far more than these tools combined.