Introduction
You've just received a product specification document as a PDF, and you need to extract technical details, identify every component, and generate a bill of materials for procurement. If you're doing this manually, you're looking at hours spent copying text, cross-referencing datasheets, and building spreadsheets. Product teams, manufacturing engineers, and supply chain managers face this task repeatedly, yet most still treat it as a manual process.
The good news: this workflow is entirely automatable. By combining Chat with PDF, Parspec AI, and PDnob Image Translator with a workflow orchestrator, you can go from PDF to structured bill of materials in minutes. The PDF arrives, the specifications are extracted, component identifications are automated, and the BOM is generated without anyone touching a keyboard in between.
This approach works whether your specs are text-based, include images, or mix both formats. The workflow adapts to whatever your suppliers send you, making it practical for real manufacturing environments where documentation standards vary wildly.
The Automated Workflow
Which Orchestration Tool to Use
For this workflow, n8n is the strongest choice. Unlike Zapier, which charges per task, n8n runs self-hosted or cloud-managed with no per-execution fees. Given that BOMs can involve multiple files and iterative processing, n8n's pricing model makes sense. Make (Integromat) would also work well; Zapier is viable but becomes expensive at scale. Claude Code offers the most control if you want to build custom logic, but you'll need to manage API calls yourself.
We'll walk through the n8n implementation first, then show the variations for other tools.
Step 1: Trigger on PDF Upload
The workflow starts when a PDF lands in your system. This could be via email, cloud storage, or a direct upload endpoint.
Webhook trigger configuration in n8n:
{
"trigger_type": "webhook",
"method": "POST",
"auth": "bearer_token",
"expected_data": {
"file_url": "string",
"file_name": "string",
"metadata": {
"supplier": "string",
"product_line": "string"
}
}
}
If using cloud storage, set up a file watch trigger instead. In n8n, use the "Google Drive" or "OneDrive" node to monitor a specific folder. When a new PDF appears, the workflow triggers automatically.
Step 2: Extract Text and Metadata with Chat with PDF
Chat with PDF (by Copilotus) accepts a file URL and returns structured text. The API is straightforward; you send the PDF URL and a prompt asking for specific extraction.
POST https://api.copilotus.com/v1/chat-pdf
{
"pdf_url": "{{ $json.file_url }}",
"messages": [
{
"role": "user",
"content": "Extract all product specifications, technical parameters, and component names from this document. Format as JSON with keys: product_name, specifications (array), components (array), images_present (boolean)"
}
],
"response_format": "json"
}
In n8n, add an HTTP Request node and configure it like this:
Node type: HTTP Request
Method: POST
URL: https://api.copilotus.com/v1/chat-pdf
Authentication: Bearer token
Headers:
- Authorization: Bearer YOUR_API_KEY
- Content-Type: application/json
Body (JSON):
{
"pdf_url": "{{ $json.file_url }}",
"messages": [
{
"role": "user",
"content": "Extract product specifications and component list. Return JSON with: product_name, specifications array, component_names array, document_notes"
}
],
"response_format": "json"
}
Save the response. This gives you structured text, but PDFs with diagrams and schematics often contain images. That's where the next step matters.
Step 3: Handle Images with PDnob Image Translator
If the PDF contains circuit diagrams, mechanical drawings, or component photographs, you need to extract and interpret those images. PDnob Image Translator converts image content into descriptive text and structured data.
First, check whether images were detected in Step 2. If so, extract the images from the PDF using an additional node.
Node type: HTTP Request
Method: POST
URL: https://api.pdnob.com/v1/pdf-extract-images
{
"pdf_url": "{{ $json.file_url }}",
"include_metadata": true
}
This returns image URLs and metadata. For each image, send it to PDnob Image Translator:
POST https://api.pdnob.com/v1/image-translator
{
"image_url": "{{ $json.images[0].url }}",
"context": "Extract component identifiers, values, and specifications from technical diagrams or schematics",
"output_format": "structured_json"
}
In n8n, add a Loop node to iterate over all images, then call PDnob for each:
Node type: n8n Loop
Input: Images array from previous step
For each image:
- HTTP Request to PDnob Image Translator
- Parse response into components array
- Merge with text-based components from Step 2
The PDnob response might look like:
{
"detected_elements": [
{
"type": "component",
"identifier": "U1",
"part_name": "LM7805",
"description": "Linear voltage regulator",
"values": {"input_voltage": "12V", "output_voltage": "5V"},
"confidence": 0.94
},
{
"type": "component",
"identifier": "C1",
"part_name": "Capacitor",
"value": "10uF",
"voltage_rating": "16V",
"confidence": 0.87
}
],
"diagram_summary": "Power supply schematic with input conditioning and output regulation"
}
Merge this with data from Step 2 to create a comprehensive component list.
Step 4: Identify Components and Find Specifications with Parspec AI
Now you have a list of component names and identifiers. Parspec AI specialises in finding component specifications and datasheets. For each component, query Parspec to get exact part numbers, alternative suppliers, and electrical characteristics.
POST https://api.parspec.ai/v1/component-search
{
"query": "{{ $json.component_name }}",
"additional_context": {
"value": "{{ $json.component_value }}",
"voltage_rating": "{{ $json.voltage_rating }}"
},
"return_fields": ["part_number", "manufacturer", "datasheet_url", "suppliers", "available_stock", "unit_cost"]
}
In n8n, use a second Loop node to iterate over all components:
Node type: n8n Loop
Input: Merged components array
For each component:
- HTTP Request to Parspec AI
- Extract part_number, manufacturer, suppliers
- Store in structured format
The Parspec response includes supplier options:
{
"part_number": "LM7805CT",
"manufacturer": "Texas Instruments",
"description": "Positive Fixed Output Linear Voltage Regulator, 5V",
"datasheet": "https://www.ti.com/lit/ds/symlink/lm7805.pdf",
"suppliers": [
{
"name": "Digi-Key",
"sku": "LM7805CT-ND",
"price_usd": 0.45,
"in_stock": 1250,
"lead_time_days": 0
},
{
"name": "Mouser",
"sku": "511-LM7805CT",
"price_usd": 0.48,
"in_stock": 890,
"lead_time_days": 1
}
]
}
Store these results in a database or spreadsheet for the next step.
Step 5: Generate Bill of Materials
Combine all the extracted and identified data into a structured BOM. Use an n8n node to format the data and either save it to a spreadsheet, database, or generate a PDF report.
Node type: Set (n8n variable builder)
Create BOM structure:
{
"bom": {
"product_name": "{{ $json.product_name }}",
"document_source": "{{ $json.file_name }}",
"generated_date": "{{ now().toISOString() }}",
"line_items": [
{
"reference_designator": "U1",
"description": "Linear voltage regulator",
"part_number": "LM7805CT",
"manufacturer": "Texas Instruments",
"quantity": 1,
"unit_cost_usd": 0.45,
"total_cost_usd": 0.45,
"preferred_supplier": "Digi-Key",
"alternative_suppliers": ["Mouser"],
"lead_time_days": 0
}
],
"summary": {
"total_line_items": 24,
"total_cost_usd": 127.50,
"longest_lead_time_days": 7
}
}
}
For output, you have multiple options:
-
Google Sheets: Add a Google Sheets node to create or update a spreadsheet with one row per component.
-
Database: Use PostgreSQL or another database node to store the BOM in a relational schema for future queries.
-
PDF Report: Call a reporting service like Puppeteer or use Make's PDF builder to generate a formatted bill of materials document.
Here's an example using Google Sheets:
Node type: Google Sheets
Operation: Append
Spreadsheet ID: YOUR_SPREADSHEET_ID
Sheet Name: BOM
Values:
[
["Reference Designator", "Description", "Part Number", "Manufacturer", "Quantity", "Unit Cost", "Total Cost", "Supplier"],
["{{ $json.line_items[0].reference_designator }}", "{{ $json.line_items[0].description }}", ...]
]
To create multiple rows at once, use:
Node type: Google Sheets
Operation: Append or Update
Data mapping:
For each item in BOM line_items array:
Row values: [reference_designator, description, part_number, manufacturer, quantity, unit_cost, total_cost, supplier]
Step 6: Send Notifications and Archive
Once the BOM is generated, notify the relevant team and store the PDF for audit purposes.
Node type: Email
To: supply-chain@company.com
Subject: BOM Generated - {{ $json.product_name }}
Body:
Bill of materials for {{ $json.product_name }} has been generated.
Total cost: ${{ $json.summary.total_cost_usd }}
Longest lead time: {{ $json.summary.longest_lead_time_days }} days
View here: [Link to Sheets or PDF]
Node type: Cloud Storage (Google Drive, OneDrive, S3)
Action: Upload
File: BOM_{{ $json.product_name }}_{{ now().toISOString() }}.json
Folder: /BOMs/Archives/
Complete n8n Workflow JSON Structure
Here's a simplified overview of the complete workflow:
Webhook Trigger
↓
HTTP Request (Chat with PDF)
↓
Conditional: Images present?
├─ Yes → HTTP Request (Extract images from PDF)
│ ↓
│ Loop (For each image)
│ ↓
│ HTTP Request (PDnob Image Translator)
│ ↓
│ Merge with text components
├─ No → Skip image extraction
↓
Loop (For each component)
↓
HTTP Request (Parspec AI component search)
↓
Set (Store component details)
↓
Set (Build BOM JSON structure)
↓
Google Sheets (Create BOM rows)
↓
Email (Send notification)
↓
Cloud Storage (Archive BOM and metadata)
The Manual Alternative
If you prefer human review at specific points, the workflow adapts easily. After Step 2 (Chat with PDF extraction), add a human approval node. The extracted specifications and component list appear in an n8n form, and a team member confirms or corrects the data before proceeding to Parspec AI. This adds 10-15 minutes per BOM but catches errors early if your PDFs are inconsistently formatted or contain unclear diagrams.
Alternatively, run the full automation but send the generated BOM to Slack with a "Review and Approve" button. Clicking approve uploads it to your system; clicking "Needs Changes" triggers a notification to revisit the PDF manually. This balances speed with control.
For organisations with strict supplier approval processes, add a final approval step where the BOM is reviewed against your approved supplier list. If Parspec recommends an unapproved supplier, flag it for procurement review before finalizing the BOM.
Pro Tips
Error Handling and Retries
PDFs vary wildly in quality and format. If Chat with PDF returns incomplete data, use n8n's Retry node to re-call the API with a more specific prompt. Set a maximum of 2 retries to avoid infinite loops.
HTTP Request (Chat with PDF)
↓
Retry Logic:
- If response.components.length < 2: Retry with refined prompt
- If error == "timeout": Retry after 5 seconds
- If error == "rate_limit": Wait 60 seconds and retry
- Max 2 retries, then proceed with partial data
Log failures to a database or Slack channel so you know which PDFs need manual review.
Rate Limiting
Parspec AI has rate limits (typically 100 requests per minute for standard plans). If your BOM has 50 components, you'll hit the limit. Add a 1-second delay between Parspec requests using n8n's Wait node:
Loop (components)
↓
HTTP Request (Parspec)
↓
Wait 1 second
↓
Next iteration
Chat with PDF is generally more forgiving, but if you process multiple PDFs simultaneously, queue them with 5-second delays between calls.
Cost Savings Through Caching
If the same components appear across multiple BOMs, cache Parspec results. Store component data in a local database with a 7-day TTL. Before calling Parspec, check the cache first:
For each component:
- Query local database for part_number
- If found and recent: Use cached data
- If not found or outdated: Call Parspec, store result
This reduces API calls by 40-60% and drops your Parspec costs significantly.
Handling Missing or Ambiguous Components
If Parspec can't identify a component (confidence score below 0.7), flag it in the BOM with a status "Requires Manual Verification". Include a link to the datasheet or schematic so procurement can research it manually. Don't leave these blank; they're critical for supply chain planning.
Operator-Specific Customisation
Different organisations have different BOM requirements. Add a configuration object to your workflow:
{
"bom_config": {
"include_supplier_alternatives": true,
"max_suppliers_per_component": 3,
"preferred_suppliers": ["Digi-Key", "Mouser", "Arrow"],
"currency": "GBP",
"include_lead_time": true,
"include_rosh_compliance": true,
"include_eco_alternatives": false
}
}
Pass this to Parspec and your BOM builder so different teams get exactly the fields they need. This prevents bloat in spreadsheets and keeps your BOM format consistent across the organisation.
Cost Breakdown
| Tool | Plan Needed | Monthly Cost | Notes |
|---|---|---|---|
| Chat with PDF (Copilotus) | Standard API | £15-40 | Per 1,000 API calls. Most BOMs use 1-2 calls per PDF. |
| Parspec AI | Professional | £60-150 | 100+ component searches per month. Cache hits reduce usage. |
| PDnob Image Translator | Standard | £20-50 | Only used if PDFs contain diagrams. Many documents are text-only. |
| n8n Self-Hosted | Cloud Team Plan | £25/month | Unlimited executions. Alternative: self-host for zero cost (DevOps overhead). |
| n8n Cloud | Standard | £20/month | 2M execution budget per month, usually sufficient. |
| Google Sheets API | Free (with G Suite account) | Included | No separate cost if using existing G Suite subscription. |
| Email service | Gmail API | Free | Included with G Suite or use SendGrid (free tier: 100/day). |
| Total (typical small team) | £140-250 | Covers 100+ BOMs per month. Cost per BOM: £1.40-£2.50. | |
| Total (with high volume, 500+ BOMs) | £200-400 | Caching and optimisation reduce per-BOM cost to £0.40-£0.80. |
The ROI is substantial. A manual BOM typically takes 1-2 hours, costing £25-50 in labour (UK average £20-30/hour). Automating 50 BOMs per month saves £1,250-2,500 in labour alone, easily justifying the tool costs.