Manufacturing quality inspection report generation from shop floor photos

Factory floors generate hundreds of quality inspection photos each day, yet supervisors still spend their afternoons manually writing up findings into compliance reports.

A supervisor takes a photo of a faulty component, notes some observations aloud, then sits at a desk typing out standardised forms that repeat the same categories every time. The whole process wastes skilled labour on documentation that should be generated automatically. The gap exists because quality inspection data lives in photos and voice notes, whilst compliance systems expect structured text. Getting from one to the other requires several steps: image analysis to spot defects, transcription of verbal notes, data extraction into a consistent format, and finally, report generation. Most factories either skip automation entirely or build brittle custom solutions that break when someone changes the form template. This workflow solves that problem by connecting computer vision, transcription, and document generation in a single automated chain. A supervisor photographs a faulty part, records a quick voice memo describing what went wrong, and a completed compliance report appears in their system within seconds. No typing required.

The Automated Workflow

Choosing an orchestration tool

For this workflow, n8n is the best choice.

It runs on your own infrastructure (or cloud hosting), handles image and audio files without complaint, and connects directly to OpenAI APIs without rate-limiting surprises. Zapier would work but charges per task, and Make's file handling is slower for large batches of inspection photos. n8n costs nothing to self-host and scales to hundreds of daily inspections without additional fees.

Data flow overview

The workflow runs in this sequence: 1. A supervisor uploads a photo and voice memo to a shared folder (Google Drive, Dropbox, or local storage) 2. n8n detects the new files and downloads them 3. GPT-4o analyses the photo and extracts defect details 4. Whisper API transcribes the voice memo 5. Claude Opus 4.6 combines both the image analysis and transcription into a structured inspection record 6. Deepnote generates a formatted compliance report using that structured data 7. The finished report is saved to the inspection archive and a Slack notification alerts the supervisor

Setting up n8n

Install n8n on a server or use their cloud hosting. Create a new workflow with these nodes in order.

Node 1: File trigger

Start with a Google Drive trigger to watch for new files in your quality inspection folder.

Trigger Type: Google Drive
Event: File created
Folder: /Quality Inspections
File types: Images (.jpg, .png), Audio (.m4a, .wav)

Node 2: Download files

Use the Google Drive download node to fetch each file locally for processing.

Input: File ID from trigger
Output: Binary file data

Node 3: Image analysis with GPT-4o

Send the photo to OpenAI's vision API. This extracts defect type, severity, location, and other structured observations.

POST https://api.openai.com/v1/chat/completions { "model": "gpt-4o", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Analyse this factory floor quality inspection photo. Extract: defect type, severity (1-5 scale), location on component, root cause hypothesis, and recommended action. Return as JSON only." }, { "type": "image_url", "image_url": { "url": "data:image/jpeg;base64,{{ $node.GoogleDrive.binary.data }}" } } ] } ], "temperature": 0.3, "max_tokens": 500
}

Set temperature to 0.3 to ensure consistent, reliable analysis. The response will be structured JSON like this:

json
{ "defect_type": "Surface scratch", "severity": 3, "location": "Top left corner of housing", "root_cause": "Abrasive contact during packaging", "recommended_action": "Adjust padding in packaging materials"
}

Node 4: Transcribe audio with Whisper API

Convert the voice memo to text. You can use Whisper API directly or OpenAI's standard Whisper endpoint.

POST https://api.openai.com/v1/audio/transcriptions Headers:
Authorization: Bearer YOUR_OPENAI_API_KEY Body (form-data):
file: [audio file from Node 1]
model: whisper-1
language: en
temperature: 0

The response includes the transcribed text:

json
{ "text": "This is a surface defect on the housing. Looks like it happened during the moulding process. We should check the tooling for wear."
}

Node 5: Combine analysis with Claude Opus 4.6

Use Claude to merge the GPT-4o image analysis and Whisper transcription into a single structured inspection record. Claude is better at combining multiple data sources into coherent formats.

POST https://api.anthropic.com/v1/messages { "model": "claude-opus-4.6", "max_tokens": 1024, "messages": [ { "role": "user", "content": "You have an image analysis result and a voice memo transcription from a factory quality inspection. Combine them into a single structured inspection record with these fields: summary, severity, root_cause, supervisor_notes, recommended_corrective_action. Use the voice memo to add context and the image analysis for technical details.\n\nImage Analysis:\n{{ $node.GPT4o.json.defect_type }}\n{{ $node.GPT4o.json.severity }}\n{{ $node.GPT4o.json.location }}\n\nVoice Memo Transcription:\n{{ $node.Whisper.json.text }}\n\nReturn as JSON only." } ]
}

Node 6: Generate report in Deepnote

Deepnote notebooks can accept HTTP requests via webhooks. Create a notebook with a report template that accepts JSON from n8n, populates a formatted compliance document, and saves it.

Webhook URL: https://your-deepnote-workspace.deepnote.com/webhooks/inspection-report Payload:
{ "inspection_id": "{{ $node.GoogleDrive.context.fileName }}", "timestamp": "{{ now }}", "defect_summary": "{{ $node.Claude.json.summary }}", "severity": "{{ $node.Claude.json.severity }}", "location": "{{ $node.Claude.json.location }}", "root_cause": "{{ $node.Claude.json.root_cause }}", "supervisor_notes": "{{ $node.Claude.json.supervisor_notes }}", "corrective_action": "{{ $node.Claude.json.recommended_corrective_action }}"
}

In your Deepnote notebook, accept this payload and render it using Python:

python
from datetime import datetime
from jinja2 import Template inspection_data = webhook_payload # Automatically available in Deepnote webhooks report_template = """
QUALITY INSPECTION REPORT
Date: {{ timestamp }}
Inspection ID: {{ inspection_id }} DEFECT SUMMARY
{{ defect_summary }} SEVERITY LEVEL: {{ severity }}/5 LOCATION
{{ location }} ROOT CAUSE ANALYSIS
{{ root_cause }} SUPERVISOR NOTES
{{ supervisor_notes }} CORRECTIVE ACTION REQUIRED
{{ corrective_action }} Report generated automatically via Deepnote and n8n.
""" template = Template(report_template)
report_html = template.render(inspection_data) # Save to file system or send to storage
with open(f"reports/inspection_{inspection_id}.html", "w") as f: f.write(report_html)

Node 7: Notify supervisor

Send a Slack message with a link to the completed report.

POST https://hooks.slack.com/services/YOUR/WEBHOOK/URL { "text": "Quality inspection report ready", "blocks": [ { "type": "section", "text": { "type": "mrkdwn", "text": "*New Inspection Report Generated*\n\nDefect: {{ $node.Claude.json.summary }}\nSeverity: {{ $node.Claude.json.severity }}/5\n\n<https://your-deepnote-link/reports/{{ $node.GoogleDrive.context.fileName }}.html|View Report>" } } ]
}

Error handling

Add error catch nodes after GPT-4o and Claude steps. If image analysis fails (e.g. blurry photo), send an alert to the supervisor asking them to retake the photo rather than generating a report from incomplete data.

If GPT-4o response contains "unable to identify" or "image unclear": → Send Slack notification: "Photo quality issue. Please retake the inspection photo." → Mark inspection as failed and pause workflow

The Manual Alternative

If you prefer not to set up n8n, you can process inspections semi-manually using Claude Code.

After uploading a photo and recording a voice memo, open Claude with the Claude Code feature, paste your photo, and ask it to: analyse the defect, accept a text transcript of your voice notes, then generate a compliance report combining both. This takes 2-3 minutes per inspection instead of 30 seconds automatically, but requires no server setup. Use this approach if you're running fewer than 5 inspections per day or want to maintain tighter control over each report before it's filed.

Pro Tips

Rate limit strategy for high-volume inspections

If your factory processes 50+ inspections daily, GPT-4o's rate limits become a bottleneck.

Configure n8n to batch requests using the Batch node, sending 5 photos to GPT-4o simultaneously rather than one at a time. This spreads the rate limit across parallel processing and cuts total runtime by 60 percent.

Cost control: use GPT-4o mini for images

GPT-4o costs 0.015 USD per image. GPT-4o mini costs 0.00075 USD per image and performs nearly identically for defect detection. Switch to GPT-4o mini for all image analysis unless you're dealing with complex defects requiring expert-level reasoning. The savings add up quickly at scale.

Voice memo pre-processing

If supervisors record voice memos in noisy factory environments, add an audio enhancement step. Use Deepnote to apply noise reduction before sending to Whisper. This improves transcription accuracy by 15-20 percent and costs nothing extra.

Validate Claude's output format

Before feeding Claude's structured JSON into Deepnote, use n8n's validation node to ensure all required fields are present. If a field is missing or malformed, trigger a second Claude pass to fix it rather than letting broken reports reach the archive.

Version compliance templates

Store your compliance report template as a version-controlled document in Git. Update Deepnote's template logic from Git on a schedule, ensuring all supervisors use the latest regulatory template without manual coordination.

Cost Breakdown

Tool	Plan Needed	Monthly Cost	Notes
n8n	Self-hosted or Pro Cloud	£0–£25	Self-hosted is free; cloud Pro is £25/month for 200K executions
GPT-4o	OpenAI API pay-as-you-go	£0.75–£3	0.015 USD per image; 50 inspections per day costs ~£22/month
GPT-4o mini	OpenAI API pay-as-you-go	£0.04–£0.15	0.00075 USD per image; recommended for high volume
Whisper API	OpenAI API pay-as-you-go	£0.50–£2	0.02 USD per minute of audio; typical memo is 30 seconds
Claude Opus 4.6	Anthropic API pay-as-you-go	£0.50–£1.50	0.015 USD per 1K input tokens; used once per inspection for synthesis
Deepnote	Free or Pro	£0–£12	Free tier includes webhook access and basic notebook execution
Slack	Free or Pro	£0–£8	Free plan sufficient for notifications; Pro for advanced features
Total monthly		£2–£52	Scales linearly with inspection volume; 50 inspections/day = ~£30/month