Introduction
Medical researchers face a genuine problem: keeping pace with published literature whilst extracting actionable clinical insights. A single research group might need to digest dozens of papers weekly, yet manually reading, summarizing, and cross-referencing findings is labour-intensive and error-prone. You end up with PDFs scattered across folders, highlights that fade from memory, and insights that never make it into your clinical practice or grant applications.
What if you could ingest a medical research paper, automatically extract its key findings, identify clinical implications, and have structured data ready for your EHR or research database, all without opening a single PDF yourself? This workflow combines three specialised AI tools with an orchestration layer to create a fully automated pipeline. A paper lands in your inbox or a shared folder; minutes later, you have a structured summary, concept analysis, and clinical recommendations.
The challenge here is not finding tools that can do each step individually. The challenge is connecting them so data flows smoothly from summarisation to interpretation to storage. That is where this Alchemy workflow excels. You will need some familiarity with API calls and webhook configuration, but the result justifies the technical lift: a medical research intelligence system that runs itself.
The Automated Workflow
We will use n8n as our orchestration engine because it offers visual workflow building with strong support for HTTP requests, file handling, and conditional logic. Make and Zapier could work here too, but n8n gives you more granular control over data transformation between tools without hitting API call limits as quickly.
Architecture Overview
The workflow follows this sequence:
-
Trigger: A PDF lands in a folder (or email attachment arrives).
-
Summarisation: ai-pdf-summarizer-by-pdf-guru-ai extracts a high-level summary and key points.
-
Deep Analysis: ExplainPaper breaks down methodology, results, and clinical relevance.
-
Clinical Extraction: Terrakotta AI identifies specific clinical insights, contraindications, and patient populations affected.
-
Storage: Structured data flows into a database or clinical notes system.
The critical design choice: we do not manually copy data between tools. Each tool's output becomes the next tool's input through API calls orchestrated by n8n.
Setting Up n8n
First, install n8n locally or use their cloud service. You will need API keys for each tool.
Step 1: Create your n8n workflow
Log in to n8n and create a new workflow. Name it something like "Medical Paper Intelligence Pipeline". You will see a canvas where you can add nodes.
Step 2: Trigger node
Add a Webhook node as your trigger. This allows PDFs or metadata to initiate the workflow:
POST /webhook/medical-papers
Configure it to accept file URLs or base64-encoded PDFs. The payload should look like this:
{
"paper_title": "Novel approach to NAFLD treatment",
"pdf_url": "https://example.com/papers/nafld-2024.pdf",
"source": "PubMed",
"received_date": "2024-01-15"
}
If you are using email triggers, use n8n's Gmail node to watch for new attachments with specific labels (e.g., "Medical Papers"). Extract the attachment and convert it to base64.
Node 1:
PDF Summarisation
Add an HTTP Request node to call the ai-pdf-summarizer-by-pdf-guru-ai API. This tool requires your PDF as input and returns bullet-point summaries and key findings.
API Endpoint:
POST https://api.pdfguru.ai/v1/summarize
Headers:
Authorization: Bearer YOUR_PDF_GURU_API_KEY
Content-Type: application/json
Request Body:
{
"pdf_url": "{{ $json.pdf_url }}",
"summary_length": "medium",
"focus": "clinical"
}
The variable {{ $json.pdf_url }} pulls the URL from your webhook trigger. Set the summary length to medium; this gives you detail without overwhelming the next step.
Response Handling:
The API returns something like this:
{
"summary": "This randomised controlled trial of 342 participants compared...",
"key_findings": [
"Primary outcome met with 68% response rate (p < 0.001)",
"Adverse events comparable to placebo",
"Effect sustained at 12-month follow-up"
],
"study_type": "RCT",
"sample_size": 342,
"duration": "12 months"
}
Store this output in a variable called summaryResult. In n8n, use the Set node to preserve this data for downstream steps.
Node 2:
Methodology and Clinical Context Analysis
Now feed the summary into ExplainPaper's API. This tool is particularly good at parsing research methodology and translating statistical findings into clinical language.
API Endpoint:
POST https://api.explainpaper.com/v1/analyse
Headers:
Authorization: Bearer YOUR_EXPLAINPAPER_API_KEY
Content-Type: application/json
Request Body:
{
"paper_title": "{{ $json.paper_title }}",
"summary": "{{ $node.HTTPRequest1.json.summary }}",
"key_findings": "{{ $node.HTTPRequest1.json.key_findings }}",
"analysis_type": "clinical_interpretation"
}
Note that we are referencing the output from the first HTTP node. ExplainPaper will return:
{
"methodology_explanation": "The authors used a double-blind design with stratified randomisation...",
"statistical_significance": "The p-value of 0.001 indicates high confidence in the result.",
"clinical_implications": [
"May be suitable for patients with moderate-to-severe disease",
"Consider drug interactions with common comorbidity treatments",
"Cost-benefit analysis suggests broader adoption is warranted"
],
"limitations": [
"Single-centre study limits generalisability",
"Follow-up period of 12 months may be insufficient for long-term safety",
"Exclusion criteria may limit applicability to real-world populations"
]
}
Again, store this as analysisResult.
Node 3:
Clinical Insight and Patient Population Extraction
This is where Terrakotta AI shines. It identifies specific patient phenotypes, contraindications, and actionable clinical recommendations.
API Endpoint:
POST https://api.terrakotta.ai/v1/clinical-extract
Headers:
Authorization: Bearer YOUR_TERRAKOTTA_API_KEY
Content-Type: application/json
Accept: application/json
Request Body:
{
"paper_summary": "{{ $node.HTTPRequest1.json.summary }}",
"clinical_implications": "{{ $node.HTTPRequest2.json.clinical_implications }}",
"limitations": "{{ $node.HTTPRequest2.json.limitations }}",
"extract_criteria": [
"patient_populations",
"contraindications",
"drug_interactions",
"monitoring_parameters",
"alternative_treatments"
]
}
Terrakotta's response structure looks like this:
{
"target_populations": [
{
"population": "Adults 18-75 with BMI > 25 and elevated transaminases",
"suitability": "high",
"confidence": 0.92
}
],
"contraindications": [
{
"condition": "Pregnancy",
"severity": "absolute"
},
{
"condition": "Severe renal impairment (eGFR < 30)",
"severity": "absolute"
}
],
"drug_interactions": [
{
"drug": "Warfarin",
"interaction": "May increase INR; monitor closely",
"management": "Check INR at baseline and 1 week"
}
],
"monitoring": [
"Liver function tests at weeks 4, 12, 24",
"Full blood count at baseline and 12 weeks"
],
"recommendations": "Consider for second-line therapy in non-responders to standard treatment"
}
Node 4:
Data Consolidation and Formatting
Add a Set node to combine all three outputs into a single structured document. This is crucial; without it, you have isolated data rather than an integrated report.
{
"paper_metadata": {
"title": "{{ $json.paper_title }}",
"source": "{{ $json.source }}",
"received_date": "{{ $json.received_date }}"
},
"summary": "{{ $node.HTTPRequest1.json.summary }}",
"key_findings": "{{ $node.HTTPRequest1.json.key_findings }}",
"clinical_context": {
"methodology": "{{ $node.HTTPRequest2.json.methodology_explanation }}",
"implications": "{{ $node.HTTPRequest2.json.clinical_implications }}",
"limitations": "{{ $node.HTTPRequest2.json.limitations }}"
},
"clinical_extraction": {
"target_populations": "{{ $node.HTTPRequest3.json.target_populations }}",
"contraindications": "{{ $node.HTTPRequest3.json.contraindications }}",
"drug_interactions": "{{ $node.HTTPRequest3.json.drug_interactions }}",
"monitoring": "{{ $node.HTTPRequest3.json.monitoring }}",
"recommendations": "{{ $node.HTTPRequest3.json.recommendations }}"
},
"processing_timestamp": "{{ new Date().toISOString() }}"
}
Node 5:
Storage and Distribution
Finally, save the consolidated result. You have several options depending on your infrastructure:
Option A: Save to Database
Add a SQL node to insert the structured JSON into a PostgreSQL database:
INSERT INTO medical_papers (
paper_title,
paper_source,
summary_json,
clinical_extraction_json,
created_at
) VALUES (
:title,
:source,
:summary_data,
:clinical_data,
NOW()
)
RETURNING id;
Map the n8n variables:
:title = {{ $json.paper_metadata.title }}
:source = {{ $json.paper_metadata.source }}
:summary_data = {{ JSON.stringify($json.summary) }}
:clinical_data = {{ JSON.stringify($json.clinical_extraction) }}
Option B: Send to Webhook
If your EHR or research platform has an API, send the consolidated data via POST:
POST https://your-ehr-system.com/api/literature-review
With headers:
Authorization: Bearer YOUR_EHR_API_KEY
Content-Type: application/json
And body containing your consolidated JSON.
Option C: Store as Document
Use n8n's file write node to save as JSON, then push to cloud storage:
Path: /medical-papers/{{ $json.paper_metadata.title }}-{{ $json.processing_timestamp }}.json
Then add a Google Drive or S3 node to upload it to your preferred storage.
Error Handling and Conditional Logic
Add an IF node after each API call to check for failures:
if ($node.HTTPRequest1.executionStatus === 'success') {
// Continue to next step
} else {
// Send alert email
return {
error: true,
message: 'PDF summarisation failed',
paper: $json.paper_title,
http_status: $node.HTTPRequest1.json.error
};
}
When failures occur, route to a Notification node (Slack, email, or PagerDuty) so your team knows immediately.
The Manual Alternative
If you need finer control over outputs before they flow to the next tool, you can use n8n's built-in pause nodes. After the summarisation step, insert a "Wait for Webhook" node that sends you an email with the summary and asks for approval before proceeding to analysis.
This is slower, obviously, but useful when papers are particularly novel or touch on sensitive clinical areas where you want human review before the system draws conclusions. You could also manually edit the summarisation result before it feeds into the Terrakotta extraction step.
Alternatively, if you want to skip the orchestration layer entirely, you could run each tool in sequence yourself: paste the PDF into ai-pdf-summarizer, copy the summary into ExplainPaper, then feed those results into Terrakotta. This works, but you will spend several minutes per paper switching contexts and copying text. For a steady stream of papers, that friction multiplies quickly.
Pro Tips
1. Rate Limiting and Cost Control
Each API call costs money. The ai-pdf-summarizer typically charges per page (roughly £0.01-0.05 per page depending on plan). If you are processing many papers, batch them. Create a "collect papers" workflow that waits until 10 papers arrive, then processes them all in parallel using n8n's loop functionality. This reduces overhead API calls and saves around 15-20% on total processing cost.
// In a Set node, maintain a collection
if (!$node.collectPapers) {
$node.collectPapers = [];
}
$node.collectPapers.push($json);
if ($node.collectPapers.length >= 10) {
// Trigger batch processing
}
2. Caching Summaries
If the same paper arrives from multiple sources (common with popular studies), cache the summary output in n8n's internal variable store. Before calling the PDF summariser, check if you have already processed this paper by matching DOI or title hash. Skip the API call if found.
3. Monitor Token Usage
ExplainPaper and Terrakotta use language models internally. Watch your monthly token usage; it can creep up if you have nested prompts. Request longer analyses sparingly, and use the "concise" mode for preliminary screening.
4. Handle Malformed PDFs Gracefully
Some papers from older archives or OCR sources may fail summarisation. Wrap your first API call in a try-catch block. If summarisation fails, check the PDF's integrity using a separate file validation service before alerting your team.
5. Version Your Workflow
Before making changes, export your n8n workflow as JSON and commit it to version control (GitHub, GitLab, etc.). If an API change breaks something, you can revert quickly. This is especially important if multiple team members are relying on the pipeline.
Cost Breakdown
| Tool | Plan Needed | Monthly Cost | Notes |
|---|---|---|---|
| ai-pdf-summarizer-by-pdf-guru-ai | Pay-as-you-go or Pro (500 pages/month) | £15-50 | £0.02-0.05 per page; Pro plan recommended for steady use |
| ExplainPaper | Standard API | £20-40 | Usage-based; 1000 analyses/month typically £25 |
| Terrakotta AI | Clinical Extraction Plan | £30-60 | Per-analysis pricing; approximately £0.05-0.10 per extraction |
| n8n | Self-hosted (free) or Cloud Pro | £0-50 | Self-hosted is free; Cloud starts at £25/month for small workflows |
| Database (PostgreSQL) | Cloud-hosted or local | £10-100 | AWS RDS starts at £10/month; local setup is free |
| Total | — | £75-300 | Scales with paper volume; 50-100 papers/month at lower end |
The cost-benefit trade-off heavily favours automation. A researcher spending 30 minutes per paper reviewing, summarising, and extracting clinical points costs roughly £15-25 in labour (at typical research salaries). This workflow handles the same task in under 2 minutes of elapsed time, paying for itself after processing 10-15 papers.
If you process papers sporadically, use the pay-as-you-go tier. If your institution processes 100+ papers monthly across multiple researchers, the subscription plans offer better value.
This workflow scales well. Once configured, it runs identically whether you process 5 papers or 500. Your infrastructure cost stays roughly constant whilst labour cost drops toward zero.