Academic research paper summarisation and citation extraction
- Published
Reading through academic papers is genuinely time-consuming. You might spend an hour with a dense research paper only to find the key insights buried in section 4.2, and extracting citations for your reference list becomes a manual copy-paste exercise. If you're a researcher, student, or knowledge worker processing multiple papers weekly, this friction adds up quickly....... For more on this, see Academic research synthesis and citation-ready literature.... For more on this, see Academic literature review synthesis from research papers.
The good news is you don't need to read papers linearly anymore. Three specialised AI tools can work together to summarise papers, explain complex concepts, and extract citations automatically. Combined with an orchestration platform, you can process a PDF in under a minute with zero manual handoff between steps. Feed in the PDF, get back a summary, key concepts explained, and a structured list of citations.
This workflow is surprisingly simple to set up, even if you've never built an automation before. We'll walk through how to connect these tools so they talk to each other and why this approach saves researchers hours every month.
The Automated Workflow
The workflow follows a logical sequence: upload a paper, extract its content, summarise it, explain key concepts, and finally pull out citations. We'll use n8n for orchestration because it handles file uploads cleanly and connects to multiple APIs without requiring paid connectors.
Architecture Overview
Here's how data flows through the system:
- You trigger the workflow by uploading a PDF (via webhook or direct upload).
- Chat with PDF by Copilotus extracts and reads the paper content.
- Resoomer AI generates a concise summary.
- ExplainPaper processes complex sections for plain-language explanations.
- A final step extracts citations from the paper and formats them as structured data.
- Results are saved to a JSON file or sent to your preferred destination (email, Slack, spreadsheet).
Setting Up with n8n
n8n is the best choice here because it handles file uploads natively, offers good HTTP flexibility, and is self-hosted or cloud-based without connector paywalls. You'll need an n8n account (free tier works fine for this workflow) and API keys from each of the three AI services.
First, gather your credentials:
-
Chat with PDF by Copilotus: API key from their dashboard.
-
Resoomer AI: API key from account settings.
-
ExplainPaper: API key (required for programmatic access).
Create a new workflow in n8n. Start with a Webhook trigger that accepts POST requests with a file URL or base64-encoded PDF data.
Step 1:
Receive and Validate the PDF
{
"method": "POST",
"path": "workflow/process-paper",
"expectedBody": {
"pdfUrl": "https://example.com/paper.pdf",
"paperTitle": "Optional paper title",
"userEmail": "researcher@example.com"
}
}
In n8n, add a Webhook node set to POST and configure it to accept the structure above. This is your entry point. The workflow will wait for a request with a PDF URL.
Next, add an HTTP Request node to download and validate the PDF. This ensures the file exists and is readable before passing it to the AI tools.
{
"method": "GET",
"url": "{{ $json.pdfUrl }}",
"responseFormat": "arraybuffer",
"headers": {
"User-Agent": "ResearchAutomation/1.0"
}
}
Configure the node to return the file as an array buffer. This gives you the raw PDF data you'll pass to the next step.
Step 2:
Extract Content with Chat with PDF by Copilotus
Chat with PDF by Copilotus allows you to upload a PDF and query it via API. The workflow extracts the full text and key sections.
Add an HTTP Request node for uploading to Chat with PDF:
{
"method": "POST",
"url": "https://api.copilotus.com/v1/documents/upload",
"headers": {
"Authorization": "Bearer {{ $env.COPILOTUS_API_KEY }}",
"Content-Type": "application/pdf"
},
"body": "{{ $json.pdfData }}",
"responseFormat": "json"
}
The API returns a document ID. Store this for the next query:
{
"method": "POST",
"url": "https://api.copilotus.com/v1/documents/{{ $json.documentId }}/query",
"headers": {
"Authorization": "Bearer {{ $env.COPILOTUS_API_KEY }}"
},
"body": {
"query": "Extract the abstract, methodology, and main findings as structured text",
"format": "json"
}
}
This returns the paper's core content in structured form. Save the response as paperContent for downstream steps.
Step 3:
Generate Summary with Resoomer AI
Resoomer AI specialises in summarising documents. Pass the extracted content to generate a concise summary.
Add an HTTP Request node:
{
"method": "POST",
"url": "https://api.resoomer.com/v2/summarize",
"headers": {
"Authorization": "Bearer {{ $env.RESOOMER_API_KEY }}",
"Content-Type": "application/json"
},
"body": {
"document": "{{ $json.paperContent }}",
"summaryLength": "medium",
"language": "en"
}
}
Configure the response to capture the summary text. Resoomer returns a shortened version highlighting key points and results.
Step 4:
Explain Complex Concepts with ExplainPaper
ExplainPaper converts dense academic language into accessible explanations. Use it on specific sections or abstract concepts the paper introduces.
Add another HTTP Request node:
{
"method": "POST",
"url": "https://api.explainpaper.com/v1/explain",
"headers": {
"Authorization": "Bearer {{ $env.EXPLAINPAPER_API_KEY }}",
"Content-Type": "application/json"
},
"body": {
"text": "{{ $json.abstract }}",
"depth": "intermediate",
"includeExamples": true
}
}
The service returns an explanation breakdown. You can run this multiple times for different sections (abstract, methodology, results) by chaining additional nodes.
Step 5:
Extract and Format Citations
This is where you add a custom function node to parse citations from the paper content. Most academic papers follow standard citation formats (IEEE, APA, Chicago).
Add a Function node in n8n:
const paperText = $json.paperContent;
const citationPattern = /\[(\d+)\]\s*(.+?)(?=\n\[|\n\n|$)/gs;
const citations = [];
let match;
while ((match = citationPattern.exec(paperText)) !== null) {
citations.push({
citationNumber: match[1],
citation: match[2].trim(),
format: "bracketed"
});
}
return {
citations: citations,
totalCitations: citations.length,
extractedAt: new Date().toISOString()
};
This regex captures bracketed citations. Adjust the pattern if your papers use different formats. For more sophisticated extraction, consider using a regex library or calling a dedicated citation API.
Step 6:
Save and Deliver Results
Add a final node to save the complete output. Use the File node to write JSON, or the Email node to send results:
{
"filename": "{{ $json.paperTitle }}_analysis.json",
"data": {
"title": "{{ $json.paperTitle }}",
"summary": "{{ $json.summary }}",
"explanation": "{{ $json.explanation }}",
"citations": "{{ $json.citations }}",
"processedAt": "{{ now() }}"
},
"format": "json"
}
Alternatively, send results via email:
{
"method": "POST",
"url": "https://api.sendgrid.com/v3/mail/send",
"headers": {
"Authorization": "Bearer {{ $env.SENDGRID_API_KEY }}"
},
"body": {
"personalizations": [{
"to": [{"email": "{{ $json.userEmail }}"}]
}],
"from": {"email": "automation@yourdomain.com"},
"subject": "Research Paper Analysis: {{ $json.paperTitle }}",
"content": [{
"type": "text/html",
"value": "<h2>{{ $json.summary }}</h2><h3>Citations</h3><ul>{{ $json.citations }}</ul>"
}],
"attachments": [{
"filename": "{{ $json.paperTitle }}_full_analysis.json",
"content": "{{ $json.analysisData }}"
}]
}
}
Deploy the workflow and test it with a sample PDF. Monitor execution logs for any API errors or timeouts.
The Manual Alternative
If you prefer more hands-on control, you can use these tools individually without automation. Open Chat with PDF by Copilotus in a browser, upload your paper, and ask specific questions about content. Use Resoomer's web interface to paste text and get summaries. Run ExplainPaper on sections you find confusing. Extract citations manually from the PDF viewer.
This approach gives you more flexibility to refine queries and cherry-pick sections, but it takes 15-20 minutes per paper instead of 2-3 minutes. It's better suited for single papers requiring deep analysis rather than processing large batches.
Pro Tips
1. Handle Large PDFs Carefully
Academic papers are typically 10-50 MB. Some APIs have size limits. If your workflow fails on large files, split the PDF into sections using a PDF library like PyPDF2 before uploading. In n8n, add a preprocessing step that checks file size and splits if necessary.
2. Rate Limiting and Quotas
All three services enforce rate limits. Chat with PDF by Copilotus typically allows 100 API calls per day on free plans. Resoomer AI is 50 summaries per day. ExplainPaper is 200 requests per month. If you're processing many papers, either batch them strategically or upgrade to paid plans. Add error handling in n8n to catch rate limit errors (HTTP 429) and implement exponential backoff.
3. Use Claude Code for Complex Citation Parsing
For papers with non-standard citation formats, consider running the citation extraction through Claude Code within n8n. Claude is excellent at understanding context and extracting structured data from messy text. Add a node that sends the paper text to Claude with a prompt asking for APA-formatted citations.
4. Cache Extracted Content
If you need to re-analyse the same paper with different questions, store the extracted content from Step 2 in a database. n8n integrates with PostgreSQL, MongoDB, and others. This saves API calls and speeds up subsequent analyses.
5. Set Realistic Timeouts
The API calls in this workflow can take 10-30 seconds each, especially if the PDF is large. Configure n8n HTTP nodes with a 60-second timeout to avoid premature failures.
Cost Breakdown
| Tool | Plan Needed | Monthly Cost | Notes |
|---|---|---|---|
| Chat with PDF by Copilotus | Free or Pro | £0-20 | Free tier: 100 calls/month; Pro: £20/month |
| Resoomer AI | Free or Premium | £0-15 | Free tier: 50 summaries/month; Premium: £15/month |
| ExplainPaper | Free or Standard | £0-10 | Free tier: 200 requests/month; Standard: £10/month |
| n8n | Cloud free or self-hosted | £0-50 | Cloud free tier sufficient; self-hosted is one-time setup |
| Total (minimal processing) | All free tiers | £0 | Supports up to ~50 papers/month across free plans |
| Total (heavy use) | All paid tiers | £45-85 | Covers 500+ papers/month with API headroom |
If you're a student or researcher processing 5-10 papers per month, the free tiers cover your needs entirely. Running costs stay under £0.50 per paper when using paid plans and processing in bulk.
The real value isn't just cost savings; it's time savings. You recover 10-15 hours monthly that would have been spent on manual summarisation and citation extraction. Build this workflow once, and it pays for itself in the first week.
More Recipes
Automated Podcast Production Workflow
Automated Podcast Production Workflow: From Raw Audio to Published Episode
Build an Automated YouTube Channel with AI
Build an Automated YouTube Channel with AI
Medical device regulatory documentation from technical specifications
Medtech companies spend significant resources translating technical specs into regulatory-compliant documentation.