AI Data Preparation · PDF Annotation

Annotate PDFs. Train Better AI.

Makroly is a precision PDF annotation platform built for AI teams. Define custom entity schemas, highlight structured information from any document, and export clean JSON datasets — ready to fine-tune your language models and document AI pipelines.

Makroly PDF annotation workspace screenshot
0×Faster than manual labeling
0%Locally processed — private
0Setup required — open in browser
Features

Everything your AI team needs

Purpose-built for the specific challenges of creating structured training data from real-world PDF documents.

🗂️

Schema-Driven Annotation

Define named entity types with typed properties (text, number, date, enum). Every annotation is validated against your schema — no free-form noise.

🖊️

Precise Text Span Selection

Select exact text fragments from rendered PDFs with pixel-perfect accuracy. Supports multi-span annotations for discontinuous evidence.

🏗️

Implicit Object Support

Annotate entities implied by context but not explicitly mentioned — critical for comprehensive document understanding model training.

📤

Structured JSON Export

Export annotations as clean, schema-aligned JSON ready to feed directly into your fine-tuning or RAG pipelines. No post-processing needed.

🔄

Schema Import / Export

Save and reuse annotation schemas across documents and projects. Share schemas with your team for consistent multi-annotator datasets.

🔒

100% Browser-Based & Private

No server uploads. Your PDFs and annotations never leave your device. Fully local processing — compliant with sensitive data policies.

Workflow

From PDF to training data in minutes

A streamlined four-step workflow that replaces error-prone spreadsheet annotation with a structured, repeatable process.

1

Upload your PDF

Open any PDF document directly in the browser. Pages render with full fidelity — tables, figures, and mixed layouts all supported.

2

Define your schema

Create entity types matching your domain — invoices, contracts, medical reports, patents. Add typed properties and mark which are required.

3

Annotate with precision

Select text ranges on the PDF to link them to entities. Fill in property values. Add implicit objects for inferred information.

4

Export structured JSON

One click exports your complete annotation set as structured JSON, perfectly aligned with your schema. LLM-ready, immediately.

makroly — annotation export
// Exported annotation (JSON)
{
"schema": "invoice_extraction_v2",
"entities": [
{
"type": "LineItem",
"spans": [[142, 198]],
"properties": {
"description": "Cloud Hosting — Q1",
"amount": 4800,
"currency": "EUR"
}
}
]
}
Why Makroly

Built for AI, not for manual review

Traditional tools were designed for human review workflows — not for producing machine-readable training data for generative AI.

CapabilityTraditional ToolsMakroly
Schema-driven entity types❌ Free-form✅ Structured schemas
Typed properties on entities❌ Not supported✅ Text, number, date, enum
Implicit / inferred entities❌ Span-only✅ Full implicit object support
JSON export for AI pipelines⚠️ Requires conversion✅ Native structured JSON
Schema reuse across documents❌ Manual duplication✅ Import / export schemas
Data privacy⚠️ Cloud upload required✅ 100% local — no uploads
Setup complexity⚠️ Account + configuration✅ Open browser and go
Use Cases

What teams use Makroly for

📄

Invoice & Receipt Processing

Annotate line items, totals, vendors, and dates for document extraction model training.

⚖️

Legal Contract Analysis

Tag clauses, parties, obligations, and dates to build contract review AI.

🏥

Medical Report Extraction

Label diagnoses, medications, and patient data from clinical PDFs for healthcare AI.

🔬

Scientific Literature Mining

Extract methods, findings, and citations from research papers for knowledge graphs.

Makroly

Ready to build your AI training dataset?

No sign-up. No cloud uploads. Open Makroly in your browser right now and start annotating in seconds.

Open the Annotator — it's free →