AI Data Preparation · PDF Annotation

Annotate PDFs. Train Better AI.

Makroly is a precision PDF annotation platform built for AI teams. Define custom entity schemas, highlight structured information from any document, and export clean JSON datasets — ready to fine-tune your language models and document AI pipelines.

Start Annotating Free→See how it works

Makroly PDF annotation workspace screenshot

0×Faster than manual labeling

0%Locally processed — private

0Setup required — open in browser

Features

Everything your AI team needs

Purpose-built for the specific challenges of creating structured training data from real-world PDF documents.

🗂️

Schema-Driven Annotation

Define named entity types with typed properties (text, number, date, enum). Every annotation is validated against your schema — no free-form noise.

🖊️

Precise Text Span Selection

Select exact text fragments from rendered PDFs with pixel-perfect accuracy. Supports multi-span annotations for discontinuous evidence.

🏗️

Implicit Object Support

Annotate entities implied by context but not explicitly mentioned — critical for comprehensive document understanding model training.

📤

Structured JSON Export

Export annotations as clean, schema-aligned JSON ready to feed directly into your fine-tuning or RAG pipelines. No post-processing needed.

🔄

Schema Import / Export

Save and reuse annotation schemas across documents and projects. Share schemas with your team for consistent multi-annotator datasets.

🔒

100% Browser-Based & Private

No server uploads. Your PDFs and annotations never leave your device. Fully local processing — compliant with sensitive data policies.

Workflow

From PDF to training data in minutes

A streamlined four-step workflow that replaces error-prone spreadsheet annotation with a structured, repeatable process.

Upload your PDF

Open any PDF document directly in the browser. Pages render with full fidelity — tables, figures, and mixed layouts all supported.

Define your schema

Create entity types matching your domain — invoices, contracts, medical reports, patents. Add typed properties and mark which are required.

Annotate with precision

Select text ranges on the PDF to link them to entities. Fill in property values. Add implicit objects for inferred information.

Export structured JSON

One click exports your complete annotation set as structured JSON, perfectly aligned with your schema. LLM-ready, immediately.

makroly — annotation export

// Exported annotation (JSON)

{

"schema": "invoice_extraction_v2",

"entities": [

{

"type": "LineItem",

"spans": [[142, 198]],

"properties": {

"description": "Cloud Hosting — Q1",

"amount": 4800,

"currency": "EUR"

}

]

}

Why Makroly

Built for AI, not for manual review

Traditional tools were designed for human review workflows — not for producing machine-readable training data for generative AI.

Capability	Traditional Tools	Makroly
Schema-driven entity types	❌ Free-form	✅ Structured schemas
Typed properties on entities	❌ Not supported	✅ Text, number, date, enum
Implicit / inferred entities	❌ Span-only	✅ Full implicit object support
JSON export for AI pipelines	⚠️ Requires conversion	✅ Native structured JSON
Schema reuse across documents	❌ Manual duplication	✅ Import / export schemas
Data privacy	⚠️ Cloud upload required	✅ 100% local — no uploads
Setup complexity	⚠️ Account + configuration	✅ Open browser and go

Use Cases

What teams use Makroly for

📄

Invoice & Receipt Processing

Annotate line items, totals, vendors, and dates for document extraction model training.

⚖️

Legal Contract Analysis

Tag clauses, parties, obligations, and dates to build contract review AI.

🏥

Medical Report Extraction

Label diagnoses, medications, and patient data from clinical PDFs for healthcare AI.

🔬

Scientific Literature Mining

Extract methods, findings, and citations from research papers for knowledge graphs.

Ready to build your AI training dataset?

No sign-up. No cloud uploads. Open Makroly in your browser right now and start annotating in seconds.

Open the Annotator — it's free →