From OCR to AI Agents: How We Reinvented Digitization at haddock
After four years, five OCR providers, and thousands of experiments, we built an AI agent that took accuracy to 95%, cut costs in half and turned messy invoices into insights in under a minute
When I joined haddock six months ago, there was a clear bottleneck: our digitization process.
Since then, we’ve dedicated much of the focus of part of our team to building an AI Agent with vision that could outperform the digitization process we had before.
After all this time and thousands of experiments we can finally say we’ve achieved it. The results have gone beyond what we expected when we started this journey.
Just a couple of numbers:
We went to 68% to 95% accuracy.
Cut OCR costs by half
Reduced digitization latency at the 99th percentile from 22 hours down to under 2 minutes.
Gained the flexibility to launch new features around digitization in days instead of months.
But First, a Bit More Context on Digitization
If you’ve read these numbers, you might be thinking: 68% accuracy sounds very low.
But still, why was it so low in the first place?
Because when we talk about digitization, we’re not talking about simple documents with just a supplier name and a total. We’re talking about digitizing restaurant invoices, delivery notes, and tickets. And these are far more complex: irregular patterns, inconsistent formats, and the general lack of digitization in the sector.
The documents we digitize often arrive through a user’s photo, and we need to capture every detail: each item a restaurant buys along with its code, quantity, unit price, total price, taxes, discounts, and unit of measure. All of this data must be consistent, because it’s what powers the rest of our platform from orders and inventories to price analytics and more.
Take a simple example:
The details of an item don’t always appear next to each other.
Tax information may not even be in the item details, it could be just a reference, with the actual value hidden pages later.
And a large percentage of delivery notes are still handwritten, which adds another layer of complexity.
That’s why, for us, the rule is simple: if a human can read and interpret it, haddock should too.
And here’s the best part: many invoices are so complex that our AI agent doesn’t just digitize them, it also helps interpret missing or ambiguous information by reasoning through the document and applying calculations.
Two examples of the kinds of documents we process and we handle over 350,000 of them every month.
And here’s the key point: for us, an automatic digitization only counts as successful if every single field is captured perfectly and remains consistent with the rest of the document. That’s why digitizing a full invoice or delivery note is such a complex challenge.
To put it into perspective, haddock processes more than €1B in GMV through this digitization system. That’s why getting every single item right is not just nice to have, it’s essential.
Now you can understand why, after four years, testing with more than five different OCR providers, and countless efforts, the best automatic accuracy we could achieve was 68%.
How Did We Achieve Such Results So Quickly?
You might be asking yourself: how did you manage to achieve such good results in such a short time?
The answer is clear: a lot of hard work and constant experimentation. But the real key has been our internal data, which allowed us to build the strongest possible model.
It’s this dataset, more than 3 million documents from over 80,000 different suppliers that made it possible to create such a robust model at scale.
So let’s talk about a couple of key topics:
1. You Can’t Improve What You Don’t Measure
But how do you measure something as irregular and complex as this, at scale?
The key at haddock has been our validators systems that check whether digitized fields are consistent with each other.
We’ve built dozens of these domain-specific validators across financial, tax, and item-level data. They don’t just confirm whether a digitization was successful, they also highlight where inconsistencies or errors are most likely to occur.
2. Agent Observability and Evaluations
Going from that first version of the agent in mid-June which barely reached 70% accuracy to today’s 95% had no magic trick behind it. The key was building a strong system of observability and evaluations, combining historical data with production tests and user feedback.
A turning point was building our observability architecture. It has since become the core for the AI team at haddock. But this topic is so important that it deserves a dedicated post of its own.
3. Finetuning vs. Adding More Context
A common question we get is whether we’ve fine-tuned our own model. The answer is no but simply because we haven’t needed to.
We’ve been able to move much faster and achieve the same or even better results just by adding more and more context, leveraging all the knowledge we already had in our internal data.
And there’s another advantage: by not relying on finetuning, it becomes much easier to switch between LLM providers whenever we need to.
4. Staying Agnostic on LLM Providers and Betting on Open Source
You might also be wondering: so, which LLM are they using?
The truth is, that’s not where the key lies. At the end of the day, what really matters is building the agent and the context around it.
Of course, the LLM itself is important. But the real strategy is being prepared to switch providers at any moment and to benefit from the improvements that will inevitably come: lower latency, higher intelligence, reduced costs, and what we’re all waiting for opensource breakthroughs.
Why Don’t OCR Providers Work Out?
A question I’ve often reflected on is: why don’t third-party OCR systems perform well, if digitization is supposed to be their core business?
There are a few reasons:
Our use case is hyper-specific.
We’re dealing with restaurant data, where the level of granularity is extremely fine. Most digitization processes that OCR providers cover don’t require capturing details at this depth.
Interpretation, not just extraction.
A large share of documents require interpreting data and making calculations. That interpretation depends on very specific restaurant industry know-how. While you can try to transfer that knowledge to an external system, details are easily lost along the way. By having the team that already holds the domain knowledge build the model, we remove bureaucracy, reduce friction, and move much faster.
AI-native vs. AI-layered.
Some of the latest OCR solutions we tried did include Generative AI in their pipelines. But we didn’t consider them AI-native. Adding GenAI on top of an OCR system is not the same as designing an agent-first approach. An AI-native system doesn’t just read text; it reasons, validates, and adapts. That’s the shift we embraced at haddock.
The Complexity Behind It
Building an agent like this doesn’t happen overnight. It has required a significant investment of time and effort, built on years of accumulated knowledge and millions of processed documents.
That complexity is precisely what makes reaching today’s results such an important milestone for us.
Key Takeaways
From all of this, we’ve drawn a few conclusions:
The key is to understand the problem better than anyone else and to transfer that knowledge into the agent through context. Everything in between is the craft, and the magic, that AI engineers must deliver.
Four years ago, when haddock started, the technology for this kind of digitization simply wasn’t ready yet. But what was ready was the vision of Arnau and Pol, the founders’ conviction that one day we would get here.
Latency makes the user experience 100x better. Today, users can snap a photo of messy data and watch it turn into structured insights in under a minute and that creates a wow effect that’s hard to beat.
There’s still plenty of room to improve. On our roadmap: reducing costs by 4x through planned experiments, pushing accuracy from 95% to 99.9%, and driving latency down even further.
This is just the first post. If you’re interested in the world of AI, Product and startups, and want to follow how we’re building AI at haddock, I’d love for you to subscribe for free to my newsletter.


