Original article excerpt
Server-side extracted preview paragraphs from the original source.
A Blog post by Dharma-AI on Hugging Face
In April, we released DharmaOCR, our specialized structured OCR model (available on Hugging Face) along with a paper detailing the methodology behind it and a benchmark demonstrating its superior quality and cost efficiency. The paper benchmarked leading vision-language model families - both open-source and commercial - on a structured document extraction task: OCR on Brazilian Portuguese text. Among the reported metrics was text degeneration rate: the frequency with which a model produces a repetition loop instead of a transcription.