IBM Unveils Granite-Docling-258M: A Pioneering Vision-Language Model for Precise Document Conversion
IBM has unveiled Granite-Docling-258M, a pioneering Vision-Language Model (VLM) designed for precise document conversion. This open-source model, available under the Apache 2.0 license, is set to expand its language support to include Swahili, Tagalog, and Bengali, building on its existing capabilities in Arabic, Chinese, and Japanese.
Granite-Docling-258M, with only 258 million parameters, is a significant step forward in document understanding. Unlike conventional OCR models, it captures mathematical formulas, code blocks, table structures, and original layout. This is achieved through DocTags, a universal markup format developed by IBM Research, which is the core of Granite-Docling-258M.
IBM Research plans to further enhance Granite-Docling by developing larger models with around 512 million and 900 million parameters, all under one billion parameters. These larger models aim to improve the precision and versatility of document conversion. Granite-Docling-258M complements the existing Docling library, offering a one-step conversion process that reduces error accumulation.
Granite-Docling-258M's experimental support for Swahili, Tagalog, and Bengali, along with its existing support for Arabic, Chinese, and Japanese, signals IBM's commitment to expanding the global usage of its document conversion technology. With its ability to preserve tables, formulas, and layouts, and its plans for larger models, Granite-Docling is poised to revolutionize document conversion.