After co-creating XML, Jean Paoli wants to reinvent business data using AI

How can we exploit the millions of complex documents that still govern almost all business processes? Insurance contracts, pharmaceutical files, industrial specifications, clinical analyses, technical specifications… this mass of information remains largely opaque to current models. This is precisely the territory that Docugami, the startup founded in Seattle by Jean Paoli, co-creator of the XML language at Microsoft, wants to structure.

In this new episode of TRENDS, our series dedicated to innovation, we welcome Jean Paoli, the CEO of DOCUGAMI

Thirty years after having contributed to the backbone of the modern web, Jean Paoli returns with the conviction that documents are the largest source of untapped data in large organizations. Current AI models know how to generate and summarize, but still struggle to extract, structure and cross-reference the information contained in the long and heterogeneous documents that drive insurance, pharmacy or industry. Docugami addresses this problem by automatically transforming these documents into XML semantic trees that can be used by agent workflows, making it possible to automate previously irreducibly human tasks.

This technology opens up a field of direct applications such as the acceleration of insurance compensation, the standardization of pharmaceutical regulatory documentation, the automation of industrial manufacturing, the structuring of R&D, etc. So many cases where documentary processing still represents a major bottleneck.

The choice of Paris to install Docugami Europe is part of a broader dynamic. Thus for Jean Paoli, France is today one of the nerve centers of open source AI, driven by public research, the rise of world-class laboratories and the emergence of open models which are becoming international standards. The new entity will bring together a dedicated R&D team, cooperate with French research institutes and develop partnerships with European regulated sectors, from insurance to health.

This movement also responds to growing expectations in terms of sovereignty: organizations want solutions that guarantee total confidentiality of data, non-reuse to train third-party models, and end-to-end control of document pipelines. Docugami claims this approach, based exclusively on open source LLMs and an architecture allowing customers to maintain complete control of their documents.

With this establishment, Docugami positions itself as a structuring player in sovereign documentary AI, in a context where the automated exploitation of documents could become the real catalyst for transformation of European companies.