Document Manager: Using Azure AI Foundry or OpenAI for my paperless office

Let’s be honest: most document management systems feel like they’re stuck in 2005. I decide to move to an paperless office and scan all my documents. I tried multiple ways to realize this but I don`t find the tooling where I say I feel comfortable with and it saves me a lot of work in storing and retriving documents.

I wouldn’t be a developer or an AI MVP if I didn’t get stuck in and write my own tool. With Doc Manager, I demonstrate how to use Azure Open AI / AI Foundary to build a best-in-class document management system.

Enter AI: The Game Changer

What if your document system actually understood your documents? Not just their titles or tags, but their actual content and meaning? That’s exactly what I set out to build with DocumentManager.

The document manager has an vector database behind to have an advanced search to find all your documents based on the content understanding. Instead of matching keywords, the system understands concepts. Search for “payment terms” and it’ll find documents discussing invoicing, contracts with payment clauses, and financial agreements – even if they never use those exact words. During the initialization of all documents there will be generate a embedding of them to use it for an vector search.

![AI-Powered Search in Action](images/ai-search.jpeg)

Key Features That Actually Matter

1. Smart OCR with Multi-Language Support

Upload a scanned PDF in German? No problem. A photo of a whiteboard in Japanese? Got it covered. The system automatically extracts text from images and PDFs in over 50 languages using Tesseract OCR.

2. Natural Language Queries

Stop thinking like a database. Just ask questions naturally. There is an AI chat integrated which finds all the relevant documents and you can chat with them like:

  • “Show me all contracts expiring this year”
  • “Find documents about the Berlin office renovation”
  • “What were our Q4 marketing expenses?”

The AI understands context and intent, not just keywords.

3. Automatic Categorization

Upload a document and watch the AI automatically tag it based on content. Financial reports get tagged as “finance”, contracts as “legal”, technical specs as “engineering”. No manual tagging required (though you can always override if needed). Also the Title, Summary, Correspondent, Document Type, Document Date, Tags and Tax Relevance is detected or auto generated. The AI generated summary makes it is very easy also for large documents to understand the content

The Technical Stack

For the tech-curious among you, here’s what’s under the hood:

  • **Backend**: FastAPI
  • **Vector Database**: ChromaDB for lightning-fast semantic search
  • **AI Models**: OpenAI or Azure OpenAI (your choice)
  • **Frontend**: Vanilla JavaScript (keeping it simple and fast)
  • **Deployment**: Docker-ready with a handy setup script

Getting Started in 3 Minutes

The beauty of open source? You can have this running on your machine right now:

Make sure docker is running on your device

git clone https://github.com/your-username/DocumentManager.git
cd DocumentManager
./setup.sh prod

That’s it. Visit `http://localhost:8000` and start uploading documents.

Security First

I know what you’re thinking – “AI and sensitive documents?” Don’t worry, I’ve got you covered:

  • Role-based access control (RBAC)
  • Complete audit trails
  • Option to use azure open ai to keep the models in your own tenant
  • All data stays on your infrastructure / storage

How does it work

The Open Source Advantage

Why open source? Because your document management system shouldn’t be a black box. You can:

  • Audit the code
  • Customize for your needs
  • Self-host everything
  • Contribute improvements

Plus, no vendor lock-in. Your documents, your rules. If you want to support the project you can buy an coffee. You can see in on the sitebar of my blog.

What’s Next?

The foundation is solid, but I’m just getting started. The roadmap includes:

  • Support self hosted models
  • Mobile apps for on-the-go access or for scanning
  • Workflow automation (imagine documents routing themselves)
  • Advanced analytics dashboards
  • Plugin system for custom integrations

Try It Yourself

Ready to transform your document chaos into organized intelligence? The entire project is available on GitHub.

Star the repo ⭐ if you find it useful – it really helps with motivation

DocumentManager is open-source and available under the MIT license. Built with ❤️ by the Jannik Reinhard and Fabian Peschke

Need AI in your processes? Need an agent or a tool like this? Message me through the contact form—happy to help.