Building a Knowledge Flow Optimizer: Part I - From Idea to Working MVP
Information silos are a common challenge in organizations of all sizes. Documents live in different systems, insights get buried in emails, and valuable knowledge often remains confined to individual team members. Even high-performance teams struggle with this problem. That’s why I decided to explore building a Knowledge Flow Optimizer that uses AI to connect related information across documents.
The Vision
I wanted to create something practical yet innovative - a tool that addresses the real problem of disconnected knowledge without unnecessary complexity. The goal was to build something that could demonstrate value quickly.
What I envisioned was a system that could:
- Identify meaningful connections between documents based on semantic understanding
- Surface relevant information that might otherwise be overlooked
- Reveal relationships between concepts across different documents
Many existing solutions in this space require significant investment in infrastructure and development time. I was interested in creating something lightweight that could be built and deployed rapidly to test the concept.
The Tech Stack Debate
The tech industry offers powerful solutions for complex problems. Enterprise-grade technologies like microservices, Kubernetes clusters, and distributed systems have their place in large-scale applications. But sometimes we reach for these tools out of habit rather than necessity.
I initially considered the full enterprise stack:
- FastAPI for the backend
- React with TypeScript for the frontend
- Vector databases like Pinecone or Weaviate
- Multiple microservices for different components
For an established product with thousands of users, this might be appropriate. But for an MVP? Simplicity is key. Spoiler alert: I landed on FastHTML, SQLite via Fastlite for data storage (yes, even for vector embeddings!), and Sentence-Transformers. A lightweight stack that let me focus on proving the concept before scaling up.
The choice to use SQLite for storing vector embeddings might raise some eyebrows. While specialized vector databases would be necessary for production scale, SQLite proved sufficient for proving the concept. By storing the embeddings as JSON strings in SQLite, I avoided an entire category of infrastructure complexity while still getting good enough performance for initial testing. This approach won’t scale to thousands of documents, but it perfectly serves the MVP purpose of demonstrating value before investing in more complex infrastructure.
For the embedding model itself, Sentence-Transformers proved to be another excellent choice. Instead of using OpenAI’s embedding API and adding both a dependency and ongoing costs, I opted for the open-source all-MiniLM-L6-v2 model. At only 90MB in size, it runs efficiently even without a GPU, yet still produces high-quality 384-dimensional embeddings that capture semantic relationships between text chunks remarkably well. The model strikes an ideal balance between performance and resource requirements - exactly what you want in an MVP where you’re trying to validate an idea without overinvesting in infrastructure.
The Core Components
The system operates through four complementary processes that work together to create connections between documents. Document upload serves as the entry point, allowing users to add content to the system. Text chunking divides longer documents into manageable segments for more precise analysis. Embedding generation converts these text chunks into vector representations using AI. Finally, similarity search identifies connections between documents based on semantic proximity.
The embedding approach allows for nuanced understanding that goes beyond simple keyword matching. For instance, content about “team communication guidelines” might be semantically connected to material about “project management best practices” even when they use different terminology to express related concepts. This demonstrates how modern AI techniques can enhance knowledge discovery.
The Development Journey
Building this prototype came with its share of learning opportunities:
- Working through response handling in the FastHTML framework
- Refining the document chunking process for reliability
- Implementing embedding generation efficiently
By taking an iterative approach and testing each component individually, I was able to create a functional MVP. Starting with core functionality and adding features incrementally proved to be an effective development strategy for this project.
The Result
The current MVP focuses on functionality rather than aesthetics. While the interface is straightforward, it successfully demonstrates the core concept: facilitating knowledge flow between related documents.
Users can now:
- Upload text documents to the system
- View document content through a simple interface
- Discover semantically related documents with similarity scores
- Navigate between connected documents easily
What’s Next?
This initial version lays the groundwork for further development. The prototype shows promise, but there are many opportunities for enhancement.
In Part II, I plan to focus on improving the user experience and adding features that build on the existing foundation. An interesting question to consider is how technology and human factors can work together to address information silos effectively.
What’s your perspective? Would a tool that helps you discover connections between your documents provide value in your context? I’m interested in hearing different viewpoints as this project evolves.
Stay tuned for Part II.