Building a Private Central Intelligence Unit (CIU): A "Secure by Design" Architecture with Private LLMs

CLIENT

AI Platform Provider

INDUSTRY

Enterprise AI / SaaS

YEAR

2023 - 2024

Key Insights

THE CHALLENGE

How do you build and launch a competitive, commercial AI platform for the enterprise market? The primary challenge was not just functionality, but trust. Our client needed to engineer the core intelligence engine for their new platform, which had to be "Secure by Design" and fully GDPR-compliant from day one. This meant ensuring that sensitive enterprise data from tools like Jira, Confluence, or Zendesk would never be exposed to third-party APIs.

THE SOLUTION

We architected and delivered the Central Intelligence Unit (CIU), the production-ready, scalable, and secure core engine for their commercial platform. This modular AI co-pilot integrates seamlessly with existing enterprise tools and operates within a 100% private, secure environment. By engineering a sophisticated RAG pipeline with privately hosted LLMs and embedding models, we provided a "Secure by Design" foundation that guarantees zero data exposure and full data sovereignty for their enterprise customers.

CIU Architecture Diagram

Part 1: Executive Summary

The Challenge: Building a "Secure by Design" Commercial AI Platform

At the end of 2023, our client, an emerging AI platform provider, had a critical business objective: to architect and develop the core engine for their new, commercial AI platform. The key challenge was not just to innovate, but to build a product that was "Secure by Design" from the ground up, guaranteeing enterprise-grade data privacy. This was essential for their go-to-market strategy, as the solution needed to be fully compliant with stringent regulations like GDPR to attract enterprise clients in the EU.

Our Approach: From Strategic Consulting to a Production-Ready Commercial Engine

Our team began the project with in-depth consulting. Instead of offering a one-size-fits-all product, we dedicated ourselves to understanding the client's commercial objectives, technological constraints, and strategic opportunities. Based on this analysis, we designed, engineered, and deployed the Central Intelligence Unit (CIU) – the scalable and secure foundation for their new commercial product.

The CIU functions as a central "brain" or AI co-pilot that integrates with existing enterprise tools (like Confluence, Jira, Slack, and Zendesk). Users can ask a question or upload a document, and the system autonomously finds, processes, and summarizes the most relevant information to provide an accurate answer. A key advantage is that the entire system operates within the client's private and secure environment, safeguarding confidential data.

The Impact: A Secure, Market-Ready Platform

By late 2024, the CIU was a production-ready, scalable, and secure engine, powering the client's successful platform launch. It delivers measurable value to their end-users:

  • Time Savings: Research tasks that once took hours are completed in minutes.
  • Better Decisions: Teams can make faster, more informed decisions thanks to the clear and accurate insights provided by the system.
  • Enhanced Data Security & GDPR Compliance: By processing all data within a private, controlled environment, the system provides a solution that is compliant by design with regulations like GDPR, eliminating the risks associated with third-party data processors.
  • Future-Ready Adaptability: The system's modular architecture allows for the easy addition of new "connectors" for different enterprise systems, ensuring the platform remains cutting-edge.

This project showcases our ability to guide clients through the complex world of AI, from initial concept to the deployment of an advanced, production-grade commercial engine that provides our client with a powerful, secure, and commercially viable product.

Part 2: Technical Deep Dive

Objective: Engineering a Production-Grade, "Secure by Design" RAG Ecosystem

The project's technical objective, initiated in late 2023, was to create the centralized, "AI-native" core engine for a commercial platform, capable of autonomous data gathering, interpretation, and action. Key challenges included handling sensitive data securely, building a scalable architecture for Large Language Models (LLMs), reducing reliance on external APIs, and ensuring the entire ecosystem was designed to be "Secure by Design" and GDPR-compliant from the ground up.

Core Architecture: A Modular & Scalable Foundation

From its inception, the system was designed as a modular, containerized ecosystem.

Core Stack: The system's backbone is Python, with FastAPI serving lightweight, LangChain and LangGraph-powered endpoints for specific tasks like classification and summarization. The entire infrastructure is containerized using Docker.

Gateway and Routing: We implemented Traefik as a reverse proxy to manage routing and security under a single gateway. This involved configuring Host rules for each new service, securing endpoints with HTTPS via Letsencrypt certificates, and implementing basic authentication where necessary.

Asynchronous Processing: To coordinate tasks between microservices and ensure scalability, we implemented RabbitMQ as a robust message broker. This was essential for efficiently managing long-running, resource-intensive processes like web scraping or complex document analysis.

RAG Pipeline Architecture

The System's Core: An Advanced RAG (Retrieval-Augmented Generation) Pipeline

The CIU rapidly evolved from simple API calls to external LLMs (like OpenAI's GPT-3.5 and GPT-4 for early chains such as chat_on_report) to a sophisticated RAG pattern that delivers contextually relevant and factually grounded answers.

Vector Database: We chose Qdrant as our vector database for storing and querying text embeddings. Its advanced filtering capabilities were leveraged to enhance retrieval precision across different RAG strategies.

Private LLM & Embedding Servers: To ensure data privacy and optimize costs, we built a dedicated Embedding Server, initially using Hugging Face models and the HuggingFaceInferenceAPIEmbeddings library. We also hosted advanced LLMs, such as neuralmagic/Llama-3.1-Nemotron-70B-Instruct-HF-FP8-dynamic, on a private GPU cluster (4x L40s) using vLLM for high-throughput serving. This move was critical not only for flexibility and security but was a foundational requirement for the product's "Secure by Design" architecture and its GDPR compliance promise to customers. By hosting both the embedding and generation models in-house, we ensured that no sensitive user data—from query contents to entire uploaded documents—ever left the client's secure infrastructure.

Advanced RAG Techniques: To maximize retrieval accuracy, we implemented a suite of sophisticated techniques, including hybrid search algorithms, reranking mechanisms, Contextual Chunk Headers, Semantic Chunking, and Document Augmentation.

Diverse Data Sources: The RAG system was built to retrieve information from multiple sources, forming the basis for the platform's commercial "connectors." This included an internal knowledge base in Qdrant, live web search results, and enterprise tools like Zendesk support documents.

Orchestration & Automation: The Power of LangGraph

For managing complex, multi-step workflows, we integrated LangGraph, an advanced orchestration framework for agent-level flow control. LangGraph functions as a dynamic router, evaluating each user request and directing it to the most appropriate processing chain. Its stateful graph structure managed the entire query lifecycle, including:

  • Query understanding and conversation summarization.
  • Conditional source selection based on query intent.
  • Parallel retrieval from multiple RAG sources (Qdrant, Zendesk, etc.).
  • Generation of a final answer using the most relevant context and selected LLM.
  • Fallback logic to an LLM-only response if retrieval failed, which was critical for balancing latency, relevance, and response quality when querying across multiple enterprise "connectors" (up to 4 source types).

LangGraph's modular structure of nodes and conditional edges provided superior scalability and traceability compared to basic LangChain chains, making it our default choice for easily scaling the platform with new enterprise connectors and agent-based tasks.

Ensuring Quality & Trust: Evaluation and Observability

Trust in an AI system is paramount. We implemented a multi-layered approach to evaluation and monitoring.

Hallucination Detection: We deployed the vectara/hallucination_evaluation_model on our private GPU cluster to assess the factual accuracy of generated outputs, thereby boosting user trust.

System Evaluation: We used the RAGAS framework for thorough system evaluation, testing our RAG pipeline with both real-world and synthetic datasets to validate its effectiveness and ensure production-ready performance.

Observability: We established comprehensive system monitoring. Langfuse was integrated for detailed tracing of all LangServe chain and LangGraph endpoints. This involved an upgrade from Langfuse v2 to v3, which required adding new Docker services like Minio (an S3-compatible object storage) and integrating them securely via Traefik. Tracing was attached using callback handlers at the top of the chain to monitor the entire process. Finally, Grafana and Graylog provided logging and analytics dashboards for fine-grained performance tracking.

Conclusion

By late 2024, the CIU had matured into a fully autonomous, scalable, and secure core engine for a commercial AI platform. The project demonstrates our studio's expertise in architecting and delivering complex, AI-First systems that are not only powerful and scalable but also secure, private, and compliant with regulations like GDPR. By successfully integrating advanced RAG, private LLM hosting, dynamic workflow orchestration with LangGraph, and a robust evaluation and observability pipeline, we delivered a production-grade solution that provides the client with a significant competitive advantage for their commercial product.

Technology Stack

python fastapi langchain langgraph vllm llama-3.1 qdrant hugging-face docker traefik rabbitmq langfuse ragas grafana graylog minio letsencrypt RAG GDPR-compliant

Related Case Studies

More Case Studies Coming Soon

We're currently working on exciting new case studies to share with you.

More Case Studies Coming Soon

We're currently working on exciting new case studies to share with you.

More Case Studies Coming Soon

We're currently working on exciting new case studies to share with you.

Contact Us

OFFICE

Belgrade

Dositejeva 21

11000 Belgrade, Serbia

[email protected]

HOURS & LOCATIONS

Business Hours

Monday - Friday: 9:00 AM - 6:00 PM

Saturday - Sunday: Closed

GET IN TOUCH

General Inquiries

For general information and inquiries

[email protected]

Business Development

For partnership and business opportunities

[email protected]