Architecting Spatial Intelligence

A Computer Vision POC for Operational Analytics

CLIENT

Strategic Partner in Retail & Logistics

INDUSTRY

Retail / Smart Spaces

YEAR

2024

Key Insights

THE CHALLENGE

How do you truly understand human movement in a complex physical space? Traditional methods—manual counts, sensors, or even CCTV review—are slow, fragmented, and fail to provide a complete, actionable picture for optimizing layout, staffing, and flow in environments like busy retail floors or transport hubs.

THE SOLUTION

We architected and delivered a multi-camera computer vision Proof of Concept (POC). The system detects, tracks, and transforms human movement from multiple 2D camera feeds onto a single unified 2D floor plan, generating dynamic heatmaps that provide immediate, intuitive visual intelligence.

Cross-Camera Person Re-ID, Built End-to-End

Part 1: Executive Summary

The Challenge (Expanded)

Our client's strategic goal was to unlock measurable ROI by optimizing their physical space. The critical blocker was a lack of unified spatial intelligence. They needed to know not just if people were present, but where they clustered, which paths they took, and which zones were underutilized, all while ensuring any future solution could be deployed responsibly.

Our Approach

We initiated a rapid Proof of Concept (POC) to validate technical feasibility and pinpoint commercial value. We began by designing a camera layout for complete floor coverage, understanding that spatial analytics is as much about environmental design as it is about algorithms. This "Secure by Design" approach ensured no blind spots and provided the foundational data integrity.

We then engineered a system that functions as a central "brain." It ingests feeds from all cameras, detects and tracks individuals, and—most critically—aligns all detections onto a single, shared top-down floor layout using coordinate transformation. This process turned fragmented, low-value video footage into a unified, high-value source of spatial intelligence.

The Impact

The POC successfully validated the approach, turning raw data into intuitive, visual insights. The heatmaps provided a breakthrough for stakeholders, offering immediate understanding without technical explanation. This laid the foundation for a production-grade system to drive trustworthy, data-backed decisions.

  • Operational Insight: Replaced fragmented manual counts with a unified, dynamic view of foot traffic, pathing, and dwell zones.
  • Business Outcome: Enabled stakeholders to visually identify high-traffic paths, bottlenecks, and underused areas, forming a direct line from data to decisions on layout, staffing, and product placement.
  • Technical Validation: Confirmed that a sophisticated stack of multi-camera tracking and homography transformation is an achievable and effective core for a production-scale analytics platform.
  • Future-Proofing: Created a scalable architecture poised to integrate with business data (e.g., sales systems) for advanced correlational analysis, such as identifying how traffic hotspots align with sales uplift.

The POC successfully reduced uncertainty and proved the system's foundational value, paving the way for a full-scale deployment to optimize safety, flow, and commercial performance.

Part 2: Technical Deep Dive

Objective

The technical objective was to architect a multi-camera heatmap generation system for tracking and visualizing human presence across a floor layout using deep learning and computer vision. The system was composed of four key components: (1) Camera & Floor Layout Design, (2) Person Detection & Tracking, (3) Coordinate Transformation, and (4) Heatmap Generation.

Core Architecture: Detection & Tracking

The pipeline's foundation is a robust, two-stage engine for detection and tracking.

Person Detection: We utilized YOLOv4 for real-time object detection to identify and localize humans in each camera feed. A primary focus was tuning detection thresholds to maintain high accuracy, even in complex and crowded scenes.

Resilient Tracking (DeepSort + OSNet): For tracking, we implemented a sophisticated hybrid solution. We used DeepSort (Deep Simple Online and Realtime Tracking), which employs a Kalman Filter to predict the next position and velocity of each tracked object. Crucially, DeepSort was augmented with OSNet to add a deep appearance descriptor (ReID feature). This extracts a 128-D or 256-D feature vector that describes how a person looks.

Matching & Re-Identification: The Hungarian Algorithm was then used to match new detections to existing tracks based on a combination of bounding box overlap (IoU) and appearance distance (cosine distance between deep ReID embeddings). This architecture is critical: if two people cross paths, or if a person's path is blocked, the tracker may lose the ID. With this method, we can re-identify the person by their appearance and recover the correct ID.

Orchestration & Performance Optimization

Running two heavy neural steps (YOLOv4 and OSNet) sequentially on every frame is computationally inefficient and slow. We engineered an optimized workflow: the full detection and ReID pipeline runs every N frames. To fill the gaps between these heavy frames, we integrated a KLT (Kanade-Lucas-Tomasi) tracker. As a lightweight optical flow–based tracker, KLT estimates how pixels move between consecutive frames. It follows specific feature points over time by comparing how their local image patches change, ensuring smooth tracking while intelligently managing the computational load.

Ensuring Quality: Transformation & Identified Challenges

To unify positioning from all camera angles, we applied homography transformation. This projects the 2D camera detections onto a single, shared top-down floor plane. This process is calibrated by having a user select at least four corresponding points between the camera frame and the floor plan, which generates a homography transformation matrix. This matrix is then used to transform every detected person's position onto the unified map.

During the POC, we identified that "maintaining consistent tracking IDs for the same individual across multiple overlapping camera views" was particularly difficult. We prototyped a solution by clustering tracks with similar appearance distance, so that one cluster would represent one person. This solution "helped a bit, but it was not perfect" and was marked as a key area for refinement in a production-grade system.

Visualization: Heatmap Generation

The final component translates the unified positional data into actionable intelligence. We used a combination of a Gaussian Kernel and Kernel Density Estimation (KDE) to generate dynamic heatmaps. This visualization layer effectively reflects areas of high and low foot traffic over time, providing the clear, intuitive insight our client required.

Technology Stack

python deep-learning computer-vision yolov4 deepsort osnet reid kalman-filter hungarian-algorithm klt-tracker optical-flow homography opencv gaussian-kernel kde

Related Case Studies

View All Work

Contact Us

OFFICE

Belgrade

Dositejeva 21

11000 Belgrade, Serbia

[email protected]

HOURS

Business Hours

Monday - Friday: 9:00 AM - 6:00 PM

Saturday - Sunday: Closed

GET IN TOUCH

General Inquiries

For general information and inquiries

[email protected]

Business Development

For partnership and business opportunities

[email protected]