Strategic Partner in Retail & Logistics
Retail / Smart Spaces
2024
How do you truly understand human movement in a complex physical space? Traditional methods—manual counts, sensors, or even CCTV review—are slow, fragmented, and fail to provide a complete, actionable picture for optimizing layout, staffing, and flow in environments like busy retail floors or transport hubs.
We architected and delivered a multi-camera computer vision Proof of Concept (POC). The system detects, tracks, and transforms human movement from multiple 2D camera feeds onto a single unified 2D floor plan, generating dynamic heatmaps that provide immediate, intuitive visual intelligence.
Cross-Camera Person Re-ID, Built End-to-End
Our client's strategic goal was to unlock measurable ROI by optimizing their physical space. The critical blocker was a lack of unified spatial intelligence. They needed to know not just if people were present, but where they clustered, which paths they took, and which zones were underutilized, all while ensuring any future solution could be deployed responsibly.
We initiated a rapid Proof of Concept (POC) to validate technical feasibility and pinpoint commercial value. We began by designing a camera layout for complete floor coverage, understanding that spatial analytics is as much about environmental design as it is about algorithms. This "Secure by Design" approach ensured no blind spots and provided the foundational data integrity.
We then engineered a system that functions as a central "brain." It ingests feeds from all cameras, detects and tracks individuals, and—most critically—aligns all detections onto a single, shared top-down floor layout using coordinate transformation. This process turned fragmented, low-value video footage into a unified, high-value source of spatial intelligence.
The POC successfully validated the approach, turning raw data into intuitive, visual insights. The heatmaps provided a breakthrough for stakeholders, offering immediate understanding without technical explanation. This laid the foundation for a production-grade system to drive trustworthy, data-backed decisions.
The POC successfully reduced uncertainty and proved the system's foundational value, paving the way for a full-scale deployment to optimize safety, flow, and commercial performance.
The technical objective was to architect a multi-camera heatmap generation system for tracking and visualizing human presence across a floor layout using deep learning and computer vision. The system was composed of four key components: (1) Camera & Floor Layout Design, (2) Person Detection & Tracking, (3) Coordinate Transformation, and (4) Heatmap Generation.
The pipeline's foundation is a robust, two-stage engine for detection and tracking.
Person Detection: We utilized YOLOv4 for real-time object detection to identify and localize humans in each camera feed. A primary focus was tuning detection thresholds to maintain high accuracy, even in complex and crowded scenes.
Resilient Tracking (DeepSort + OSNet): For tracking, we implemented a sophisticated hybrid solution. We used DeepSort (Deep Simple Online and Realtime Tracking), which employs a Kalman Filter to predict the next position and velocity of each tracked object. Crucially, DeepSort was augmented with OSNet to add a deep appearance descriptor (ReID feature). This extracts a 128-D or 256-D feature vector that describes how a person looks.
Matching & Re-Identification: The Hungarian Algorithm was then used to match new detections to existing tracks based on a combination of bounding box overlap (IoU) and appearance distance (cosine distance between deep ReID embeddings). This architecture is critical: if two people cross paths, or if a person's path is blocked, the tracker may lose the ID. With this method, we can re-identify the person by their appearance and recover the correct ID.
Running two heavy neural steps (YOLOv4 and OSNet) sequentially on every frame is computationally inefficient and slow. We engineered an optimized workflow: the full detection and ReID pipeline runs every N frames. To fill the gaps between these heavy frames, we integrated a KLT (Kanade-Lucas-Tomasi) tracker. As a lightweight optical flow–based tracker, KLT estimates how pixels move between consecutive frames. It follows specific feature points over time by comparing how their local image patches change, ensuring smooth tracking while intelligently managing the computational load.
To unify positioning from all camera angles, we applied homography transformation. This projects the 2D camera detections onto a single, shared top-down floor plane. This process is calibrated by having a user select at least four corresponding points between the camera frame and the floor plan, which generates a homography transformation matrix. This matrix is then used to transform every detected person's position onto the unified map.
During the POC, we identified that "maintaining consistent tracking IDs for the same individual across multiple overlapping camera views" was particularly difficult. We prototyped a solution by clustering tracks with similar appearance distance, so that one cluster would represent one person. This solution "helped a bit, but it was not perfect" and was marked as a key area for refinement in a production-grade system.
The final component translates the unified positional data into actionable intelligence. We used a combination of a Gaussian Kernel and Kernel Density Estimation (KDE) to generate dynamic heatmaps. This visualization layer effectively reflects areas of high and low foot traffic over time, providing the clear, intuitive insight our client required.
We're currently working on exciting new case studies to share with you.
Monday - Friday: 9:00 AM - 6:00 PM
Saturday - Sunday: Closed