Beyond GenAI: Computer Vision & ML to Optimize Flow, Staffing & Space—for Measurable ROI
When it comes to managing crowded spaces—whether a busy retail floor, an airport terminal, or an urban plaza—understanding how people move is critical. Where do people move? Where do they pause? Which zones are overused, and which are overlooked? Yet traditional methods—manual counts, sensors, or even CCTV review—are slow, fragmented, and often fail to capture the full picture.
So, we ran a proof of concept to test whether we could automatically detect, track, and visualize human movement across multiple camera feeds. The goal wasn't to build a polished product on day one. It was to validate feasibility, identify constraints, and learn where the real value lies.
Starting With Coverage, Not Just Cameras
The first step was designing the camera layout. To extract meaningful insights, you need full floor coverage: no blind spots, no zones where people disappear.
This seems obvious—but it reminded us that spatial analytics is as much about environment design as it is about algorithms.
Detecting and Following People Through the Space
We used a deep learning model to detect individuals in each camera feed and then a tracking approach to follow their movement over time.
Even in a POC, one insight became clear quickly: Detecting people is easy. Following them reliably is the real challenge.
People cross paths, objects get in the way, and camera angles differ. To handle this, our tracking approach didn't just rely on position — it also used appearance cues to keep identity consistent when movement got complicated. This helped significantly, though tracking across multiple cameras is still an area that would need refinement in a production environment.
Bringing All Views Into One Single Map
Each camera sees the world from its own angle. So we aligned detections from all cameras onto a shared top-down floor layout. With a simple calibration step, the system could translate "where someone is in a camera view" into "where they are in the real space."
This is what turned footage into usable spatial intelligence.
Turning Movement Into Insight: Heatmaps
Once positions were unified, we generated dynamic heatmaps. These clearly showed:
- Where people cluster
- Which paths are most used
- Which areas receive little or no engagement
For stakeholders, this was the breakthrough moment. No technical explanation needed — the insight is visual and intuitive.
Cross-Camera Person Re-ID, Built End-to-End
A Cost-Effective and Scalable Approach
One of the biggest learnings from this POC was that you don't need expensive new hardware to unlock these insights. The system can run on existing camera infrastructure, which means organizations don't have to invest in specialized sensors or proprietary counting devices.
We also intentionally built on proven, widely-used computer vision models, rather than jumping directly to heavy generative AI architectures. This keeps compute requirements reasonable and makes the solution easier to maintain, deploy, and scale.
Generative AI can layer on top later — for example, to answer complex, predictive questions:
To Predict
Instead of just seeing past patterns, you can forecast future ones.
- "Based on the last three Fridays, predict checkout queue lengths for this Friday's 5:00 PM rush, allowing us to open new registers 20 minutes before the rush begins."
- "Forecast security checkpoint wait times for Monday morning based on flight schedules, triggering automated alerts for passengers."
To Recommend
It can move from observation to active suggestion.
- "Recommend an optimized staffing plan by suggesting we move one employee from the quiet 'Zone A' to the busy 'Zone C' for the next 45 minutes."
- "Ask, 'Where is the optimal placement for our new product display?' and get three data-backed layout suggestions to maximize engagement."
To Simulate
You can test changes virtually before committing real-world resources.
- "Simulate 'what-if we close this main aisle for a 2-hour cleaning?' to see the ripple effect on crowd flow and find the least disruptive time to do it."
- "Run a virtual simulation of an emergency evacuation with the current layout to identify and fix potential bottlenecks before a real event occurs."
But the foundational value comes from accurate detection + reliable tracking + intuitive visualization. Starting here ensures the system is cost-effective today, and future-ready if AI-powered forecasting becomes relevant.
What This POC Confirmed
- Technical Feasibility — Multi-camera tracking is absolutely achievable with modern computer vision models.
- Value of Visualization — Heatmaps turned out to be the most intuitive bridge between raw data and operational decisions.
- The Real Value is in Context, Not Just Detection — Detecting people is easy. Understanding how they move through space is where the strategic insight emerges.
What We Would Explore Next
If we continue to production-scale development, we'd focus on:
- Real-time streaming — Moving from sampled frames to continuous live insights.
- Faster camera calibration workflows — Reducing setup time so the system can adapt to new layouts or reconfigurations quickly.
- Operational alerts and triggers — For example: congestion warnings, queue-length thresholds, or automated staff allocation prompts.
- Combining movement data with business data — Integrating heatmap activity with data from cash registers, sales systems, or customer flow metrics would unlock higher-level insight — not just where people move, but why. This would enable correlations such as which product placements drive more traffic, or which checkout configurations minimize wait times.
- Data governance and privacy layering — Ensuring the system can be deployed responsibly (edge, hybrid, or cloud) while protecting identity and complying with organizational policies.
"This wasn't about launching a finished product—it was about reducing uncertainty."
Why It Matters
From retail layout optimization to urban planning to event operations, the ability to see how people naturally move unlocks trustworthy decisions:
- Smarter resource allocation
- More intuitive environments
- Improved safety and flow
- Increased commercial performance
And those decisions only improve when teams across design, operations, and strategy can collaborate around shared insights.
Related Articles
Read all ArticlesAI Chat's Two Real Problems: the Blank Prompt and "I Don't Know"
What happens after launch? How to design a learning ecosystem that turns a static AI tool into a system that improves with use.
Building a Private "Central Intelligence Unit": How One Organisation Modernised Research Without Exposing Its Data
How to get the intelligence and efficiency of modern AI without shipping your data to someone else's servers. A case study in building a secure, GDPR-compliant AI system.