Beyond GenAI: Computer Vision & ML to Optimize Flow, Staffing & Space—for Measurable ROI

· 10 min read · By Next Halo Team
Beyond GenAI: Computer Vision & ML visualization

When it comes to managing crowded spaces—whether a busy retail floor, an airport terminal, or an urban plaza—understanding how people move is critical. Where do people move? Where do they pause? Which zones are overused, and which are overlooked? Yet traditional methods—manual counts, sensors, or even CCTV review—are slow, fragmented, and often fail to capture the full picture.

So, we ran a proof of concept to test whether we could automatically detect, track, and visualize human movement across multiple camera feeds. The goal wasn't to build a polished product on day one. It was to validate feasibility, identify constraints, and learn where the real value lies.

Starting With Coverage, Not Just Cameras

The first step was designing the camera layout. To extract meaningful insights, you need full floor coverage: no blind spots, no zones where people disappear.

This seems obvious—but it reminded us that spatial analytics is as much about environment design as it is about algorithms.

Detecting and Following People Through the Space

We used a deep learning model to detect individuals in each camera feed and then a tracking approach to follow their movement over time.

Even in a POC, one insight became clear quickly: Detecting people is easy. Following them reliably is the real challenge.

People cross paths, objects get in the way, and camera angles differ. To handle this, our tracking approach didn't just rely on position — it also used appearance cues to keep identity consistent when movement got complicated. This helped significantly, though tracking across multiple cameras is still an area that would need refinement in a production environment.

Bringing All Views Into One Single Map

Each camera sees the world from its own angle. So we aligned detections from all cameras onto a shared top-down floor layout. With a simple calibration step, the system could translate "where someone is in a camera view" into "where they are in the real space."

This is what turned footage into usable spatial intelligence.

Turning Movement Into Insight: Heatmaps

Once positions were unified, we generated dynamic heatmaps. These clearly showed:

  • Where people cluster
  • Which paths are most used
  • Which areas receive little or no engagement

For stakeholders, this was the breakthrough moment. No technical explanation needed — the insight is visual and intuitive.

Cross-Camera Person Re-ID, Built End-to-End

A Cost-Effective and Scalable Approach

One of the biggest learnings from this POC was that you don't need expensive new hardware to unlock these insights. The system can run on existing camera infrastructure, which means organizations don't have to invest in specialized sensors or proprietary counting devices.

We also intentionally built on proven, widely-used computer vision models, rather than jumping directly to heavy generative AI architectures. This keeps compute requirements reasonable and makes the solution easier to maintain, deploy, and scale.

Generative AI can layer on top later — for example, to answer complex, predictive questions:

To Predict

Instead of just seeing past patterns, you can forecast future ones.

  • "Based on the last three Fridays, predict checkout queue lengths for this Friday's 5:00 PM rush, allowing us to open new registers 20 minutes before the rush begins."
  • "Forecast security checkpoint wait times for Monday morning based on flight schedules, triggering automated alerts for passengers."

To Recommend

It can move from observation to active suggestion.

  • "Recommend an optimized staffing plan by suggesting we move one employee from the quiet 'Zone A' to the busy 'Zone C' for the next 45 minutes."
  • "Ask, 'Where is the optimal placement for our new product display?' and get three data-backed layout suggestions to maximize engagement."

To Simulate

You can test changes virtually before committing real-world resources.

  • "Simulate 'what-if we close this main aisle for a 2-hour cleaning?' to see the ripple effect on crowd flow and find the least disruptive time to do it."
  • "Run a virtual simulation of an emergency evacuation with the current layout to identify and fix potential bottlenecks before a real event occurs."

But the foundational value comes from accurate detection + reliable tracking + intuitive visualization. Starting here ensures the system is cost-effective today, and future-ready if AI-powered forecasting becomes relevant.

What This POC Confirmed

  • Technical Feasibility — Multi-camera tracking is absolutely achievable with modern computer vision models.
  • Value of Visualization — Heatmaps turned out to be the most intuitive bridge between raw data and operational decisions.
  • The Real Value is in Context, Not Just Detection — Detecting people is easy. Understanding how they move through space is where the strategic insight emerges.

What We Would Explore Next

If we continue to production-scale development, we'd focus on:

  • Real-time streaming — Moving from sampled frames to continuous live insights.
  • Faster camera calibration workflows — Reducing setup time so the system can adapt to new layouts or reconfigurations quickly.
  • Operational alerts and triggers — For example: congestion warnings, queue-length thresholds, or automated staff allocation prompts.
  • Combining movement data with business data — Integrating heatmap activity with data from cash registers, sales systems, or customer flow metrics would unlock higher-level insight — not just where people move, but why. This would enable correlations such as which product placements drive more traffic, or which checkout configurations minimize wait times.
  • Data governance and privacy layering — Ensuring the system can be deployed responsibly (edge, hybrid, or cloud) while protecting identity and complying with organizational policies.
"This wasn't about launching a finished product—it was about reducing uncertainty."

Why It Matters

From retail layout optimization to urban planning to event operations, the ability to see how people naturally move unlocks trustworthy decisions:

  • Smarter resource allocation
  • More intuitive environments
  • Improved safety and flow
  • Increased commercial performance

And those decisions only improve when teams across design, operations, and strategy can collaborate around shared insights.

Contact Us

OFFICE

Belgrade

Dositejeva 21

11000 Belgrade, Serbia

[email protected]

HOURS

Business Hours

Monday - Friday: 9:00 AM - 6:00 PM

Saturday - Sunday: Closed

GET IN TOUCH

General Inquiries

For general information and inquiries

[email protected]

Business Development

For partnership and business opportunities

[email protected]