Real-time vehicle classification
An OpenCV capture pipeline feeds a ResNet50 inference loop on a Raspberry Pi 4, with sub-second end-to-end latency and 97% accuracy. The pipeline is the project — the model is just a component inside it.
When I first profiled this system, the model was not the bottleneck. The capture loop was. OpenCV was happily handing me 1080p frames that the model immediately downsized. I was paying for resolution I was throwing away. Dropping the capture resolution at the source took more latency off the system than any model change I ever made.
That is the lesson I keep relearning in CV: look at the whole pipeline. Optimise the part that is actually slow, not the part you assume is slow.
Vision at the edge
Optimized pre-processing, frame skipping, and a separate capture thread keep the vision loop responsive on constrained edge hardware. Most of the wins came from getting out of the model's way, not from changing the model.
The single change that bought me the most: a producer-consumer setup where one thread reads frames into a small ring buffer and another thread pulls the freshest frame for inference. Old frames get dropped. The model is always working on the most recent frame, never the oldest one. The user-visible effect is that the system feels 'alive' even when load spikes.
For edge CV, this kind of structural plumbing matters more than the cool things you can do with the model. The cool things only show up if the plumbing holds.
Stack
Python, OpenCV, TensorFlow, NumPy, and a Raspberry Pi 4 — the same hardware the demos run on, not a stand-in. Developing on the target is one of those things I resisted at first and now insist on. The Pi has its own quirks — clock throttling under load, USB camera bandwidth limits, SD card latency — and you only respect them when you live with them.
Why CV at the edge is worth the trouble
Cloud CV is easier. But the moment you need a closed-loop reaction — a barrier that lifts, a light that changes, a robot that swerves — the round-trip to a cloud GPU stops making sense. The decision has to live next to the camera.
That is the niche I keep coming back to. Vision systems that have to act, not just observe. They are harder to build, harder to debug, and far more interesting to ship.
