Smart Focus Streaming is a hackathon PoC: an Android device streams two video tracks (low-res context + high-res / ROI source), a FastAPI backend proxies signalling and stores camera state, and a React dashboard composites the feeds in the browser and drives focus over a WebRTC DataChannel.
For the hackathon story and 2nd place at MWC 2026 / Talent Arena, see the blog post:
Open Gateway Hackathon 2026 – 2nd Place.
TL;DR
| Piece | Role |
|---|---|
| Android | CameraX + TFLite, WebRTC (2× recvonly tracks for the browser), commands DataChannel, HTTP POST signalling on :8888 |
| Backend | MongoDB cameras, focus/mode REST, SDP POST proxy + URL normalisation (8080 → 8888) |
| React | WebRTCPlayer: offer/answer via proxy, canvas compositor (OpenCV-style mask + blur), getStats() → kbps |
| NaC | QoD / Network as Code exercised in separate backend test scripts |
Problem & design goals
Problem: Full-frame HD everywhere burns bandwidth; often only a sub-region needs detail.
Design:
- LOW – show context only (cheap).
- HIGH – full-frame high-res (baseline “best quality”).
- VISION – high-res non-black regions from the source track blended over context (ML-driven ROI on device).
- HYBRID – same blend, but the ROI is also constrained by a user rectangle (0–100% of frame) sent over the DataChannel as
hybrid_rect.
The dashboard must:
- Negotiate WebRTC without the browser calling the phone’s HTTP server directly (CORS).
- Show measured inbound video bitrate, not mocks, when using the real API.
Architecture
┌─────────────────┐ WebRTC (2 video + DataChannel) ┌──────────────────┐
│ Android device │ ◄──────────────────────────────────► │ React dashboard │
│ Signaling :8888│ SDP: browser → FastAPI → device │ WebRTCPlayer │
└────────┬────────┘ └────────┬─────────┘
│ │
│ ┌──────────────────┐ │
└────────────────►│ FastAPI + Mongo │◄────────────────┘
│ REST + SDP proxy │
└──────────────────┘
Why a proxy? The phone exposes signalling as plain HTTP POST (SDP body). Browsers on another origin cannot safely POST to http://<device-ip>:8888 without CORS headers on the device. The backend forwards the offer and returns the answer, so the dashboard only talks to same-origin /api/....
Signalling & URL normalisation
Registration stores signalingUrl. The proxy normalises host/port before httpx POST:
def normalize_signaling_url(raw: str) -> str:
url_str = (raw or "").strip()
if "://" not in url_str:
url_str = f"http://{url_str}"
parsed = urlparse(url_str)
if not parsed.hostname:
return raw
port = parsed.port
if port is None:
port = 8888
elif port == 8080:
port = 8888
netloc = f"{parsed.hostname}:{port}"
return urlunparse((parsed.scheme or "http", netloc, parsed.path or "", parsed.params or "", parsed.query or "", parsed.fragment or ""))
On ConnectError, the backend can retry with port 8888 when older entries still pointed at legacy ports.
Frontend: WebRTC setup (offer via proxy)
The player creates two recvonly video transceivers, a commands DataChannel, gathers ICE, then POSTs the SDP as text/plain:
const dataChannel = pc.createDataChannel("commands");
pc.addTransceiver("video", { direction: "recvonly" });
pc.addTransceiver("video", { direction: "recvonly" });
const offer = await pc.createOffer();
await pc.setLocalDescription(offer);
// … wait for ICE gathering …
const res = await fetch(`/api/dashboard/cameras/${cameraId}/sdp-offer`, {
method: "POST",
body: pc.localDescription?.sdp,
headers: { "Content-Type": "text/plain" },
});
const sdpAnswer = await res.text();
await pc.setRemoteDescription(new RTCSessionDescription({ type: "answer", sdp: sdpAnswer }));
Tracks are routed to two hidden <video> elements (playsInline, not display:none—some browsers defer decoding when fully hidden). Track IDs containing context vs source/high disambiguate streams when metadata is available.
DataChannel: mode + hybrid rectangle
Android expects JSON with mode and optional hybrid_rect in percentage space (0–100):
const cmd = {
mode: streamingMode,
hybrid_rect:
streamingMode === "HYBRID" && focusArea
? {
x1: focusArea.x,
y1: focusArea.y,
x2: focusArea.x + focusArea.width,
y2: focusArea.y + focusArea.height,
}
: undefined,
};
dataChannel.send(JSON.stringify(cmd));
The same rectangle is persisted via REST for the dashboard; the DataChannel is the low-latency path to the device encoder/compositor.
Compositor (browser ↔ OpenCV reference)
For VISION / HYBRID, the browser matches the Python/OpenCV pipeline in webrtc_receiver.py: threshold on luma (~18), binary mask, ~31px feather (via canvas filter: blur(31px)), cut bleed with the hard mask, then destination-in composite so high-res only shows through the feathered alpha. HIGH replaces the full canvas with the source; LOW keeps context only.
High-level loop (simplified):
// 1. Draw low-res context full frame
ctx.drawImage(contextFeed, 0, 0, canvas.width, canvas.height);
if (streamingMode === "HIGH") {
ctx.drawImage(sourceFeed, 0, 0, canvas.width, canvas.height);
} else if (streamingMode !== "LOW") {
// Build mask from source luma > ~18, blur, bitwise clamp, then:
tCtx.drawImage(sourceFeed, 0, 0, canvas.width, canvas.height);
tCtx.globalCompositeOperation = "destination-in";
tCtx.drawImage(blurCanvas, 0, 0, canvas.width, canvas.height);
ctx.drawImage(tempCanvas, 0, 0);
}
This keeps lab demos and field viewers visually aligned without shipping native OpenCV in the browser.
Bandwidth: getStats() → kbps
A ~2s timer walks RTCPeerConnection.getStats(), sums inbound-rtp video bytesReceived, and converts deltas to kbps; zero bytes forces 0 so the UI shows 0.0 Mbps when the stream is down.
const stats = await pc.getStats();
let bytes = 0;
stats.forEach((report: RTCStatsReport) => {
if (report.type === "inbound-rtp" && report.kind === "video" && !report.isRemote) {
if (typeof report.bytesReceived === "number") bytes += report.bytesReceived;
}
});
// … delta bytes / delta time → kbps, callback to dashboard state …
The useCameras hook exposes updateBandwidth(cameraId, kbps); mock fluctuation runs only when dataSource === "mock".
Stack
| Layer | Tech |
|---|---|
| Device | Kotlin, CameraX, TFLite, WebRTC, HTTP signalling |
| Backend | Python 3.10+, FastAPI, Motor/Mongo, httpx, python-socketio |
| Frontend | React 18, Vite, TypeScript, Tailwind, Canvas 2D |
| Network | Nokia Network as Code (QoD tests in backend/testing/) |
Repository layout (monorepo)
backend/– FastAPI app, SDP proxy, camera CRUD, focus/mode updates.front-dashboard/–WebRTCPlayer,FocusAreaSelector,CameraCard,CameraDetail,useCameras.apps_kotline/SurveillanceModel_app/– Android streamer + optionalreceiver.py/ OpenCV-style viewers for parity checks.
What’s next (technical)
- One
RTCPeerConnectionper camera shared across grid, drawer, and detail (or recvonly transceiver reuse strategy) to cut load on the device. - Encoder-side ROI or SVC layers to substantiate bitrate savings beyond client-side compositing.
- NaC hooks in the hot path (QoD on ICE failure or high-motion ROI).