Smart Focus Streaming -- Hackathon

Smart Focus Streaming is a hackathon PoC: an Android device streams two video tracks (low-res context + high-res / ROI source), a FastAPI backend proxies signalling and stores camera state, and a React dashboard composites the feeds in the browser and drives focus over a WebRTC DataChannel.

For the hackathon story and 2nd place at MWC 2026 / Talent Arena, see the blog post:
Open Gateway Hackathon 2026 – 2nd Place.

TL;DR

Piece	Role
Android	CameraX + TFLite, WebRTC (2× `recvonly` tracks for the browser), `commands` DataChannel, HTTP POST signalling on `:8888`
Backend	MongoDB cameras, focus/mode REST, SDP POST proxy + URL normalisation (`8080` → `8888`)
React	`WebRTCPlayer`: offer/answer via proxy, canvas compositor (OpenCV-style mask + blur), `getStats()` → kbps
NaC	QoD / Network as Code exercised in separate backend test scripts

Problem & design goals

Problem: Full-frame HD everywhere burns bandwidth; often only a sub-region needs detail.

Design:

LOW – show context only (cheap).
HIGH – full-frame high-res (baseline “best quality”).
VISION – high-res non-black regions from the source track blended over context (ML-driven ROI on device).
HYBRID – same blend, but the ROI is also constrained by a user rectangle (0–100% of frame) sent over the DataChannel as hybrid_rect.

The dashboard must:

Negotiate WebRTC without the browser calling the phone’s HTTP server directly (CORS).
Show measured inbound video bitrate, not mocks, when using the real API.

Architecture

┌─────────────────┐     WebRTC (2 video + DataChannel)   ┌──────────────────┐
│  Android device │ ◄──────────────────────────────────► │  React dashboard │
│  Signaling :8888│     SDP: browser → FastAPI → device  │  WebRTCPlayer    │
└────────┬────────┘                                      └────────┬─────────┘
         │                                                        │
         │                 ┌──────────────────┐                 │
         └────────────────►│  FastAPI + Mongo  │◄────────────────┘
                           │  REST + SDP proxy  │
                           └──────────────────┘

Why a proxy? The phone exposes signalling as plain HTTP POST (SDP body). Browsers on another origin cannot safely POST to http://<device-ip>:8888 without CORS headers on the device. The backend forwards the offer and returns the answer, so the dashboard only talks to same-origin /api/....

Signalling & URL normalisation

Registration stores signalingUrl. The proxy normalises host/port before httpx POST:

def normalize_signaling_url(raw: str) -> str:
    url_str = (raw or "").strip()
    if "://" not in url_str:
        url_str = f"http://{url_str}"
    parsed = urlparse(url_str)
    if not parsed.hostname:
        return raw
    port = parsed.port
    if port is None:
        port = 8888
    elif port == 8080:
        port = 8888
    netloc = f"{parsed.hostname}:{port}"
    return urlunparse((parsed.scheme or "http", netloc, parsed.path or "", parsed.params or "", parsed.query or "", parsed.fragment or ""))

On ConnectError, the backend can retry with port 8888 when older entries still pointed at legacy ports.

Frontend: WebRTC setup (offer via proxy)

The player creates two recvonly video transceivers, a commands DataChannel, gathers ICE, then POSTs the SDP as text/plain:

const dataChannel = pc.createDataChannel("commands");
pc.addTransceiver("video", { direction: "recvonly" });
pc.addTransceiver("video", { direction: "recvonly" });

const offer = await pc.createOffer();
await pc.setLocalDescription(offer);
// … wait for ICE gathering …

const res = await fetch(`/api/dashboard/cameras/${cameraId}/sdp-offer`, {
  method: "POST",
  body: pc.localDescription?.sdp,
  headers: { "Content-Type": "text/plain" },
});
const sdpAnswer = await res.text();
await pc.setRemoteDescription(new RTCSessionDescription({ type: "answer", sdp: sdpAnswer }));

Tracks are routed to two hidden <video> elements (playsInline, not display:none—some browsers defer decoding when fully hidden). Track IDs containing context vs source/high disambiguate streams when metadata is available.

DataChannel: mode + hybrid rectangle

Android expects JSON with mode and optional hybrid_rect in percentage space (0–100):

const cmd = {
  mode: streamingMode,
  hybrid_rect:
    streamingMode === "HYBRID" && focusArea
      ? {
          x1: focusArea.x,
          y1: focusArea.y,
          x2: focusArea.x + focusArea.width,
          y2: focusArea.y + focusArea.height,
        }
      : undefined,
};
dataChannel.send(JSON.stringify(cmd));

The same rectangle is persisted via REST for the dashboard; the DataChannel is the low-latency path to the device encoder/compositor.

Compositor (browser ↔ OpenCV reference)

For VISION / HYBRID, the browser matches the Python/OpenCV pipeline in webrtc_receiver.py: threshold on luma (~18), binary mask, ~31px feather (via canvas filter: blur(31px)), cut bleed with the hard mask, then destination-in composite so high-res only shows through the feathered alpha. HIGH replaces the full canvas with the source; LOW keeps context only.

High-level loop (simplified):

// 1. Draw low-res context full frame
ctx.drawImage(contextFeed, 0, 0, canvas.width, canvas.height);

if (streamingMode === "HIGH") {
  ctx.drawImage(sourceFeed, 0, 0, canvas.width, canvas.height);
} else if (streamingMode !== "LOW") {
  // Build mask from source luma > ~18, blur, bitwise clamp, then:
  tCtx.drawImage(sourceFeed, 0, 0, canvas.width, canvas.height);
  tCtx.globalCompositeOperation = "destination-in";
  tCtx.drawImage(blurCanvas, 0, 0, canvas.width, canvas.height);
  ctx.drawImage(tempCanvas, 0, 0);
}

This keeps lab demos and field viewers visually aligned without shipping native OpenCV in the browser.

Bandwidth: `getStats()` → kbps

A ~2s timer walks RTCPeerConnection.getStats(), sums inbound-rtp video bytesReceived, and converts deltas to kbps; zero bytes forces 0 so the UI shows 0.0 Mbps when the stream is down.

const stats = await pc.getStats();
let bytes = 0;
stats.forEach((report: RTCStatsReport) => {
  if (report.type === "inbound-rtp" && report.kind === "video" && !report.isRemote) {
    if (typeof report.bytesReceived === "number") bytes += report.bytesReceived;
  }
});
// … delta bytes / delta time → kbps, callback to dashboard state …

The useCameras hook exposes updateBandwidth(cameraId, kbps); mock fluctuation runs only when dataSource === "mock".

Stack

Layer	Tech
Device	Kotlin, CameraX, TFLite, WebRTC, HTTP signalling
Backend	Python 3.10+, FastAPI, Motor/Mongo, httpx, python-socketio
Frontend	React 18, Vite, TypeScript, Tailwind, Canvas 2D
Network	Nokia Network as Code (QoD tests in `backend/testing/`)

Repository layout (monorepo)

backend/ – FastAPI app, SDP proxy, camera CRUD, focus/mode updates.
front-dashboard/ – WebRTCPlayer, FocusAreaSelector, CameraCard, CameraDetail, useCameras.
apps_kotline/SurveillanceModel_app/ – Android streamer + optional receiver.py / OpenCV-style viewers for parity checks.

What’s next (technical)

One RTCPeerConnection per camera shared across grid, drawer, and detail (or recvonly transceiver reuse strategy) to cut load on the device.
Encoder-side ROI or SVC layers to substantiate bitrate savings beyond client-side compositing.
NaC hooks in the hot path (QoD on ICE failure or high-motion ROI).

TL;DR#

Problem & design goals#

Architecture#

Signalling & URL normalisation#

Frontend: WebRTC setup (offer via proxy)#

DataChannel: mode + hybrid rectangle#

Compositor (browser ↔ OpenCV reference)#

Bandwidth: getStats() → kbps#

Stack#

Repository layout (monorepo)#

What’s next (technical)#