LocateAnything-3B — fully in-browser, WebGPU, INT4

Open-vocabulary detection running 100% client-side via onnxruntime-web (WebGPU). Model: Reza2kn/LocateAnything-3B-ONNX-WebGPU-INT4 · source nvidia/LocateAnything-3B. INT4 language tower + custom 4-bit embedding gather + KV cache. No server inference.

checking WebGPU…model not loaded

Sample images

Or upload your own Category prompt Max new tokens: 96

Decoded output