NVIDIA, with Hong Kong Polytechnic University and Nanjing University, has released LocateAnything, an open-source visual grounding model: give it one image and one sentence, and it draws a box around whatever you asked for. Thanks to a new "draw the whole box at once" parallel-decoding method, it locates nearly 13 targets per second on a single top-end GPU — outpacing comparable models on both speed and accuracy — and it's free to download.