Rovr: Releasing The Open Dataset
- Existing models underperform when tested on ROVR’s data, demonstrating that current approaches don’t generalize to its challenging scenarios.
- This data is vital for training and deploying advanced systems like autonomous vehicles, robotics, and spatial artificial intelligence solutions.
- Data collection within the ROVR ecosystem relies on two specialized devices: TarantulaX (TX) and LightCone (LC).
- TarantulaX is a compact hardware device that mounts on the vehicle roof and links to a driver’s smartphone over Bluetooth.
What Happened
ROVR is launching a set of open challenges, one on predicting distance from a single camera image and another on finding and labeling objects using a camera plus LiDAR, accelerating spatial AI development through fostered collaboration.
ROVR’s Open Dataset is built to move past those limits by focusing on diverse geographies and conditions. The initial release includes ~200,000 clips (synchronized snapshots combining a camera image, LiDAR-based depth, and the car’s location and direction), with a path to expand to 1 million clips using the same acquisition and processing pipeline. Each sequence, a continuous stretch of a drive recorded as a short clip, comes with clean calibration and per-frame references. This first release focuses on monocular depth, while laying a direct path to object detection and scene labeling. Announced at the ADAS & Autonomous Vehicle Technology Summit, the Open Dataset draws on ROVR’s broader network, spanning 50+ countries and more than 20 million kilometers of real-world driving, to seed a resource for autonomous driving and spatial-AI research.
Files are packaged in a researcher-friendly layout with images at 1920×1080, point clouds in PCD, LiDAR-projected depth maps in single-channel PNG, and navigation tracks (GNSS/INS poses plus 100 Hertz IMU logs). All files carry nanosecond timestamps to keep modalities strictly aligned.
Market Context
Data collection within the ROVR ecosystem relies on two specialized devices: TarantulaX (TX) and LightCone (LC). TarantulaX is a compact hardware device that mounts on the vehicle roof and links to a driver’s smartphone over Bluetooth. By feeding centimeter-level corrections from GEODNET, it turns everyday mobile video into accurate geospatial data. Meanwhile, LightCone is a roof-mounted sensor that pairs an automotive LiDAR, ADAS-grade camera, tri-band RTK satellite antenna, and high-precision IMU to capture centimeter-accurate 3D data. Users who contribute quality data are rewarded with ROVR tokens, incentivized based on factors including the amount of data collected (measured in mapping mileage), data quality, and frequency of road revisits.
ROVR Open Dataset is captured as time-linked video and laser measurements, so models learn depth from motion, not frozen frames. Each sequence bundles synchronized RGB images, LiDAR point clouds, and per-frame vehicle pose, making depth supervision consistent across time rather than inferred from single snapshots.
To capture this geospatial data, vehicles use a 126-beam solid-state LiDAR with a 200-meter range and a vertical field of view of ±12.5 degrees, paired with a forward-facing HD RGB camera at 1920×1080. Although the camera can operate at 30 Hz, both the camera and LiDAR are logged at 5 Hz in the dataset to provide matched clips for depth learning. A triple-frequency RTK GNSS receiver, aided by GEODNET corrections, supplies centimeter-level positioning, and an automotive-grade IMU stabilizes orientation and motion estimates. Sensors are hardware-synchronized with GPS-disciplined clocks, so the time offset between modalities stays below 2 ms.
Why It Matters
Training spatial AI solutions, like self-driving cars and humanoid robots, requires accurate and scalable geospatial data adaptable to edge cases. Depth shows how far a part of the image is from the camera. It’s a go-to metric because LiDAR measures those distances accurately and describes the 3D shape needed for navigation. Depth estimation involves measuring distances to roads, cars, people, and buildings across many places, times of day, and weather conditions. Depth is the basic signal these systems use to decide where they can and cannot go.
Details
Key Insights
ROVR is releasing an Open Dataset consisting of ~200,000 clips (synchronized snapshots combining imaging and LiDAR) via the ROVR Open Dataset, democratizing access to real-world 3D data.
Existing models underperform when tested on ROVR’s data, demonstrating that current approaches don’t generalize to its challenging scenarios.
Training on ROVR reduced absolute relative error scores across models, with VA-DepthNet, DCDepth, and IEBins down 69.3%, 69.6%, and 64.7% versus KITTI training, respectively.
Primer
ROVR Network (ROVR) is a decentralized physical infrastructure network (DePIN) dedicated to constructing a comprehensive geospatial data platform through specialized hardware and software solutions. Its mission is to collect and produce large-scale, highly accurate 3D geospatial and 4D spatiotemporal data from real-world environments, addressing the critical bottleneck in quality 3D data availability. This data is vital for training and deploying advanced systems like autonomous vehicles, robotics, and spatial artificial intelligence solutions. By democratizing access to critical resources traditionally monopolized by large corporations, ROVR empowers individual contributors to participate directly in the economic benefits of the AI-driven economy.
The data gathered is subsequently transformed into high-definition (HD) maps that deliver centimeter-level precision and detailed environmental context, critical for applications such as autonomous vehicle navigation. ROVR's 3D data generation tools support the training of advanced AI models, enabling precise scene editing and the creation of synthetic data based on actual real-world conditions.
Website / X / Discord / Telegram
The Geospatial Bottleneck
Today, most high-quality geospatial depth data is held by a few companies. Public datasets exist, but they are either small, narrow, or expensive to reproduce. This creates a data bottleneck: models learn a handful of routes very well, then falter when the scene shifts.
Depth benchmarks come from recorded drives with synchronized camera and LiDAR rigs. They differ in scene coverage, label fidelity, and rig complexity, and those tradeoffs result in varying limitations. KITTI is compact and filmed in a few calm towns, so models tend to memorize those settings. nuScenes spans more locations and weather, but its reference distances can be noisy when sensors are not perfectly aligned. DDAD reaches farther with denser references, but the complex hardware makes large-scale collection costly.
Enter the ROVR Open Dataset
The dataset is free and open for non-commercial use with attribution. A commercial license is available for companies to use it in products and services, also with attribution.
Data Quality & Collection
The first release consists of 1,363 clips of roughly thirty seconds each, split into 1,296 training clips and 67 test clips. At five frames per second for both camera and LiDAR, a thirty-second clip yields about 150 paired frames; the training and test splits together include 193,648 and 10,002 clips, respectively. For comprehensive details on data acquisition, refer to Guo et al., ROVR-Open-Dataset: A Large-Scale Depth Dataset for Autonomous Driving.
Collection spans North America, Europe, and Asia, with 10,000+ hours across urban, suburban, and highway routes in day, night, and rain. The diverse settings reflect that breadth: highway, urban, and rural scenes appear under normal illumination, low-light, and precipitation.
Model Performance