The World's Leading Managed Robot Data Collection Service

Expert operators. Professional teleoperation hardware. Your dataset format of choice. From 20-episode pilots to 10,000+ episode production campaigns — we collect the training data your robot policies need.

Leader-Follower Teleoperation VR & Glove Collection HDF5 / RLDS / LeRobot Delivery Research-Grade QA

Why Robot Training Data Quality Matters

Most robot learning failures are data failures. Here are the three problems every ML team hits — and how SVRC solves each one.

Hard to collect at scale

Building a data collection station takes weeks. Recruiting and training operators takes longer. Most teams spend 60% of their project timeline on infrastructure instead of research. SVRC operates multi-station facilities with trained operators ready to start collecting within days of project kickoff.

Operator quality varies wildly

An untrained operator produces demonstrations with inconsistent strategies, failed grasps, and jerky trajectories. These demonstrations actively harm policy training. SVRC operators complete a qualification program covering approach consistency, grasp precision, and temporal smoothness before they touch your data.

Format inconsistency wastes months

Different labs store data in incompatible schemas. Timestamp conventions differ. Camera naming is inconsistent. Converting between formats introduces subtle bugs. SVRC delivers datasets in your exact target format — HDF5, RLDS, or LeRobot — validated against your training pipeline before handoff.

How a Data Campaign Actually Works

Six phases from first conversation to training-ready dataset. Every campaign follows this process.

1

Kickoff Call & Task Design

We work with your team to define the task specification: success criteria, observation space (which cameras, what resolution, what frame rate), action space (joint positions vs. end-effector velocity vs. delta actions), and scene diversity requirements (object variations, lighting, initial positions). You receive a detailed collection protocol document for review before any data is collected. Typical duration: 1-3 days.

2

Hardware Configuration

We configure the robot arm, teleoperation interface, cameras (Intel RealSense D435/D455, ZED 2i, or your cameras), and workspace fixtures for your task. Camera extrinsics are calibrated, time synchronization is verified to <5 ms across all streams using hardware-triggered cameras and shared clock sources. A test episode validates the full pipeline end-to-end. Typical duration: 1-2 days.

3

Operator Training & Qualification

Operators are trained on your specific task before collection begins. Each operator must pass a proficiency test demonstrating consistent approach strategy, precise grasps, smooth trajectories, and correct scene resets. Operators who fail the qualification test are re-trained or replaced. We track per-operator quality metrics throughout the campaign. Typical duration: 0.5-1 day.

4

Collection Sprints

Qualified operators collect demonstrations in structured sprints. Real-time quality monitoring flags failed episodes, inconsistent strategies, synchronization errors, and frame drops. Scene diversity is managed according to the protocol — object positions, lighting, and distractors are varied systematically. You receive daily progress reports with throughput metrics, quality statistics, and example episodes.

5

Quality Control & Validation

Every episode passes our 10-point QA checklist (see below). Failed episodes are flagged, excluded from the primary dataset, and made available separately on request. Dataset-level validation checks format consistency, schema compliance, and statistical properties (action distributions, episode length distributions, success rates). A QA report accompanies every delivery.

6

Delivery & Handoff

Validated data is exported to your target format (HDF5, RLDS, LeRobot, or custom) and delivered via secure transfer, Hugging Face Hub, or directly into your Fearless Platform workspace. We include the collection protocol, QA report, camera calibration files, and a README with dataset schema documentation. Post-delivery support: 2 weeks of format/pipeline troubleshooting included.

Data Specifications

Technical specs for the data streams we capture during collection.

Data Stream Frequency Resolution Notes
Joint states 30 Hz (configurable to 50 Hz) float64 Position, velocity, effort for each joint
RGB cameras 60 fps (configurable) 640x480 or 1280x720 1-4 views typical (wrist, overhead, side, ego)
Depth cameras 30 fps 640x480 Intel RealSense D435/D455, aligned to RGB
End-effector pose 30 Hz 6-DOF (xyz + rpy) Forward kinematics from joint states
Gripper state 30 Hz float (aperture) Continuous aperture + binary open/close
Force/torque 100 Hz (when available) 6-axis Wrist-mounted F/T sensor (UR, Franka)
Annotations Per-episode Structured JSON Task phase labels, language instructions, keyframes, success/failure

All streams are synchronized to <5 ms using hardware-triggered cameras and a shared clock source. Timestamp format: Unix epoch (float64, seconds).

Data Collection Methods

We select the teleoperation method that matches your task requirements. Here is how they compare.

Method How It Works Precision Scale (demos/hr) Cost/Episode Best For
Leader-Follower Arms Operator moves lightweight leader arm; follower replicates joint positions at 3-8 ms latency Highest 20-35 $$ Contact-rich manipulation, insertion, bimanual tasks
VR / Quest 3 Hand controller positions mapped to end-effector via inverse kinematics Good 15-25 $ Pick-and-place, sorting, packing, gross manipulation
SpaceMouse / Keyboard 6-DOF joystick controls end-effector velocity; keyboard triggers discrete actions Moderate 5-12 $ Prototyping, navigation, low-precision tasks
Haptic Gloves Finger joint tracking drives dexterous robot hands with force feedback High (dexterous) 8-15 $$$ In-hand manipulation, assembly, tool use
Kinesthetic Teaching Operator physically guides the robot arm through the task in gravity-compensation mode High 10-18 $$ Simple tasks with compliant arms, quick data
Scripted Demos Programmatic waypoint trajectories with randomized perturbations Exact (deterministic) 60-200+ $ Structured tasks, data augmentation, baseline generation

Not sure which method fits your task? Talk to our team — we will recommend the right approach based on your task requirements and budget.

Operator Quality Assurance Process

The quality of demonstrations directly determines the quality of trained policies. Here is how we ensure operator quality.

Qualification Testing

Before touching production data, every operator must pass a task-specific proficiency test. We evaluate approach consistency (does the operator use the same general strategy each time?), grasp precision (does the gripper close at the correct position and angle?), trajectory smoothness (are motions fluid or jerky?), and scene reset accuracy (are objects returned to valid initial positions?).

Real-Time Monitoring

During collection, automated monitoring flags potential issues: episodes where joint velocities exceed normal bounds, where the gripper state does not match visual evidence, where frame drops exceed 2%, or where episode duration falls outside the expected range. Flagged episodes are reviewed by a senior operator before inclusion in the dataset.

Per-Operator Metrics

We track success rate, average episode duration, trajectory smoothness score, and QA pass rate for each operator. Operators whose metrics drift below thresholds are re-trained or reassigned. Campaign-level quality reports break down these metrics so you can see exactly who collected which data and at what quality level.

Output Formats

We deliver datasets in the format your training pipeline needs — no conversion headaches on your end.

HDF5

The gold standard for robot data. Native to ACT, ALOHA, and Diffusion Policy. Hierarchical episode structure with efficient random access and mature Python tooling via h5py.

RLDS / TFRecord

The format behind Open X-Embodiment and Octo. TensorFlow Datasets schema for cross-embodiment training. Streamable from cloud storage with efficient tf.data pipelines.

LeRobot / Parquet

Hugging Face ecosystem native. One-command upload to HF Hub with built-in visualization. Compact MP4 video storage. Growing community with 300+ public datasets.

Custom Formats

Need ROS bag, CSV, JSON-lines, or a proprietary schema? We write custom export adapters. You provide the target schema; we handle conversion and validation.

Read our detailed HDF5 vs RLDS vs LeRobot format comparison guide for technical deep dives on each format.

How Many Demonstrations Do You Need?

Dataset size depends on task complexity, policy architecture, and target success rate. Here are benchmarks from real campaigns.

Task Complexity Example Tasks Demos for 80%+ Demos for 90%+ Notes
Simple single-arm Pick-and-place, push, open drawer 20-50 50-100 ACT/Diffusion Policy, fixed objects
Moderate single-arm Insertion, stacking, tool use 50-150 150-300 Contact-rich, position-sensitive
Bimanual Folding, handover, coordinated assembly 100-300 300-600 Two-arm coordination required
High diversity Multi-object sorting, variable geometry 200-500 500-1,000 Many object/scene variations
VLA / generalist Language-conditioned multi-task 500-2,000 2,000-10,000+ Large-scale, multi-embodiment

These are guidelines based on published research and SVRC campaign data. Actual requirements depend on your specific policy architecture, training regime, and generalization targets. We help scope the right dataset size during the kickoff call.

Pricing

Transparent pricing aligned to your project stage. Every tier includes task design, hardware setup, expert collection, QA, and delivery.

Pilot

20 demonstrations

$2,500
  • Task design & protocol document
  • Single collection station
  • 1 qualified operator
  • 10-point QA on every episode
  • Delivery in 1 format (HDF5, RLDS, or LeRobot)
  • 1-2 week turnaround
  • 2 weeks post-delivery support
Start a Pilot
Most Popular

Campaign

100 demonstrations

$8,000
  • Everything in Pilot, plus:
  • Multi-station parallel collection
  • 2-4 dedicated operators
  • Weekly batch deliveries
  • Scene diversity management
  • Delivery in up to 2 formats
  • 2-6 week turnaround
Start a Campaign

Enterprise

Custom scale / ongoing

Custom
  • Everything in Campaign, plus:
  • Dedicated collection infrastructure
  • On-site or co-located deployment
  • SLA with uptime guarantees
  • Custom robot integration
  • All formats + Fearless Platform access
  • Ongoing support & iteration
Contact Sales

Compatible Hardware

We operate and integrate with a wide range of robot arms. If your platform is ROS2-compatible, we can collect data on it.

OpenArm 101

Open-source, SVRC-designed

DK1 Bimanual

Dual-arm kit with leader-follower

Franka FR3

Research-grade torque control

UR3e / UR5e

Industrial collaborative arms

Unitree G1

Humanoid full-body

xArm 6/7

Cost-effective 6/7-DOF

Kinova Gen3

Lightweight research arm

Custom

Ship us your robot

See our full hardware catalog for specifications and availability. Leasing rates available for all platforms.

10-Point Data Quality Checklist

Every episode we deliver passes this checklist. No exceptions.

  1. Synchronized timestamps — All sensor streams (cameras, joints, actions) aligned to <5 ms tolerance using hardware-triggered cameras and shared clock sources.
  2. Consistent episode structure — Every episode follows the same observation/action schema with identical array dimensions, data types, and key names.
  3. Operator qualification — Operators pass a proficiency test on the specific task before their episodes enter the production dataset.
  4. Task success verification — Each episode is reviewed for full task completion. Failed episodes are flagged and excluded from the primary dataset (available separately on request).
  5. Scene reset consistency — Object positions, lighting, and workspace state are reset to defined initial conditions between episodes. Randomization ranges are documented.
  6. Frame drop monitoring — Camera streams are checked for dropped frames. Episodes with >2% frame loss are re-collected.
  7. Gripper state consistency — Gripper open/close signals are validated against camera evidence. Phantom gripper events are corrected or flagged.
  8. Joint limit compliance — No episode contains joint positions outside the robot's safe operating range or near singularity configurations.
  9. Metadata completeness — Every episode includes task name, operator ID, timestamp, robot serial, camera config, and success label as structured metadata.
  10. Annotation standards — Language instructions, task phase labels, and keyframe annotations (when requested) follow the agreed annotation schema.

Campaign Examples

Anonymized examples from real data collection campaigns.

University Lab

2,400 bimanual demos in 6 weeks

A CMU robotics lab needed bimanual manipulation data for a Diffusion Policy paper. Two OpenArm 101 leader-follower stations, 3 camera views each, 2 trained operators. HDF5 delivery validated against their ACT training pipeline. Paper published at a top venue 3 months later.

Robotics Startup

First policy in 4 weeks

Series A manipulation startup. Campaign tier: 100 demos of a kitting task on UR5e. LeRobot format delivery. Their ML team trained an ACT policy that achieved 87% success rate on first deployment. Total data cost: $8,000.

Enterprise

Ongoing pipeline, 72% to 94% success

Logistics company with mobile manipulators across 3 warehouses. Monthly campaigns of 500+ demos covering new SKU types and failure edge cases. Data flows into Fearless Platform for failure mining and retraining. Policy success rate improved from 72% to 94% over 6 months.

Research Benchmark

5,000-episode benchmark dataset

Academic group creating a standardized manipulation benchmark. 10 task categories, 500 demos each, 4 operators. Delivered in all three formats (HDF5, RLDS, LeRobot). Dataset published on Hugging Face Hub with 200+ downloads in first month.

Who Uses SVRC Data Services

Research Labs

University and corporate research groups who need high-quality demonstration data for policy learning papers. We handle the tedious collection work so your researchers can focus on algorithms and experiments.

Startup Policy Training

Early-stage robotics companies building their first manipulation policies. Get from zero to a working policy in weeks instead of months by outsourcing data collection to operators who already know how to produce training-grade demonstrations.

Enterprise Deployment

Companies deploying robots in production who need ongoing data collection to improve policy performance, handle edge cases, and expand to new task variants. Our campaign and enterprise tiers support continuous data pipelines. Enterprise programs.

Academic Benchmarks

Research groups creating standardized benchmark datasets for the community. We provide the collection infrastructure and operator consistency needed for reproducible, high-quality benchmark datasets that other labs can build on.

Trusted by Leading Research Institutions

SVRC works with researchers and engineering teams at top universities and robotics companies to collect the demonstration data that powers state-of-the-art manipulation policies.

Stanford UC Berkeley MIT CMU Toyota Research

Frequently Asked Questions

What teleoperation hardware do you use?

We operate leader-follower arms (ALOHA-style WidowX/ViperX and OpenArm setups), Meta Quest 3 VR systems, 6-DOF SpaceMouse interfaces, and SenseGlove Nova 2 haptic gloves. We select the interface that best matches your task requirements for precision, throughput, and data quality. For bimanual tasks, we run dual leader-follower or dual VR configurations. See our bimanual teleoperation guide for details.

What formats do you deliver?

We deliver datasets in HDF5 (ACT/ALOHA compatible), RLDS/TFRecord (for Open X-Embodiment and Octo), LeRobot Parquet (Hugging Face Hub ready), or custom formats. You specify the format in your project brief, and we handle all conversion and validation. Read our format comparison guide for details on each.

How long does a data collection campaign take?

A pilot program (20 demos) typically takes 1-2 weeks from kickoff to delivery, including task design and hardware setup. A standard campaign (100 demos) takes 2-6 weeks depending on task complexity and scene diversity requirements. Enterprise-scale projects are scoped individually. Rush delivery is available for pilots at additional cost.

Can you collect data on my robot?

Yes. We work with OpenArm, DK1, Franka FR3, UR3e, UR5e, xArm, Kinova Gen3, Unitree G1, and most ROS2-compatible robot arms. If you ship us your robot or we can procure one, we integrate it into our collection infrastructure. Custom integrations typically take 3-5 business days. We also support mobile manipulators and bimanual configurations.

What is a typical cost per episode?

Cost per episode ranges from $8-$35 depending on task complexity, number of camera views, teleoperation method, and QA requirements. Simple tabletop pick-and-place tasks are at the lower end; contact-rich bimanual tasks with dexterous hands are at the higher end. Volume discounts apply for campaigns over 500 episodes. Contact us for a detailed quote based on your specific requirements.

Do you sign NDAs?

Yes. We sign mutual NDAs before any project discussion that involves proprietary tasks, robot configurations, or research goals. All data collected under contract is owned by the client. We do not retain copies, use client data for any other purpose, or include client data in public datasets. We also support custom data governance and security requirements for enterprise clients.

Can collected data go directly into the Fearless Platform?

Yes. Enterprise data collection campaigns include Fearless Platform access. Data collected by SVRC operators flows directly into your Fearless workspace with full metadata, QA reports, and lineage information. This creates a seamless path from collection to replay, annotation, evaluation, and retraining.

What annotation types are available?

We support timestamped annotations (task phase labels at specific time points), segmented annotations (start/end boundaries for subtask phases), language instructions (natural language descriptions for VLA training), keyframe annotations (critical manipulation moments), and success/failure labels. Custom annotation schemas are supported for enterprise campaigns.

Ready to Start Your Data Collection Campaign?

Tell us about your task, robot, and timeline. We will scope a collection program and send you a detailed proposal within 48 hours.

Email us directly: contact@roboticscenter.ai

Teleop Dataset Program

Build your dataset scope

Tell us the robot setup, modalities, volume, and license intent. We will return a structured lead plus a starter schema, capability matrix, and rough pricing band.