Chapter 12: US Challengers: Figure AI and Agility Robotics (with Tesla Optimus Outlook)
12.1 Two opposite bets
If Boston Dynamics (Chapter 11) is the hybrid-MPC+RL incumbent, Figure AI and Agility Robotics are the two US challengers betting opposite directions. Figure commits to end-to-end learned policies with heavy vertical integration and consumer-facing scale-up. Agility commits to safety-first whole-body control with narrow industrial vertical deployment. Both are US-based, both have raised commercial-scale funding, both are shipping in 2026 — and they disagree on nearly every architectural question Part III catalogued. The disagreement is a gift to this chapter: it lets the comparison foreground trade-offs that single-company analyses cannot expose.
Tesla's Optimus sits uncomfortably alongside these two. Tesla has published demos but essentially no peer-reviewed engineering detail, and Gen 3 mass production announced in January 2026 is both the biggest humanoid production-scale claim in the field and the least externally verifiable. §12.7 handles Tesla carefully: report what is publicly disclosed, flag what is not, and mark the boundary at which further analysis becomes speculation. Two smaller US-aligned entrants — 1X (Norwegian/US, home-robot focus) and Sanctuary AI (Canadian, hydraulic-hand specialist) — get briefer treatment in §12.8 because their public disclosures are substantially thinner than Figure, Agility, or Tesla.
The chapter proceeds: Figure's hardware (§12.2) and software stack (§12.3), Agility's Digit lineage (§12.4) and Motor Cortex architecture (§12.5), the Figure-vs-Agility comparison (§12.6), Tesla Optimus outlook (§12.7), 1X + Sanctuary short profiles (§12.8), and Part IV intermediate open questions (§12.9).
12.2 Figure AI — the hardware pivot to Figure 03
Figure AI was founded in 2022 by Brett Adcock and has iterated through five humanoid generations in four years — Figure 01 (2023 prototype), Figure 02 (August 2024) [2], Figure 03 (October 2025). The cadence is the fastest among frontier-company humanoid hardware programs, reflecting Figure's vertically-integrated BotQ manufacturing line that combines hardware iteration with software deployment in a way reminiscent of early Tesla's vehicle pipeline.
Figure 02 (August 2024) was the stand-in platform for the Helix software launch [2]. Paired with an OpenAI language-partnership (2023–2024), Figure 02 added dialogue capability ahead of Helix's full VLA release. It was piloted at BMW's Spartanburg, South Carolina plant — the first auto-manufacturing humanoid pilot in the United States.
Figure 03 (October 2025) is the full-commercial-scale humanoid and the hardware that Helix 02 runs on [10]. Key specifications:
- Body: 30 degrees of freedom.
- Hands: 5-finger, 20-DoF per hand. Fingertip tactile sensors with 3-gram resolution.
- Vision: 6 main cameras plus 2 palm cameras for visual proprioception.
- Power: 2 kW inductive foot charging (floor-plate contact, no manual plugging).
- Communication: 10 Gbps millimeter-wave uplink to fleet compute.
- Manufacturing: BotQ vertically-integrated factory.
The 3-gram fingertip tactile resolution is the sharpest hardware differentiator. Most production humanoids of the 2024–2026 era have coarse force feedback at the wrist; 3-gram fingertip sensing puts Figure into the tactile regime that Chapter 4 §4.7 flagged as the dexterous-manipulation open frontier. The 2 kW inductive foot charging is the second signal: it is specifically a thermal-and-uptime statement about fleet deployment where cable-plug charging would be the reliability bottleneck. The 10 Gbps mmWave is a fleet-learning premise — Figure expects to move bulk data off-robot continuously, which informs its software architecture.
12.3 Helix and Helix 02 — the System 0/1/2 exemplar
Figure's software stack is the reference architecture Chapter 9 pulled from. Helix (February 2025) [8] introduced the "System 1 + System 2" naming and delivered the first VLA reported to run fully onboard a low-power embedded GPU: System 2 as an onboard internet-pretrained VLM at 7–9 Hz for scene understanding and language; System 1 as a 200 Hz reactive visuomotor policy mapping all sensors in to all joints out. The "all sensors in, all joints out" framing is Figure's public slogan for architectural commitment to no intermediate abstractions.
Helix 02 (January/February 2026) [10] added System 0: a 10-million-parameter whole-body controller at 1 kHz, underneath Helix 02's retained 7B VLM S2 and visuomotor S1. The complete stack demonstrates:
- 4-minute continuous task execution (dishwasher loading documented).
- 61 loco-manipulation primitives chained without reset.
- Approximately 4 orders of magnitude dynamic range — millimeter fingertip motion to room-scale locomotion.
The dynamic-range claim is architecturally important: it implies the single weight set handles the full System 0/1/2 stack's operating envelope, from micro-fingertip adjustment to cross-room locomotion, without task-specific fine-tuning. If the claim holds at commercial deployment scale, Figure's end-to-end thesis is substantially validated. If it holds only on demo curation, the thesis is substantially weakened. No independent third-party has reproduced the claim as of 2026Q1.
Logistics expansion is Figure's 2025 commercial push. A Helix tech blog in 2025 [8] documents Figure 02/03 fleet deployment on tote-sorting tasks, explicitly positioning against Agility's GXO deployment (§12.4). Figure's claims emphasize fleet-learning compatibility — shared weights across the fleet, OTA software updates, continuous improvement from deployed data. Quantitative tote-throughput benchmarks comparable to Agility's GXO numbers have not been published.
Training data economics: Figure's Helix was reportedly trained on approximately 500 hours of supervised teleoperation data — a small number by VLA training standards (AgiBot World is 1M+ trajectories, Open X-Embodiment is 970k episodes). Figure's implicit bet is that data consistency (same hardware, same teleoperator pool, same tasks) matters more than raw scale. Whether this scales to new task classes is an open question that commercial deployment will answer.
12.4 Agility Robotics — Cassie to Digit lineage
Agility Robotics descends from Jonathan Hurst's Oregon State University lab, which built the Cassie research biped (Chapter 4 §4.5; Chapter 6 §6.5) [1]. Cassie's design philosophy — SLIP-based spring-mass dynamics, centralized torso mass, minimized leg inertia, and leaf springs at the shin-pitch and tarsus-pitch joints — produced a platform that efficiently stores and returns energy at human-class walking speeds. Agility commercialized this lineage into Digit, adapting the research platform for warehouse logistics.
Digit's hardware is notably different from Figure 03 and Electric Atlas:
- Leg morphology: passive springs at shin and tarsus (not ankle, despite shorthand in popular writing). Actuated toe at the seventh joint per leg.
- Torso: reduced-mass design with manipulation-focused arms and grippers (not 5-finger hands).
- Operating envelope: warehouse-scale locomotion + reliable pick-place, not dexterous manipulation.
The architectural commitment is clear: Agility bets that workforce substitution in bounded-task industrial environments is the near-term commercial humanoid opportunity, and that a platform optimized for reliability at walking-plus-reaching beats a general-purpose humanoid at that economic target. The Digit-GXO deployment (§12.6 below) is the evidence the bet is paying.
12.5 Motor Cortex — safety-first whole-body control
Agility's Motor Cortex [9] is the foundation-model-class whole-body controller for Digit, announced in August 2025. Three architectural claims distinguish it from Figure's Helix:
Compact model: under 1 million parameters per Agility's disclosure. This is less than HOVER's 1.5M (Chapter 6 §6.10) and three orders of magnitude smaller than Figure Helix's VLM-class stack. Motor Cortex runs comfortably on edge hardware without the power/thermal envelope Figure 03's 7B VLM demands.
Sim-only training: trained purely in NVIDIA Isaac Sim for "decades of simulated time" over 3–4 days. No real-robot data in the training mix. Zero-shot transfer to hardware. The sim-only claim is architectural discipline — Motor Cortex bets that carefully-constructed simulation and domain randomization (Chapter 5 §5.4, Chapter 7 §7.2) covers the deployment distribution sufficiently that real-robot teleoperation is not a training requirement. Motor Cortex emphasizes manipulation-workspace trajectory coverage specifically — not just AMASS-style mocap but the force-exertion trajectories that warehouse tasks require (Chapter 6 §6.10 discussed this).
Always-on safety layer: Motor Cortex includes explicit safety-monitor integration, described as "always-on." The safety layer is the pattern Chapter 2 §2.3 named as QP-based safety filter: a classical constraint solver that arbitrates on top of the learned whole-body policy. This is what makes Digit qualify for industrial-facility deployment under ISO 10218 / TS 15066 regimes — a certification gate Figure has not yet publicly cleared at scale.
The operational evidence: 100,000+ totes moved at GXO's Flowery Branch (Georgia) facility by November 2025 [Agility Robotics, 2024-gxo; Agility Robotics, 2025]. The multi-year Robots-as-a-Service (RaaS) agreement signed in June 2024 became the first industry-verifiable multi-year humanoid commercial deployment with over 12 months of continuous operation. The 100k-tote number is a deployment metric, not a learning metric (Chapter 3 §3.7 Gap 3 noted this), but it is the strongest public evidence that RaaS humanoid economics can sustain industrial operations. Figure's 2025 logistics demos (§12.3) position against this evidence but have not published comparable production-scale numbers.
Motor Cortex additionally enables a 4-hour battery (with autonomous swap) and integration with Agility's "Arc" cloud platform for fleet management. Arc is less publicly documented than Motor Cortex itself but is the commercial orchestration layer that lets a multi-robot fleet coordinate task allocation, telemetry, and software updates.
12.6 Figure vs Agility — the architectural contrast
A single table frames the trade-off:
| Dimension | Figure (Helix 02 / Figure 03) | Agility (Motor Cortex / Digit) |
|---|---|---|
| Body DoF | 30 + 20 hand | fewer, manipulation-focused |
| Actuation | QDD-class | QDD + passive springs (leg) |
| Hands | 5-finger 20-DoF, 3-g tactile | grippers, coarser tactile |
| System 0 | 10M learned, 1 kHz | classical safety-filter QP + learned |
| System 1 | visuomotor, 200 Hz | Motor Cortex, <1M params |
| System 2 | 7B VLM onboard | external task-level, less integrated |
| Training data | 500 h teleoperation | sim-only (Isaac Sim, 3–4 days) |
| Deployment (2026Q1) | BMW Spartanburg pilot, logistics demos | GXO 100k+ totes, >12 months continuous |
| Commercial metric | qualitative demos | quantitative production numbers |
| Architectural bet | end-to-end learned, dexterous | safety-first, industrial-narrow |
Four observations:
Hardware-DoF contrast. Figure 03's 30-DoF body plus 20-DoF-per-hand totals ~70 DoF. Digit has fewer. The difference is a design choice, not a capability gap: Figure is buying envelope (what the robot can physically do) at the cost of training complexity; Agility is bounding the envelope to what industrial tasks actually need.
System 0 architectural contrast. Figure's learned 10M-param S0 is a bet that end-to-end learning delivers tighter joint-level coordination than a classical QP can. Agility's classical QP-based safety filter is a bet that provable correctness at the lowest layer is worth the expressivity sacrifice. Chapter 11 noted Boston Dynamics occupies the hybrid middle; Figure sits at the all-learned end, Agility at the safety-first-classical end.
Training-data philosophy contrast. Figure's 500 h of high-quality teleoperation is a consistency-over-scale bet. Agility's pure-sim training is a coverage-over-collection bet. Both are defensible given current evidence; they answer different questions about how to spend engineering budget.
Commercial metric contrast. Agility has 100k+ totes in production; Figure has logistics demos and a BMW pilot. The GXO deployment is the most significant commercial evidence any humanoid company has presented in 2024–2026. Figure's deployment evidence is thinner but its architectural claims are more ambitious.
Which bet wins is not yet settled. The 2026–2028 commercial data will differentiate: if Figure 03's dexterous manipulation translates to tasks Digit cannot do economically, Figure's thesis wins; if Digit's industrial-narrow deployment keeps scaling while Figure's BMW/logistics pilots remain pilots, Agility's thesis wins; if both thrive at different parts of the market, the architectural debate is resolved into coexistence rather than dominance.
12.7 Tesla Optimus — what is public, what is not
Tesla announced Optimus at its 2021 AI Day, with Gen 1 demos in 2022, Gen 2 in December 2023 [4], and Gen 3 mass-production announcement in January 2026. The public engineering disclosure is substantially thinner than Figure, Agility, or AgiBot (Chapter 13). The record requires careful separation of what is documented vs what is inferred.
Optimus Gen 2 (December 2023 / 2024 updates) [4]:
- Height approximately 180 cm, mass approximately 55 kg (10 kg lighter than Gen 1).
- Walking speed ~8 km/h (claimed ~30% improvement over Gen 1).
- 28 body actuators (14 rotary + 14 linear planetary roller-screw, per Tesla disclosures). Planetary roller-screw linear actuators are Tesla-developed, distinct from QDD-class rotational actuators.
- 11-DoF tactile hands.
- 2.3 kWh battery.
Optimus Gen 3 (2026-01-21 mass-production announcement):
- 22-DoF hands with approximately 25 actuators per side relocated into the forearm using a tendon-driven architecture (4.5× hand-actuator count increase over Gen 2).
- 5G + WiFi fleet learning claimed.
- Mass production at Fremont, California starting January 21, 2026.
What is documented. What is not documented:
- No peer-reviewed engineering paper on actuator design, control architecture, or software stack.
- No public benchmarks on task success rates, manipulation accuracy, or deployment uptime at anything approximating the AgiBot World / GXO-Digit level.
- Software stack is undisclosed. Whether Optimus runs a learned VLA, a hybrid MPC+RL stack, a teleoperation-heavy architecture, or something else is not stated in any public Tesla disclosure.
- Planetary roller-screw linear actuators are a genuinely distinct hardware choice (linear rather than rotational), but no failure-mode analysis, efficiency numbers, or backdrivability metrics are public.
- Production-scale claims (mass production from January 2026) are large but not independently verified. Observers include Reuters and Bloomberg; direct factory-floor validation is not available.
The responsible framing: Tesla's Optimus is a plausible near-term entrant at production scale based on its manufacturing capacity and linear-actuator choice. The capabilities at deployment time are inferable by analogy with other frontier humanoids (Figure, Agility, AgiBot) but not verified by Tesla's public disclosures. Chapter 15 and Chapter 16 return to Optimus as an information-asymmetric case that shapes public discourse disproportionately to its measurable technical specification.
A note on hand design: Optimus Gen 3's 22-DoF hands with approximately 25 actuators per side relocated into a tendon-driven forearm are, in absolute terms, among the more ambitious hand specifications among frontier humanoids. Chapter 15 discusses hand design as one of the 2026 differentiation frontiers. Whether Tesla's hand architecture translates to grasp-success rates that match or exceed Figure 03's 20-DoF 3-gram-tactile design is a specific question that awaits deployment data.
12.8 Shorter profiles — 1X and Sanctuary AI
1X Technologies (Norwegian-US, founded 2014 as Halodi Robotics) [5] is the home-robot-focused entrant. Neo Gamma (2024) uses a 3D-knitted exterior and tendon-driven actuation to prioritize quiet, soft human-compatible operation. 1X partners with NVIDIA as one of the GR00T N1 partner humanoid platforms alongside Fourier GR-1 and Unitree H1 in NVIDIA's public GR00T N1 announcements; GR00T N1's published real-world evaluations were carried out on the Fourier GR-1 specifically (Chapter 10 §10.2). 1X's consumer-beta program was announced in 2025; scale and commercial validation remain limited public signals. OpenAI is an investor. 1X's architectural bet is that home-environment data accumulated through consumer pilots will compound into a humanoid VLA that other companies cannot replicate from industrial-only data.
Sanctuary AI (Canadian, founded 2018) [6] specializes in hydraulic hand design — Phoenix Gen 7 (2024) reports 20-DoF hands with >70 total degrees of freedom, targeting dexterous manipulation through teleoperation-heavy pipelines. Sanctuary's "Carbon" foundation model for cognitive control claims 1000+ tasks. Public benchmark numbers are limited. Sanctuary's architectural bet is the opposite of Motor Cortex's compact-model thesis: invest heavily in hand hardware and VLM-class cognitive models, and let dexterous-manipulation data pay back the complexity cost.
Both 1X and Sanctuary are worth watching but currently lack the deployment evidence or architectural disclosure depth of Figure, Agility, and Tesla. Their role in Part IV narrative is as option value — entrants whose architectural choices could matter if either proves out in 2026–2028.
12.9 Open questions
Part IV's first two chapters (Ch11 BD, Ch12 Figure/Agility/Tesla) frame three chapter-level open questions:
First, which architectural philosophy produces the first commercially sustainable humanoid fleet at 10k+ units? Boston Dynamics's hybrid approach, Figure's end-to-end learning, Agility's safety-first narrow-industrial, and Tesla's mass-production-linear-actuator approach each make different bets on what "first" means. The 2026–2028 commercial numbers will differentiate.
Second, does hand dexterity actually differentiate commercial outcomes? Figure 03's 3-g tactile fingertips, Optimus Gen 3's 22-DoF hands, and Sanctuary's 20-DoF hydraulic hands are all hand-intensive bets. Agility's gripper-based approach is the deliberate non-hand-bet. If dexterous hand tasks command a price premium that sustains the additional complexity, hand-intensive bets win; if industrial work pays the same with simpler end-effectors, gripper simplicity wins.
Third, is Tesla Optimus the "iPhone 2007" of humanoids or a well-marketed Nokia? The analogy is fraught, but the question is substantive: Tesla's manufacturing scale is unmatched among humanoid entrants, and Tesla's deployment to its own factories at Gen 3 scale will produce the largest production-data flywheel of any humanoid company. Whether the underlying software stack competes technically with Helix/Motor Cortex/GR00T is entirely unclear.
Chapter 13 turns to China's humanoid leaders, Unitree and AgiBot, whose architectural choices and commercial positions differ sharply from the US trio Chapters 11 and 12 have analyzed.
References
- Hurst, J. W. (2019). Cassie bipedal robot and the ATRIAS lineage. Agility Robotics / Oregon State University.
- Figure AI. (2024). Figure 02 humanoid with OpenAI partnership. Figure AI announcement, August 2024.
- Agility Robotics & GXO. (2024). Digit × GXO: First commercial humanoid Robots-as-a-Service deployment. Agility/GXO press releases, June 2024 onward.
- Tesla. (2024). Optimus Gen 2 / Gen 3 engineering disclosures. Tesla press demos, December 2023 and subsequent AI Day events.
- 1X Technologies. (2024). Neo Gamma humanoid and NVIDIA GR00T integration. 1X public demos, 2024–2025.
- Sanctuary AI. (2024). Phoenix Gen 7 humanoid with hydraulic hands and Carbon AI. Sanctuary AI product release, 2024.
- Figure AI. (2025). Helix: A vision-language-action model for generalist humanoid control. Figure AI tech blog, February 2025.
- Figure AI. (2025). Helix accelerating real-world logistics. Figure AI tech blog, 2025.
- Agility Robotics. (2025). Motor Cortex: Whole-body control foundation model for Digit. Agility Robotics announcement, August 2025.
- Figure AI. (2026). Figure 03 + Helix 02: General-purpose humanoid system. Figure AI announcement, January/February 2026.