Part II: The Four Catalysts

Chapter 4: Hardware: QDD Actuators

Written: 2026-04-24 Last updated: 2026-04-24

4.1 Why hardware first

A MIT Cheetah doing a backflip is a hardware statement, not a control statement. The policy that commanded the backflip was a hand-tuned model-predictive-control trajectory — not a reinforcement-learning triumph. What the demo proved is that a small quadruped with the right actuators could execute commanded torques accurately enough, quickly enough, and honestly enough that a model-based controller could realize an aerial flip without the robot destroying itself on landing. The hardware cashed the control check.

This chapter argues that the Quasi-Direct-Drive (QDD) actuator is the substrate underneath every other catalyst of Parts II and III. Without QDD, a policy trained in simulation lands on hardware that does not execute its commanded joint torques; the PD-tracking assumption inside every modern System 0 fails, and the sim-to-real gap is structurally uncloseable. GPU simulation, teacher-student RL, and sim-to-real correction are all necessary. None of them can compensate for an actuator that refuses to be driven backwards.

The chapter proceeds through five claims. First (§4.2), it defines QDD relative to its alternatives (harmonic-drive servos, series-elastic actuators). Second (§4.3), it traces the MIT Cheetah lineage from Seok et al. 2013 to Wensing et al. 2017, the canonical references that establish the design principles. Third (§4.4), it explains the Impact Mitigation Factor (IMF) as the metric that made QDD's virtues quantifiable. Fourth (§4.5), it surveys the 2019–2026 diffusion of the QDD template through Mini Cheetah, Unitree, Berkeley Humanoid, ToddlerBot, Fourier GR-1, and Boston Dynamics's Electric Atlas, with a compare-and-contrast on the Cassie/Digit SEA-plus-springs lineage that chose a different path. Fifth (§4.6), it connects QDD to the RL substrate: what exactly the policy-hardware compatibility relationship is, and why RL on legacy humanoids was unproductive before QDD commoditized. The chapter closes with the Part II verdict (§4.7): QDD is solved and commoditized; what remains open is thermal headroom at industrial duty cycle and tactile integration at the hand.

4.2 What QDD is, and what it replaced

Three actuator families dominated legged-robot design before 2017: harmonic-drive servos, series-elastic actuators (SEAs), and direct-drive electric motors.

Harmonic-drive servos — a high-gear-ratio cycloidal or harmonic drive stacked on a high-torque-density BLDC motor — are the workhorse of industrial robotics. Gear ratios of 100:1 to 200:1 are typical. The benefits are high torque density at the output, holding torque without current, and mechanical stiffness. The costs are low backdrivability (external force must overcome the gearbox's high reflected inertia before moving the motor) and low output-side torque-sensing bandwidth (a load-side torque sensor or strain gauge is needed; sensor noise and sensor bandwidth bound the force-control loop). For a humanoid operating in open contact, these costs are fatal: impact transients the motor cannot absorb are converted into gearbox stress, and the force-control bandwidth is insufficient to implement high-fidelity whole-body control.

Series-elastic actuators (SEAs) were Gill Pratt's and Matthew Williamson's 1995 response to the force-control problem. An elastic element — typically a torsion spring — is inserted between the motor output and the load. The spring's deflection, measured by an encoder on each side, becomes a direct torque reading ^[2]. SEAs offer excellent passive compliance (the spring absorbs impact), moderate force-control bandwidth (typically 20–60 Hz, bounded by the spring's natural frequency), and reliable operation. The canonical SEA humanoid deployment is the Cassie / Digit lineage ^[7], where Jonathan Hurst's Oregon State lab and then Agility Robotics built on passive springs at the ankle together with motor-driven knees and hips. The SEA bet is to accept the force-control bandwidth ceiling and gain passive-dynamic efficiency from the spring.

Direct-drive motors — no gearbox, motor output coupled straight to the load — offer perfect backdrivability and the highest possible force-control bandwidth (limited only by motor electromagnetics, typically 300+ Hz). The cost is torque: without gear amplification, even the most torque-dense BLDC cannot produce the torques needed to support a humanoid's weight at reasonable current draw. Direct-drive is the right choice for drone propellers and desktop haptic devices; it is structurally wrong for legs.

Quasi-direct-drive (QDD) threads the three extremes. A custom, high-gap-radius, outer-rotor BLDC motor is paired with a low-ratio planetary gear (typically 5:1 to 10:1; the MIT Cheetah's ratio is 5.8:1 via a single-stage planetary ^[3]). The low gear ratio keeps backdrivability near direct-drive levels — external force reflects back through the gearbox with only ~35× reduction in effectiveness rather than harmonic-drive's ~10,000×. The high-gap-radius motor provides the peak torque that direct drive cannot. The resulting actuator is backdrivable, has force-control bandwidth above 100 Hz, does not need a load-side torque sensor (motor current, measured on the stator side, is a clean torque proxy when the gear ratio is low), and achieves torque density roughly 2.8× typical industrial servos ^[3].

The simplest way to understand QDD is as the actuator family that preserves direct-drive's honesty — joint torque commands faithfully realized in joint motion — while buying back enough torque to support a legged robot. That honesty is what every downstream catalyst of Parts II–III depends on.

4.3 The MIT Cheetah lineage

The QDD design principle traces to Sangbae Kim's MIT Biomimetics Robotics Lab. Seok et al.'s 2013 ICRA paper ^[1] is the origin: it documents a custom outer-rotor BLDC paired with a coaxial planetary gear, analyzes the low-impedance actuation as a cost-of-transport (COT) design target, and reports a COT of 0.5 on the first-generation MIT Cheetah — close to the biological cheetah and roughly an order of magnitude better than contemporary legged robots of similar scale. The 2013 paper establishes the thesis; Wensing et al.'s 2017 paper in IEEE Transactions on Robotics ^[3] formalizes it.

Wensing 2017 is the canonical reference for QDD design. Its three formal contributions: (1) a design framework for proprioceptive actuators that relates gap radius, gear ratio, and torque-bandwidth product; (2) a demonstration that motor current, at low gear ratios, is a torque estimator faithful enough to replace a load-side torque sensor; and (3) the introduction of the Impact Mitigation Factor (IMF) as a normalized metric for how well an actuator absorbs collision transients. The validation on MIT Cheetah reports contact durations around 85 ms, peak foot forces above 450 N, and torque density approximately 2.8× typical industrial servos ^[3]. The paper's design principles have since been reproduced across more than a dozen commercial and academic platforms.

The lineage from Cheetah 1 (2013) through Cheetah 3, Cheetah 2, and ultimately Mini Cheetah (2019) is worth tracing because it maps onto a scaling-down exercise. Cheetah 1 was approximately 30 kg, targeted outdoor efficiency, and used the first-generation QDD actuators. Cheetah 3 pushed the envelope with improved MPC (Di Carlo et al. 2018) but retained the basic actuator template. Mini Cheetah ^[6] was the pedagogically decisive platform: 9 kg, 12 modular QDD actuators at approximately 17 Nm peak torque, 250 Hz control bandwidth, MPC at 30 Hz with 1 kHz joint-level PD tracking, speeds up to 2.5 m/s, and — famously — the first aerial backflip at this size class. The modular, low-cost template Mini Cheetah established is the template that Unitree's A1, Go1, Go2, and eventually G1 mirrored; it is also the template that every subsequent QDD humanoid platform inherited.

4.4 IMF and the vocabulary of backdrivability

Impact Mitigation Factor (IMF) is the metric that made QDD's design advantages quantifiable. Roughly, IMF is the ratio of the force the actuator feels during an impact to the force the environment sees — higher IMF means the actuator absorbs more of the collision, lower IMF means the collision is transmitted back into the gearbox and motor ^[5]. The MIT Cheetah's QDD design achieves IMF approximately 0.75; high-geared industrial servos are typically below 0.3 ^[5]. The difference is not cosmetic. An IMF of 0.75 means the gearbox sees roughly a quarter of the impact force; an IMF of 0.3 means the gearbox sees the full brunt minus a modest reflected-inertia absorption.

Related metrics include Cost of Bandwidth per Ampere (CBA) — the force-control bandwidth achieved per unit electrical power drawn — and Cost of Transport (COT) — mechanical energy per mass per distance. QDD's design space pushes both: the low gear ratio keeps CBA high (force-control bandwidth is not bottlenecked by high reflected inertia); the low impedance and high efficiency keep COT low (less energy is wasted overcoming the actuator).

The vocabulary matters because it provides the cross-platform comparison basis that the 2019–2026 QDD diffusion rests on. When a new platform — Unitree G1, Berkeley Humanoid, ToddlerBot — publishes IMF, peak torque, and torque-bandwidth numbers, it is publishing numbers that can be compared directly with MIT Cheetah's reference values. The shared vocabulary is what made the 2019–2026 period a convergence rather than a free-for-all of competing actuator designs.

Figure 4.1: QDD actuator design space — torque-control bandwidth versus gear ratio (log scale), with three labeled category regions and three representative data points: MIT Cheetah / QDD family <sup><a class= [3] at 5.8:1 and > 100 Hz (IMF ≈ 0.75); pre-QDD SEA (Cassie/Digit leaf-spring, Pratt–Williamson lineage) at ≈ 30:1 and ≈ 40 Hz; industrial harmonic drive at ≈ 160:1 and ≈ 15 Hz (IMF < 0.3). Illustration by author (Gemini-assisted reconstruction), adapted from the design-framework discussion in ^[3] and the numerical specs in §4.3–4.4." loading="lazy" style="max-height:160px;width:auto;border-radius:8px;cursor:zoom-in">

4.5 The 2019–2026 diffusion

After Mini Cheetah 2019, the QDD template diffused rapidly through both commercial and academic legged-robot platforms. Four case studies illustrate the convergence.

Unitree. Unitree Robotics commercialized the Mini Cheetah lineage for the quadruped market (Go1, Go2) and extended it to bipedal humanoids with H1 (2023) and G1 (2024) ^[9]. The G1 is the decisive platform of this era. At 127 cm tall, approximately 35 kg, with 23–43 DoF (base versus Education variants), 2 m/s walking speed, knee torque 90 Nm (120 Nm in the EDU variant), and a 2 h battery, the G1 ships at US\$16,000 — roughly 5–10× cheaper than comparable humanoids from other vendors. This price is possible because Unitree controls its quadruped-derived QDD manufacturing, runs it at scale, and refuses to vertically integrate the upper-of-stack where the margin collapses. Unitree's companion unitree_rl_gym repository ^[9] codifies a Train → Play → Sim2Sim → Sim2Real workflow that became the de facto reference for RL on G1, H1, H1-2, and Go2. Within twelve months of G1's release, more than one hundred academic papers cited G1 as their hardware platform — including the ASAP sim-to-real work (Chapter 7) and much of the humanoid whole-body RL research discussed in Chapter 6.

Berkeley Humanoid. Liao et al.'s 2024 Berkeley Humanoid ^[12] is the open-hardware academic counterpart to Unitree G1. A 0.85 m / 16 kg platform with 6 DoF per leg and custom QDD modules (hip 62.6 Nm, knee 81.1 Nm) in the MIT Cheetah lineage, it demonstrates zero-shot sim-to-real walking and outdoor traversal over hundreds of meters. The paper reports reaching 1 m/s from rest within 1 s. Berkeley Humanoid is a research platform rather than a commercial product; cost is not documented in the paper, and its role is to provide full hardware control to labs that cannot rely on Unitree firmware.

ToddlerBot. Shi et al.'s 2025 ToddlerBot ^[14] pushes the low-cost envelope further: a 3D-printable humanoid platform with a bill of materials under US\$6,000. The paper demonstrates full-body RL walking plus pick-and-place, zero-shot from simulation, and a two-robot collaborative toy-cleanup scenario. ToddlerBot's architectural contribution is less the actuator (its QDD modules are conservative commodity choices) and more the infrastructure — plug-and-play zero-point calibration, transferable motor system identification, and a high-fidelity digital twin that allows sim-to-real without per-unit tuning.

Fourier GR-1. Fourier Intelligence's GR-1 (2023) and GR-2 (2024) ^[13] are a Chinese academic-research platform at the 1.65 m full-size scale, with 40+ DoF and (in GR-2) bimanual dexterous hands. GR-1 is notable for its adoption as one of three reference embodiments for NVIDIA's GR00T N1 training (Chapter 10): GR-1, Unitree H1, and 1X Neo together form the training set for cross-embodiment VLA work.

Electric Atlas. Boston Dynamics's Electric Atlas ^[11], unveiled the day after the hydraulic Atlas retirement on April 16, 2024, is the frontier-scale instantiation of QDD-style design at a fundamentally different scale. Electric Atlas reports 56 DoF — hyperarticulation beyond human limits at hips, waist, and neck — 1.5 m height, 89 kg mass, 2.3 m reach, 50 kg payload, a 4 h battery with autonomous swap, 85–90% electrical-to-mechanical efficiency, and foot-placement precision under 10 cm. The actuators are custom; Hyundai Mobis is Boston Dynamics's supplier. Electric Atlas is not a QDD platform in the narrow Mini Cheetah sense — its actuators are more industrial-grade, with higher gear ratios than the classic QDD definition — but it inherits the design philosophy: high-bandwidth proprioceptive torque control as the substrate for learned policies. Chapter 11 analyzes Electric Atlas's strategic position in detail.

The SEA counter-path: Cassie / Digit. Agility Robotics's Digit — the commercial descendant of Oregon State's ATRIAS and Cassie research bipeds ^[7] — chose a different actuator architecture. Cassie and Digit use a combination of motor-driven hips and knees with passive springs at the ankle. The intellectual parent is spring-loaded inverted pendulum (SLIP) theory: minimize leg inertia, centralize mass in the torso, and let the ankle's passive spring do the work of impact absorption and energy storage. The Cassie/Digit bet is that passive dynamics, correctly exploited, are cheaper and more efficient than trying to make QDD actuators absorb impact actively. The bet has shipped — Digit's GXO deployment is the largest commercial humanoid fleet of 2025–2026 (Chapter 12) — but it also has limits: the passive springs constrain the robot's dynamic range (you cannot arbitrarily command ankle torques that the spring cannot produce), and the embodiment is specialized for locomotion in ways that make loco-manipulation integration more architecturally constrained.

The Korean variant: AMBIDEX. A less-covered counter-path deserves explicit mention. NAVER LABS's AMBIDEX ^[4] is a cable-driven dual-arm manipulator developed in collaboration with Korea Tech. The cable-drive architecture relocates motors from the joints to the torso, dramatically reducing arm inertia, and uses cables for force transmission. AMBIDEX is a compliance-first design philosophy — human-safe force transfer and low inertia are prioritized over backdrivability-via-low-gear. It is neither QDD nor SEA; it is a third architectural path that Chapter 14 returns to in detail. Its inclusion here is a reminder that QDD's dominance in 2019–2026 is an empirical outcome, not a theoretical necessity, and that alternative actuator philosophies remain viable in specific niches.

4.6 Why QDD is the RL substrate

The claim that QDD is the substrate for learned-policy humanoids deserves specific articulation. The mechanism is the following. A policy trained in simulation to emit desired joint positions (or joint torques) makes a specific implicit assumption: the real hardware will track those commands with a fidelity close to the simulator's idealization. Every modern System 0 — whether a classical PD controller or a learned 10M-parameter neural network — is a function from desired joint state to realized joint state, and the gap between "desired" and "realized" is where sim-to-real pays out or doesn't.

Three QDD properties close this gap.

Backdrivability means that when the environment applies a force the policy did not predict, the motor moves physically rather than destroying itself or producing a large unmodeled torque spike. Learned policies, by construction, have seen only a finite distribution of perturbations during training; deployment will include perturbations outside that distribution. A backdrivable actuator degrades gracefully — it gives, lets the external force dissipate, and lets the policy's state estimator catch up. A high-geared actuator cannot give, and the mismatch between commanded and realized torque grows until either the policy fails or the hardware breaks.

High force-control bandwidth means that the commanded torque changes propagate to the joint in milliseconds rather than tens of milliseconds. Modern RL policies run at 100–200 Hz (Chapter 9's System 1) and emit desired joint positions at that rate; a force-control loop with 30 Hz bandwidth cannot track those commands faithfully, and the policy will observe a lag between its commands and the joint's response — which the history encoder either must learn to compensate for or which will corrupt the policy's adaptation.

Proprioceptive force sensing via motor current means the force-control loop is closed without load-side sensors. Sensors add mass, cost, failure modes, and latency. A 200 Hz policy does not want a 50 Hz torque-sensor latency inside its inner loop; it wants the current signal, which is available at motor-driver rates (often >10 kHz) with essentially no latency.

The conjunction of these three properties is what makes the sim-to-real gap (Chapter 7) closable for locomotion. A learned policy trained against a randomized model of a QDD actuator deployed on actual QDD hardware sees a residual dynamics gap of order a few percent — a gap that domain randomization and (for the last few percent) ASAP-style delta-action correction can close. The same policy deployed on a harmonic-drive servo sees a dynamics gap of order tens of percent, and no amount of DR covers the tail.

Hwangbo et al.'s 2019 Science Robotics result ^[8] is the first rigorous demonstration of this closure at deployable scale: a QDD quadruped (ANYmal) trained with an actuator network that captures the QDD's real dynamics, achieving 1.6 m/s on rough terrain. Chapter 6 develops this. For the purposes of Chapter 4, the point is that Hwangbo 2019's success is not just an RL-algorithm achievement; it is an RL-algorithm achievement running on the right substrate. Change the substrate, and the algorithm fails.

4.7 Verdict and open questions

Catalyst 1 verdict: solved (commodity). The MIT Cheetah design principles ^[3] — outer-rotor BLDC, low-ratio planetary, motor-current torque estimation, Impact Mitigation Factor as metric — are reproduced across Unitree G1 (US\$16,000 launch MSRP), Berkeley Humanoid (open-hardware research platform), academic ToddlerBot (US\$6,000 BOM), Fourier GR-1 / GR-2, and at frontier scale in Boston Dynamics's Electric Atlas. The hardware primitive is commoditized. Further innovation happens at the scale-up (Atlas) and scale-down (ToddlerBot) extremes and in specialized integrations (Figure 03's 3-gram fingertip tactile sensor). No open architectural question remains for legged locomotion.

What remains open are three specific frontiers that Chapters 15 and 16 will revisit.

Thermal and current calibration at industrial duty cycle. Research platforms operate for minutes to hours in controlled environments; factory deployment demands sustained duty cycles over shifts, with temperature swings and motor-winding thermal limits that current QDD designs have not been empirically validated against. Figure's wireless inductive foot charging (Chapter 12) is a thermal-management statement disguised as a charging statement.

Tactile integration at the hand. QDD's success at leg scale has not yet transferred to dexterous hand scale. Figure 03's 3-gram fingertip load cell and AgiBot G2's 7-DOF torque-sensing arms (Chapter 13) are the frontier of this integration; neither is commoditized, and the hand-actuator design space in 2026 is roughly where the leg-actuator design space was in 2013–2015 — exploratory, multi-path, yet to converge. Chapter 15's discussion of manipulation data as the first differentiation axis returns to this gap.

Custom integration at industrial scale. Unitree, Berkeley Humanoid, and ToddlerBot represent academically reproducible QDD. Frontier-company hardware — Figure, Agility, Boston Dynamics — involves custom supply chains (Hyundai Mobis for Atlas, in-house for Figure's BotQ) that are not academically reproducible. Chapter 14's discussion of Korean ecosystem positioning notes that this vertical-integration gap is exactly the seam where Korean manufacturing strength could produce a QDD-manufacturing advantage.

4.8 Bridge to Chapter 5

QDD is the substrate. The question Chapter 5 takes up is: given the honest substrate, how do we train policies fast enough to exploit it? The answer is GPU massively parallel simulation, whose 2021 inflection point — Isaac Gym ^[15] and Rudin et al.'s "learning to walk in minutes" ^[16] — converted per-experiment training time from days to minutes and made the domain-randomization programs that QDD hardware deserves actually tractable.

References

Seok, S., Wang, A., & Otten, D. (2013). Design principles for highly efficient quadrupeds and implementation on the MIT Cheetah robot. Proc. IEEE ICRA. doi:10.1109/ICRA.2013.6631038.
Paine, N., Oh, S., & Sentis, L. (2014). Design and control considerations for high-performance series elastic actuators. IEEE/ASME Transactions on Mechatronics. doi:10.1109/TMECH.2013.2264338.
Wensing, P. M., Wang, A., Seok, S., Otten, D., Lang, J., & Kim, S. (2017). Proprioceptive actuator design in the MIT Cheetah: Impact mitigation and high-bandwidth physical interaction for dynamic legged robots. IEEE Transactions on Robotics. doi:10.1109/TRO.2016.2640183.
Kim, H.-S., Kim, Y.-J., & NAVER LABS. (2017). AMBIDEX: Cable-driven dual-arm manipulator from NAVER LABS. NAVER LABS product / Korea Tech research.
Kim Lab, MIT Biomimetics Robotics. (2017). Impact Mitigation Factor (IMF) and MIT Cost-of-Bandwidth-per-Ampere (CBA) for legged design.
Katz, B., Di Carlo, J., & Kim, S. (2019). Mini Cheetah: A platform for pushing the limits of dynamic quadruped control. Proc. IEEE ICRA. doi:10.1109/ICRA.2019.8793865.
Hurst, J. W. (2019). Cassie bipedal robot and the ATRIAS lineage. Agility Robotics / Oregon State.
Hwangbo, J., Lee, J., Dosovitskiy, A., Bellicoso, D., Tsounis, V., Koltun, V., & Hutter, M. (2019). Learning agile and dynamic motor skills for legged robots. Science Robotics. (Discussed in detail in Chapter 6.)
Unitree Robotics. (2024). Unitree G1 humanoid platform and `unitree_rl_gym`. Unitree product release.
Unitree Robotics. (2023–2024). Unitree H1 humanoid platform specifications.
Boston Dynamics. (2024). Electric Atlas: 56-DoF all-electric humanoid. Boston Dynamics announcement, April 2024.
Liao, Q., Zhang, B., & Huang, X. (2024). Berkeley Humanoid: A research platform for learning-based control. arXiv preprint 2407.21781.
Fourier Intelligence. (2024). Fourier GR-1 / GR-2 humanoid platform.
Shi, H., Wang, W., & Song, S. (2025). ToddlerBot: Open-source ML-compatible humanoid platform for loco-manipulation. arXiv preprint 2502.00893.
Makoviychuk, V., et al. (2021). Isaac Gym: High performance GPU-based physics simulation for robot learning. NeurIPS Datasets and Benchmarks. arXiv:2108.10470.
Rudin, N., Hoeller, D., Reist, P., & Hutter, M. (2021). Learning to walk in minutes using massively parallel deep reinforcement learning. Proc. CoRL. arXiv:2109.11978.