Glossary

Key terms in A-Z order. `(Ch N)` marks the chapter where the term is introduced or discussed in depth.

Definitions are kept in sync with the monorepo master at `glossary/master_en.md`.

> **Adding a new term**: grep `glossary/master_en.md` first — if present, copy the definition verbatim

> and append `(Ch N)`; if absent, add to master first, then copy here. See `glossary/README.md` for details.

ADR (Automatic Domain Randomization) — An adaptive extension of domain randomization in which the distribution of simulation parameters is automatically widened as the policy succeeds, and narrowed as it fails. Introduced by OpenAI on the Rubik's-cube Shadow Hand (2019). (Ch 7)
AMASS — A large archive of motion-capture sequences retargeted to a canonical human skeleton, widely used as a reference motion library for humanoid imitation and motion-prior training. (Ch 6, Ch 8)
ASAP (Aligning Simulation And real-world Physics) — He et al. 2025 residual delta-action method that reduces sim-to-real tracking error by ~53% on agile whole-body motions via 20 minutes of real-robot data. (Ch 7)
BFM (Behavior Foundation Model) — Yuan et al. 2025 framing of pretrained whole-body humanoid controllers as the robotics analog of language-model foundations. Complementary to the System 0/1/2 architecture. (Ch 9)
Capture point (CP) — The ground-plane location where, if the swing foot is placed, the inverted-pendulum CoM will come to rest. A simple yet robust reactive-stepping target; generalizes to N-step capture regions. (Ch 1, Ch 2)
Centroidal dynamics — The reduction of full-body dynamics to the motion of the center of mass and centroidal momentum; the bridge between task-space goals and whole-body torque distribution. (Ch 1, Ch 2)
DDP (Differential Dynamic Programming) — A Newton-style trajectory-optimization solver widely used inside whole-body MPC (e.g., Koenemann 2015 HRP-2). (Ch 2)
DeepMimic — Peng et al. 2018 physics-based character-animation method that tracks reference motions via RL, an ancestor of humanoid motion-imitation pipelines. (Ch 6, Ch 8)
Delta-action (residual action) — A sim-to-real correction technique where a small learned or optimized correction is added on top of a sim-trained base policy; central to ASAP. (Ch 7)
Diffusion policy — Chi et al. 2023 formulation of visuomotor policies as denoising-diffusion models, now a standard action head for manipulation. (Ch 8, Ch 10)
Domain randomization (DR) — Training a policy across randomized simulation parameters so that reality falls inside the training distribution; the dominant sim-to-real strategy. (Ch 7)
Embodiment — The specific physical configuration (kinematics, actuation, sensors) of a robot; cross-embodiment transfer is the open frontier of whole-body VLAs. (Ch 10)
ExBody — "Expressive whole-body control" line of work on human-motion imitation for humanoids; part of the 2024–2025 canon. (Ch 6, Ch 10)
FALCON — Force-adaptive humanoid loco-manipulation policy (Zhang et al. 2025). (Ch 6, Ch 10)
FastTD3 — Seo et al. 2025 fast TD3 variant tuned for humanoid control; part of the 2025 algorithmic frontier. (Ch 6, Ch 8)
Flow matching — Lipman et al. 2022 continuous-time generative-modeling framework; used as the action head in π0 VLAs. (Ch 8, Ch 10)
H2O / OmniH2O — Human-to-humanoid whole-body teleoperation and imitation learning systems (He et al. 2024). (Ch 6, Ch 10)
History encoder — The component of a humanoid RL policy that summarizes a window of past observations and actions; TCN → LSTM → Transformer marks the 2020–2024 evolution. (Ch 6)
HOVER — He et al. 2024 versatile neural whole-body controller; a ~1.5M-parameter System 1 with 15+ control modes. (Ch 6, Ch 9)
HumanPlus — Fu et al. 2024 humanoid-shadowing-and-imitation pipeline from human video. (Ch 6, Ch 10)
Humanoid-Gym — Gu et al. 2024 open-source RL framework for humanoids; popularized the Isaac Gym → MuJoCo → hardware sim-to-sim pattern. (Ch 5, Ch 7)
Hybrid zero dynamics (HZD) — A classical underactuated-biped control framework that reduces full dynamics to low-dimensional virtual-constraint dynamics; its second life is as reward structure for RL. (Ch 2)
IMF (Impact Mitigation Factor) — Wensing et al. 2017 metric quantifying how well a legged actuator absorbs impulsive contact forces; the headline spec for QDD. (Ch 4)
Isaac Gym — Makoviychuk et al. 2021 GPU-native physics simulator that co-locates PhysX rigid-body simulation and PyTorch neural-network inference on a single GPU; the 2021 inflection point. (Ch 5)
Isaac Lab / Orbit — NVIDIA's production-grade RL platform on top of Isaac Sim, successor to Orbit; default for frontier-company pipelines. (Ch 5)
LAFAN1 — Harvey et al. 2020 motion-capture dataset with robust motion-in-betweening labels, used alongside AMASS for motion priors. (Ch 6)
Loco-manipulation — The joint problem of walking and manipulating simultaneously; the 2024–2026 integration frontier. (Ch 6, Ch 10)
LIPM (Linear Inverted Pendulum Model) — Kajita's classical low-dimensional reduced model for bipedal walking; still used as a reward-shaping template and a safety-region reference. (Ch 1, Ch 2)
MPC (Model Predictive Control) — Receding-horizon optimization that repeatedly resolves a short-horizon control problem; whole-body MPC is Boston Dynamics's hybrid-stack backbone. (Ch 2, Ch 11)
MuJoCo — DeepMind's rigid-body simulator; its JAX-native MJX port and the MuJoCo Playground framework are the JAX-first alternative to Isaac Lab. (Ch 5)
OmniRetarget — Yang et al. 2025 interaction-preserving retargeting pipeline that generates feasible humanoid trajectories from human motion when object meshes are available. (Ch 6, Ch 7)
PHC / PULSE — Physics-based humanoid controller and universal-latent-space variants (Luo et al. 2024); motion-prior scaffolds for RL policies. (Ch 6, Ch 8)
PhysX — NVIDIA's rigid-body physics engine underlying Isaac Gym / Isaac Sim / Isaac Lab. (Ch 5)
PPO (Proximal Policy Optimization) — Schulman et al. 2017 on-policy actor-critic algorithm; the community default for humanoid locomotion RL. (Ch 8)
Privileged information / privileged teacher — A teacher policy trained with access to oracle state (true dynamics, disturbances) that a student policy later distills; the core of teacher-student sim-to-real RL. (Ch 6, Ch 7)
QDD (Quasi-Direct Drive) — Low-gear-ratio outer-rotor motor with motor-current torque sensing; the MIT-Cheetah-lineage actuator primitive that the 2019–2026 humanoid boom is built on. (Ch 4)
QP (Quadratic Program) — Convex optimization primitive used for whole-body torque distribution under friction and joint-limit constraints; the System 0 of classical stacks and still a safety filter in modern ones. (Ch 1, Ch 2)
Retargeting — Mapping motion-capture trajectories from a source skeleton to a target humanoid's kinematics; a precondition for motion-imitation pipelines. (Ch 6)
SAC (Soft Actor-Critic) — Haarnoja et al. 2018 off-policy maximum-entropy RL algorithm; common alternative to PPO for sample-efficient humanoid RL. (Ch 8)
Sim-to-real — The end-to-end problem of transferring a policy trained in simulation to physical hardware; solved for locomotion, partially solved for dexterous manipulation. (Ch 7)
Sim-to-sim — A cheap bug filter where a policy is validated across two simulators (e.g., Isaac Gym → MuJoCo) before deployment; policies that fail sim-to-sim almost always fail on hardware. (Ch 5, Ch 7)
SLIP (Spring-Loaded Inverted Pendulum) — Energy-conserving reduced model underlying Cassie/ATRIAS leg design; related to LIPM but with elastic energy storage. (Ch 4)
System 0 / System 1 / System 2 — The three-layer humanoid architecture where System 0 (~1 kHz whole-body control) realizes commands from System 1 (~100–200 Hz visuomotor policy), which in turn is conditioned by System 2 (~7–10 Hz VLM). Crystallized by Figure Helix 02. (Ch 9)
TCN (Temporal Convolutional Network) — A causal-convolution history encoder used in early teacher-student policies; later superseded by LSTM and Transformer. (Ch 6)
TD3 (Twin Delayed Deep Deterministic Policy Gradient) — Fujimoto et al. 2018 off-policy RL algorithm; base of FastTD3. (Ch 8)
Teacher-student RL — Two-stage training regime: a privileged teacher with oracle information is distilled into a student that uses only deployable observations. Introduced for legged robots by Lee et al. 2020; extended by Kumar 2021 RMA, Siekmann 2021 Cassie, Radosavovic 2024. (Ch 6)
Transformer (causal) — The history-encoder architecture adopted by Radosavovic 2024 for humanoid locomotion; enables long-context implicit adaptation. (Ch 6, Ch 8)
TWIST — Ze et al. 2025 teleoperated whole-body imitation system for humanoids. (Ch 6, Ch 10)
VLA (Vision-Language-Action) — A class of models that condition action generation on both visual observations and language instructions (RT-2, OpenVLA, π0, GR00T N1, Helix). (Ch 8, Ch 10)
VLM (Vision-Language Model) — A large pretrained model jointly reasoning over images and text; System 2 of the three-layer architecture. (Ch 8, Ch 9, Ch 10)
ZMP (Zero Moment Point) — The contact-surface point where net ground-reaction moments are zero; the classical criterion of bipedal balance. (Ch 1, Ch 2)