Glossary

Key terms in A-Z order. `(Ch N)` marks the chapter where the term is introduced or discussed in depth.

ADR (Automatic Domain Randomization) — An adaptive extension of domain randomization in which the distribution of simulation parameters is automatically widened as the policy succeeds, and narrowed as it fails. Introduced by OpenAI on the Rubik's-cube Shadow Hand (2019). (Ch 7)
AMASS — A large archive of motion-capture sequences retargeted to a canonical human skeleton, widely used as a reference motion library for humanoid imitation and motion-prior training. (Ch 6, Ch 8)
ASAP (Aligning Simulation And real-world Physics) — He et al. 2025 residual delta-action method that reduces sim-to-real tracking error by ~53% on agile whole-body motions via 20 minutes of real-robot data. (Ch 7)
BFM (Behavior Foundation Model) — Yuan et al. 2025 framing of pretrained whole-body humanoid controllers as the robotics analog of language-model foundations. Complementary to the System 0/1/2 architecture. (Ch 9)
Capture point (CP) — The ground-plane location where, if the swing foot is placed, the inverted-pendulum CoM will come to rest. A simple yet robust reactive-stepping target; generalizes to N-step capture regions. (Ch 1, Ch 2)
Centroidal dynamics — The reduction of full-body dynamics to the motion of the center of mass and centroidal momentum; the bridge between task-space goals and whole-body torque distribution. (Ch 1, Ch 2)
DDP (Differential Dynamic Programming) — A Newton-style trajectory-optimization solver widely used inside whole-body MPC (e.g., Koenemann 2015 HRP-2). (Ch 2)
DeepMimic — Peng et al. 2018 physics-based character-animation method that tracks reference motions via RL, an ancestor of humanoid motion-imitation pipelines. (Ch 6, Ch 8)
Delta-action (residual action) — A sim-to-real correction technique where a small learned or optimized correction is added on top of a sim-trained base policy; central to ASAP. (Ch 7)
Diffusion policy — Chi et al. 2023 formulation of visuomotor policies as denoising-diffusion models, now a standard action head for manipulation. (Ch 8, Ch 10)
Domain randomization (DR) — Training a policy across randomized simulation parameters so that reality falls inside the training distribution; the dominant sim-to-real strategy. (Ch 7)
Embodiment — The specific physical configuration (kinematics, actuation, sensors) of a robot; cross-embodiment transfer is the open frontier of whole-body VLAs. (Ch 10)
ExBody — "Expressive whole-body control" line of work on human-motion imitation for humanoids; part of the 2024–2025 canon. (Ch 6, Ch 10)
FALCON — Force-adaptive humanoid loco-manipulation policy (Zhang et al. 2025). (Ch 6, Ch 10)
FastTD3 — Seo et al. 2025 fast TD3 variant tuned for humanoid control; part of the 2025 algorithmic frontier. (Ch 6, Ch 8)
Flow matching — Lipman et al. 2022 continuous-time generative-modeling framework; used as the action head in π0 VLAs. (Ch 8, Ch 10)
H2O / OmniH2O — Human-to-humanoid whole-body teleoperation and imitation learning systems (He et al. 2024). (Ch 6, Ch 10)
History encoder — The component of a humanoid RL policy that summarizes a window of past observations and actions; TCN → LSTM → Transformer marks the 2020–2024 evolution. (Ch 6)
HOVER — He et al. 2024 versatile neural whole-body controller; a ~1.5M-parameter System 1 with 15+ control modes. (Ch 6, Ch 9)
HumanPlus — Fu et al. 2024 humanoid-shadowing-and-imitation pipeline from human video. (Ch 6, Ch 10)
Humanoid-Gym — Gu et al. 2024 open-source RL framework for humanoids; popularized the Isaac Gym → MuJoCo → hardware sim-to-sim pattern. (Ch 5, Ch 7)
Hybrid zero dynamics (HZD) — A classical underactuated-biped control framework that reduces full dynamics to low-dimensional virtual-constraint dynamics; its second life is as reward structure for RL. (Ch 2)
IMF (Impact Mitigation Factor) — Wensing et al. 2017 metric quantifying how well a legged actuator absorbs impulsive contact forces; the headline spec for QDD. (Ch 4)
Isaac Gym — Makoviychuk et al. 2021 GPU-native physics simulator that co-locates PhysX rigid-body simulation and PyTorch neural-network inference on a single GPU; the 2021 inflection point. (Ch 5)
Isaac Lab / Orbit — NVIDIA's production-grade RL platform on top of Isaac Sim, successor to Orbit; default for frontier-company pipelines. (Ch 5)
LAFAN1 — Harvey et al. 2020 motion-capture dataset with robust motion-in-betweening labels, used alongside AMASS for motion priors. (Ch 6)
Loco-manipulation — The joint problem of walking and manipulating simultaneously; the 2024–2026 integration frontier. (Ch 6, Ch 10)
LIPM (Linear Inverted Pendulum Model) — Kajita's classical low-dimensional reduced model for bipedal walking; still used as a reward-shaping template and a safety-region reference. (Ch 1, Ch 2)
MPC (Model Predictive Control) — Receding-horizon optimization that repeatedly resolves a short-horizon control problem; whole-body MPC is Boston Dynamics's hybrid-stack backbone. (Ch 2, Ch 11)
MuJoCo — DeepMind's rigid-body simulator; its JAX-native MJX port and the MuJoCo Playground framework are the JAX-first alternative to Isaac Lab. (Ch 5)
OmniRetarget — Yang et al. 2025 interaction-preserving retargeting pipeline that generates feasible humanoid trajectories from human motion when object meshes are available. (Ch 6, Ch 7)
PHC / PULSE — Physics-based humanoid controller and universal-latent-space variants (Luo et al. 2024); motion-prior scaffolds for RL policies. (Ch 6, Ch 8)
PhysX — NVIDIA's rigid-body physics engine underlying Isaac Gym / Isaac Sim / Isaac Lab. (Ch 5)
PPO (Proximal Policy Optimization) — Schulman et al. 2017 on-policy actor-critic algorithm; the community default for humanoid locomotion RL. (Ch 8)
Privileged information / privileged teacher — A teacher policy trained with access to oracle state (true dynamics, disturbances) that a student policy later distills; the core of teacher-student sim-to-real RL. (Ch 6, Ch 7)
QDD (Quasi-Direct Drive) — Low-gear-ratio outer-rotor motor with motor-current torque sensing; the MIT-Cheetah-lineage actuator primitive that the 2019–2026 humanoid boom is built on. (Ch 4)
QP (Quadratic Program) — Convex optimization primitive used for whole-body torque distribution under friction and joint-limit constraints; the System 0 of classical stacks and still a safety filter in modern ones. (Ch 1, Ch 2)
Retargeting — Mapping motion-capture trajectories from a source skeleton to a target humanoid's kinematics; a precondition for motion-imitation pipelines. (Ch 6)
SAC (Soft Actor-Critic) — Haarnoja et al. 2018 off-policy maximum-entropy RL algorithm; common alternative to PPO for sample-efficient humanoid RL. (Ch 8)
Sim-to-real — The end-to-end problem of transferring a policy trained in simulation to physical hardware; solved for locomotion, partially solved for dexterous manipulation. (Ch 7)
Sim-to-sim — A cheap bug filter where a policy is validated across two simulators (e.g., Isaac Gym → MuJoCo) before deployment; policies that fail sim-to-sim almost always fail on hardware. (Ch 5, Ch 7)
SLIP (Spring-Loaded Inverted Pendulum) — Energy-conserving reduced model underlying Cassie/ATRIAS leg design; related to LIPM but with elastic energy storage. (Ch 4)
System 0 / System 1 / System 2 — The three-layer humanoid architecture where System 0 (~1 kHz whole-body control) realizes commands from System 1 (~100–200 Hz visuomotor policy), which in turn is conditioned by System 2 (~7–10 Hz VLM). Crystallized by Figure Helix 02. (Ch 9)
TCN (Temporal Convolutional Network) — A causal-convolution history encoder used in early teacher-student policies; later superseded by LSTM and Transformer. (Ch 6)
TD3 (Twin Delayed Deep Deterministic Policy Gradient) — Fujimoto et al. 2018 off-policy RL algorithm; base of FastTD3. (Ch 8)
Teacher-student RL — Two-stage training regime: a privileged teacher with oracle information is distilled into a student that uses only deployable observations. Introduced for legged robots by Lee et al. 2020; extended by Kumar 2021 RMA, Siekmann 2021 Cassie, Radosavovic 2024. (Ch 6)
Transformer (causal) — The history-encoder architecture adopted by Radosavovic 2024 for humanoid locomotion; enables long-context implicit adaptation. (Ch 6, Ch 8)
TWIST — Ze et al. 2025 teleoperated whole-body imitation system for humanoids. (Ch 6, Ch 10)
VLA (Vision-Language-Action) — A class of models that condition action generation on both visual observations and language instructions (RT-2, OpenVLA, π0, GR00T N1, Helix). (Ch 8, Ch 10)
VLM (Vision-Language Model) — A large pretrained model jointly reasoning over images and text; System 2 of the three-layer architecture. (Ch 8, Ch 9, Ch 10)
ZMP (Zero Moment Point) — The contact-surface point where net ground-reaction moments are zero; the classical criterion of bipedal balance. (Ch 1, Ch 2)

A

ACT (Action Chunking with Transformers): Transformer-based action chunking — learns continuous action sequences from demonstrations to stabilize delayed-reward tasks. (Ch1, Ch8, Ch10)
ADR (Automatic Domain Randomization): Auto-expands the range of physical/non-physical parameters during training as a sim-to-real strategy. (Ch6, Ch7)
ALOHA: Low-cost (<$20K) bimanual teleoperation hardware (includes ALOHA and ALOHA 2). (Ch10)

C

Closed-loop: Architecture that feeds execution results back to update plans. (Ch2)
Compliance: Mechanical yielding to external force — essential for contact-rich manipulation. (Ch4, Ch15, Ch16)

D

Dexterous manipulation: Precise object manipulation with multi-fingered hands — in-hand rotation, assembly, etc. (Ch3, Ch7, Ch12, Ch14, Ch16)
Diffusion Policy: Policy learning via conditional denoising diffusion over action distributions. (Ch2, Ch3, Ch7, Ch8, Ch9, Ch10, Ch11)
DoF (Degrees of Freedom): Number of independent joint axes. Human hand has ~27 DoF. (Ch2, Ch4, Ch8, Ch11, Ch12, Ch13, Ch15)
Domain Randomization: Randomizes simulation parameters to improve policy robustness. (Ch3, Ch4, Ch5, Ch6, Ch7, Ch9, Ch11, Ch12)

F

Feasibility: Property of actions, states, or regions that satisfy a set of constraints — joint/torque limits, friction cones, contact conditions, support-polygon inclusion. In classical control it serves as a safety certificate; learned policies internalize it via reward shaping or QP-based safety filters. (Ch1, Ch2, Ch6, Ch11, Ch14, Ch16)
Flow Matching: Learns action distributions via continuous normalizing flows — core technique of pi0. (Ch8, Ch9, Ch10)
Foundation Model: Large-scale pretrained general-purpose model — e.g., Sparsh (tactile), pi0 (VLA). (Ch8, Ch9, Ch10, Ch12, Ch13, Ch14, Ch15, Ch16)

G

Grounding: Connecting an LLM's abstract language to feasible actions, objects, and states in the environment. (Ch3)

I

In-hand manipulation: Changing the position/pose of a grasped object. (Ch7, Ch8, Ch10)

O

Open X-Embodiment: Largest open-source robot dataset, aggregating 1M+ trajectories from 34 labs. (Ch8, Ch10, Ch12, Ch13, Ch15)
OpenVLA: Open-source VLA foundation model (7B params, trained on Open X-Embodiment). (Ch8, Ch9, Ch10, Ch15)

P

PaLM-E: Google's embodied multimodal language model unifying image, state, and language into one token space. (Ch10)
Physical AI: AI that understands and interacts with the physical world — convergence of Foundation Models, simulation, and sensors. (Ch3, Ch9, Ch14, Ch15, Ch16)
Point cloud: 3D coordinate set representing tactile or visual data. (Ch7)

R

REFLECT: Closed-loop pattern of reflecting on failures and replanning. (Ch15, Ch16)
RL (Reinforcement Learning): Policy learning by maximizing reward. (Ch1, Ch2, Ch3, Ch4, Ch5, Ch6, Ch7, Ch8, Ch9, Ch10, Ch11, Ch12, Ch13, Ch14, Ch15, Ch16)
RT-2: Google DeepMind's VLA model jointly trained on web VQA and robot manipulation. (Ch10)

S

SIMPLER: Benchmark that aligns simulation evaluation with real-world performance. (Ch8, Ch10, Ch12)
Sim-to-Real: Process/strategies for transferring policies trained in simulation to the real world. (Ch1, Ch2, Ch3, Ch4, Ch5, Ch6, Ch7, Ch8, Ch9, Ch10, Ch11, Ch13, Ch14, Ch16)

T

Teleoperation (TeleOp): Humans remotely operating a robot to collect demonstration data. (Ch3, Ch6, Ch8, Ch9, Ch10, Ch11, Ch12, Ch13, Ch15)
Tendon-driven: Actuation via tendons transmitting force — e.g., SoftHand, ORCA. (Ch12)
Torque control: Directly controls joint torque — essential in contact-rich environments. (Ch4)

V

VLA (Vision-Language-Action): Unified model that directly outputs robot actions from vision and language input. (Ch2, Ch3, Ch4, Ch6, Ch7, Ch8, Ch9, Ch10, Ch12, Ch13, Ch14, Ch15, Ch16)