Skip to main content
πŸ€– Physical AI
πŸ“š DocsπŸ—ΊοΈ RoadmapπŸ–₯️ Dashboard
πŸ” Login✨ Sign Up Free
Language:
AI Personalization:

Week 13 - Complete Humanoid System

Learning Objectives​

By the end of this week, you will be able to:

  • Integrate all curriculum modules into a complete humanoid system
  • Deploy Physical AI systems on edge hardware
  • Design and execute a capstone project
  • Evaluate system performance and plan future improvements

Overview​

This final week brings together all components from the 13-week curriculum into a complete, deployable humanoid robot system. You will integrate perception, reasoning, control, and conversational AI into a unified platform.

Complete System Architecture​

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Complete Humanoid System β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Perception Layer β”‚ β”‚
β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚
β”‚ β”‚ β”‚ Camera β”‚ β”‚ LiDAR β”‚ β”‚ IMU β”‚ β”‚ Force Sensors β”‚ β”‚ β”‚
β”‚ β”‚ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚
β”‚ β”‚ β”‚ Sensor Fusion & VLA β”‚ β”‚ β”‚
β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Reasoning Layer β”‚ β”‚
β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚
β”‚ β”‚ β”‚ LLM / VLA β”‚ β”‚ Planning β”‚ β”‚ Memory β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ (Language) β”‚ β”‚ (Navigation) β”‚ β”‚ (Context) β”‚ β”‚ β”‚
β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Control Layer β”‚ β”‚
β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚
β”‚ β”‚ β”‚ Whole-Body β”‚ β”‚ Balance β”‚ β”‚ Locomotion β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ Control β”‚ β”‚ (ZMP) β”‚ β”‚ (Walking) β”‚ β”‚ β”‚
β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Hardware Layer β”‚ β”‚
β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚
β”‚ β”‚ β”‚ Jetson Orin β”‚ β”‚ Motor Drivers β”‚ β”‚ Power System β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ (Edge AI) β”‚ β”‚ (Dynamixel) β”‚ β”‚ (Battery) β”‚ β”‚ β”‚
β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚
β”‚ Communication: ROS 2 (DDS Middleware) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Hardware Requirements​

Complete System Bill of Materials​

ComponentSpecificationQuantityPurpose
Main ComputeJetson Orin AGX1Primary AI processing
Edge ComputeJetson Orin Nano2Distributed processing
GPU (Desktop)RTX 4090 (24GB)1Training, simulation
CamerasIntel Realsense D435i2RGB-D perception
LiDAROuster OS1-1613D mapping
IMUBNO055 9-axis1Balance sensing
MotorsDynamixel XM430-W35020+Joint actuation
Motor DriversDynamixel U2D24Motor communication
Battery6S LiPo 10000mAh2Power system
Power Distribution5V/12V regulators4Voltage regulation
FrameAluminum extrusion1Robot structure

System Integration​

Complete Launch File​

#!/usr/bin/env python3
"""
Complete Humanoid System Launch File
Integrates all subsystems for full robot operation
"""

from launch import LaunchDescription
from launch.actions import (
DeclareLaunchArgument,
ExecuteProcess,
IncludeLaunchDescription,
RegisterEventHandler,
LogInfo
)
from launch.event_handlers import OnProcessStart
from launch.launch_description_sources import PythonLaunchDescriptionSource
from launch.substitutions import LaunchConfiguration, PathJoinSubstitution
from launch_ros.actions import Node
from launch_ros.substitutions import FindPackageShare

def generate_launch_description():
# Configuration
use_sim_time = LaunchConfiguration('use_sim_time', default='false')
robot_model = LaunchConfiguration('robot_model', default='humanoid_v1')

# ========== Perception Stack ==========
perception_launch = IncludeLaunchDescription(
PythonLaunchDescriptionSource([
PathJoinSubstitution([
FindPackageShare('humanoid_perception'),
'launch',
'perception_stack.launch.py'
])
]),
launch_arguments={
'use_sim_time': use_sim_time,
}.items()
)

# ========== Navigation Stack ==========
navigation_launch = IncludeLaunchDescription(
PythonLaunchDescriptionSource([
PathJoinSubstitution([
FindPackageShare('nav2_bringup'),
'launch',
'navigation_launch.py'
])
]),
launch_arguments={
'use_sim_time': use_sim_time,
'params_file': PathJoinSubstitution([
FindPackageShare('humanoid_navigation'),
'config',
'nav2_params.yaml'
])
}.items()
)

# ========== Control Stack ==========
control_launch = IncludeLaunchDescription(
PythonLaunchDescriptionSource([
PathJoinSubstitution([
FindPackageShare('humanoid_control'),
'launch',
'whole_body_control.launch.py'
])
]),
launch_arguments={
'use_sim_time': use_sim_time,
}.items()
)

# ========== Conversational AI Stack ==========
conversational_launch = IncludeLaunchDescription(
PythonLaunchDescriptionSource([
PathJoinSubstitution([
FindPackageShare('humanoid_ai'),
'launch',
'conversational_ai.launch.py'
])
]),
launch_arguments={
'use_sim_time': use_sim_time,
}.items()
)

# ========== Robot State Publisher ==========
robot_state_publisher = Node(
package='robot_state_publisher',
executable='robot_state_publisher',
name='robot_state_publisher',
output='screen',
parameters=[{
'robot_description': open(
PathJoinSubstitution([
FindPackageShare('humanoid_description'),
'urdf',
'humanoid_robot.urdf.xacro'
])
).read(),
'use_sim_time': use_sim_time,
}]
)

# ========== System Monitor ==========
system_monitor = Node(
package='humanoid_system',
executable='system_monitor_node',
name='system_monitor',
output='screen',
parameters=[{
'use_sim_time': use_sim_time,
}]
)

# ========== RViz Visualization ==========
rviz_launch = IncludeLaunchDescription(
PythonLaunchDescriptionSource([
PathJoinSubstitution([
FindPackageShare('rviz2'),
'launch',
'rviz2.launch.py'
])
]),
launch_arguments={
'rviz_config': PathJoinSubstitution([
FindPackageShare('humanoid_viz'),
'config',
'humanoid_system.rviz'
])
}.items()
)

return LaunchDescription([
DeclareLaunchArgument('use_sim_time'),
DeclareLaunchArgument('robot_model'),

LogInfo(msg='Launching Complete Humanoid System...'),

robot_state_publisher,

# Start perception first
perception_launch,

# Then navigation
RegisterEventHandler(
OnProcessStart(
target_action=perception_launch,
on_start=[navigation_launch]
)
),

# Then control
RegisterEventHandler(
OnProcessStart(
target_action=navigation_launch,
on_start=[control_launch]
)
),

# Then conversational AI
RegisterEventHandler(
OnProcessStart(
target_action=control_launch,
on_start=[conversational_launch]
)
),

# System monitor
system_monitor,

# Visualization
rviz_launch,
])

System Monitor Node​

#!/usr/bin/env python3
"""
System Monitor for Complete Humanoid
Monitors system health, performance, and diagnostics
"""

import rclpy
from rclpy.node import Node
from diagnostic_msgs.msg import DiagnosticArray, DiagnosticStatus, KeyValue
from sensor_msgs.msg import BatteryState
import psutil
import torch

class SystemMonitor(Node):
def __init__(self):
super().__init__('system_monitor')

# Publishers
self.diagnostics_pub = self.create_publisher(
DiagnosticArray, '/diagnostics', 10)
self.battery_pub = self.create_publisher(
BatteryState, '/battery_state', 10)

# Timer
self.timer = self.create_timer(1.0, self.monitor_callback)

self.get_logger().info('System Monitor initialized')

def monitor_callback(self):
"""Collect and publish system diagnostics"""
# CPU usage
cpu_percent = psutil.cpu_percent(interval=0.5)
memory = psutil.virtual_memory()

# GPU usage (if available)
gpu_memory = 0
gpu_utilization = 0
if torch.cuda.is_available():
gpu_memory = torch.cuda.memory_allocated() / 1024**2
gpu_utilization = 0 # Would need pynvml for actual utilization

# Create diagnostic array
diag_array = DiagnosticArray()
diag_array.header.stamp = self.get_clock().now().to_msg()

# CPU status
cpu_status = DiagnosticStatus()
cpu_status.name = 'CPU'
cpu_status.level = DiagnosticStatus.OK if cpu_percent < 80 else DiagnosticStatus.WARN
cpu_status.message = f'CPU Usage: {cpu_percent}%'
cpu_status.values = [
KeyValue(key='usage_percent', value=str(cpu_percent)),
KeyValue(key='memory_percent', value=str(memory.percent)),
KeyValue(key='memory_available_mb', value=str(memory.available / 1024**2)),
]
diag_array.status.append(cpu_status)

# GPU status
gpu_status = DiagnosticStatus()
gpu_status.name = 'GPU'
gpu_status.level = DiagnosticStatus.OK
gpu_status.message = f'GPU Memory: {gpu_memory:.0f}MB'
gpu_status.values = [
KeyValue(key='memory_allocated_mb', value=f'{gpu_memory:.0f}'),
KeyValue(key='cuda_available', value=str(torch.cuda.is_available())),
]
diag_array.status.append(gpu_status)

# ROS 2 node status
ros_status = DiagnosticStatus()
ros_status.name = 'ROS 2 System'
ros_status.level = DiagnosticStatus.OK
ros_status.message = 'All systems operational'
ros_status.values = [
KeyValue(key='node_count', value=str(len(self.get_node_names()))),
KeyValue(key='uptime_sec', value=str(self.get_clock().now().nanoseconds / 1e9)),
]
diag_array.status.append(ros_status)

self.diagnostics_pub.publish(diag_array)

# Log summary
self.get_logger().info(
f'System: CPU={cpu_percent}%, Memory={memory.percent}%, '
f'GPU={gpu_memory:.0f}MB'
)

def main(args=None):
rclpy.init(args=args)
node = SystemMonitor()
rclpy.spin(node)
node.destroy_node()
rclpy.shutdown()

if __name__ == '__main__':
main()

Voice-to-Action Pipeline (Whisper + LLM + ROS 2)​

This is the core of the PDF's capstone requirement: a robot that hears a voice command, reasons about it, and executes physical actions.

Pipeline Overview​

Microphone
β”‚
β–Ό
[OpenAI Whisper] ←── Speech-to-Text (ASR)
β”‚
β–Ό
[LLM Planner] ←── "Pick up the red mug" β†’ structured action plan
β”‚
β–Ό
[ROS 2 Action Server] ←── Navigate β†’ Detect Object β†’ Grasp
β”‚
β–Ό
[Robot Hardware / Isaac Sim]

Step 1: Whisper Speech Recognition Node​

#!/usr/bin/env python3
"""
Whisper ASR Node β€” converts microphone audio to text commands
Requires: pip install openai-whisper sounddevice numpy
"""

import rclpy
from rclpy.node import Node
from std_msgs.msg import String
import whisper
import sounddevice as sd
import numpy as np

class WhisperASRNode(Node):
def __init__(self):
super().__init__('whisper_asr_node')
self.model = whisper.load_model("base") # or "small", "medium"
self.command_pub = self.create_publisher(String, '/voice_command', 10)
self.sample_rate = 16000
self.duration = 5 # seconds to record per utterance
self.get_logger().info('Whisper ASR ready β€” listening...')
self.create_timer(6.0, self.listen_and_publish)

def listen_and_publish(self):
self.get_logger().info('Recording...')
audio = sd.rec(int(self.duration * self.sample_rate),
samplerate=self.sample_rate, channels=1, dtype='float32')
sd.wait()
audio_np = audio.flatten()
result = self.model.transcribe(audio_np, language='en')
text = result['text'].strip()
if text:
self.get_logger().info(f'Heard: "{text}"')
msg = String()
msg.data = text
self.command_pub.publish(msg)

def main(args=None):
rclpy.init(args=args)
rclpy.spin(WhisperASRNode())
rclpy.shutdown()

Step 2: LLM Command Planner Node​

#!/usr/bin/env python3
"""
LLM Planner Node β€” converts natural language to structured robot actions
"""

import rclpy
from rclpy.node import Node
from std_msgs.msg import String
import json
import os

# Uses Groq API (same as course RAG chatbot)
from groq import Groq

SYSTEM_PROMPT = """You are a robot action planner. Convert the user's voice command
into a JSON action plan. Use only these actions:
- navigate_to: {"action": "navigate_to", "target": "kitchen"}
- detect_object: {"action": "detect_object", "object": "red mug"}
- pick_object: {"action": "pick_object", "object": "red mug"}
- place_object: {"action": "place_object", "location": "table"}
- say: {"action": "say", "text": "I found the mug"}

Return a JSON array of actions only. No explanation."""

class LLMPlannerNode(Node):
def __init__(self):
super().__init__('llm_planner_node')
self.client = Groq(api_key=os.getenv('GROK_API_KEY'))
self.command_sub = self.create_subscription(
String, '/voice_command', self.plan_callback, 10)
self.action_pub = self.create_publisher(String, '/action_plan', 10)
self.get_logger().info('LLM Planner ready')

def plan_callback(self, msg: String):
command = msg.data
self.get_logger().info(f'Planning for: "{command}"')
response = self.client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": command}
]
)
plan_json = response.choices[0].message.content
self.get_logger().info(f'Plan: {plan_json}')
out = String()
out.data = plan_json
self.action_pub.publish(out)

def main(args=None):
rclpy.init(args=args)
rclpy.spin(LLMPlannerNode())
rclpy.shutdown()

Step 3: Action Executor Node​

#!/usr/bin/env python3
"""
Action Executor β€” reads the LLM plan and dispatches ROS 2 commands
"""

import rclpy
from rclpy.node import Node
from std_msgs.msg import String
from geometry_msgs.msg import Twist
import json

class ActionExecutorNode(Node):
def __init__(self):
super().__init__('action_executor')
self.plan_sub = self.create_subscription(
String, '/action_plan', self.execute_callback, 10)
self.cmd_vel_pub = self.create_publisher(Twist, '/cmd_vel', 10)
self.speech_pub = self.create_publisher(String, '/tts_text', 10)

def execute_callback(self, msg: String):
try:
actions = json.loads(msg.data)
except json.JSONDecodeError:
self.get_logger().error('Invalid action plan JSON')
return

for action in actions:
name = action.get('action')
if name == 'navigate_to':
self.get_logger().info(f"Navigating to: {action['target']}")
# In real system: send Nav2 goal
elif name == 'detect_object':
self.get_logger().info(f"Detecting: {action['object']}")
# In real system: call YOLO/VLM service
elif name == 'pick_object':
self.get_logger().info(f"Picking: {action['object']}")
# In real system: call manipulation action server
elif name == 'say':
out = String()
out.data = action['text']
self.speech_pub.publish(out)

def main(args=None):
rclpy.init(args=args)
rclpy.spin(ActionExecutorNode())
rclpy.shutdown()

Capstone Project​

Project Requirements​

Your capstone project should demonstrate:

  1. Perception: Object detection, depth estimation, or scene understanding
  2. Reasoning: Language understanding, planning, or decision-making
  3. Control: Locomotion, manipulation, or balance control
  4. Integration: Multiple subsystems working together

The Autonomous Humanoid (Main Capstone)​

This is the PDF-defined capstone: a simulated robot that completes a full autonomy loop.

Task: "Pick up the red mug from the kitchen counter"

StepWhat HappensTechnology Used
1. Voice InputUser speaks the commandMicrophone + Whisper ASR
2. TranscriptionAudio β†’ "Pick up the red mug"OpenAI Whisper (base model)
3. PlanningText β†’ JSON action planLLM (Groq llama-3.3-70b)
4. NavigationRobot moves toward kitchenNav2 + LiDAR SLAM
5. Object DetectionCamera finds the red mugYOLO / VLM (GPT-4o vision)
6. ManipulationArm reaches, grasps mugInverse Kinematics + F/T control
7. ConfirmationRobot says "I picked up the mug"TTS (pyttsx3 / gTTS)

Example Projects​

ProjectDescriptionDifficulty
Autonomous HumanoidFull voice→plan→navigate→pick loop (PDF capstone)Hard
Object FetchingRobot finds and retrieves objects on commandMedium
Guided TourRobot navigates and explains environmentMedium
Interactive DemoRobot responds to voice commands and gesturesHard
Autonomous PatrolRobot patrols area with obstacle avoidanceHard

Project Template​

#!/usr/bin/env python3
"""
Capstone Project Template
Base class for humanoid robot capstone projects
"""

import rclpy
from rclpy.node import Node
from std_msgs.msg import String
from geometry_msgs.msg import Twist
from sensor_msgs.msg import Image
from typing import Dict, List

class CapstoneProject(Node):
def __init__(self, project_name: str):
super().__init__(f'capstone_{project_name}')

self.project_name = project_name

# Subscribers
self.command_sub = self.create_subscription(
String, '/language_command',
self.command_callback, 10)
self.camera_sub = self.create_subscription(
Image, '/camera/color/image_raw',
self.camera_callback, 10)

# Publishers
self.cmd_vel_pub = self.create_publisher(
Twist, '/cmd_vel', 10)
self.feedback_pub = self.create_publisher(
String, '/project_feedback', 10)

# State machine
self.state = 'IDLE'
self.state_data: Dict = {}

self.get_logger().info(f'Capstone Project "{project_name}" initialized')

def command_callback(self, msg: String):
"""Process project command"""
command = msg.data.lower()
self.get_logger().info(f'Command: {command}')

if 'start' in command:
self.state = 'RUNNING'
self.send_feedback(f'{self.project_name} started')
elif 'stop' in command:
self.state = 'IDLE'
self.send_feedback(f'{self.project_name} stopped')
else:
self.process_command(command)

def camera_callback(self, msg: Image):
"""Process camera data"""
if self.state == 'RUNNING':
self.process_visual_input(msg)

def process_command(self, command: str):
"""Override: Process specific commands"""
raise NotImplementedError

def process_visual_input(self, image: Image):
"""Override: Process visual input"""
raise NotImplementedError

def send_feedback(self, message: str):
"""Send project feedback"""
msg = String()
msg.data = message
self.feedback_pub.publish(msg)
self.get_logger().info(f'Feedback: {message}')

# Example implementation
class ObjectFetchingProject(CapstoneProject):
def __init__(self):
super().__init__('object_fetching')
self.target_object = None
self.detected_objects: List[str] = []

def process_command(self, command: str):
if 'fetch' in command:
# Extract object name
self.target_object = command.split('fetch')[-1].strip()
self.state = 'SEARCHING'
self.send_feedback(f'Searching for {self.target_object}')

def process_visual_input(self, image: Image):
if self.state == 'SEARCHING':
# Run object detection
# If target found, transition to NAVIGATING
pass

def main(args=None):
rclpy.init(args=args)

# Choose project
project = ObjectFetchingProject()

rclpy.spin(project)
project.destroy_node()
rclpy.shutdown()

if __name__ == '__main__':
main()

Deployment​

Edge Deployment Script​

#!/bin/bash
# Deploy Humanoid System to Jetson Orin

JETSON_IP="192.168.1.100"
JETSON_USER="nvidia"

echo "Building packages..."
colcon build --packages-select humanoid_system humanoid_perception humanoid_control

echo "Syncing to Jetson..."
rsync -avz --exclude='build' --exclude='install' \
~/humanoid_ws/ ${JETSON_USER}@${JETSON_IP}:~/humanoid_ws/

echo "Installing dependencies on Jetson..."
ssh ${JETSON_USER}@${JETSON_IP} << 'EOF'
cd ~/humanoid_ws
source /opt/ros/humble/setup.bash
rosdep install --from-paths src --ignore-src -r -y
colcon build --symlink-install
EOF

echo "Deployment complete!"
echo "SSH into Jetson and run: ros2 launch humanoid_system complete_system.launch.py"

Resources​

Documentation​

Tools​

Exercises​

  1. System Integration: Launch all subsystems together
  2. Performance Testing: Measure system latency and throughput
  3. Capstone Project: Implement and demonstrate a complete project
  4. Documentation: Create project documentation and demo video

Curriculum Completion​

Congratulations on completing the 13-week Physical AI & Humanoid Robotics curriculum!

What You've Learned​

  • Weeks 1-2: Foundation - Physical AI concepts and architecture
  • Weeks 3-5: ROS 2 - Middleware, nodes, topics, services, actions
  • Weeks 6-7: Gazebo - Robot simulation and modeling
  • Weeks 8-10: NVIDIA Isaac - GPU-accelerated simulation and perception
  • Weeks 11-13: Humanoid - Kinematics, control, conversational AI, integration

Next Steps​

  • Continue with advanced topics (RL, advanced manipulation)
  • Contribute to open-source robotics projects
  • Build your own humanoid robot platform
  • Pursue research in Physical AI