Week 13 - Complete Humanoid System
Learning Objectivesβ
By the end of this week, you will be able to:
- Integrate all curriculum modules into a complete humanoid system
- Deploy Physical AI systems on edge hardware
- Design and execute a capstone project
- Evaluate system performance and plan future improvements
Overviewβ
This final week brings together all components from the 13-week curriculum into a complete, deployable humanoid robot system. You will integrate perception, reasoning, control, and conversational AI into a unified platform.
Complete System Architectureβ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Complete Humanoid System β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Perception Layer β β
β β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββββββββββ β β
β β β Camera β β LiDAR β β IMU β β Force Sensors β β β
β β ββββββ¬ββββββ ββββββ¬ββββββ ββββββ¬ββββββ ββββββββββ¬ββββββββββ β β
β β β β β β β β
β β βββββββββββββββ΄βββββββ¬βββββββ΄βββββββββββββββββββ β β
β β β β β
β β βββββββββββββββΌββββββββββββββ β β
β β β Sensor Fusion & VLA β β β
β β βββββββββββββββ¬ββββββββββββββ β β
β ββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββ β
β β β
β ββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββ β
β β Reasoning Layer β β
β β ββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββ β β
β β β LLM / VLA β β Planning β β Memory β β β
β β β (Language) β β (Navigation) β β (Context) β β β
β β ββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββ β β
β ββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββ β
β β β
β ββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββ β
β β Control Layer β β
β β ββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββ β β
β β β Whole-Body β β Balance β β Locomotion β β β
β β β Control β β (ZMP) β β (Walking) β β β
β β ββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββ β β
β ββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββ β
β β β
β ββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββ β
β β Hardware Layer β β
β β ββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββ β β
β β β Jetson Orin β β Motor Drivers β β Power System β β β
β β β (Edge AI) β β (Dynamixel) β β (Battery) β β β
β β ββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Communication: ROS 2 (DDS Middleware) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Hardware Requirementsβ
Complete System Bill of Materialsβ
| Component | Specification | Quantity | Purpose |
|---|---|---|---|
| Main Compute | Jetson Orin AGX | 1 | Primary AI processing |
| Edge Compute | Jetson Orin Nano | 2 | Distributed processing |
| GPU (Desktop) | RTX 4090 (24GB) | 1 | Training, simulation |
| Cameras | Intel Realsense D435i | 2 | RGB-D perception |
| LiDAR | Ouster OS1-16 | 1 | 3D mapping |
| IMU | BNO055 9-axis | 1 | Balance sensing |
| Motors | Dynamixel XM430-W350 | 20+ | Joint actuation |
| Motor Drivers | Dynamixel U2D2 | 4 | Motor communication |
| Battery | 6S LiPo 10000mAh | 2 | Power system |
| Power Distribution | 5V/12V regulators | 4 | Voltage regulation |
| Frame | Aluminum extrusion | 1 | Robot structure |
System Integrationβ
Complete Launch Fileβ
#!/usr/bin/env python3
"""
Complete Humanoid System Launch File
Integrates all subsystems for full robot operation
"""
from launch import LaunchDescription
from launch.actions import (
DeclareLaunchArgument,
ExecuteProcess,
IncludeLaunchDescription,
RegisterEventHandler,
LogInfo
)
from launch.event_handlers import OnProcessStart
from launch.launch_description_sources import PythonLaunchDescriptionSource
from launch.substitutions import LaunchConfiguration, PathJoinSubstitution
from launch_ros.actions import Node
from launch_ros.substitutions import FindPackageShare
def generate_launch_description():
# Configuration
use_sim_time = LaunchConfiguration('use_sim_time', default='false')
robot_model = LaunchConfiguration('robot_model', default='humanoid_v1')
# ========== Perception Stack ==========
perception_launch = IncludeLaunchDescription(
PythonLaunchDescriptionSource([
PathJoinSubstitution([
FindPackageShare('humanoid_perception'),
'launch',
'perception_stack.launch.py'
])
]),
launch_arguments={
'use_sim_time': use_sim_time,
}.items()
)
# ========== Navigation Stack ==========
navigation_launch = IncludeLaunchDescription(
PythonLaunchDescriptionSource([
PathJoinSubstitution([
FindPackageShare('nav2_bringup'),
'launch',
'navigation_launch.py'
])
]),
launch_arguments={
'use_sim_time': use_sim_time,
'params_file': PathJoinSubstitution([
FindPackageShare('humanoid_navigation'),
'config',
'nav2_params.yaml'
])
}.items()
)
# ========== Control Stack ==========
control_launch = IncludeLaunchDescription(
PythonLaunchDescriptionSource([
PathJoinSubstitution([
FindPackageShare('humanoid_control'),
'launch',
'whole_body_control.launch.py'
])
]),
launch_arguments={
'use_sim_time': use_sim_time,
}.items()
)
# ========== Conversational AI Stack ==========
conversational_launch = IncludeLaunchDescription(
PythonLaunchDescriptionSource([
PathJoinSubstitution([
FindPackageShare('humanoid_ai'),
'launch',
'conversational_ai.launch.py'
])
]),
launch_arguments={
'use_sim_time': use_sim_time,
}.items()
)
# ========== Robot State Publisher ==========
robot_state_publisher = Node(
package='robot_state_publisher',
executable='robot_state_publisher',
name='robot_state_publisher',
output='screen',
parameters=[{
'robot_description': open(
PathJoinSubstitution([
FindPackageShare('humanoid_description'),
'urdf',
'humanoid_robot.urdf.xacro'
])
).read(),
'use_sim_time': use_sim_time,
}]
)
# ========== System Monitor ==========
system_monitor = Node(
package='humanoid_system',
executable='system_monitor_node',
name='system_monitor',
output='screen',
parameters=[{
'use_sim_time': use_sim_time,
}]
)
# ========== RViz Visualization ==========
rviz_launch = IncludeLaunchDescription(
PythonLaunchDescriptionSource([
PathJoinSubstitution([
FindPackageShare('rviz2'),
'launch',
'rviz2.launch.py'
])
]),
launch_arguments={
'rviz_config': PathJoinSubstitution([
FindPackageShare('humanoid_viz'),
'config',
'humanoid_system.rviz'
])
}.items()
)
return LaunchDescription([
DeclareLaunchArgument('use_sim_time'),
DeclareLaunchArgument('robot_model'),
LogInfo(msg='Launching Complete Humanoid System...'),
robot_state_publisher,
# Start perception first
perception_launch,
# Then navigation
RegisterEventHandler(
OnProcessStart(
target_action=perception_launch,
on_start=[navigation_launch]
)
),
# Then control
RegisterEventHandler(
OnProcessStart(
target_action=navigation_launch,
on_start=[control_launch]
)
),
# Then conversational AI
RegisterEventHandler(
OnProcessStart(
target_action=control_launch,
on_start=[conversational_launch]
)
),
# System monitor
system_monitor,
# Visualization
rviz_launch,
])
System Monitor Nodeβ
#!/usr/bin/env python3
"""
System Monitor for Complete Humanoid
Monitors system health, performance, and diagnostics
"""
import rclpy
from rclpy.node import Node
from diagnostic_msgs.msg import DiagnosticArray, DiagnosticStatus, KeyValue
from sensor_msgs.msg import BatteryState
import psutil
import torch
class SystemMonitor(Node):
def __init__(self):
super().__init__('system_monitor')
# Publishers
self.diagnostics_pub = self.create_publisher(
DiagnosticArray, '/diagnostics', 10)
self.battery_pub = self.create_publisher(
BatteryState, '/battery_state', 10)
# Timer
self.timer = self.create_timer(1.0, self.monitor_callback)
self.get_logger().info('System Monitor initialized')
def monitor_callback(self):
"""Collect and publish system diagnostics"""
# CPU usage
cpu_percent = psutil.cpu_percent(interval=0.5)
memory = psutil.virtual_memory()
# GPU usage (if available)
gpu_memory = 0
gpu_utilization = 0
if torch.cuda.is_available():
gpu_memory = torch.cuda.memory_allocated() / 1024**2
gpu_utilization = 0 # Would need pynvml for actual utilization
# Create diagnostic array
diag_array = DiagnosticArray()
diag_array.header.stamp = self.get_clock().now().to_msg()
# CPU status
cpu_status = DiagnosticStatus()
cpu_status.name = 'CPU'
cpu_status.level = DiagnosticStatus.OK if cpu_percent < 80 else DiagnosticStatus.WARN
cpu_status.message = f'CPU Usage: {cpu_percent}%'
cpu_status.values = [
KeyValue(key='usage_percent', value=str(cpu_percent)),
KeyValue(key='memory_percent', value=str(memory.percent)),
KeyValue(key='memory_available_mb', value=str(memory.available / 1024**2)),
]
diag_array.status.append(cpu_status)
# GPU status
gpu_status = DiagnosticStatus()
gpu_status.name = 'GPU'
gpu_status.level = DiagnosticStatus.OK
gpu_status.message = f'GPU Memory: {gpu_memory:.0f}MB'
gpu_status.values = [
KeyValue(key='memory_allocated_mb', value=f'{gpu_memory:.0f}'),
KeyValue(key='cuda_available', value=str(torch.cuda.is_available())),
]
diag_array.status.append(gpu_status)
# ROS 2 node status
ros_status = DiagnosticStatus()
ros_status.name = 'ROS 2 System'
ros_status.level = DiagnosticStatus.OK
ros_status.message = 'All systems operational'
ros_status.values = [
KeyValue(key='node_count', value=str(len(self.get_node_names()))),
KeyValue(key='uptime_sec', value=str(self.get_clock().now().nanoseconds / 1e9)),
]
diag_array.status.append(ros_status)
self.diagnostics_pub.publish(diag_array)
# Log summary
self.get_logger().info(
f'System: CPU={cpu_percent}%, Memory={memory.percent}%, '
f'GPU={gpu_memory:.0f}MB'
)
def main(args=None):
rclpy.init(args=args)
node = SystemMonitor()
rclpy.spin(node)
node.destroy_node()
rclpy.shutdown()
if __name__ == '__main__':
main()
Voice-to-Action Pipeline (Whisper + LLM + ROS 2)β
This is the core of the PDF's capstone requirement: a robot that hears a voice command, reasons about it, and executes physical actions.
Pipeline Overviewβ
Microphone
β
βΌ
[OpenAI Whisper] βββ Speech-to-Text (ASR)
β
βΌ
[LLM Planner] βββ "Pick up the red mug" β structured action plan
β
βΌ
[ROS 2 Action Server] βββ Navigate β Detect Object β Grasp
β
βΌ
[Robot Hardware / Isaac Sim]
Step 1: Whisper Speech Recognition Nodeβ
#!/usr/bin/env python3
"""
Whisper ASR Node β converts microphone audio to text commands
Requires: pip install openai-whisper sounddevice numpy
"""
import rclpy
from rclpy.node import Node
from std_msgs.msg import String
import whisper
import sounddevice as sd
import numpy as np
class WhisperASRNode(Node):
def __init__(self):
super().__init__('whisper_asr_node')
self.model = whisper.load_model("base") # or "small", "medium"
self.command_pub = self.create_publisher(String, '/voice_command', 10)
self.sample_rate = 16000
self.duration = 5 # seconds to record per utterance
self.get_logger().info('Whisper ASR ready β listening...')
self.create_timer(6.0, self.listen_and_publish)
def listen_and_publish(self):
self.get_logger().info('Recording...')
audio = sd.rec(int(self.duration * self.sample_rate),
samplerate=self.sample_rate, channels=1, dtype='float32')
sd.wait()
audio_np = audio.flatten()
result = self.model.transcribe(audio_np, language='en')
text = result['text'].strip()
if text:
self.get_logger().info(f'Heard: "{text}"')
msg = String()
msg.data = text
self.command_pub.publish(msg)
def main(args=None):
rclpy.init(args=args)
rclpy.spin(WhisperASRNode())
rclpy.shutdown()
Step 2: LLM Command Planner Nodeβ
#!/usr/bin/env python3
"""
LLM Planner Node β converts natural language to structured robot actions
"""
import rclpy
from rclpy.node import Node
from std_msgs.msg import String
import json
import os
# Uses Groq API (same as course RAG chatbot)
from groq import Groq
SYSTEM_PROMPT = """You are a robot action planner. Convert the user's voice command
into a JSON action plan. Use only these actions:
- navigate_to: {"action": "navigate_to", "target": "kitchen"}
- detect_object: {"action": "detect_object", "object": "red mug"}
- pick_object: {"action": "pick_object", "object": "red mug"}
- place_object: {"action": "place_object", "location": "table"}
- say: {"action": "say", "text": "I found the mug"}
Return a JSON array of actions only. No explanation."""
class LLMPlannerNode(Node):
def __init__(self):
super().__init__('llm_planner_node')
self.client = Groq(api_key=os.getenv('GROK_API_KEY'))
self.command_sub = self.create_subscription(
String, '/voice_command', self.plan_callback, 10)
self.action_pub = self.create_publisher(String, '/action_plan', 10)
self.get_logger().info('LLM Planner ready')
def plan_callback(self, msg: String):
command = msg.data
self.get_logger().info(f'Planning for: "{command}"')
response = self.client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": command}
]
)
plan_json = response.choices[0].message.content
self.get_logger().info(f'Plan: {plan_json}')
out = String()
out.data = plan_json
self.action_pub.publish(out)
def main(args=None):
rclpy.init(args=args)
rclpy.spin(LLMPlannerNode())
rclpy.shutdown()
Step 3: Action Executor Nodeβ
#!/usr/bin/env python3
"""
Action Executor β reads the LLM plan and dispatches ROS 2 commands
"""
import rclpy
from rclpy.node import Node
from std_msgs.msg import String
from geometry_msgs.msg import Twist
import json
class ActionExecutorNode(Node):
def __init__(self):
super().__init__('action_executor')
self.plan_sub = self.create_subscription(
String, '/action_plan', self.execute_callback, 10)
self.cmd_vel_pub = self.create_publisher(Twist, '/cmd_vel', 10)
self.speech_pub = self.create_publisher(String, '/tts_text', 10)
def execute_callback(self, msg: String):
try:
actions = json.loads(msg.data)
except json.JSONDecodeError:
self.get_logger().error('Invalid action plan JSON')
return
for action in actions:
name = action.get('action')
if name == 'navigate_to':
self.get_logger().info(f"Navigating to: {action['target']}")
# In real system: send Nav2 goal
elif name == 'detect_object':
self.get_logger().info(f"Detecting: {action['object']}")
# In real system: call YOLO/VLM service
elif name == 'pick_object':
self.get_logger().info(f"Picking: {action['object']}")
# In real system: call manipulation action server
elif name == 'say':
out = String()
out.data = action['text']
self.speech_pub.publish(out)
def main(args=None):
rclpy.init(args=args)
rclpy.spin(ActionExecutorNode())
rclpy.shutdown()
Capstone Projectβ
Project Requirementsβ
Your capstone project should demonstrate:
- Perception: Object detection, depth estimation, or scene understanding
- Reasoning: Language understanding, planning, or decision-making
- Control: Locomotion, manipulation, or balance control
- Integration: Multiple subsystems working together
The Autonomous Humanoid (Main Capstone)β
This is the PDF-defined capstone: a simulated robot that completes a full autonomy loop.
Task: "Pick up the red mug from the kitchen counter"
| Step | What Happens | Technology Used |
|---|---|---|
| 1. Voice Input | User speaks the command | Microphone + Whisper ASR |
| 2. Transcription | Audio β "Pick up the red mug" | OpenAI Whisper (base model) |
| 3. Planning | Text β JSON action plan | LLM (Groq llama-3.3-70b) |
| 4. Navigation | Robot moves toward kitchen | Nav2 + LiDAR SLAM |
| 5. Object Detection | Camera finds the red mug | YOLO / VLM (GPT-4o vision) |
| 6. Manipulation | Arm reaches, grasps mug | Inverse Kinematics + F/T control |
| 7. Confirmation | Robot says "I picked up the mug" | TTS (pyttsx3 / gTTS) |
Example Projectsβ
| Project | Description | Difficulty |
|---|---|---|
| Autonomous Humanoid | Full voiceβplanβnavigateβpick loop (PDF capstone) | Hard |
| Object Fetching | Robot finds and retrieves objects on command | Medium |
| Guided Tour | Robot navigates and explains environment | Medium |
| Interactive Demo | Robot responds to voice commands and gestures | Hard |
| Autonomous Patrol | Robot patrols area with obstacle avoidance | Hard |
Project Templateβ
#!/usr/bin/env python3
"""
Capstone Project Template
Base class for humanoid robot capstone projects
"""
import rclpy
from rclpy.node import Node
from std_msgs.msg import String
from geometry_msgs.msg import Twist
from sensor_msgs.msg import Image
from typing import Dict, List
class CapstoneProject(Node):
def __init__(self, project_name: str):
super().__init__(f'capstone_{project_name}')
self.project_name = project_name
# Subscribers
self.command_sub = self.create_subscription(
String, '/language_command',
self.command_callback, 10)
self.camera_sub = self.create_subscription(
Image, '/camera/color/image_raw',
self.camera_callback, 10)
# Publishers
self.cmd_vel_pub = self.create_publisher(
Twist, '/cmd_vel', 10)
self.feedback_pub = self.create_publisher(
String, '/project_feedback', 10)
# State machine
self.state = 'IDLE'
self.state_data: Dict = {}
self.get_logger().info(f'Capstone Project "{project_name}" initialized')
def command_callback(self, msg: String):
"""Process project command"""
command = msg.data.lower()
self.get_logger().info(f'Command: {command}')
if 'start' in command:
self.state = 'RUNNING'
self.send_feedback(f'{self.project_name} started')
elif 'stop' in command:
self.state = 'IDLE'
self.send_feedback(f'{self.project_name} stopped')
else:
self.process_command(command)
def camera_callback(self, msg: Image):
"""Process camera data"""
if self.state == 'RUNNING':
self.process_visual_input(msg)
def process_command(self, command: str):
"""Override: Process specific commands"""
raise NotImplementedError
def process_visual_input(self, image: Image):
"""Override: Process visual input"""
raise NotImplementedError
def send_feedback(self, message: str):
"""Send project feedback"""
msg = String()
msg.data = message
self.feedback_pub.publish(msg)
self.get_logger().info(f'Feedback: {message}')
# Example implementation
class ObjectFetchingProject(CapstoneProject):
def __init__(self):
super().__init__('object_fetching')
self.target_object = None
self.detected_objects: List[str] = []
def process_command(self, command: str):
if 'fetch' in command:
# Extract object name
self.target_object = command.split('fetch')[-1].strip()
self.state = 'SEARCHING'
self.send_feedback(f'Searching for {self.target_object}')
def process_visual_input(self, image: Image):
if self.state == 'SEARCHING':
# Run object detection
# If target found, transition to NAVIGATING
pass
def main(args=None):
rclpy.init(args=args)
# Choose project
project = ObjectFetchingProject()
rclpy.spin(project)
project.destroy_node()
rclpy.shutdown()
if __name__ == '__main__':
main()
Deploymentβ
Edge Deployment Scriptβ
#!/bin/bash
# Deploy Humanoid System to Jetson Orin
JETSON_IP="192.168.1.100"
JETSON_USER="nvidia"
echo "Building packages..."
colcon build --packages-select humanoid_system humanoid_perception humanoid_control
echo "Syncing to Jetson..."
rsync -avz --exclude='build' --exclude='install' \
~/humanoid_ws/ ${JETSON_USER}@${JETSON_IP}:~/humanoid_ws/
echo "Installing dependencies on Jetson..."
ssh ${JETSON_USER}@${JETSON_IP} << 'EOF'
cd ~/humanoid_ws
source /opt/ros/humble/setup.bash
rosdep install --from-paths src --ignore-src -r -y
colcon build --symlink-install
EOF
echo "Deployment complete!"
echo "SSH into Jetson and run: ros2 launch humanoid_system complete_system.launch.py"
Resourcesβ
Documentationβ
Toolsβ
- Foxglove Studio - Data visualization
- ros2doctor - System diagnostics
- htop - Process monitoring
Exercisesβ
- System Integration: Launch all subsystems together
- Performance Testing: Measure system latency and throughput
- Capstone Project: Implement and demonstrate a complete project
- Documentation: Create project documentation and demo video
Curriculum Completionβ
Congratulations on completing the 13-week Physical AI & Humanoid Robotics curriculum!
What You've Learnedβ
- Weeks 1-2: Foundation - Physical AI concepts and architecture
- Weeks 3-5: ROS 2 - Middleware, nodes, topics, services, actions
- Weeks 6-7: Gazebo - Robot simulation and modeling
- Weeks 8-10: NVIDIA Isaac - GPU-accelerated simulation and perception
- Weeks 11-13: Humanoid - Kinematics, control, conversational AI, integration
Next Stepsβ
- Continue with advanced topics (RL, advanced manipulation)
- Contribute to open-source robotics projects
- Build your own humanoid robot platform
- Pursue research in Physical AI