Week 13 - Complete Humanoid System

Learning Objectives

By the end of this week, you will be able to:

Integrate all curriculum modules into a complete humanoid system
Deploy Physical AI systems on edge hardware
Design and execute a capstone project
Evaluate system performance and plan future improvements

Overview

This final week brings together all components from the 13-week curriculum into a complete, deployable humanoid robot system. You will integrate perception, reasoning, control, and conversational AI into a unified platform.

Complete System Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                    Complete Humanoid System                              │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │                    Perception Layer                              │    │
│  │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────────────┐ │    │
│  │  │  Camera  │  │  LiDAR   │  │   IMU    │  │  Force Sensors   │ │    │
│  │  └────┬─────┘  └────┬─────┘  └────┬─────┘  └────────┬─────────┘ │    │
│  │       │             │             │                  │           │    │
│  │       └─────────────┴──────┬──────┴──────────────────┘           │    │
│  │                            │                                      │    │
│  │              ┌─────────────▼─────────────┐                        │    │
│  │              │   Sensor Fusion & VLA     │                        │    │
│  │              └─────────────┬─────────────┘                        │    │
│  └────────────────────────────┼───────────────────────────────────────┘    │
│                               │                                           │
│  ┌────────────────────────────▼───────────────────────────────────────┐   │
│  │                    Reasoning Layer                                  │   │
│  │  ┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐  │   │
│  │  │   LLM / VLA      │  │   Planning       │  │   Memory         │  │   │
│  │  │   (Language)     │  │   (Navigation)   │  │   (Context)      │  │   │
│  │  └──────────────────┘  └──────────────────┘  └──────────────────┘  │   │
│  └────────────────────────────┬───────────────────────────────────────┘   │
│                               │                                           │
│  ┌────────────────────────────▼───────────────────────────────────────┐   │
│  │                    Control Layer                                    │   │
│  │  ┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐  │   │
│  │  │   Whole-Body     │  │   Balance        │  │   Locomotion     │  │   │
│  │  │   Control        │  │   (ZMP)          │  │   (Walking)      │  │   │
│  │  └──────────────────┘  └──────────────────┘  └──────────────────┘  │   │
│  └────────────────────────────┬───────────────────────────────────────┘   │
│                               │                                           │
│  ┌────────────────────────────▼───────────────────────────────────────┐   │
│  │                    Hardware Layer                                   │   │
│  │  ┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐  │   │
│  │  │   Jetson Orin    │  │   Motor Drivers  │  │   Power System   │  │   │
│  │  │   (Edge AI)      │  │   (Dynamixel)    │  │   (Battery)      │  │   │
│  │  └──────────────────┘  └──────────────────┘  └──────────────────┘  │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                         │
│              Communication: ROS 2 (DDS Middleware)                       │
└─────────────────────────────────────────────────────────────────────────┘

Hardware Requirements

Complete System Bill of Materials

Component	Specification	Quantity	Purpose
Main Compute	Jetson Orin AGX	1	Primary AI processing
Edge Compute	Jetson Orin Nano	2	Distributed processing
GPU (Desktop)	RTX 4090 (24GB)	1	Training, simulation
Cameras	Intel Realsense D435i	2	RGB-D perception
LiDAR	Ouster OS1-16	1	3D mapping
IMU	BNO055 9-axis	1	Balance sensing
Motors	Dynamixel XM430-W350	20+	Joint actuation
Motor Drivers	Dynamixel U2D2	4	Motor communication
Battery	6S LiPo 10000mAh	2	Power system
Power Distribution	5V/12V regulators	4	Voltage regulation
Frame	Aluminum extrusion	1	Robot structure

System Integration

Complete Launch File

#!/usr/bin/env python3
"""
Complete Humanoid System Launch File
Integrates all subsystems for full robot operation
"""

from launch import LaunchDescription
from launch.actions import (
    DeclareLaunchArgument,
    ExecuteProcess,
    IncludeLaunchDescription,
    RegisterEventHandler,
    LogInfo
)
from launch.event_handlers import OnProcessStart
from launch.launch_description_sources import PythonLaunchDescriptionSource
from launch.substitutions import LaunchConfiguration, PathJoinSubstitution
from launch_ros.actions import Node
from launch_ros.substitutions import FindPackageShare

def generate_launch_description():
    # Configuration
    use_sim_time = LaunchConfiguration('use_sim_time', default='false')
    robot_model = LaunchConfiguration('robot_model', default='humanoid_v1')
    
    # ========== Perception Stack ==========
    perception_launch = IncludeLaunchDescription(
        PythonLaunchDescriptionSource([
            PathJoinSubstitution([
                FindPackageShare('humanoid_perception'),
                'launch',
                'perception_stack.launch.py'
            ])
        ]),
        launch_arguments={
            'use_sim_time': use_sim_time,
        }.items()
    )
    
    # ========== Navigation Stack ==========
    navigation_launch = IncludeLaunchDescription(
        PythonLaunchDescriptionSource([
            PathJoinSubstitution([
                FindPackageShare('nav2_bringup'),
                'launch',
                'navigation_launch.py'
            ])
        ]),
        launch_arguments={
            'use_sim_time': use_sim_time,
            'params_file': PathJoinSubstitution([
                FindPackageShare('humanoid_navigation'),
                'config',
                'nav2_params.yaml'
            ])
        }.items()
    )
    
    # ========== Control Stack ==========
    control_launch = IncludeLaunchDescription(
        PythonLaunchDescriptionSource([
            PathJoinSubstitution([
                FindPackageShare('humanoid_control'),
                'launch',
                'whole_body_control.launch.py'
            ])
        ]),
        launch_arguments={
            'use_sim_time': use_sim_time,
        }.items()
    )
    
    # ========== Conversational AI Stack ==========
    conversational_launch = IncludeLaunchDescription(
        PythonLaunchDescriptionSource([
            PathJoinSubstitution([
                FindPackageShare('humanoid_ai'),
                'launch',
                'conversational_ai.launch.py'
            ])
        ]),
        launch_arguments={
            'use_sim_time': use_sim_time,
        }.items()
    )
    
    # ========== Robot State Publisher ==========
    robot_state_publisher = Node(
        package='robot_state_publisher',
        executable='robot_state_publisher',
        name='robot_state_publisher',
        output='screen',
        parameters=[{
            'robot_description': open(
                PathJoinSubstitution([
                    FindPackageShare('humanoid_description'),
                    'urdf',
                    'humanoid_robot.urdf.xacro'
                ])
            ).read(),
            'use_sim_time': use_sim_time,
        }]
    )
    
    # ========== System Monitor ==========
    system_monitor = Node(
        package='humanoid_system',
        executable='system_monitor_node',
        name='system_monitor',
        output='screen',
        parameters=[{
            'use_sim_time': use_sim_time,
        }]
    )
    
    # ========== RViz Visualization ==========
    rviz_launch = IncludeLaunchDescription(
        PythonLaunchDescriptionSource([
            PathJoinSubstitution([
                FindPackageShare('rviz2'),
                'launch',
                'rviz2.launch.py'
            ])
        ]),
        launch_arguments={
            'rviz_config': PathJoinSubstitution([
                FindPackageShare('humanoid_viz'),
                'config',
                'humanoid_system.rviz'
            ])
        }.items()
    )
    
    return LaunchDescription([
        DeclareLaunchArgument('use_sim_time'),
        DeclareLaunchArgument('robot_model'),
        
        LogInfo(msg='Launching Complete Humanoid System...'),
        
        robot_state_publisher,
        
        # Start perception first
        perception_launch,
        
        # Then navigation
        RegisterEventHandler(
            OnProcessStart(
                target_action=perception_launch,
                on_start=[navigation_launch]
            )
        ),
        
        # Then control
        RegisterEventHandler(
            OnProcessStart(
                target_action=navigation_launch,
                on_start=[control_launch]
            )
        ),
        
        # Then conversational AI
        RegisterEventHandler(
            OnProcessStart(
                target_action=control_launch,
                on_start=[conversational_launch]
            )
        ),
        
        # System monitor
        system_monitor,
        
        # Visualization
        rviz_launch,
    ])

System Monitor Node

#!/usr/bin/env python3
"""
System Monitor for Complete Humanoid
Monitors system health, performance, and diagnostics
"""

import rclpy
from rclpy.node import Node
from diagnostic_msgs.msg import DiagnosticArray, DiagnosticStatus, KeyValue
from sensor_msgs.msg import BatteryState
import psutil
import torch

class SystemMonitor(Node):
    def __init__(self):
        super().__init__('system_monitor')
        
        # Publishers
        self.diagnostics_pub = self.create_publisher(
            DiagnosticArray, '/diagnostics', 10)
        self.battery_pub = self.create_publisher(
            BatteryState, '/battery_state', 10)
        
        # Timer
        self.timer = self.create_timer(1.0, self.monitor_callback)
        
        self.get_logger().info('System Monitor initialized')
    
    def monitor_callback(self):
        """Collect and publish system diagnostics"""
        # CPU usage
        cpu_percent = psutil.cpu_percent(interval=0.5)
        memory = psutil.virtual_memory()
        
        # GPU usage (if available)
        gpu_memory = 0
        gpu_utilization = 0
        if torch.cuda.is_available():
            gpu_memory = torch.cuda.memory_allocated() / 1024**2
            gpu_utilization = 0  # Would need pynvml for actual utilization
        
        # Create diagnostic array
        diag_array = DiagnosticArray()
        diag_array.header.stamp = self.get_clock().now().to_msg()
        
        # CPU status
        cpu_status = DiagnosticStatus()
        cpu_status.name = 'CPU'
        cpu_status.level = DiagnosticStatus.OK if cpu_percent < 80 else DiagnosticStatus.WARN
        cpu_status.message = f'CPU Usage: {cpu_percent}%'
        cpu_status.values = [
            KeyValue(key='usage_percent', value=str(cpu_percent)),
            KeyValue(key='memory_percent', value=str(memory.percent)),
            KeyValue(key='memory_available_mb', value=str(memory.available / 1024**2)),
        ]
        diag_array.status.append(cpu_status)
        
        # GPU status
        gpu_status = DiagnosticStatus()
        gpu_status.name = 'GPU'
        gpu_status.level = DiagnosticStatus.OK
        gpu_status.message = f'GPU Memory: {gpu_memory:.0f}MB'
        gpu_status.values = [
            KeyValue(key='memory_allocated_mb', value=f'{gpu_memory:.0f}'),
            KeyValue(key='cuda_available', value=str(torch.cuda.is_available())),
        ]
        diag_array.status.append(gpu_status)
        
        # ROS 2 node status
        ros_status = DiagnosticStatus()
        ros_status.name = 'ROS 2 System'
        ros_status.level = DiagnosticStatus.OK
        ros_status.message = 'All systems operational'
        ros_status.values = [
            KeyValue(key='node_count', value=str(len(self.get_node_names()))),
            KeyValue(key='uptime_sec', value=str(self.get_clock().now().nanoseconds / 1e9)),
        ]
        diag_array.status.append(ros_status)
        
        self.diagnostics_pub.publish(diag_array)
        
        # Log summary
        self.get_logger().info(
            f'System: CPU={cpu_percent}%, Memory={memory.percent}%, '
            f'GPU={gpu_memory:.0f}MB'
        )

def main(args=None):
    rclpy.init(args=args)
    node = SystemMonitor()
    rclpy.spin(node)
    node.destroy_node()
    rclpy.shutdown()

if __name__ == '__main__':
    main()

Voice-to-Action Pipeline (Whisper + LLM + ROS 2)

This is the core of the PDF's capstone requirement: a robot that hears a voice command, reasons about it, and executes physical actions.

Pipeline Overview

Microphone
    │
    ▼
[OpenAI Whisper]  ←── Speech-to-Text (ASR)
    │
    ▼
[LLM Planner]     ←── "Pick up the red mug" → structured action plan
    │
    ▼
[ROS 2 Action Server]  ←── Navigate → Detect Object → Grasp
    │
    ▼
[Robot Hardware / Isaac Sim]

Step 1: Whisper Speech Recognition Node

#!/usr/bin/env python3
"""
Whisper ASR Node — converts microphone audio to text commands
Requires: pip install openai-whisper sounddevice numpy
"""

import rclpy
from rclpy.node import Node
from std_msgs.msg import String
import whisper
import sounddevice as sd
import numpy as np

class WhisperASRNode(Node):
    def __init__(self):
        super().__init__('whisper_asr_node')
        self.model = whisper.load_model("base")  # or "small", "medium"
        self.command_pub = self.create_publisher(String, '/voice_command', 10)
        self.sample_rate = 16000
        self.duration = 5  # seconds to record per utterance
        self.get_logger().info('Whisper ASR ready — listening...')
        self.create_timer(6.0, self.listen_and_publish)

    def listen_and_publish(self):
        self.get_logger().info('Recording...')
        audio = sd.rec(int(self.duration * self.sample_rate),
                       samplerate=self.sample_rate, channels=1, dtype='float32')
        sd.wait()
        audio_np = audio.flatten()
        result = self.model.transcribe(audio_np, language='en')
        text = result['text'].strip()
        if text:
            self.get_logger().info(f'Heard: "{text}"')
            msg = String()
            msg.data = text
            self.command_pub.publish(msg)

def main(args=None):
    rclpy.init(args=args)
    rclpy.spin(WhisperASRNode())
    rclpy.shutdown()

Step 2: LLM Command Planner Node

#!/usr/bin/env python3
"""
LLM Planner Node — converts natural language to structured robot actions
"""

import rclpy
from rclpy.node import Node
from std_msgs.msg import String
import json
import os

# Uses Groq API (same as course RAG chatbot)
from groq import Groq

SYSTEM_PROMPT = """You are a robot action planner. Convert the user's voice command
into a JSON action plan. Use only these actions:
- navigate_to: {"action": "navigate_to", "target": "kitchen"}
- detect_object: {"action": "detect_object", "object": "red mug"}
- pick_object: {"action": "pick_object", "object": "red mug"}
- place_object: {"action": "place_object", "location": "table"}
- say: {"action": "say", "text": "I found the mug"}

Return a JSON array of actions only. No explanation."""

class LLMPlannerNode(Node):
    def __init__(self):
        super().__init__('llm_planner_node')
        self.client = Groq(api_key=os.getenv('GROK_API_KEY'))
        self.command_sub = self.create_subscription(
            String, '/voice_command', self.plan_callback, 10)
        self.action_pub = self.create_publisher(String, '/action_plan', 10)
        self.get_logger().info('LLM Planner ready')

    def plan_callback(self, msg: String):
        command = msg.data
        self.get_logger().info(f'Planning for: "{command}"')
        response = self.client.chat.completions.create(
            model="llama-3.3-70b-versatile",
            messages=[
                {"role": "system", "content": SYSTEM_PROMPT},
                {"role": "user", "content": command}
            ]
        )
        plan_json = response.choices[0].message.content
        self.get_logger().info(f'Plan: {plan_json}')
        out = String()
        out.data = plan_json
        self.action_pub.publish(out)

def main(args=None):
    rclpy.init(args=args)
    rclpy.spin(LLMPlannerNode())
    rclpy.shutdown()

Step 3: Action Executor Node

#!/usr/bin/env python3
"""
Action Executor — reads the LLM plan and dispatches ROS 2 commands
"""

import rclpy
from rclpy.node import Node
from std_msgs.msg import String
from geometry_msgs.msg import Twist
import json

class ActionExecutorNode(Node):
    def __init__(self):
        super().__init__('action_executor')
        self.plan_sub = self.create_subscription(
            String, '/action_plan', self.execute_callback, 10)
        self.cmd_vel_pub = self.create_publisher(Twist, '/cmd_vel', 10)
        self.speech_pub = self.create_publisher(String, '/tts_text', 10)

    def execute_callback(self, msg: String):
        try:
            actions = json.loads(msg.data)
        except json.JSONDecodeError:
            self.get_logger().error('Invalid action plan JSON')
            return

        for action in actions:
            name = action.get('action')
            if name == 'navigate_to':
                self.get_logger().info(f"Navigating to: {action['target']}")
                # In real system: send Nav2 goal
            elif name == 'detect_object':
                self.get_logger().info(f"Detecting: {action['object']}")
                # In real system: call YOLO/VLM service
            elif name == 'pick_object':
                self.get_logger().info(f"Picking: {action['object']}")
                # In real system: call manipulation action server
            elif name == 'say':
                out = String()
                out.data = action['text']
                self.speech_pub.publish(out)

def main(args=None):
    rclpy.init(args=args)
    rclpy.spin(ActionExecutorNode())
    rclpy.shutdown()

Capstone Project

Project Requirements

Your capstone project should demonstrate:

Perception: Object detection, depth estimation, or scene understanding
Reasoning: Language understanding, planning, or decision-making
Control: Locomotion, manipulation, or balance control
Integration: Multiple subsystems working together

The Autonomous Humanoid (Main Capstone)

This is the PDF-defined capstone: a simulated robot that completes a full autonomy loop.

Task: "Pick up the red mug from the kitchen counter"

Step	What Happens	Technology Used
1. Voice Input	User speaks the command	Microphone + Whisper ASR
2. Transcription	Audio → "Pick up the red mug"	OpenAI Whisper (base model)
3. Planning	Text → JSON action plan	LLM (Groq llama-3.3-70b)
4. Navigation	Robot moves toward kitchen	Nav2 + LiDAR SLAM
5. Object Detection	Camera finds the red mug	YOLO / VLM (GPT-4o vision)
6. Manipulation	Arm reaches, grasps mug	Inverse Kinematics + F/T control
7. Confirmation	Robot says "I picked up the mug"	TTS (pyttsx3 / gTTS)

Example Projects

Project	Description	Difficulty
Autonomous Humanoid	Full voice→plan→navigate→pick loop (PDF capstone)	Hard
Object Fetching	Robot finds and retrieves objects on command	Medium
Guided Tour	Robot navigates and explains environment	Medium
Interactive Demo	Robot responds to voice commands and gestures	Hard
Autonomous Patrol	Robot patrols area with obstacle avoidance	Hard

Project Template

#!/usr/bin/env python3
"""
Capstone Project Template
Base class for humanoid robot capstone projects
"""

import rclpy
from rclpy.node import Node
from std_msgs.msg import String
from geometry_msgs.msg import Twist
from sensor_msgs.msg import Image
from typing import Dict, List

class CapstoneProject(Node):
    def __init__(self, project_name: str):
        super().__init__(f'capstone_{project_name}')
        
        self.project_name = project_name
        
        # Subscribers
        self.command_sub = self.create_subscription(
            String, '/language_command',
            self.command_callback, 10)
        self.camera_sub = self.create_subscription(
            Image, '/camera/color/image_raw',
            self.camera_callback, 10)
        
        # Publishers
        self.cmd_vel_pub = self.create_publisher(
            Twist, '/cmd_vel', 10)
        self.feedback_pub = self.create_publisher(
            String, '/project_feedback', 10)
        
        # State machine
        self.state = 'IDLE'
        self.state_data: Dict = {}
        
        self.get_logger().info(f'Capstone Project "{project_name}" initialized')
    
    def command_callback(self, msg: String):
        """Process project command"""
        command = msg.data.lower()
        self.get_logger().info(f'Command: {command}')
        
        if 'start' in command:
            self.state = 'RUNNING'
            self.send_feedback(f'{self.project_name} started')
        elif 'stop' in command:
            self.state = 'IDLE'
            self.send_feedback(f'{self.project_name} stopped')
        else:
            self.process_command(command)
    
    def camera_callback(self, msg: Image):
        """Process camera data"""
        if self.state == 'RUNNING':
            self.process_visual_input(msg)
    
    def process_command(self, command: str):
        """Override: Process specific commands"""
        raise NotImplementedError
    
    def process_visual_input(self, image: Image):
        """Override: Process visual input"""
        raise NotImplementedError
    
    def send_feedback(self, message: str):
        """Send project feedback"""
        msg = String()
        msg.data = message
        self.feedback_pub.publish(msg)
        self.get_logger().info(f'Feedback: {message}')

# Example implementation
class ObjectFetchingProject(CapstoneProject):
    def __init__(self):
        super().__init__('object_fetching')
        self.target_object = None
        self.detected_objects: List[str] = []
    
    def process_command(self, command: str):
        if 'fetch' in command:
            # Extract object name
            self.target_object = command.split('fetch')[-1].strip()
            self.state = 'SEARCHING'
            self.send_feedback(f'Searching for {self.target_object}')
    
    def process_visual_input(self, image: Image):
        if self.state == 'SEARCHING':
            # Run object detection
            # If target found, transition to NAVIGATING
            pass

def main(args=None):
    rclpy.init(args=args)
    
    # Choose project
    project = ObjectFetchingProject()
    
    rclpy.spin(project)
    project.destroy_node()
    rclpy.shutdown()

if __name__ == '__main__':
    main()

Deployment

Edge Deployment Script

#!/bin/bash
# Deploy Humanoid System to Jetson Orin

JETSON_IP="192.168.1.100"
JETSON_USER="nvidia"

echo "Building packages..."
colcon build --packages-select humanoid_system humanoid_perception humanoid_control

echo "Syncing to Jetson..."
rsync -avz --exclude='build' --exclude='install' \
    ~/humanoid_ws/ ${JETSON_USER}@${JETSON_IP}:~/humanoid_ws/

echo "Installing dependencies on Jetson..."
ssh ${JETSON_USER}@${JETSON_IP} << 'EOF'
    cd ~/humanoid_ws
    source /opt/ros/humble/setup.bash
    rosdep install --from-paths src --ignore-src -r -y
    colcon build --symlink-install
EOF

echo "Deployment complete!"
echo "SSH into Jetson and run: ros2 launch humanoid_system complete_system.launch.py"

Resources

Documentation

Tools

Foxglove Studio - Data visualization
ros2doctor - System diagnostics
htop - Process monitoring

Exercises

System Integration: Launch all subsystems together
Performance Testing: Measure system latency and throughput
Capstone Project: Implement and demonstrate a complete project
Documentation: Create project documentation and demo video

Curriculum Completion

Congratulations on completing the 13-week Physical AI & Humanoid Robotics curriculum!

What You've Learned

Weeks 1-2: Foundation - Physical AI concepts and architecture
Weeks 3-5: ROS 2 - Middleware, nodes, topics, services, actions
Weeks 6-7: Gazebo - Robot simulation and modeling
Weeks 8-10: NVIDIA Isaac - GPU-accelerated simulation and perception
Weeks 11-13: Humanoid - Kinematics, control, conversational AI, integration

Next Steps

Continue with advanced topics (RL, advanced manipulation)
Contribute to open-source robotics projects
Build your own humanoid robot platform
Pursue research in Physical AI

Learning Objectives​

Overview​

Complete System Architecture​

Hardware Requirements​

Complete System Bill of Materials​

System Integration​

Complete Launch File​

System Monitor Node​

Voice-to-Action Pipeline (Whisper + LLM + ROS 2)​

Pipeline Overview​

Step 1: Whisper Speech Recognition Node​

Step 2: LLM Command Planner Node​

Step 3: Action Executor Node​

Capstone Project​

Project Requirements​

The Autonomous Humanoid (Main Capstone)​

Example Projects​

Project Template​

Deployment​

Edge Deployment Script​

Resources​

Documentation​

Tools​

Exercises​

Curriculum Completion​

What You've Learned​

Next Steps​