Complete Tutorial: Deep RL for Autonomous Obstacle Avoidance (Lidar + ROS 2 + Isaac Sim)

(0 reviews)

https://ithub.uno/blogs/entry/3-complete-tutorial-deep-rl-for-autonomous-obstacle-avoidance-lidar-ros-2-isaac-sim/

This comprehensive guide walks you through building a drone obstacle avoidance system from scratch, combining Reinforcement Learning theory, ROS 2 infrastructure, and Isaac Sim physics simulation. By the end, you'll have a trained "brain" that can be deployed to real hardware like NVIDIA Jetson.

1. Technology Stack

OS: Ubuntu 22.04 LTS (recommended for stability)
Communication: ROS 2 Humble (LTS)
Simulator: NVIDIA Isaac Sim 2023.1.1 or later (Omniverse)
RL Libraries: Stable Baselines3 (PPO), PyTorch 2.0+, Gymnasium
Sensor: 3D LiDAR (simulated Velodyne VLP-16 in Isaac Sim)
Python: 3.10+

2. Environment Setup

2.1 Install ROS 2 Humble

# Add ROS 2 repository
sudo apt update && sudo apt install software-properties-common
sudo add-apt-repository universe
sudo apt update && sudo apt install curl -y
sudo curl -sSL https://raw.githubusercontent.com/ros/rosdistro/master/ros.asc | sudo apt-key add -
sudo sh -c 'echo "deb [arch=$(dpkg --print-architecture)] http://packages.ros.org/ros2/ubuntu $(lsb_release -cs) main" > /etc/apt/sources.list.d/ros2-latest.list'

# Install ROS 2 Humble
sudo apt update
sudo apt install ros-humble-desktop ros-humble-ros2bag ros-humble-rosbag2-storage-default-plugins

# Source ROS 2
echo "source /opt/ros/humble/setup.bash" >> ~/.bashrc
source ~/.bashrc

2.2 Install Isaac Sim

Download and install NVIDIA Omniverse Launcher
From the launcher, install Isaac Sim 2023.1.1 or later
Enable the ROS 2 bridge extension:
- Open Isaac Sim
- Go to Window → Extensions
- Search for omni.isaac.ros2_bridge
- Enable it and allow dependencies to install

2.3 Install Python Dependencies

# Create a virtual environment (recommended)
python3 -m venv ~/rl_drone_env
source ~/rl_drone_env/bin/activate

# Install core dependencies
pip install --upgrade pip
pip install gymnasium==0.29.1
pip install stable-baselines3[extra]==2.2.1
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install tensorboard
pip install numpy scipy
pip install open3d  # For point cloud visualization/processing

# Install ROS 2 Python bindings (if not already available)
pip install rclpy sensor-msgs-py geometry-msgs-py

3. Isaac Sim Scene Setup

3.1 Create the Drone Model

Launch Isaac Sim and create a new stage
Add the quadcopter:
- Go to Isaac Utils → Robots → Quadcopter
- Or manually: Create → Physics → Rigid Body, then add mesh and configure

Configure physics properties:

Mass: 1.5 kg
Center of Mass: (0, 0, -0.05)
Linear Damping: 0.05
Angular Damping: 0.05

Add motors/thrusters: The Isaac quadcopter comes with 4 rotor controllers

3.2 Add LiDAR Sensor

Add Velodyne VLP-16:
- Select your drone root prim
- Go to Create → Isaac → Sensors → Lidar → Rotating
- Set position: (0.1, 0, 0.05) relative to drone body

Configure LiDAR parameters:

Horizontal Resolution: 1.0 degree (360 beams)Vertical FOV: 30 degrees (-15 to +15)Vertical Resolution: 2 degrees (16 channels)Range: 0.1 to 10.0 metersRotation Rate: 10 Hz

3.3 Setup Action Graph for ROS 2

Open Action Graph:
- Window → Visual Scripting → Action Graph

Create ROS 2 publishing node:

Add nodes:
- On Playback Tick (trigger)
- Isaac Read Lidar Point Cloud
- ROS2 Publish Point Cloud
- ROS2 Context (set domain_id = 0)

Connections:
- Tick → Read Lidar → Publish Point Cloud
- Set topic name: "/lidar/points"
- Frame ID: "lidar_frame"

Create velocity command subscriber:

Add nodes:
- ROS2 Subscribe Twist
- Isaac Articulation Controller

Set topic: "/cmd_vel"
Connect Twist message to quadcopter controller

3.4 Create Training Environment

Add obstacles: Create a room with walls and random obstacles

- Create → Shapes → Cube (for walls)
- Scale: Make walls 0.1m thick, 3m high
- Add physics collision
- Add 5-10 random boxes/cylinders as obstacles

Add randomization script (optional but recommended):
- Randomize obstacle positions on reset
- Randomize lighting conditions
- Add procedural textures
Save the scene: File → Save As → drone_training_env.usd

4. Point Cloud Processing Implementation

Here's the complete implementation for converting 3D LiDAR data to a 360-dimensional observation:

Code: point_cloud_processor.py

import numpy as np
from sensor_msgs.msg import PointCloud2
import struct

class PointCloudProcessor:
    def __init__(self, num_beams=360, max_range=10.0, height_threshold=0.5):
        """
        Args:
            num_beams: Number of angular bins (360 for 1-degree resolution)
            max_range: Maximum detection distance (normalized to 1.0)
            height_threshold: Accept points within ±height_threshold meters
        """
        self.num_beams = num_beams
        self.max_range = max_range
        self.height_threshold = height_threshold
        
    def process_point_cloud(self, msg: PointCloud2) -> np.ndarray:
        """
        Convert 3D PointCloud2 to 360-dimensional array (bird's-eye view)
        
        Process:
        1. Parse binary point cloud data
        2. Filter points by height (take horizontal slice)
        3. Convert to polar coordinates
        4. Bin by angle (360 bins)
        5. Take minimum distance in each bin
        6. Normalize to [0, 1]
        """
        # Initialize output array (1.0 = no obstacle detected)
        ranges = np.ones(self.num_beams, dtype=np.float32)
        
        # Parse PointCloud2 binary data
        points = self._parse_pointcloud2(msg)
        
        if points is None or len(points) == 0:
            return ranges
        
        # Extract x, y, z
        x = points[:, 0]
        y = points[:, 1]
        z = points[:, 2]
        
        # Filter by height: only keep points near drone's horizontal plane
        height_mask = np.abs(z) < self.height_threshold
        x = x[height_mask]
        y = y[height_mask]
        
        if len(x) == 0:
            return ranges
        
        # Convert to polar coordinates
        distances = np.sqrt(x**2 + y**2)
        angles = np.arctan2(y, x)  # Range: [-pi, pi]
        
        # Convert angles to degrees and shift to [0, 360)
        angles_deg = np.degrees(angles) % 360
        
        # Bin points by angle
        angle_bins = (angles_deg * self.num_beams / 360.0).astype(np.int32)
        angle_bins = np.clip(angle_bins, 0, self.num_beams - 1)
        
        # For each bin, keep the minimum distance
        for i in range(len(distances)):
            bin_idx = angle_bins[i]
            if distances[i] < ranges[bin_idx] * self.max_range:
                ranges[bin_idx] = distances[i] / self.max_range
        
        # Clip to [0, 1] range
        ranges = np.clip(ranges, 0.0, 1.0)
        
        return ranges
    
    def _parse_pointcloud2(self, msg: PointCloud2) -> np.ndarray:
        """
        Parse binary PointCloud2 data to numpy array
        
        PointCloud2 format:
        - data: byte array with packed XYZ values
        - point_step: bytes per point (usually 16: x,y,z,intensity as floats)
        - row_step: bytes per row
        """
        # Determine format
        point_step = msg.point_step
        num_points = len(msg.data) // point_step
        
        if num_points == 0:
            return None
        
        # Parse binary data (assumes XYZ as first 3 floats)
        points = []
        for i in range(num_points):
            offset = i * point_step
            # Extract x, y, z (first 12 bytes = 3 floats)
            x, y, z = struct.unpack('fff', msg.data[offset:offset+12])
            points.append([x, y, z])
        
        return np.array(points, dtype=np.float32)
    
    def visualize_scan(self, scan: np.ndarray):
        """
        Optional: Visualize the 360-degree scan as a polar plot
        Useful for debugging
        """
        import matplotlib.pyplot as plt
        
        angles = np.linspace(0, 2*np.pi, self.num_beams)
        distances = scan * self.max_range
        
        fig, ax = plt.subplots(subplot_kw=dict(projection='polar'))
        ax.plot(angles, distances)
        ax.set_ylim(0, self.max_range)
        ax.set_title("LiDAR Scan (Bird's Eye View)")
        plt.show()

5. Complete Gymnasium Environment

Code: drone_lidar_env.py

import gymnasium as gym
import rclpy
from rclpy.node import Node
from rclpy.qos import QoSProfile, ReliabilityPolicy, HistoryPolicy
from sensor_msgs.msg import PointCloud2
from geometry_msgs.msg import Twist
from std_srvs.srv import Empty
import numpy as np
import time
from point_cloud_processor import PointCloudProcessor

class IsaacDroneEnv(gym.Env):
    """
    Gymnasium environment for drone obstacle avoidance using Isaac Sim
    """
    metadata = {'render_modes': ['human']}
    
    def __init__(self, max_steps=1000, render_mode=None):
        super().__init__()
        
        # Initialize ROS 2
        if not rclpy.ok():
            rclpy.init()
        
        # Create node with proper QoS settings
        self.node = Node('rl_drone_env')
        
        # QoS profile for reliable communication
        qos_profile = QoSProfile(
            reliability=ReliabilityPolicy.RELIABLE,
            history=HistoryPolicy.KEEP_LAST,
            depth=10
        )
        
        # Observation space: 360 lidar beams (normalized distances)
        self.observation_space = gym.spaces.Box(
            low=0.0, 
            high=1.0, 
            shape=(360,), 
            dtype=np.float32
        )
        
        # Action space: [linear_x, angular_z] in normalized range
        self.action_space = gym.spaces.Box(
            low=np.array([-1.0, -1.0]), 
            high=np.array([1.0, 1.0]), 
            dtype=np.float32
        )
        
        # Publishers and subscribers
        self.vel_pub = self.node.create_publisher(
            Twist, '/cmd_vel', qos_profile
        )
        
        self.lidar_sub = self.node.create_subscription(
            PointCloud2, 
            '/lidar/points', 
            self._lidar_callback, 
            qos_profile
        )
        
        # Point cloud processor
        self.pc_processor = PointCloudProcessor(
            num_beams=360,
            max_range=10.0,
            height_threshold=0.5
        )
        
        # State variables
        self.latest_scan = np.ones(360, dtype=np.float32)
        self.scan_received = False
        self.collision = False
        self.steps_count = 0
        self.max_steps = max_steps
        
        # Action scaling (tune based on drone capabilities)
        self.max_linear_vel = 2.0  # m/s
        self.max_angular_vel = 1.5  # rad/s
        
        # Collision threshold
        self.collision_distance = 0.3  # meters (normalized: 0.03 for 10m max)
        
        # Rendering
        self.render_mode = render_mode
        
    def _lidar_callback(self, msg: PointCloud2):
        """Process incoming LiDAR point cloud"""
        # Convert 3D point cloud to 2D scan
        self.latest_scan = self.pc_processor.process_point_cloud(msg)
        self.scan_received = True
        
        # Check for collision (any beam < threshold)
        min_distance = np.min(self.latest_scan)
        if min_distance < (self.collision_distance / self.pc_processor.max_range):
            self.collision = True
    
    def step(self, action):
        """Execute one step in the environment"""
        self.steps_count += 1
        
        # Scale action to real velocity commands
        cmd_vel = Twist()
        cmd_vel.linear.x = float(action[0] * self.max_linear_vel)
        cmd_vel.angular.z = float(action[1] * self.max_angular_vel)
        
        # Publish command
        self.vel_pub.publish(cmd_vel)
        
        # Wait for sensor update (spin ROS)
        self.scan_received = False
        timeout = time.time() + 0.1  # 100ms timeout
        
        while not self.scan_received and time.time() < timeout:
            rclpy.spin_once(self.node, timeout_sec=0.01)
        
        # Calculate reward
        reward = self._calculate_reward(action)
        
        # Check termination conditions
        terminated = self.collision
        truncated = self.steps_count >= self.max_steps
        
        # Additional info for debugging
        info = {
            'collision': self.collision,
            'min_distance': float(np.min(self.latest_scan)),
            'steps': self.steps_count
        }
        
        return self.latest_scan.copy(), reward, terminated, truncated, info
    
    def _calculate_reward(self, action):
        """
        Reward function design:
        - Survival bonus: +1.0 per step
        - Collision penalty: -200
        - Progress reward: bonus for moving forward
        - Smoothness penalty: discourage erratic movements
        - Proximity penalty: penalize getting too close
        """
        reward = 0.0
        
        if self.collision:
            return -200.0
        
        # Survival bonus
        reward += 1.0
        
        # Progress reward (encourage forward movement)
        forward_speed = action[0]
        if forward_speed > 0:
            reward += forward_speed * 0.5
        
        # Proximity penalty (discourage getting too close to obstacles)
        min_dist = np.min(self.latest_scan)
        if min_dist < 0.1:  # Within 1 meter (0.1 normalized)
            reward -= (0.1 - min_dist) * 10.0
        
        # Smoothness penalty (discourage sharp turns at high speed)
        if abs(action[0]) > 0.5 and abs(action[1]) > 0.5:
            reward -= 0.2
        
        return reward
    
    def reset(self, seed=None, options=None):
        """Reset the environment to initial state"""
        super().reset(seed=seed)
        
        # Stop the drone
        stop_cmd = Twist()
        self.vel_pub.publish(stop_cmd)
        
        # Reset state variables
        self.collision = False
        self.steps_count = 0
        self.latest_scan = np.ones(360, dtype=np.float32)
        
        # Wait for environment to stabilize
        time.sleep(0.5)
        
        # Spin to get fresh sensor data
        for _ in range(10):
            rclpy.spin_once(self.node, timeout_sec=0.01)
        
        return self.latest_scan.copy(), {}
    
    def close(self):
        """Cleanup"""
        # Stop the drone
        stop_cmd = Twist()
        self.vel_pub.publish(stop_cmd)
        
        # Shutdown ROS
        self.node.destroy_node()
        if rclpy.ok():
            rclpy.shutdown()

6. Training Script with Frame Stacking

Code: train.py

import numpy as np
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv, VecFrameStack
from stable_baselines3.common.callbacks import CheckpointCallback, EvalCallback
from stable_baselines3.common.monitor import Monitor
from drone_lidar_env import IsaacDroneEnv
import os

def make_env():
    """Create and wrap the environment"""
    env = IsaacDroneEnv(max_steps=1000)
    env = Monitor(env)  # For logging
    return env

# Create directories
os.makedirs("./models", exist_ok=True)
os.makedirs("./logs", exist_ok=True)

# Create vectorized environment
env = DummyVecEnv([make_env])

# Add frame stacking for temporal information
# Agent sees last 4 scans (360x4 = 1440 inputs)
env = VecFrameStack(env, n_stack=4)

# Checkpoint callback: save model every 50k steps
checkpoint_callback = CheckpointCallback(
    save_freq=50000,
    save_path="./models/",
    name_prefix="drone_rl_model"
)

# Configure PPO with optimized hyperparameters
model = PPO(
    policy="MlpPolicy",
    env=env,
    learning_rate=3e-4,
    n_steps=2048,
    batch_size=64,
    n_epochs=10,
    gamma=0.99,
    gae_lambda=0.95,
    clip_range=0.2,
    ent_coef=0.01,  # Encourage exploration
    vf_coef=0.5,
    max_grad_norm=0.5,
    verbose=1,
    tensorboard_log="./logs/",
    device="cuda"  # Use GPU
)

print("Starting training...")
print("Monitor with: tensorboard --logdir ./logs/")

# Train the model
model.learn(
    total_timesteps=1_000_000,
    callback=checkpoint_callback,
    progress_bar=True
)

# Save final model
model.save("./models/drone_avoidance_final")
print("Training complete! Model saved to ./models/drone_avoidance_final.zip")

7. Running the Complete Pipeline

7.1 Launch Isaac Sim

Open Isaac Sim
Load your saved scene: drone_training_env.usd
Press Play to start the simulation

Verify ROS 2 topics are publishing:

ros2 topic list# Should see: /cmd_vel, /lidar/points

7.2 Start Training

# Activate virtual environment
source ~/rl_drone_env/bin/activate

# Start training
python train.py

7.3 Monitor Progress

In a new terminal:

tensorboard --logdir ./logs/

Open browser to http://localhost:6006

What to look for:

rollout/ep_rew_mean: Should increase over time (target: > 500)
train/entropy_loss: Should gradually decrease
Episode length: Should increase as drone survives longer

7.4 Expected Training Timeline

First 100k steps: Random exploration, many collisions
100k-300k steps: Begins avoiding obvious obstacles
300k-500k steps: Smooth navigation in simple environments
500k+ steps: Handles complex scenarios

8. Deployment to Real Hardware (Sim-to-Real)

8.1 Export Model to ONNX

# export_model.py
import torch
from stable_baselines3 import PPO
import numpy as np

# Load trained model
model = PPO.load("./models/drone_avoidance_final")

# Extract policy network
policy = model.policy

# Create dummy input (4 stacked frames of 360 beams)
dummy_input = torch.FloatTensor(np.zeros((1, 1440)))

# Export to ONNX
torch.onnx.export(
    policy,
    dummy_input,
    "drone_brain.onnx",
    export_params=True,
    opset_version=11,
    input_names=['observation'],
    output_names=['action'],
    dynamic_axes={'observation': {0: 'batch_size'}}
)

print("Model exported to drone_brain.onnx")

8.2 Inference Node for NVIDIA Jetson

Code: inference_node.py

import rclpy
from rclpy.node import Node
from sensor_msgs.msg import PointCloud2
from geometry_msgs.msg import Twist
import onnxruntime as ort
import numpy as np
from collections import deque
from point_cloud_processor import PointCloudProcessor

class DroneInferenceNode(Node):
    def __init__(self):
        super().__init__('drone_inference_node')
        
        # Load ONNX model
        self.session = ort.InferenceSession(
            "drone_brain.onnx",
            providers=['TensorrtExecutionProvider', 'CUDAExecutionProvider']
        )
        
        # Frame buffer for stacking
        self.frame_buffer = deque(maxlen=4)
        for _ in range(4):
            self.frame_buffer.append(np.ones(360, dtype=np.float32))
        
        # Point cloud processor
        self.pc_processor = PointCloudProcessor()
        
        # ROS communication
        self.vel_pub = self.create_publisher(Twist, '/cmd_vel', 10)
        self.lidar_sub = self.create_subscription(
            PointCloud2,
            '/lidar/points',
            self.lidar_callback,
            10
        )
        
        # Safety parameters
        self.emergency_stop_distance = 0.2  # meters
        self.max_linear_vel = 1.5  # m/s (reduced for real flight)
        self.max_angular_vel = 1.0  # rad/s
        
    def lidar_callback(self, msg):
        # Process point cloud
        scan = self.pc_processor.process_point_cloud(msg)
        
        # Emergency stop check
        if np.min(scan) < (self.emergency_stop_distance / 10.0):
            self.get_logger().warn("EMERGENCY STOP: Obstacle too close!")
            self.publish_stop()
            return
        
        # Update frame buffer
        self.frame_buffer.append(scan)
        
        # Stack frames
        stacked_obs = np.concatenate(list(self.frame_buffer))
        stacked_obs = stacked_obs.reshape(1, -1).astype(np.float32)
        
        # Run inference
        action = self.session.run(None, {'observation': stacked_obs})[0][0]
        
        # Publish command
        cmd = Twist()
        cmd.linear.x = float(np.clip(action[0], -1, 1) * self.max_linear_vel)
        cmd.angular.z = float(np.clip(action[1], -1, 1) * self.max_angular_vel)
        self.vel_pub.publish(cmd)
        
    def publish_stop(self):
        stop = Twist()
        self.vel_pub.publish(stop)

def main():
    rclpy.init()
    node = DroneInferenceNode()
    rclpy.spin(node)
    node.destroy_node()
    rclpy.shutdown()

if __name__ == '__main__':
    main()

8.3 Deploy to Jetson

# On your development machine
scp drone_brain.onnx jetson@drone-jetson:~/
scp inference_node.py jetson@drone-jetson:~/
scp point_cloud_processor.py jetson@drone-jetson:~/

# SSH into Jetson
ssh jetson@drone-jetson

# Install dependencies
pip install onnxruntime-gpu numpy

# Run inference node
python3 inference_node.py

9. Safety Considerations

Critical safety features implemented:

Emergency Stop: Hardware override always available
Minimum Distance Check: Classic algorithm runs in parallel
Velocity Limits: Capped at safe speeds for real hardware
Timeout Handling: Stops drone if no sensor data received
Gradual Deployment: Test in safe environment first

Pre-flight checklist:

[ ] Test in simulation until 95%+ success rate
[ ] Verify all sensors publishing at expected rates
[ ] Test emergency stop button
[ ] Start with reduced velocity limits (50% of max)
[ ] Use safety nets or fly in padded area
[ ] Keep manual override pilot ready

10. Troubleshooting

LiDAR data not received:

# Check topics
ros2 topic list
ros2 topic hz /lidar/points

# Verify Isaac Sim ROS bridge is running
# Check Action Graph is enabled

Training not converging:

Increase training steps to 2M+
Reduce action space (limit max velocities)
Simplify environment (fewer obstacles initially)
Check reward function (use tensorboard to monitor)

Sim-to-real gap:

Add sensor noise during training
Randomize lighting in Isaac Sim
Calibrate LiDAR parameters to match real sensor
Use domain randomization for textures

11. Next Steps

Enhancements to try:

Add GPS waypoint navigation
Implement dynamic obstacle avoidance
Train with curriculum learning (start easy, increase difficulty)
Add camera vision for semantic understanding
Multi-agent training (multiple drones)

Edited January 25Jan 25 by Elena Eroshina

Sign In