Computer Vision for Robots
OpenCV in ROS2 — image topics, CvBridge, color segmentation, and a real HSV-based ball tracker that publishes /cmd_vel.
What you'll build
A color-based ball tracker: a ROS2 node that subscribes to /camera/image_raw, finds a red ball using HSV color segmentation, and publishes a /cmd_vel that drives the robot to keep the ball centred. Your robot becomes a one-trick dog — but the plumbing you learn here (ROS image topics + OpenCV + a control loop) is exactly what fancier perception is built on.
ROS images ↔ OpenCV images: cv_bridge
OpenCV uses NumPy arrays (numpy.ndarray). ROS uses sensor_msgs/msg/Image. The bridge:
from cv_bridge import CvBridge
bridge = CvBridge()
# ROS → OpenCV
cv_image = bridge.imgmsg_to_cv2(msg, desired_encoding='bgr8')
# OpenCV → ROS
ros_image = bridge.cv2_to_imgmsg(cv_image, encoding='bgr8')
Install: sudo apt install ros-humble-cv-bridge ros-humble-vision-opencv. Use bgr8 for color (OpenCV's default), mono8 for grayscale, 32FC1 for depth.
HSV — why it beats RGB for color detection
RGB is sensitive to brightness. A red ball in shadow has RGB ~ (80, 0, 0). The same ball in bright light is (255, 30, 30). Same color, completely different RGB.
HSV (Hue, Saturation, Value) separates color (Hue) from intensity (Value). The hue of red stays ~0° (or ~180° — red wraps around) whether the ball is in shadow or bright light. That's what you want.
OpenCV's HSV is awkward: H is 0–179 (not 360), S and V are 0–255.
Red is split because hue wraps:
# Red has TWO HSV ranges in OpenCV:
LOWER_RED_1 = (0, 120, 70) # First range
UPPER_RED_1 = (10, 255, 255)
LOWER_RED_2 = (170, 120, 70) # Wrapped range
UPPER_RED_2 = (179, 255, 255)
The full ball-tracker node
import cv2
import numpy as np
import rclpy
from rclpy.node import Node
from sensor_msgs.msg import Image
from geometry_msgs.msg import Twist
from cv_bridge import CvBridge
LOWER_RED_1 = np.array([0, 120, 70])
UPPER_RED_1 = np.array([10, 255, 255])
LOWER_RED_2 = np.array([170, 120, 70])
UPPER_RED_2 = np.array([179, 255, 255])
MIN_AREA = 500 # ignore tiny noise blobs
LINEAR_SPEED = 0.2
class BallTracker(Node):
def __init__(self):
super().__init__('ball_tracker')
self.bridge = CvBridge()
self.create_subscription(Image, '/camera/image_raw', self.on_image, 10)
self.cmd_pub = self.create_publisher(Twist, '/cmd_vel', 10)
self.image_width = 640 # will be auto-set from first frame
self.get_logger().info('Ball tracker started')
def on_image(self, msg: Image):
try:
frame = self.bridge.imgmsg_to_cv2(msg, 'bgr8')
except Exception as e:
self.get_logger().error(f'cv_bridge: {e}')
return
h, w, _ = frame.shape
self.image_width = w
# Detect red
hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
mask1 = cv2.inRange(hsv, LOWER_RED_1, UPPER_RED_1)
mask2 = cv2.inRange(hsv, LOWER_RED_2, UPPER_RED_2)
mask = cv2.bitwise_or(mask1, mask2)
# Clean up — morphological operations remove noise
mask = cv2.morphologyEx(mask, cv2.MORPH_OPEN, np.ones((5, 5), np.uint8))
# Find largest contour above threshold
contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
target = None
if contours:
largest = max(contours, key=cv2.contourArea)
if cv2.contourArea(largest) > MIN_AREA:
M = cv2.moments(largest)
if M['m00'] > 0:
cx = int(M['m10'] / M['m00'])
target = cx
# Control logic
twist = Twist()
if target is None:
# Lost — rotate slowly to search
twist.angular.z = 0.4
else:
# P-controller: turn toward the ball
error = (target - w / 2) / (w / 2) # -1..1
twist.linear.x = LINEAR_SPEED
twist.angular.z = -1.5 * error
self.cmd_pub.publish(twist)
def main():
rclpy.init()
rclpy.spin(BallTracker())
rclpy.shutdown()
if __name__ == '__main__':
main()
Run it. Hold a red object in front of the camera. The robot turns to follow.
This is a proportional controller in image space — error is "pixels off-centre," output is angular velocity. The same pattern, with a real PID, drives lane-keeping in self-driving cars.
What CV doesn't solve
This tracker fails in five obvious ways. Spot them before you read on:
- Bad lighting — a red ball in low light might map to brown HSV.
- Other red things — chair, t-shirt, fire extinguisher.
- Distance — at 5m the ball is 20 pixels and
MIN_AREArejects it. - Occlusion — half the ball behind a leg.
- No depth — the robot doesn't know how far the ball is, only where in the image it appears.
This is why production CV moves to YOLO / neural networks + stereo depth / LiDAR. We cover that in Forge 03. But — every YOLO-on-Jetson stack still uses the same ROS2 image plumbing you learned today.
Try this in the visualizer: open /visualizer and watch the Sense → Think → Act loop. Your CV node is the Sense; your control logic is the Think;
/cmd_velis the Act.
Test Your Understanding
1. Your ball tracker works in a bright room but fails outdoors at noon. Walk through three specific things you'd change before reaching for a neural network.
2. A team-mate suggests using RGB thresholding instead of HSV "because it's simpler and avoids the hue-wrap problem." Construct one concrete scenario where their approach will fail badly and yours will keep working.
3. You want to extend this to track two balls of different colors and follow the closer one. Without depth, can you estimate which ball is closer from a single 2D image? What single OpenCV measurement gives you a reasonable proxy?
India Opportunity
- Computer Vision Engineer · CynLr, Bangalore — vision-based robotic picking, ₹16–28 LPA.
- Perception Engineer · Niqo Robotics, Bangalore — agriculture-spraying drones, ROS2 + OpenCV + ML, ₹12–22 LPA.
- Vision Engineer · GreyOrange, Gurgaon — barcode + pallet recognition in warehouse robots, ₹14–24 LPA.
- Computer Vision Intern · Skylark Drones, Bangalore — aerial imagery analysis, ₹40–60k/month.
Next Step
→ Continue to Wire 06 · Capstone — Autonomous Navigator — combine everything into one obstacle-avoiding robot.
Community discussion
0 questions & insightsLoading discussion…
Spotted something off? Report an error →