I wanted a little experience
This AR4-based “HandyBot” understands voice commands to autonomously pick/place tabletop objects using cutting-edge AI vision (Grounding DINO, Segment Anything) and OpenAI Whisper speech recognition. The $2300 system combines an AR4 arm, RealSense D435 camera, and custom ROS 2 pipeline for real-world object interaction.
🔗 Build it yourself: GitHub Repo
💬 Discuss the ROS driver: Community Thread
Voice Command: User speaks a prompt (e.g., “Put the marker in the container”).
AI Processing: OpenAI Whisper transcribes speech, Grounding DINO detects objects, and Segment Anything creates masks.
Action Execution: AR4 ROS driver calculates grasps and moves the arm to complete the task.