camera, code, action

3 min readJun 7, 2022

--

backstory: I was a Matlab-on-Windows machine vision developer and I wanted to code Python and Bash on hardware, then came the cloud and TensorFlow.

At the end of this post, there is a link to download training data collected by blue1. There is also a link to download a model that is trained on this data and has run on blue1.

build a low-cost rover that carries a camera, runs deep learning vision models through python and bash, and is cloud-connected.

These were the requirements that led to blue1: a Sphero RVR that carries a Raspberry Pi connected to a camera. Download the electronics BOM + obj/stl/gcode for the 3d-printed brackets here.

Other options include BuildMecar, CupBots, Donkey Car, JetBot (including waveshare JetBot kit), and watney, among many others. I will talk about these designs in future posts.

blue1 runs a modified version of this asyncio loop that captures frames that it feeds into a 3-class TensorFlow classifier named blue_driver to receive one of the following labels: turn_left, turn_right, no_action, which are then communicated w/ the motor controllers through a serial line.

blue1 is operated through a standard keyboard. User presses w to start the motors and then steers the robot by pressing d or a (to turn 10 deg left/right). The frames captured by the camera are annotated by these keystrokes and a cloud worker runs a nightly retrain of blue_driver triggered by the availability of additional training data. The latest blue_driver is then released through an MLFlow-style model lifecycle-management backend. The same backend enablesblue1 to always pull the latest blue_driver.

blue_driver is a 3-class image classifier that takes in 128x128 colour images and produces turn_left/turn_right/no_action decisions. This model and the training data for it are available for download at the end of this post.

At runtime, the user can issue a prediction request on the current frame by pressing m. The results of this prediction will be implemented in realtime, i.e. blue1 will turn left or right or will continue its path based on the output of blue_driver on the latest frame. There is also a continuous prediction mode, n, which runs at ~3fps. Here is how blue1 looks like when it is driving autonomously:

Parts of the terraform of blue1 + the design of blue_driver are courtesy of Donkey Car.

As I did more research on the electronics parts and the software and cloud concepts and components relevant to building blue1, I started to see a bigger picture:

a minimal python+bash machine vision back+frontend that terraforms common Linux machines and enables edge data collection and model execution.

camera, code, action

Written by Arash Kamangir

No responses yet