(UBC ENPH 353 Course Report)
Keywords: Robot Operating System (ROS), Keras, OpenCV
This an overview of the work done in the ENPH 353 course.
This is a brand new course at UBC! So firstly I would like to provide special thanks for Miti and Griffin at UBC for setting everything up. I have learned a lot in this course.
The goal of ENPH353 is to design a robot to navigate virtual environments and read license plates using machine learning. Fancy. Stay on the road, don’t hit the pedestrians, you know the drill.
For those who are familiar with MIT’s “Duckietown”, our task is quite similar to that. We even use the same framework to control our robots (ROS). The difference is that our track is not physical, and is instead modelled in a software called Gazebo, which integrates with ROS nicely.
This is the course we must navigate. We are restricted to only two methods of interfacing with the robot:
- The camera feed
- Twist commands (move forward, move backwards, turn left, turn right)
With these two I/Os, we must design a robot which will accurately report license plates and their location through a ROS message.
Yes, one could set up a full CNN plate reader which scans the whole image for license plate characters, such as this model. however in terms of training time, this model is far from ideal and considering our short time frame it a high risk strategy to rely on one single complex method to do everything for us. Furthermore, the nature of the plates all being the same size and color makes it extremely attractive to break this into two separate systems: one that collects the license plate and another that tells us what is on it.
We had originally planned for the plate detector to be another object detection neural net. Due to time constraints we moved to an OpenCV color mask instead. It’s really nothing impressive so I’d say you can just skip this section.
Once the color mask was chosen, we passed it through an opening in OpenCV (that is, erosion and then dilation) to remove the random white pixels. We then passed that through findContours, and s so that rectangular shapes are included with a minimum bounding height and width.
The problem here was that, for some reason, openCV also counted the road features as white things with 4 edges (see figure 2). This was a simple fix, which involved filtering for the purple in the image, once again finding contours, and then scaling the filled bounding box of the purple so that the white licence plate is always covered. Using that bounding box as another bitwise and mask, we have the final product which is effective and fast:
Observe that it is not perfect. Because the license plate is not included in our color mask (only the “true” white), the bounding box has been simply stretched a little bit. This works for angles that are head on but it leads to imperfection when the plate is read at an angle.
Lab 3 of this course was building a CNN to read the characters of a virtual license plate. However, since images were perfect (skew, or color deformation), and we can generate a lot of them, it achieved 100% accuracy within the first epoch….
In the real competition, running through Gazebo, where there is skew and color deformation due to lighting and camera effects, the task is not as simple.
To generate data, we employed two methods:
- Create a generator python script, which would generate plates and artificially Skew them in front of collected backgrounds. This had the advantage of knowing the corner locations and text of any huge number of images, but does it map to the real virtual world?
- Collect real-camera data through the use of a bash script. Said script launched spawned the robot in a specific location, turned it ever so slightly, killed the Gazebo model, and then did the whole thing over again. This has the advantage that it is real world data, but is it comprehensive? Unfortunately it also neglected to kill the xterm keyboard controller so after running it overnight my computer looked like this:
- Of course, there was also the option of manually driving around, gathering and labelling data. Sounds like a bad idea, but once you realize that you could have over 1000 images of juicy REAL data in under 3 hours, it becomes pretty appealing.
We elected to use methods 1 and 3. At the end of this process, we had over 700 simulated license plates and over 1000 real license plates. Example Data is shown below:
Both types of data were they were unskewed and separated into individual characters (see the python notebook below) so they would be fed into the neural network. Each 40×80 character image looked this:
Keras Neural Network
This is the python notebook housing the neural network. Most of it is just piping the data to the correct form (unskewing simulated data and cropping, the result being what you saw above). But in the end we were left with this model and an accuracy of over 95% on real data:
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_4 (Conv2D) (None, 76, 36, 32) 832 _________________________________________________________________ max_pooling2d_3 (MaxPooling2 (None, 38, 18, 32) 0 _________________________________________________________________ conv2d_5 (Conv2D) (None, 36, 16, 32) 9248 _________________________________________________________________ max_pooling2d_4 (MaxPooling2 (None, 18, 8, 32) 0 _________________________________________________________________ conv2d_6 (Conv2D) (None, 16, 6, 32) 9248 _________________________________________________________________ max_pooling2d_5 (MaxPooling2 (None, 8, 3, 32) 0 _________________________________________________________________ conv2d_7 (Conv2D) (None, 6, 1, 32) 9248 _________________________________________________________________ flatten_1 (Flatten) (None, 192) 0 _________________________________________________________________ dense_2 (Dense) (None, 60) 11580 _________________________________________________________________ dense_3 (Dense) (None, 36) 2196 ================================================================= Total params: 42,352 Trainable params: 42,352 Non-trainable params: 0 _________________________________________________________________
That’s all, Folks
There you are! It works. It is accurate enough.
These are some of the cool things I have found while doing research, hopefully they will aid in your next computer vision project: