Computer Vision Project

(UBC ENPH 353 Course Report)

Keywords: Robot Operating System (ROS), Keras, OpenCV

This an overview of the work done in the ENPH 353 course.

Github repositories for some context: [1] [2]
Plate reader python notebook (uploaded to Colab for readability) [3]

Overview

This is a brand new course at UBC! So firstly I would like to provide special thanks for Miti and Griffin at UBC for setting everything up. I have learned a lot in this course.

The goal of ENPH353 is to design a robot to navigate virtual environments and read license plates using machine learning. Fancy. Stay on the road, don’t hit the pedestrians, you know the drill.

For those who are familiar with MIT’s “Duckietown”, our task is quite similar to that. We even use the same framework to control our robots (ROS). The difference is that our track is not physical, and is instead modelled in a software called Gazebo, which integrates with ROS nicely.

Figure: our gazebo simulation environment. Thanks, Miti and Griffin!

This is the course we must navigate. We are restricted to only two methods of interfacing with the robot:

  1. The camera feed
  2. Twist commands (move forward, move backwards, turn left, turn right)

With these two I/Os, we must design a robot which will accurately report license plates and their location through a ROS message.

ROS Gazebo Colormask License Plate
Figure: Colormask

Methods

Yes, one could set up a full CNN plate reader which scans the whole image for license plate characters, such as this model. however in terms of training time, this model is far from ideal and considering our short time frame it a high risk strategy to rely on one single complex method to do everything for us. Furthermore, the nature of the plates all being the same size and color makes it extremely attractive to break this into two separate systems: one that collects the license plate and another that tells us what is on it.

Plate Detector

Plate Detection Failure ROS Gazebo
Figure: False Positives for the Plate Detector

We had originally planned for the plate detector to be another object detection neural net. Due to time constraints we moved to an OpenCV color mask instead. It’s really nothing impressive so I’d say you can just skip this section.

Once the color mask was chosen, we passed it through an opening in OpenCV (that is, erosion and then dilation) to remove the random white pixels. We then passed that through findContours, and s so that rectangular shapes are included with a minimum bounding height and width.


ROS Gazebo License Plate Capture
Figure: Plate Detector

The problem here was that, for some reason, openCV also counted the road features as white things with 4 edges (see figure 2). This was a simple fix, which involved filtering for the purple in the image, once again finding contours, and then scaling the filled bounding box of the purple so that the white licence plate is always covered. Using that bounding box as another bitwise and mask, we have the final product which is effective and fast:

Observe that it is not perfect. Because the license plate is not included in our color mask (only the “true” white), the bounding box has been simply stretched a little bit. This works for angles that are head on but it leads to imperfection when the plate is read at an angle.

Data Generation

Lab 3 of this course was building a CNN to read the characters of a virtual license plate. However, since images were perfect (skew, or color deformation), and we can generate a lot of them, it achieved 100% accuracy within the first epoch….

In the real competition, running through Gazebo, where there is skew and color deformation due to lighting and camera effects, the task is not as simple.

To generate data, we employed two methods:

  1. Create a generator python script, which would generate plates and artificially Skew them in front of collected backgrounds. This had the advantage of knowing the corner locations and text of any huge number of images, but does it map to the real virtual world?
  2. Collect real-camera data through the use of a bash script. Said script launched spawned the robot in a specific location, turned it ever so slightly, killed the Gazebo model, and then did the whole thing over again. This has the advantage that it is real world data, but is it comprehensive? Unfortunately it also neglected to kill the xterm keyboard controller so after running it overnight my computer looked like this:
  3. Of course, there was also the option of manually driving around, gathering and labelling data. Sounds like a bad idea, but once you realize that you could have over 1000 images of juicy REAL data in under 3 hours, it becomes pretty appealing.

We elected to use methods 1 and 3. At the end of this process, we had over 700 simulated license plates and over 1000 real license plates. Example Data is shown below:

Both types of data were they were unskewed and separated into individual characters (see the python notebook below) so they would be fed into the neural network. Each 40×80 character image looked this:

Figure: Real data after characters cropped in python notebook. Notice that some characters are sometimes quite close to the chosen crop boundaries.
Figure: Synthetic data after characters cropped in python notebook. Notice how well behaved the synthetic data is compared to the real data.

Keras Neural Network

This is the python notebook housing the neural network. Most of it is just piping the data to the correct form (unskewing simulated data and cropping, the result being what you saw above). But in the end we were left with this model and an accuracy of over 95% on real data:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_4 (Conv2D)            (None, 76, 36, 32)        832       
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 38, 18, 32)        0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 36, 16, 32)        9248      
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 18, 8, 32)         0         
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 16, 6, 32)         9248      
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 8, 3, 32)          0         
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 6, 1, 32)          9248      
_________________________________________________________________
flatten_1 (Flatten)          (None, 192)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 60)                11580     
_________________________________________________________________
dense_3 (Dense)              (None, 36)                2196      
=================================================================
Total params: 42,352
Trainable params: 42,352
Non-trainable params: 0
_________________________________________________________________

That’s all, Folks

There you are! It works. It is accurate enough.

These are some of the cool things I have found while doing research, hopefully they will aid in your next computer vision project:

Really cool license plate recognition (though out of the time scope of our project)

YoloV3 ( I really love Redman’s approach to academic writing here. What a fantastic costly signal to his work)

Creating your own object detector – Towards Data Science

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s