The Culmination of A Self Driving Car Journey (Part 2)

Electives for the SDCND

Michael Virgo
6 min readMar 21, 2018

This is the second part of my reflections from the final term of the Self-Driving Car Engineer Nanodegree program — you can see Part 1 on the Path Planning project here!

The second project of Term 3 of the SDCND program is actually more than one project, as it’s an elective! Currently, there are two options: Functional Safety or Advanced Deep Learning.

Not the smoothest detection, but it’s found both roads present fairly well.

The Functional Safety content is done in concert with Elektrobit, and is actually quite a bit different than the other projects in the Nanodegree, as this one is not programming based. Instead, the project focuses on documenting the functional safety of a lane assistance system in a car (not a fully autonomous vehicle, in this case). The lesson and project go through the related standard, ISO 26262, and its requirements. Therein, a company develops a safety plan, performs a hazard and risk assessment, develops functional safety requirements, technical safety requirements, as well as software requirements, and allocates these requirements to various parts of the system architecture.

Although more of my fellow students tended to opt for the Advanced Deep Learning elective over Functional Safety, the validation and verification of systems through Functional Safety is only continuing to increase in importance. With all the various sensors, other hardware and software components coming together in autonomous vehicles, functional safety process are absolutely crucial in making sure they operate safely enough to be out on the road.

The Advanced Deep Learning content is done along with Nvidia and focuses on semantic segmentation, which for the project is for detecting open road space. If you’ve seen my article from last year on deep learning for lane detection, this is a fairly similar application, just for the entire road instead of just a single lane. The content itself also goes through other more advanced topics, such as using 1x1 convolutions and skip layers, as well as a lot of techniques that can be used to vastly improve the speed of a deep learning algorithm without significant drops in inference performance.

I ended up choosing to do the Advanced Deep Learning project, so I’ll talk a little bit about how these more advanced techniques work.

The concept of 1x1 convolutions seems a little silly at first. Normally, a convolutional layer gets its power from filters that slide over the image to detect features — for instance, a 5x5 filter on an image input detects a feature over a 5x5 pixel space each move across the input, and the output of that 5x5 space becomes one part of the input to the next layer. At a basic level, this can detect lines or gradient shifts, on up to shapes and more advanced features.

What possibly can be gleaned from just a 1x1 filter? My own intuition would be that you’d simply return the same information from before. However, the 1x1 convolutions are allowing you to decrease the depth (i.e. # of filters) before performing a more complex computation. As such, in a case where you’d normally perform a 5x5 convolution on an input, if you instead use a 1x1 convolution followed by the 5x5 convolution, you end up actually performing less overall computations, with minimal impact from information loss.

The use of 1x1 convolutions to reduce dimensionality is used extensively in the Inception model architecture.

Example of an Inception module (Going Deep with Convolutions, C Szegedy et. al, 2014)

Skip layers can also play a big part in semantic segmentation networks. One of the strengths of convolutional neural networks in image classification is that they are translation invariant (especially when using of pooling layers), meaning an object can appear in any part of the image and still be correctly classified. However, this is a big issue with semantic segmentation tasks — the output needs to correctly classify pixels in the correct location of the image. Skip layers can help get around this issue, as they pass information from an earlier layer, past at least one intermediate layer, and onward to a future layer. This way, the neural network does not lose spatial information as information is passed forward, and the later layers are still able to ascertain the position of various objects, leading to smoother pixel segmentation. Skip layers also more generally help with retaining information for better training in very deep neural networks.

Skip layers are used to great success in the Resnet architecture.

Example skip layer (Deep Residual Learning for Image Recognition, K He et. al., 2015)

Perhaps the most important part of this section of the Nanodegree comes from a lesson on inference performance. This involves running various operations over your trained neural network in order to optimize it to run faster in actual deployment, helping to enable a trained model to run on something like a smartphone, where computational power is a premium. A lot of this is done by getting rid of portions of the model that will not need to be changed, or otherwise are only important during training.

One of these is freezing the graph — removing unnecessary nodes related to training (you don’t need dropout for inference!), as well as changing certain TensorFlow variables into constants, which allow for faster computation. Fusion is another optimizing operation, whereby previously separate operations like ReLU, batch norm and a convolution can be combined into one, reducing the number of tensors needed to perform that operation (partly memory and partly speed).

Another method is lowering precision, whereby the calculations are performed on numbers with fewer digits — while this would seem like a big negative for accuracy, it actually has a fairly small impact on the forward pass of the network. Lowering precision does have a much larger impact during backpropagation, but given that we are optimizing for inference and not training, this impact does not matter here (no backprop occurs during inference).

The last part of this is quantization, which takes the discrete distribution of the neural network’s weights, and converts them from floating point values to integers, wherein faster calculations can occur.

What an optimal semantic segmentation model would output

Finally, we’re to the actual project — Semantic Segmentation. Semantic segmentation involves classifying each and every pixel in an image as part of a given class. For the project, we were tasked only with classifying pixels as either road or not road based on KITTI’s dataset (very similar to my own previous lane finding semantic model, which only focused on the vehicle’s own lane and not the whole road); however, there can be many more classes than that. For instance, the often-used Cityscapes dataset has 30 different classes, which is closer to what many actual self-driving car companies use in their cameras for autonomous perception.

The project also focuses on using a few layers of a pre-trained VGG network, adding skip layers, and then training over the last few layers of the network that output the pixel-wise segmentation. For myself, this was one of the quicker projects of the Nanodegree, although helped out by having built a somewhat similar network before.

With that in the bag, I was finally ready for the FINAL project of the Self-Driving Car Engineer Nanodegree. The capstone project is the only team project of the program, and one where I finally got to see some of my own (and my team’s) code on a real self-driving car. You can find that article here!

--

--