Starship is building a fleet of robots to deliver packages according to local demand. To achieve this successfully, robots must be safe, polite and fast. But how to get there with low computational resources and without expensive sensors like LIDAR? This is the engineering reality you have to deal with if you don’t live in a universe where customers are happily paying $ 100 for a delivery.
For starters, robots are beginning to sense the world with radar, numerous cameras, and ultrasound.
However, the challenge is that most of this knowledge is low-level and non-semantic. For example, a robot can sense that an object is ten meters away, but without knowing the category of objects it is difficult to make driving decisions.
Machine learning through neural networks is surprisingly useful for converting this unstructured low-level data into higher-level information.
Starship robots mostly cross the sidewalks and streets when they need to. This poses another set of challenges compared to cars that drive cars. Highway traffic is more structured and predictable. Cars move along lanes and do not change direction too much, but humans often stop suddenly, meandering, accompanied by a dog on a leash, and not expressing their intentions with signal lights.
To understand the surrounding environment in real time, a key component of the robot is an object detection module, a program that inserts images and returns a list of object boxes.
That’s all very well, but how do you write a program like that?
An image is a large three-dimensional matrix consisting of numerous numbers that indicate the intensity of the pixels. These values change significantly instead of taking the picture at night; when the color, scale, or position of the object changes, or when the object itself is cut or closed.
For some complex issues, teaching is more natural than programming.
In robot software, we have a set of trainable units, mostly neural networks, where the code is written by the model itself. The program is represented by a set of weights.
Initially, these numbers start at random, and the output of the program is also random. Engineers present examples of the model they would like to predict and ask them to improve the network the next time they see a similar entry. By changing the weights by repeating them, the optimization algorithm looks for programs that predict the boundary boxes more and more accurately.
However, the examples used to train the model need to be thought through in depth.
- Should the model be punished or rewarded when a car detects a reflection in the window?
- What will he do when he perceives a picture of a man from a poster?
- Should a car-filled car trailer be indicated as an entity or each car separately?
These are all examples of what happened when we built an object detection module in our robots.
When a machine is taught, big data is not enough. The data collected must be rich and varied. For example, using only uniformly sampled images and then warning them, many pedestrians and cars would be displayed, but the model would lack examples of motorcycles or skaters to reliably detect these categories.
The team must specifically undermine hard examples and rare cases, otherwise the model would not move forward. Starship operates in several countries and enriches the set of examples with different weather conditions. A lot of people were surprised when the Starship delivery robots were operating during the snowstorm ‘Emma’ In the United Kingdom, however, airports and schools remained closed.
At the same time, data release requires time and resources. Ideally, it is best to train and improve models with less data. That’s where architectural engineering comes in. We codify prior knowledge in the architecture and optimization processes to reduce the search space to programs that are likely in the real world.
In some computer vision applications, such as pixel segmentation, it is useful for the model to know if the robot is on a sidewalk or at a crossroads. To make a suggestion, we encode global image-level traces in neural network architecture; then the model decides whether or not to use it without having to learn from scratch.
After data and architectural engineering, the models may work well. However, deep learning models require a lot of computing power, and this is a big challenge for the team because we can’t take advantage of the most powerful graphics cards in low-cost battery-powered delivery robots.
Starship wants our shipments to be low cost, which means our hardware needs to be cheap. That’s why Starship is the same reason why LIDAR (a detection system that works on the principle of radar, but uses laser light) would make it much easier to understand the world, but we don’t want our customers to pay more. than necessary for delivery.
State-of-the-art object detection systems published in academic articles run at about 5 frames per second [MaskRCNN], and real-time object detection papers do not significantly indicate rates above 100 FPS [Light-Head R-CNN, tiny-YOLO, tiny-DSOD]. Moreover, these numbers are captured in a single image; however, we need a 360-degree understanding (equivalent to approximately 5 single image processing).
To give you an overview, Starship models run at more than 2000 FPS when measured on a consumer-grade GPU, and process a full 360-degree panoramic image in a forward pass. This is equivalent to 10,000 FPS when 5 single images with a batch size of 1 are processed.
Neural networks are better than humans in many visual problems, although they can still have defects. For example, a boundary box may be too wide, the confidence may be too low, or it may be an object hallucinated in a really empty place.
These mistakes are difficult to fix.
Neural networks are considered to be black boxes that are difficult to study and understand. However, in order to improve the model, engineers need to understand cases of failure and delve deeper into the specifics of what the model has learned.
The model is represented by a set of weights, and one can see what each specific neuron wants to detect. For example, the first layers of Starship’s network are activated in standard patterns, such as horizontal and vertical edges. The next layer block detects more complex textures, and the upper layers detect car parts and entire objects.
Technical debt takes on a different meaning with machine learning models. Engineers are constantly improving their architecture, optimization processes and data sets. As a result, the model becomes more accurate. However, changing the detection model to a better one does not necessarily guarantee success in the overall behavior of a robot.
There are dozens of components that use the output of the object detection model, each of which requires a different level of accuracy and memory, which are set based on the existing model. However, the new model may behave differently. For example, the output probability distribution may be set aside at higher values or may be wider. While the average performance is better, it can be worse for a specific group like big cars. To avoid these obstacles, the team calibrates the probabilities and checks the regressions in multiple data sets.
Monitoring of trainable software components poses a different set of challenges compared to standard software monitoring. There is little concern about inference time or memory usage, as these are mostly constant.
However, the change in the data set becomes the main concern: the distribution of data used to train the model is different from what the model is currently deploying.
For example, there may be electric scooters that suddenly circulate on the sidewalks. If the model does not take this class into account, the model will find it difficult to classify it properly. The information derived from the object detection module will agree with other sensory information, and as a result, human operators will be asked for help and, as a result, shipments will be slowed down.
Neural networks give Starship robots the ability to be safe at crossroads, avoiding obstacles like cars, and understanding all directions that can hit humans and other obstacles on sidewalks.
Starship robots achieve this by using inexpensive hardware that poses many engineering challenges but now makes robot deliveries a solid reality. Starship’s robots are making real deliveries seven days a week in many cities around the world, and it’s rewarding to see how our technology is constantly giving people more comfort in their lives.