Tesla’s Dojo Supercomputer Breaks All Established Industry Standards — CleanTechnica Deep Dive, Part 4

August 22, 20213 years ago Chanan Bos 0 Comments

Sign up for daily news updates from CleanTechnica on email. Or follow us on Google News!

If you missed the initial parts of this series, you can first read for more background:

Tesla’s Dojo Supercomputer Breaks All Established Industry Standards — CleanTechnica Deep Dive, Part 1

Tesla’s Dojo Supercomputer Breaks All Established Industry Standards — CleanTechnica Deep Dive, Part 2

Tesla’s Dojo Supercomputer Breaks All Established Industry Standards — CleanTechnica Deep Dive, Part 3

Give it to me straight doc, is it good or bad?

The fact that Dojo is not the supercomputer with the most computational power is not a bad thing, as Tesla built this supercomputer for a very specific task, which is training neural networks based on lots and lots of 360 video. All of the code is written specifically to work ideally on this hardware. All other supercomputers and even regular computers in the world are built with flexibility in mind to be able to accommodate a large variety of tasks. On the one hand, it means that other supercomputers, even the most powerful Fugaku, will most likely be slower than Dojo for the tasks that Tesla has in mind. On the flip side, this might also be Dojo’s Achilles heel since any other kind of simulations that scientists might want to use a supercomputer for will not be easy and likely not run nearly as fast as on any of the other supercomputers.

As was said in the Q&A, Tesla built Dojo first and foremost for itself and its needs. Tesla won’t be finished with improving FSD until it’s 1000% times safer than a human being. For many years we will already be sleeping in a car that has no steering wheel while Tesla will still be working on the next 9 in the 99.9999999% safety figure. Then, now that they have announced the Tesla robot called Optimus Sub-Prime (since its prime time has not yet arrived), Tesla, and consequently Dojo, has a whole new realm of challenges to explore. Also, even if Dojo won’t be able to help scientists find dark matter or other mysteries of the cosmos, there are tons of other real-world AI applications like robotic kitchens, factory automation, space construction robots, and many others for which Dojo will be absolutely perfect.

Dojo could still become the most powerful supercomputer in the world

Dojo is made up of a mere 10 cabinets and is thus also the smallest supercomputer in the world when it comes to size. Fugaku on the other hand is made up of 256 cabinets. If Tesla was to add 54 cabinets to Dojo V1 for a total of 64 cabinets, Dojo would surpass Fugaku. Then, finally, if Dojo 2.0 is actually 10 times more powerful than Dojo is right now, then even with a mere 10 cabinets Dojo V2 would be the most powerful supercomputer in the world by a healthy margin.

How Tesla will improve Dojo 2.0

So, from the moment that I saw all of the groundbreaking innovations in Tesla’s architecture, knowing how Elon works with his 5 step design process, it became obvious how some of these ideas came to be and that they happened halfway through the design process.

Those steps (which we recently learned about thanks to YouTuber and SpaceX expert Tim Dodd, the “Everyday Astronaut,” in his 3-part Elon Musk interview) are:

Make your requirements less dumb
Try very hard to delete the part or process
3 simplify or optimize
Accelerate cycle time
Automate.

Here is a simplified version of what likely happened halfway through the design process:

Elon: “Okay, walk me through the fabrication steps again.”

Hardware team responds: “So, step 1, all components are added to the silicon wafer. Step 2, you cut all the SoCs out of the wafer.”

Elon interrupts: “What if we don’t?”

Hardware team responds: “What if we don’t what?”

Elon clarifies: “What if we don’t cut them out of the wafer, just leave them in there? Can’t we make them talk to each other right on the wafer?”

And the rest is history.

For the 10× improvement, the first thing that they will be able to do is rather than leave a bunch of SoCs on a wafer, they could create a System On a Wafer rather than 25 Systems On a Chip(s) on a wafer. Being modular is useful, but making the whole wafer a system could significantly increase performance. Since this, too, would be unprecedented, it is hard to predict how much this will increase performance, but my gut feeling says that this would be very powerful. Also, it is the next logical step that would fall under “1. Make your requirements less dumb.”

Then, the second optimization comes not from Tesla but from Samsung (or TSMC). As was said earlier, the fact that the D1 chip is made on a 7nm scale shows that it was either Samsung or TSMC that fabricated the chip/wafer. By 2022, both companies hope to have their 3nm process up and running, and this will improve performance by at least 2.3 times and lower the power consumption. In fact, the performance increase might be even greater since Tesla has a much better cooling solution, making the increase in heat produced by the die shrink less of a problem and not compromising clock speeds.

With the SoCs on a wafer, it sounded like Tesla did not anticipate that network switching chips would be so slow and that they were better off making their own. Given more time, they could improve that even more. Finally, right now Tesla is making use of PCI-e gen 4, but in a few years, PCI-e gen 5 will again double the speed at which Tesla can connect wafers to one another.

All in all, which improvements, including the ones I can’t think of right now, will lead to the 10× performance improvement is hard to say. Nonetheless, that is the figure that Tesla gave and we all know better than to bet against Elon.

Tesla’s Training Tile is the Octovalve of Silicon

Now that you are intimately familiar with Dojo, there is a really good parallel that I can draw for you. Some of you may have seen Sandy Munro on YouTube. He has taken apart the Tesla Model Y and the Ford Mustang Mach-E (among other vehicles). I hope you have watched those videos, but if not, please take a look at this one and this one.

The Mach-E is a fantastic car, I recently drove it (full review coming really soon), but the way they handled the heat exchange, Sandy Munro literally pretends to faint upon seeing it — whereas Tesla has something called the Octovalve, which doesn’t have more than 30 hoses connecting things. The Octovalve is a very tight package all-in-one in a way that has never been done before. So, rather than 18 meters worth of hoses, Tesla has 6 meters. Rather than 35 parts, Tesla has 10. Rather than hold 22.4 kg of fluid that is harder to warm/cool, Tesla has only 9 kg of fluid.

For Dojo, this training tile, it’s undoubtedly the Octovalve of silicon. Though, in my opinion, even more impressive. When I look at my beautiful desktop computer to the right of me and the Tesla training tile on the screen to the left of me, it really feels exactly like my powerful computer with all the tubes and wires has become that Mach-E. We should all faint. If anything, it also shows just how much the computer industry has been slacking off with standard thinking and with our comfortable backwards compatible ports and standards. This really is a Nokia vs. iPhone moment.

The only sad part is that Tesla’s SoC architecture is so ill-equipped for purposes besides training neural nets fed with video. Nonetheless, if standard processors, graphics cards, and SoC chips were designed for this same kind of elegance, compression, and modularity, we could make all computers in the world a lot more powerful. In fact, while the Q&A wasn’t very clear, it does sound like Tesla might also want to make a “hybrid stack” with different SoCs so that Dojo can process more kinds of tasks.

HW4 Chip Starting with Cybertruck

During the Q&A, Elon revealed that HW4 will come in about a year and will be launched together with the Cybertruck. At the same time, the Cybertruck will also have at least one improved camera or maybe even a whole new camera system altogether. Though, Elon did explain that they still haven’t maxxed out the cameras that they currently use and the new camera system won’t be necessary for the car to achieve full autonomy at a safety level 200–300% safer than a human driver.

In contradiction to what Elon said during Autonomy Day, HW4 will be 4 times more capable than HW3. Earlier in the Q&A, Tesla stated that they can’t make neural nets too complex or the latency on HW3 would be too slow. While this was not expressly said during AI day, the 4× power increase that comes with HW4 means more and larger/more complex neural nets would become viable that would have taken too long to come up with an answer on HW3. Interestingly, Tesla also said that bigger neural networks only work better if you have the data to feed it with. Hence, Dojo that can process so much video is exactly what Tesla needs to make more complex neural nets.

Offering Dojo and AI Neural Net Training Software to Others

I already published a whole article and a video about this right before AI Day, and all of it still holds true after AI Day, especially the analysis. That’s great! Depending on the interpretation of the Q&A, I might have even been right on my predictions as well — though, Tesla didn’t expressly confirm it.

Tesla has indeed significantly automated the AI labeling and training process to the point that machine learning experts can focus on the more difficult tasks whereas labelers can do more of the legwork needed to train Autopilot. Elon has now said on Twitter that Tesla will offer Dojo as a service. However, from the presentation and the Q&A, it became clear that this is not very useful unless you: A) are making a real-world AI that is trained with lots of video footage & simulations, and B) are making use of Tesla’s highly automated tools for labeling and training AI. Tesla did state that they will work on a PyTorch extension to make Dojo work well with the tools that ML scientists are used to, but the audience seemed skeptical about how well this would actually run on Dojo’s very specific hardware.

While this was not expressly confirmed, this most likely means that Tesla is going to offer its training and labeling tools along with access to Dojo, a full software development stack, and the image above does seem to support that hypothesis. Elon also confirmed once again that Tesla is willing to license FSD to other automakers, and hopefully Tesla AI Day has made them at least think twice about that.

In the End

Watching AI Day was absolutely shocking for me. It has shattered my perception of what is possible when it comes to computer technology. This is at least the third time that Tesla has done this to me, previous times including Battery Day and Autonomy Day. Tesla is a company like no other, and it really doesn’t let you forget it. Dojo, the Octovalve, the new pressure-overwrapped electric motor, the 4680 cell, the Terafactory, and the list goes on.

You know I am also so relieved that after many years I have finally gotten an answer to a question I had for Elon. That question was: Elon, you have the machine that builds the machine, so where is the robot that builds the robot? I have covered briefly what a robot future can mean in my previous article/video, but now that Tesla has announced Optimus Sub-Prime, I will also make a separate analysis of that at the earliest opportunity.

Dojo is mind-blowing, and I hope this analysis has helped you fully grasp the extent of the technological advancements Tesla has made.

Have a tip for CleanTechnica? Want to advertise? Want to suggest a guest for our CleanTech Talk podcast? Contact us here.

Latest CleanTechnica.TV Video

CleanTechnica uses affiliate links. See our policy here.