> The hardware is built around a stackable 10×10cm compute module with two ARM Cortex-A55 SBCs — one for ROS 2 navigation/EKF localisation, one dedicated to vision/YOLO inference — connected via a single ethernet cable.
I will preface this by saying that I have nothing against ARM per se, that my employer/team supported a good chunk of the work for making ROS 2 actually work on arm64, and that there is some good hardware out there.
I really don't understand why startups and research projects keep using weird ARM SBCs for their robots. The best of these SBCs is still vastly shittier in terms of software support and stability than any random Chinese Intel ADL-N box. The only reasons to use (weird) ARM SBCs in robots are that either (1) you are using a Jetson for Jetson things (i.e. Nvidia libraries), or (2) you have a product which requires serious cost optimization to be produced at a large scale. Otherwise you are just committing yourselves and your users/customers to a future of terrible-to-nonexistent support and adding significantly to the amount of work you need to bring up the new system and port existing tools to it.
> The only reasons to use ARM SBCs in robots are...
Obviously, anyone can have there own opinion on this.
I work in robotics, we are quite happy with our A53 and M4. Though, we use a SOM, not a SBC, if you feel like splitting hairs.
You probably aren't using some weird SOM, though. There is a bit of an unstated exception of "unless said SBC/SOM has specific hardware that is necessary/particularly valuable for your product/project". For example, if you need GMSL you are probably not going to be picking Intel, even though ADL-N and the bigger processors support MIPI, simply because no one else does and the documentation/support for it is basically nonexistent. Designs with closely-coupled A/M/R cores, or CPU/MCU/FPGA hybrids like Zynq would be others.
But generally projects which are choosing some random SBC aren't using any of these features, and are just suffering the pain/imposing it on their users for no good reason.
again, just an oppinion, but it feels really weird to hear you find "exception after exception", when the net result that you've ruled out more real world robotics projects on ARM than likely exist on x86 that you're suggesting should be the "norm".
you've ruled out the entire NXP ecosystem, the entire Nvidia Jetson ecosystem, the entire AMD/FPGA/Zynq ecosystem, even perfectly good options like beagle-board .... who else?
incidentally, you've also ruled out this project - as they are using an M7 microcontroller to meet their hard-real-time timing constraints...
The other poster had said nothing about microcontrollers, e.g. about the various MCU models based on Cortex-M cores.
Some things are best done with a microcontroller, and those are not suitable for being done with a general-purpose CPU either based on Intel/AMD or on Cortex-A cores. Actually there are many projects that mistakenly use something like a Raspberry Pi instead of a better and cheaper implementation with a microcontroller, e.g. one based on Cortex-M7 or its successor, Cortex-M85.
The other poster said that where you do not want a microcontroller, but you want to run a standard operating system, e.g. Linux, then the best choice is much more frequently a SBC with an Intel Alder Lake N or Twin Lake CPU, as these not only have a better performance per dollar than the ARM-based SBCs, but they also avoid any software problems and future maintainability problems.
Unfortunately, during the last few months the price of Intel-based SBCs has been affected by the fact that most of them do not have soldered memory but they use one SODIMM memory module. While you can buy an Intel Alder Lake N based SBC for $100, buying today a SODIMM for it may cost as much or more, depending on the amount of memory with which you are content.
The ARM SBCs that come with soldered LPDDR memory have initially been less affected by the price hikes, though now even for them the prices are rising.
I think you're missing my point entirely. If your project needs specific hardware, you have to use that specific hardware (the obvious examples of which would be Jetsons or Zynq/Zynq-like or something ASIL-D or something that tightly couples "A"/M/R cores together, or you are stuck using a SoC from Qualcomm for cell connectivity). There are a lot of projects that do fall into that category.
There are also a (much smaller) number of projects that will legitimately see the kind of scale of production that justifies aggressive cost optimization for the compute platform, either in terms of designing their own around a SoC or picking some SBC/SoM that they can get a good deal on, where the significant additional up-front engineering cost is outweighed by the production savings (and where the desire/need to keep a fixed platform means the often limited platform support from the vendor is less restrictive).
But a large number of robotics projects (basically everything in the research sphere) - this one very much included - just need "some computer" for general-purpose use. They are already separating realtime control onto a separate microcontroller board. For these projects, it is almost always committing a "premature pessimization" of picking some weird SBC. You are signing up for worse CPU and GPU performance, stability, and development future for very little reward.
There are a variety of x86 products with Coreboot support, if what you are looking for is firmware openness. If what you are looking for is PCB design openness, the options are much fewer, but at that point you are probably optimizing for an overly niche objective.
> Part of the point of this for me is to see what's possible with open hardware (down to chip level at least)
I appreciate the idea, but this is essentially saying "this project will prioritize a specific choice of one (core) piece of hardware to the detriment of everything else, users included". Approximately none of your potential users are going to benefit from the "openness" of the SBC versus that of a more broadly-supported platform (I say "openness" because the reality of SBCs is that actually finding a usefully performant one that is completely blob-free is almost impossible). Open hardware means very little if it isn't running an upstream kernel and userland.
The South-Korean Hardkernel ODROID H4 models are open hardware. There is no need to send one to you, as you can order one yourself from their on-line shop or from local shops.
You get their schematics/PCB documentation and their BIOS has features that are missing in most mini-PCs and laptops with Alder Lake N/Twin Lake, e.g. you can enable in-band ECC for the memory. You can choose various variants of the SBC and you can buy cheaply various accessories, e.g. several case variants and additional peripheral interfaces. Those ODROID H4 SBCs are also correctly designed for cooling inside a box like that used in this project, because the PCB is attached to a big heatsink and you can attach the heatsink directly to an aluminum wall from inside the box, ensuring good thermal contact with pads or grease, so that the electronics will be cooled well.
Most technical information can be found in their Korean site, but there is a UK distributor (though the prices appear greatly inflated here; so much that it might be cheaper to buy from South Korea, depending on shipping costs and applicable taxes):
Also the Chinese Radxa has a Raspberry Pi sized SBC with an Intel N100, which is open hardware, with complete schematics/PCB documentation (but unlike ODROID H4, which has excellent cooling and it can be used without a fan, it is unclear how easy is to cool the Radxa SBC).
Moreover, unlike for many Intel/AMD CPUs, which no longer have public documentation, for Alder Lake N Intel still provides public datasheets, which contain e.g. the thousands of control registers for the on-chip peripherals. Most ARM Cortex-A based CPUs are undocumented, with few exceptions like Rockchip RK3588 and the very expensive NVDIA Orin/Thor (or the obsolete Xavier). All Cortex-A based CPUs have secret boot loaders, so you can never be certain that your programs really run on bare metal, as the CPU vendor can implement the equivalent of the Intel System Management Mode, where the proprietary vendor firmware can take control from your own operating system whenever it wants.
There are somewhat more ARM-based SBCs than Intel-based SBCs that are open hardware, but there are also plenty of undocumented ARM SBCs that are much worse from this PoV than the Intel/AMD based computers, where at least the IBM PC standards and the later standards pushed by Intel, e.g. ACPI/UEFI, apply. The Allwinner CPU used in this robot has almost non-existent documentation, in comparison with Intel Alder Lake N, so it is much farther from "open hardware".
You have mentioned the NVIDIA Jetson modules, which are based on Thor/Orin/Xavier. Those have excellent documentation, but you have to register at NVIDIA, for a free account, in order to access it. The documentation is not the problem with them, but the fact that they are greatly overpriced, like almost anything made by NVIDIA. Unless your application critically depends on some feature provided by NVIDIA, for which no acceptable alternatives exist, choosing Jetson is a very bad decision, because the alternatives are usually both better and cheaper.
The SBCs based on Cortex-A55 are the cheapest for the purpose of running Linux that still have a decent performance and they may be sufficient for many applications.
However, the SBCs based on either Intel Alder Lake N or on ARM Cortex-A7x cores are in a completely different class of performance, so they are more future-proof as they can enable the implementation of applications that were not taken into consideration in the beginning. Moreover, as pointed by the other poster, none of the Cortex-A55 SBCs implements any kind of standard, so migration to any different SBC may require significant work, unlike with the Intel/AMD SBCs, which are mostly interchangeable.
The Intel Alder Lake N/Twin Lake cores (Gracemont cores) have a performance similar to the ARM Cortex-A78 cores, which for now can be found only in few SBCs, which use Qualcomm, Mediatek or NVIDIA CPUs. The Cortex-A76 cores, which are used in Rockchip 3588 and in the latest Raspberry Pi, have a speed of only around 2/3 of the Gracemont/Cortex-A78 speed, at the same clock frequency.
Cortex-A55 cores are many times slower than any of these bigger cores. A single Intel SBC (or Cortex-A7x based SBC), can replace both Cortex-A55 SBCs of this design, at about the same cost, improving the cooling and probably lowering the power consumption, while also providing a significant performance headroom for future extensions.
While using 1 Cortex-A55 SBC for minimum cost may make sense, using 2 is a definite mistake, as they should be replaced by 1 better SBC.
I have mentioned the open-hardware Intel-based ODROID H4. The same company makes several models of ARM-based SBCs, which I would trust much more in an outdoors robot, than the choice done in the parent article, because the cooling behavior of all of them is carefully tested and reported on their site, and because it is a company that has been around for many years, demonstrating reliable hardware. Avaota provides much less information about their product than Hardkernel, i.e. they only give schematics/PCB information, without any information about power consumption, and especially about thermal behavior, which is essential in a robot application.
> There are four pair of wires in the cable. If you use all of them for TX, you can't receive.
No, you absolutely can use them all for transmit and receive at the same time. The device at each end knows what signal it is transmitting, and can remove that from the received signal to identify what has been transmitted by the other end.
This is the magic that made 1000Base-T win out among the candidates for Gige over copper, since it required the lowest signaling frequencies and thus would run better over existing cables.
It's both. Anything that you effectively have optimized to the details of its environment will end up dependent on that environment. If that environment is the real world, then it's fine. If that environment is a simulator, now you have sim-to-real problems.
There's always going to be a gap between reality and simulation, because simulation fidelity has an incredibly long tail. "Digital twins" are relatives, not identical replicas, because everything about capturing the real world for simulation involves simplification and discretization and abstraction.
Because "brand new" doesn't mean devoid of context. Within your domain, there will still be common libraries, interfaces, and tools.
C++ is very flexible, with a lot of very mature tooling and incredibly broad platform support. If you're writing some web server to run on the hardware of your choosing, then sure, that doesn't matter. But if you're writing something deeply integrated with platform/OS interfaces, or graphics, or needs to support some less common platforms, then C++ is often your only practical option for combining expressiveness and performance.
This is the sort of info I was trolling for, but what are those platforms and os? Targets llvm doesn't handle yeah c++ makes sense, or c. A sibling mentions xcode, which makes sense. Graphics seems questionable, vulkan support is fine. Windows support has seemed finetoo, the same gui has worked as what we wrote for Linux.
Dependencies. There are billions of lines of C++ out there that have been optimized and production hardened over decades that you might want to reuse. Rust lang interoperability with anything but C sucks in practice.
> At the big tech firms, there are engineers looking for software fixes that make tiny efficiency improvements that can save lots of money at scale.
Meanwhile, Google and Apple look for whatever ways they can to improve battery life on their phones.
While this is true for parts of these companies, I think the user experience with their products makes it clear that the performance focus only goes in a very select few areas.
Perhaps my favorite Google example is the absolute dogshit performance of Google Home devices. By any objective metric, these are fairly capable computers (quad-core arm64 processors) running a tiny number of apps and yet their UI is still incredibly sluggish. Better yet, these are basically the only real-world devices that run Fuchsia - they should be shining examples of the performance of a brand new OS and yet they are anything but.
For one, no one is seriously contemplating a LIDAR-only system, the question is between camera+LIDAR or camera-only.
> Lidar just fundamentally can’t read signs, traffic lights or road markings in a reliable way.
Actually, given that basically every meaningful LIDAR on the market gives an "intensity" value for each return, in surprisingly many cases you could get this kind of imaging behavior from LIDAR so long as the point density is sufficient for the features you wish to capture (and point density, particularly in terms of points/sec/$, continues to improve at a pretty good rate). A lot of the features that go into making road signage visible to drivers (e.g. reflective lettering on signs, cats eye reflectors, etc) also result in good contrast in LIDAR intensity values.
It's like having 2 pilots instead of 1 pilot. If one pilot is unexpectedly defective (has a heart attack mid-flight), you still have the other pilot. Some errors between the 2 pilots aren't uncorrelated of course, but many of them are. So the chance of an at-fault crash goes from p and approaches p^2 in the best case. That's an unintuitively large improvement. Many laypeople's gut instinct would be more like p -> p/2 improvement from having 2 pilots (or 2 data streams in the case of camera+LIDAR).
In the camera+LIDAR case, you conceptually require AND(x.ok for all x) before you accelerate. If only one of those systems says there's a white truck in front of you, then you hit the brakes, instead of requiring both of them to flag it. False negatives are what you're trying to avoid because the confusion matrix shouldn't be equally weighted given the additional downside of a catastrophic crash. That's where two somewhat independent data streams becomes so powerful at reducing crashes, you really benefit from those ~uncorrelated errors.
"In the camera+LIDAR case, you conceptually require AND(x.ok for all x) before you accelerate."
This can be learnt by the model. Let's assume vision is 100% correct, the model would learn to ignore LIDAR, so the worst case scenario is that LIDAR is extra cost for zero benefit.
This is not going to be true for a very long time, at least so long as one's definition of "vision" is something like "low-cost passive planar high-resolution imaging sensors sensitive to the visual and IR spectrum" (I include "low-cost" on the basis that while SWIR, MWIR, and LWIR sensors do provide useful capabilities for self-driving applications, they are often equally expensive, if not much more so, than LIDARs). Camera sensors have gotten quite good, but they are still fundamentally much less capable than the human eyes plus visual cortex in terms of useful dynamic range, motion sensitivity, and depth cues - and human eyes regularly encounter driving conditions which interfere or prohibit safe driving (e.g. mist/ fog, heavy rain/snow, blowing sand/dust, low-angle sunlight at sunrise/sunset/winter). One of the best features of LIDAR is that it is either immune or much less sensitive to these phenomena at the ranges we care about for driving.
Of course, LIDAR is not without its own failings, and the ideal system really is one that combines cameras, LIDARs, and RADARs. The problem there is that building automotive RADAR with sufficient spatial resolution to reliably discriminate between stationary obstacles (e.g. a car stalled ahead) and nearby clutter (e.g. a bridge above the road) is something of an unsolved problem.
The worst case scenario is that LIDAR is a rapidly falling extra cost for zero benefit? Sounds like it's a good idea to invest into cheap LIDAR just in case the worst case doesn't happen. Even better, you can get a head start by investing in the solution early and abandon it when it has obsolete.
By the way, Tesla engineers secretly trained their vision systems using LIDAR data because that's how you get training data. When Elon Musk found out, he fired them.
Finally, your premise is nonsensical. Using end to end learning for self driving sounds batshit crazy to me. Traffic rules are very rigid and differ depending on the location. Tesla's self driving solution gets you ticketed for traffic violations in China. Machine learning is generally used to "parse" the sensor output into a machine representation and then classical algorithms do most of the work.
The rationale for being against LIDAR seems to be "Elon Musk said LIDAR is bad" and is not based on any deficiency in LIDAR technology.
If you're on a desert island and you have 2 watches instead of 1, the probability of failure (defined as "don't know the time") within T years goes from p to p^2 + epsilon (where epsilon encapsulates things like correlated manufacturing defects).
So in a way, yes.
The main difference is that "don't know the time" is a trivial consequence, but "crash into a white truck at 70mph" is non-trivial.
It's different because the challenge with self-driving is not to know the exact time. You win for simply noticing the discrepancy and stopping.
Imagine if the watch simply tells you if it is safe to jump into the pool (depending on the time it may or may not have water). If watches conflict, you still win by not jumping.
I was responding to the parent who said if you had to make a choice between lidar and vision, you'd pick lidar.
I know there are theoretical and semi-practical ways of reading those indicators with features that are correlated with the visual data, for example thermoplastic line markings create a small bump that sufficiently advanced lidar can detect. However, while I'm not a lidar expert, I don't believe using a completely different physical mechanism to read that data will be reliable. It will surely inevitably lead to situations where a human detects something that a lidar doesn't, and vice versa, just due to fundamental differences in how the two mechanisms work.
For example, you could imagine a situation where the white lane divider thermoplastic markings on a road has been masked over with black paint and new lane markings have been painted on - but lidar will still detect the bump as a stronger signal than the new paint markings.
Ideally while humans and self driving coexist on the same roads, we need to do our best to keep the behaviour of the sensors to be as close to how a human would interpret the conditions. Where human driving is no longer a concern, lidar could potentially be a better option for the primary sensor.
> For example, you could imagine a situation where the white lane divider thermoplastic markings on a road has been masked over with black paint and new lane markings have been painted on - but lidar will still detect the bump as a stronger signal than the new paint markings.
Conflicting lane marking due to road work/changes is already a major problem for visual sensors and human drivers, and something that fairly regularly confuses ADAS implementations. Any useful self-driving system will already have to consider the totality of the situation (apparent lane markings, road geometry, other cars, etc) to decide what "lane" to follow. Arguably a "geometry-first" approach with LIDAR-only would be more robust to this sort of visual confusion.
Everyone is missing the point, including Karpathy which is the most surprising because he is supposed to be one of the smart ones.
The focus shouldn't be on which sensor to use. If you are going to use humans as examples, just take the time to think how a human drives. We can drive with one eye. We can drive with a screen instead of a windshield. We can drive with a wiremesh representation of the world. We also use audio signals quite a bit when when driving as well.
The way to build a self driving suite is start with the software that builds your representation of the world first. Then any sensor you add in is a fairly trivial problem of sensor fusion + Kalman filtering. That way, as certain tech gets cheaper or better or more expensive and worse, you can just easily swap in what you need to achieve x degree of accuracy.
> ...just take the time to think how a human drives...
We truly have no understanding of how the human brain really models the world around us and reasons over motion, and frankly anyone claiming to is lying and trying to sell something. "But humans can do X with just Y and Z..." is a very seductive idea, but the reality is "humans can do X with just Y, Z, and an extremely complex and almost entirely unknown brain" and thus trying to do X with just Y and Z is basically a fool's errand.
> ...builds your representation of the world first...
So far, I would say that one of the very few representations that can be meaningfully decoupled from the sensors in use is world geometry, and even that is a very weak decoupling because the ways you performantly represent geometry are deeply coupled with the capabilities of your sensors (e.g. LIDAR gives you relatively sparse points with limited spatial consistency, cameras give you dense points with higher spatial consistency, RADAR gives you very sparse targets with velocity). Beyond that, the capabilities of your sensors really define how you represent the world.
The alternative is that you do not "represent" the world but instead have that representation emerge implicitly inside some huge neural net model. But those models and their training end up even more tightly coupled to the type of data and capabilities of your sensors and are basically impossible to move to new sensor types without significant retraining.
> Then any sensor you add in is a fairly trivial problem of sensor fusion + Kalman filtering
"Sensor fusion" means everything and nothing; there are subjects where "sensor fusion" is practically solved (e.g. IMU/AHRS/INS accelerometer+gyro+magnetometer fusion is basically accepted as solved with EKF) and there are other areas where every "fusion" of multiple sensors is entirely bespoke.
Absent disassembly and direct comparison between a DGX Spark and a Dell GB10, I don't think there's sufficient evidence to say what is meaningfully different between these devices (beyond the obvious of the power LED). Anything over 240W is beyond the USB-C EPR spec, and while Dell does have a question ably-compliant USB-C 280W supply, you'd have to compare actual power consumption to see if the Dell supply is actually providing more power. I suspect any other minor differences in experience/performance are more explainable as the consequences on increasing maturity of the DGX software stack than anything unique to the Dell version; particularly any comparisons to very early DGX Spark behavior need to keep in mind that the software and firmware have seen a number of updates.
Comparing notes with Wendell from Level1Techs, the ASUS and Dell GB10 boxes were both able to sustain better performance due to their better thermal management. That's a fairly significant improvement. The Spark's crusted gold facade seems more form over function.
> It just creates entrenched players and monopolies in domains where it should be near trivial to move (browsers are definitely trivial to jump ship)
I think this is understating the cost of jumping. Basically zero users care about the "technological" elements of their browser (e.g. the render engine, JS engine, video codecs) so long as it offers feature equivalence, but they do care a lot about comparatively "minor" UX elements (e.g. password manager, profile sync, cross-platform consistency, etc) which probably actually dominate their user interaction with the browser itself and thus understandably prove remarkably sticky ("minor" here is in terms of implementation complexity versus the rest of a browser).
Yeah I think you're right. That it's the little things that get people upset rather than the big things weirdly enough. But I think people should have a bit more introspection. Are their complaints things they seriously care about or justifies for their choices. Can they themselves differentiate. It might seem obvious but the easiest person to fool is yourself and we're all experts at it.
> C++ has std::bitset and std::vector and Java similarly has BitSet and Array because using the generic code for arrays of bits is too wasteful.
Rather infamously, C++ tried to be clever here and std::vector<bool> is not just a vector-of-bools but instead a totally different vector-ish type that lacks many of the important properties of every other instantiation of std::vector. Yes, a lot of the time you want the space efficiency of a dynamic bitset, rather than wasting an extra 7 bits per element. But also quite often you do want the behavior of a "real" std::vector for true/false values, and then you have to work around it manually (usually via std::vector<uint8_t> or similar) to get the expected behavior.
I will preface this by saying that I have nothing against ARM per se, that my employer/team supported a good chunk of the work for making ROS 2 actually work on arm64, and that there is some good hardware out there.
I really don't understand why startups and research projects keep using weird ARM SBCs for their robots. The best of these SBCs is still vastly shittier in terms of software support and stability than any random Chinese Intel ADL-N box. The only reasons to use (weird) ARM SBCs in robots are that either (1) you are using a Jetson for Jetson things (i.e. Nvidia libraries), or (2) you have a product which requires serious cost optimization to be produced at a large scale. Otherwise you are just committing yourselves and your users/customers to a future of terrible-to-nonexistent support and adding significantly to the amount of work you need to bring up the new system and port existing tools to it.
reply