Embedded systems often have crappy compilers. And you sometimes have to pay crazy money to be abused, as well.
Years ago, we were building an embedded vehicle tracker for commercial vehicles. The hardware used an ARM7 CPU, GPS, and GPRS modem, running uClinux.
We ran into a tricky bug in the initial application startup process. The program that read from the GPS and sent location updates to the network was failing. When it did, the console stopped working, so we could not see what was happening. Writing to a log file gave the same results.
For regular programmers, if your machine won't boot up, you are having a bad day. For embedded developers, that's just a typical Tuesday, and your only debugging option may be staring at the code and thinking hard.
This board had no Ethernet and only two serial ports, one for the console and one hard-wired for the GPS. The ROM was almost full (it had a whopping 2 MB of flash, 1 MB for the Linux kernel, 750 KB for apps, and 250 KB for storage). The lack of MMU meant no shared libraries, so every binary was statically linked and huge. We couldn't install much else to help us.
A colleague came up with the idea of running gdb (the text mode debugger) over the cellular network. It took multiple tries due to packet loss and high latency, but suddenly, we got a stack backtrace. It turned out `printf()` was failing when it tried to print the latitude and longitude from the GPS, a floating point number.
A few hours of debugging and scouring five-year-old mailing list posts turned up a patch to GCC (never applied), which fixed a bug on the ARM7 that affected uclibc.
This made me think of how the folks who make the space probes debug their problems. If you can't be an astronaut, at least you can be a programmer, right? :-)
At least the debugger worked. The processor I used in embedded systems in college, the 68HC11, would stop doing conditional branches when the supply voltage was too low.
We had a battery powered board, with no brownout detection, and I was using rechargable NiMH batteries to save money/waste. When the students with alkaline batteries had low batteries, the motor load would bring vcc down far enough that the CPU would reset by itself. With NiMH, the batteries could still drive the motors and keep the CPU alive...
You could single step in the debugger, and see the flag register was set as expected, but the branch didn't happen. Just ran straight through. I can't remember if unconditional jump or call worked. After about the third time this happened, I got good at figuring it out.
> For embedded developers, that's just a typical Tuesday
I was trying to explain to my colleague the other day that I've spent an unhealthy amount of time rebooting devices while staring at an LED wondering why it won't turn on.
It is nuts to have a dev board that is constrained as the final device. You should have had an additional serial port and 8x as much flash, it would have solved your problem immediately.
It is even better to do the bulk of the dev inside of an emulator if you can swing it. The GPS and GPRS could be tethered into the emulator instead of trying to get a debug link into the system board.
> For regular programmers, if your machine won't boot up, you are having a bad day. For embedded developers, that's just a typical Tuesday, and your only debugging option may be staring at the code and thinking hard.
Of course where it becomes even more fun is when it's a customer's unit in Peru and you can't replicate it locally :). But oh how I love it. I have definitely spent many a day staring at code piecing things together with what limited info we have.
But to get back on topic, I can definitely confer on the quality of most embedded compilers. It's a great day when I can just use normal old gcc. I've never run into anything explicitly wrong, but I see so many bits of weird codegen or missed optimisations that I keep the disassembly view open permanently, as a sanity check. The assembly never lies to you - until you find a silicon bug at least.
In the embedded world, correctly working hardware isn't a given, either. Part of the board bringup/hardware verification process is just determining that everything on the board actually works. Always fun when you have to figure out if a problem is in your code or in the hardware. (HINT: It's often both.)
It's rare that you need to break out the oscilloscope or logic analyzer, but when you absolutely have to know if that line went high or not, there's no substitute. :)
Were these commodity boards? Having to resort to using the cellular connection, instead of attaching a hardware debugging probe (J-Link?) seems like a recipe for a painful squandering of intellect.
One of the lovely "features" of embedded work is that after a while of doing this sort of thing, sometimes you get good enough at the crazy hacks that it becomes faster and easier to do something like this than to track down who has the J-Link (okay, they've usually got more than one) and can they spare it/where did they put it/why does that person have a J-Link at all/is the J-Link still alive....
> For regular programmers, if your machine won't boot up, you are having a bad day. For embedded developers, that's just a typical Tuesday, and your only debugging option may be staring at the code and thinking hard.
It seems to me that if you can still update and reboot said machine, you can do a bisect on your commits to pinpoint the regression. Once you spot the regression commit you can split it to check what introduced the regression.
With that in mind, the article is correct that the vast majority of issues people think might be a compiler bug are in fact user errors and misunderstanding.
My experience actually working with users has been somewhat humorous in the past, including multiple instances of people completely freaking out when they report something that turns out to be a miscompile. I’ve seen people completely freaking out, to the point that they no longer felt that any code could be trusted since it could have been miscompiled in some way.
Compilers are multimillion line programs, and they have an error rate which is commensurate with multimillion line programs.
That said, I think like half the bugs I see get filed against the compiler aren't actually compiler bugs but errors in user code--and this is already using the filter of "took the trouble to file a compiler bug." So it's a pretty good rule of thumb that it's not a compiler bug, unless you understand the compiler rules well enough to articulate why it can't be user error.
It's around 10% invalid bugs and another 10% duplicates. A lot of them that I've seen, including one of mine, are a result of misinterpreting details of language standards.
Compilers have a huge advantage over other programs: they are fully deterministic since they depend only on input files, command line arguments and few environment variables. It makes bugs easier to reproduce and fix compared to interactive applications, programs with networking, multi-threading...
Pretty sure most modern compilers are multithreaded, and do exhibit a slew of practical nondeterminisms, which is how/why projects like Reproducible Builds were formed.
In general, most compilers are generally single-threaded for most of the compilation process--at the very least, compiling a single file (translation unit) is almost always done using just one thread.
However, nondeterminism does creep in in various places in the compiler. Sorting an array by pointer value is an easy way to get nondeterminism. But the most common nondeterminism in a build system comes not from the compiler but the filesystem--"for file in directory" usually sorts the file by inode, which is effectively nondeterministic across different computers.
I am very curious, if these bugs are that common then why don’t we see more programs with weird bugs when they are running and especially having them be documented? Is it because when an unknown bug turns out to be a compiler bug and not a code error it gets fixed right away and with little fanfare? Or that there is some sort of resiliency built into the compiled code that can mask compiler bugs? Or is there some other factor?
Also how easy is it do discover a compiler bug and how easy is it to identify that a bug in your executable is due to a compiler bug?
A significant factor in my experience is that a lot of programs are quite similar from an compiler perspective: they use well-trodden set of features and combine then in a predictable way. Compiling those regular programs is well-tested and well-understood. Compiler bugs tend to be relegated on the exotic paths, when using language features in novel and interesting ways.
Ages ago working on PS2 games one of our guys had a particularly huge "do-animations-and-interpolations-and-state-and-everything-for-the-hero-in-one-huge-switch" thingy (not uncommon to encounter in games) that crashed the GCC, the function was split up.
In the sequel I think a similar function grew enough that not only had they the function but also split in multiple files to avoid miscompiles.
Most recently I was generating an ORM binding(C#) from the database model of an ERP system, for mysterious reasons the C# runtime was crashing without stacktraces,etc (no debugger help). Having seen things like this before I realized that one of the auto-generated functions was huge so I split it up in multiple units and lo-and-behold it worked.
(Having written a tiny JVM once I also remembered that jump instructions are limited to 64kb, not 100% if the .NET runtime inherited that... once it worked I didn't put any effort into investigating the causes).
Most of the time though compiler bugs aren't the worst (unless they help cause confusion in already hard scenarios).
> I am very curious, if these bugs are that common then why don’t we see more programs with weird bugs when they are running and especially having them be documented?
Any given program has N "native" bugs and M bugs introduced by the compiler. I think as long as N >> M you won't really notice. Even if you stumble across a compiler bug by chance, proving it is a nightmare: there's so much UB everywhere that any possible output is technically correct. Exceptions are compiler crashes but those are rare.
In my experience most of compiler bugs were found by well-tested and proven software during the update of the compiler version or switching compilers. That kind corresponds to the prerequisite of "N is small".
Compilers runs enormous regression suites, and CI/git/bisect/etc style of development has made bugs harder to check in and quicker to squash in a lot of cases I would say.
I have found a number of compiler bugs in GCC and LLVM (and GAS and LLMV AS). Almost without fail they have been in the use of new features (certain new instructions, new ABI / addresing model) or esoteric things (linker script trickery, unusual use of extended inline asm) etc where the compilers had probably no or very little "real" code to test against other than presumably some simple things and basic unit tests when they check in said features.
Unless you're doing _really_ unusual things, or exercising new paths that don't just get picked up when compiling existing code (e.g., like many/most optimizations would), it's just not that likely you'll write code that triggers some unique path / state that has a noticeable bug.
To identify the bug is a compiler bug that is silent bad code generation, you basically assume the compiler is correct until you start to narrow the problem down to a state which should be impossible. After you put in enough assertions and breakpoints and logging (some of which might make the problem mysteriously go away) and reach the point of banging your head on the table, you start side-eyeing the compiler. If you know assembly you might start looking at some assembly output. Or you would start trying to make an reduced reproducer case. E.g., take the suspect function out on its own and make some unit tests for it. A tool like C-reduce can sometimes help if it's not a relatively simple small function.
How quickly you reach that point where you can actually start to narrow down on a possible compiler bug entirely depends on the problem. If it's causing some memory ordering or race condition or silent memory corruption that is only detected later or can only be reproduced at a customer sporadically, then who knows? Could be months, if ever. Others could be an almost immediate assert or error log or obvious bad result that you could debug and file a bug report in a day.
It's amazing how many compiler issues never translate into meaningful deviations at the level of application behaviour. Code tends to be highly resilient to small execution errors, seemingly by accident. I wonder what a language/runtime would look like if it were optimized to maximize that resilience, i.e. every line could miscompile in arbitrary ways. Is there a smarter solution than computational redundancy without an isolated verifier system?
I hit a similar issue in 2017 which is still the case today: Python's builtin `random.shuffle` destroys numpy arrays passed into it [0]. This is apparently a design limitation within numpy and cannot be detected or fixed, so it still stands today. I spent hours combing through my own code wondering where the bug was, because there was no way that it was caused by numpy or Python, but eventually all the likely scenarios got ruled out...
Back when I worked on the MPC-HC project we found a bug in the Visual Studio MSVC compiler. When we upgraded from VS2010 to VS2012 subtitles would fail to render.
We eventually traced it down to a small for loop that added 0.5 to double members in an anonymous struct. For some reason these three factors: an anonymous struct, double datatypes and a for loop caused those member variables to become uninitialized.
We extracted this code into a small code sample to make it easily reproducible and reported it to Microsoft. Their compiler team called it one of the most helpful reports they'd gotten and confirmed it was a bug in their for-loop vectorization code. The compiler appeared to have messed up the SIMD instructions to write the results of the addition back to memory.
As a compiler developer, I see plenty of bugs. So, it's sometimes a bug. But, in the case of C (and C++ by extension), it's often a language design bug that unfortunately has no fix and can only be worked around.
I think it's just common for people to assume they're wrong and change things blindly rather than carefully checking the standard for their language (assuming their language even has a standard to check). It doesn't help that before AddressSanitizer and co. existed compilers would just do all sorts of nonsense when they detected possibly undefined code in C and C++.
Oh man. I uncovered a hash implementation bug in go, ca 2014 or so and I spent like two days prepping my bug report, tests, I was so certain it was me. The team of course was super nice and like ‘good catch’. Victory lap day for any nerd.
It's massive, and several gcc versions have to be blacklisted. The clang restrict bug is still not fixed, it never worked.
rustc was never memory-, type- nor concurrency-safe.
I wonder if the bubble-sort implementation in this library helped prolong the life of this bug. Most people would choose another impl for performance reasons, and thus not find this bug.
Note... it's not really that "it's never a compiler bug," but more like "it's never a backend/codegen bug."
It's not particularly hard (for someone who knows the language rules, which are difficult for a language like C++) to make a widely-used compiler be erroneous in its acceptance or rejection of code.
What's much more difficult ("never" happens) is to make the compiler accept valid code and then generate an incorrect executable. It's possible (and I run into this maybe once a year doing unusual things) but it's really rare. If you think that's what's going on, it's very unlikely to be the case.
Codegen bugs are not particularly rare either; but you usually run into them if doing "weird stuff" (which hits an edge case somewhere within the compiler).
And the first instinct of most C++ programmers when seeing weird compiler behavior is to assume their weird code somehow triggered undefined behavior, so they refactor their program until it's less weird. But then it usually also no longer hits the edge case in the compiler's logic, so the program starts working correctly. Most developers then don't spend additional hours/days to investigate whether it was truly undefined behavior or if they hit a compiler bug.
Round 'bout 10 years ago I was working on this Python C extension and, after a distro upgrade, it started segfaulting. Dropping down into gdb, python was fairly obviously calling the wrong C function. I didn't know if the linker, compiler or python was at fault and "it is never a compiler error" was at the forefront of my mind so I never even tried to report the incorrect behavior out of fear that maybe I was doing something stupid that caused gcc to compile an incorrect shared library without complaining.
IIRC after the next fedora release everything started working again so maybe not me? Still don't know.
It depends really which compiler you are testing and whether the version you are testing has just been released or has been around for some time. If the compiler is for a niche language, then it's possible to find bugs. If the compiler has been released, it's even possible to be the first person to note the bug. But the bigger the language, the more has passed, the less likely this is.
This is definitely a big factor. I've found one compiler bug, but it was in a feature that had been added all of two months earlier (optional chaining in Typescript 3.7).
Learning to code (C) I thought I found a compiler bug lots of times and was almost always wrong. It gave me the heuristic that if I thought I found a compiler bug, it was time to take a break, have a snack and go for a walk or something before looking again. It usually helped me find my mistake much faster.
The thing I disliked most about later learning PHP or Javascript was that my previously usually wrong reaction of "the compiler is insane" suddenly turned out to be commonly true. Even when it wasn't an actual bug PHP and javascript were often so poorly designed that intended behaviour wasn't much better than one.
Thanks for reminding me that the two programming languages I'm using are poorly designed :)
Joke aside, JS is getting better especially when paired with the right tools, like Typescript, VS Code, ESLint, Prettier, React+JSX, etc.
PHP has been evolving for a long time to be a bit safer with more static analysis.
I'm not a fan or PHP's variables that are available in a bigger scope than they should, arrays that can be filled without having been defined in the first place, or that are falsy when empty. The solution is to not abuse these features (not use them at all, really) and code as if it was not PHP.
My worst slowdown ever was when a compiler failed because a bit had flipped somehow. After a month and a half I finally reinstalled it and everything worked perfectly.
While it's almost never a compiler error, it happens, and I have personal experience; I once found an error in the VAX/VMS Pascal compiler - and could demonstrate it as such by disassembling the compiler output - and had to work around it until DEC fixed it.
I crashed the Oracle HotSpot Java virtual machine back in 2017 with a totally innocuous program involving nested arrays. After reproducing and minimizing it, I filed a bug report. It got fixed quickly.
At my first job, it actually was a compiler error, and I'm not sure if my manager ever believed me. We were using an internal gcc fork and cross-compiling, so who knows where the bug was, but the compiler team got back to me. Jump tables were sometimes broken, and we had to add a switch to disable them.
For anyone who worked in embedded programming in the bad old days of proprietary compilers, it sometimes felt like the compiler working correctly was the common case. One of my first jobs involved programming a smallish, embeddedish, ruggedized computer in C. IIRC I wasted several hours on a bug once before realizing that it was a compiler issue and I needed to try arbitrarily rearranging the buggy function until it generated code that at least appeared to work.
In the early days of C++11, I used to get unique ICEs in both GCC and Clang weekly. One particular annoyance was when a stable release of Debian decided to ship a point release with a regression (not looking it up, but it was something like: 4.6.1 or 4.6.3 worked, but 4.6.2 had completely broken UDLs for constant expressions or something). I had just converted the whole codebase to use UDLs aggressively since they worked everywhere in my tests, not thinking I had to test every point release in between ...
Thankfully I don't think I ever had any miscompilations - that would require the code actually compile across several compiler versions in the first place.
This brings back memories of XL calculating an address wrong as a result of it lying on a boundary ≡ 0 (mod 2^32). Fortunately, the TOBEY (XL back-end) guys were in the same area in the building so restablishing our sanity was faster than it otherwise could have been...
>> It is not a compiler error. It is never a compiler error (2017)
No, not always true. Even in modern compilers -- as matured and as modern as VS 2022-- you would still get bug.
I found one[0]. In my case it's easy to tell it's a compiler bug because the program just can't compile properly. But it's also not easy to reproduce, which just proves how well tested compilers usually are.
Our infrastructural team keeps about 2 MSLOC building on several compilers and running on several architectures. They report a new compiler bug every 2-3 years.
I still have a recognition letter from Borland regarding a bug I have found in Turbo Pascal 6.0.
function BrokenResult: Integer;
var
BrokenResult: Integer; (* This should not happen *)
begin
BrokenResult := 42 (* Local variable will be assigned, function result is whatever the compiler comes up with*)
end;
When I started learning Turbo Pascal I came across a problem where an if-statement was obviously decided wrong. I saw the values in the debugger.
My rescue was that I had a more experienced friend who knew that IIRC the compiler would choose the data type of the left operand of a comparison also for the right operand leading to potential sign switches.
I’ve hit so many fun compiler bugs. Usually easy to work around though (yay modern / fp flavored languages). It certainly helps when it also crashes the compiler ;).
Miscompilation bugs are definitely nasty though. Especially if it’s a self boot strapping compiler. Save your old build artifacts! :)
Back when I was using CodeWarrior to make a game for PlayStation 2, I found a compiler bug, but fortunately, it was one where it gave an error on valid code, rather than generating bad output. I can't remember the details, but I had some sort of equation that my co-workers agreed should have compiled with no problems. I was able to rewrite it a little to get the result I wanted without triggering any compiler errors.
As the article shows it’s highly dependent on which compiler you’re relying on. Always good to keep this in mind when assessing the likelyhood of an error.
Just last week I tripped over a couple compilation bugs in (an old version of) bpftrace.
One was caught by internal checks somewhere, something about struct member offsets that I think was an alignment / padding issue and didn't seem to actually break anything. The other made it segfault during compilation, and I had to just tweak my code blindly until it decided to go away.
I’ve thought I’d found a compiler but maybe 5 times in my life and it has never actually been a compiler bug.
When I reflect on the ~25 years I’ve been programming C, all of the times I thought I’d found a compiler bug were in the first ~8 years. Dunning-Kruger hard at work :-/
I found one by accident with C++. It was a situation where class A had a protected field x, B inherited from A, and C was a friend of B. Can C access x?
GCC and Clang disagreed on this. Upon close reading, Clang was right, C should be able to access x.
(Why did I do this? C was a helper class for the purpose of running unit tests. Unit tests are supposed to poke around in stuff you wouldn't normally poke around in.)
I've encountered 1 in ~20 years. I don't even remember what it was, but I remember being shocked when I tracked it down and it actually was a compiler bug
Similarly, I once ran into a broken implementation of a Dictionary type (in Mono, I think.) It was only comparing the keys' hash codes, not the keys themselves. In most scenarios this turned out to be more than good enough - for int32 keys obviously it will work, and for most strings it works too if the hash function is good - but I had a great many keys without an amazing hash function for them.
It's funny how sometimes a really glaring bug can hide in a stdlib for months or years just because by luck the stars never align to trigger it where somebody can notice it. In my case, the dictionary bug was causing recoverable errors, and I only noticed because I dug in instead of going "Mono's just broken".
...unless it is. Compiler crashes are easy to see, but it can actually be nontrivial to identify miscompilations as they can only trigger in certain code paths and with careful observation you can notice the second order effects...
In my case, it wasn't a compiler bug - it was a bug in the STL, before the STL was part of the compiler. It was a separate thing you downloaded. I found a bug, and emailed Stepanov (or Lee - I forget). Me, just some random nobody on the internet. I got a fix, and then an improved fix, and then a final fix, all within two hours. I was floored.
Thankfully though we can still look at the STL source easily and presumably be able to determine the source of the bug or trace behavior or design test cases easier etc.
I was playing with Java 1.0.1 trying to make an app screen with a GridBagLayout. It made utter hash of my layout, drawing things on top of each other, etc. Applying the First Rule of Compiler/Runtime Bugs I double-checked and triple-checked and quadruple-checked my work, making sure I used the GridBagLayout API exactly according to spec. Eventually I posted to USENET comp.lang.java asking, "Is there a bug in GridBagLayout?"
Embedded systems often have crappy compilers. And you sometimes have to pay crazy money to be abused, as well.
Years ago, we were building an embedded vehicle tracker for commercial vehicles. The hardware used an ARM7 CPU, GPS, and GPRS modem, running uClinux.
We ran into a tricky bug in the initial application startup process. The program that read from the GPS and sent location updates to the network was failing. When it did, the console stopped working, so we could not see what was happening. Writing to a log file gave the same results.
For regular programmers, if your machine won't boot up, you are having a bad day. For embedded developers, that's just a typical Tuesday, and your only debugging option may be staring at the code and thinking hard.
This board had no Ethernet and only two serial ports, one for the console and one hard-wired for the GPS. The ROM was almost full (it had a whopping 2 MB of flash, 1 MB for the Linux kernel, 750 KB for apps, and 250 KB for storage). The lack of MMU meant no shared libraries, so every binary was statically linked and huge. We couldn't install much else to help us.
A colleague came up with the idea of running gdb (the text mode debugger) over the cellular network. It took multiple tries due to packet loss and high latency, but suddenly, we got a stack backtrace. It turned out `printf()` was failing when it tried to print the latitude and longitude from the GPS, a floating point number.
A few hours of debugging and scouring five-year-old mailing list posts turned up a patch to GCC (never applied), which fixed a bug on the ARM7 that affected uclibc.
This made me think of how the folks who make the space probes debug their problems. If you can't be an astronaut, at least you can be a programmer, right? :-)
At least the debugger worked. The processor I used in embedded systems in college, the 68HC11, would stop doing conditional branches when the supply voltage was too low.
We had a battery powered board, with no brownout detection, and I was using rechargable NiMH batteries to save money/waste. When the students with alkaline batteries had low batteries, the motor load would bring vcc down far enough that the CPU would reset by itself. With NiMH, the batteries could still drive the motors and keep the CPU alive...
You could single step in the debugger, and see the flag register was set as expected, but the branch didn't happen. Just ran straight through. I can't remember if unconditional jump or call worked. After about the third time this happened, I got good at figuring it out.
> For embedded developers, that's just a typical Tuesday
I was trying to explain to my colleague the other day that I've spent an unhealthy amount of time rebooting devices while staring at an LED wondering why it won't turn on.
It is nuts to have a dev board that is constrained as the final device. You should have had an additional serial port and 8x as much flash, it would have solved your problem immediately.
It is even better to do the bulk of the dev inside of an emulator if you can swing it. The GPS and GPRS could be tethered into the emulator instead of trying to get a debug link into the system board.
> For regular programmers, if your machine won't boot up, you are having a bad day. For embedded developers, that's just a typical Tuesday, and your only debugging option may be staring at the code and thinking hard.
Of course where it becomes even more fun is when it's a customer's unit in Peru and you can't replicate it locally :). But oh how I love it. I have definitely spent many a day staring at code piecing things together with what limited info we have.
But to get back on topic, I can definitely confer on the quality of most embedded compilers. It's a great day when I can just use normal old gcc. I've never run into anything explicitly wrong, but I see so many bits of weird codegen or missed optimisations that I keep the disassembly view open permanently, as a sanity check. The assembly never lies to you - until you find a silicon bug at least.
Tuesday, indeed. :)
In the embedded world, correctly working hardware isn't a given, either. Part of the board bringup/hardware verification process is just determining that everything on the board actually works. Always fun when you have to figure out if a problem is in your code or in the hardware. (HINT: It's often both.)
It's rare that you need to break out the oscilloscope or logic analyzer, but when you absolutely have to know if that line went high or not, there's no substitute. :)
Were these commodity boards? Having to resort to using the cellular connection, instead of attaching a hardware debugging probe (J-Link?) seems like a recipe for a painful squandering of intellect.
One of the lovely "features" of embedded work is that after a while of doing this sort of thing, sometimes you get good enough at the crazy hacks that it becomes faster and easier to do something like this than to track down who has the J-Link (okay, they've usually got more than one) and can they spare it/where did they put it/why does that person have a J-Link at all/is the J-Link still alive....
> For regular programmers, if your machine won't boot up, you are having a bad day. For embedded developers, that's just a typical Tuesday, and your only debugging option may be staring at the code and thinking hard.
It seems to me that if you can still update and reboot said machine, you can do a bisect on your commits to pinpoint the regression. Once you spot the regression commit you can split it to check what introduced the regression.
It took them multiple tries just to use gdb, I don’t think this is a scenario where you can easily reflash the image on the board
Did the GCC patch get applied after that?
"Never" implies no, I guess. :-)
I’ve spent 30 years working on compilers.
They have bugs. Lots of them.
With that in mind, the article is correct that the vast majority of issues people think might be a compiler bug are in fact user errors and misunderstanding.
My experience actually working with users has been somewhat humorous in the past, including multiple instances of people completely freaking out when they report something that turns out to be a miscompile. I’ve seen people completely freaking out, to the point that they no longer felt that any code could be trusted since it could have been miscompiled in some way.
Compilers are multimillion line programs, and they have an error rate which is commensurate with multimillion line programs.
That said, I think like half the bugs I see get filed against the compiler aren't actually compiler bugs but errors in user code--and this is already using the filter of "took the trouble to file a compiler bug." So it's a pretty good rule of thumb that it's not a compiler bug, unless you understand the compiler rules well enough to articulate why it can't be user error.
It's not quite half the bugs on GCCs bug tracker, but it's very high: https://gcc.gnu.org/bugzilla/report.cgi?x_axis_field=&y_axis...
It's around 10% invalid bugs and another 10% duplicates. A lot of them that I've seen, including one of mine, are a result of misinterpreting details of language standards.
Compilers have a huge advantage over other programs: they are fully deterministic since they depend only on input files, command line arguments and few environment variables. It makes bugs easier to reproduce and fix compared to interactive applications, programs with networking, multi-threading...
Pretty sure most modern compilers are multithreaded, and do exhibit a slew of practical nondeterminisms, which is how/why projects like Reproducible Builds were formed.
In general, most compilers are generally single-threaded for most of the compilation process--at the very least, compiling a single file (translation unit) is almost always done using just one thread.
However, nondeterminism does creep in in various places in the compiler. Sorting an array by pointer value is an easy way to get nondeterminism. But the most common nondeterminism in a build system comes not from the compiler but the filesystem--"for file in directory" usually sorts the file by inode, which is effectively nondeterministic across different computers.
Yes, that's why I was so careful with the wording. Timestamps are another example.
I am very curious, if these bugs are that common then why don’t we see more programs with weird bugs when they are running and especially having them be documented? Is it because when an unknown bug turns out to be a compiler bug and not a code error it gets fixed right away and with little fanfare? Or that there is some sort of resiliency built into the compiled code that can mask compiler bugs? Or is there some other factor?
Also how easy is it do discover a compiler bug and how easy is it to identify that a bug in your executable is due to a compiler bug?
A significant factor in my experience is that a lot of programs are quite similar from an compiler perspective: they use well-trodden set of features and combine then in a predictable way. Compiling those regular programs is well-tested and well-understood. Compiler bugs tend to be relegated on the exotic paths, when using language features in novel and interesting ways.
Large functions is a particular breeding ground.
Ages ago working on PS2 games one of our guys had a particularly huge "do-animations-and-interpolations-and-state-and-everything-for-the-hero-in-one-huge-switch" thingy (not uncommon to encounter in games) that crashed the GCC, the function was split up.
In the sequel I think a similar function grew enough that not only had they the function but also split in multiple files to avoid miscompiles.
Most recently I was generating an ORM binding(C#) from the database model of an ERP system, for mysterious reasons the C# runtime was crashing without stacktraces,etc (no debugger help). Having seen things like this before I realized that one of the auto-generated functions was huge so I split it up in multiple units and lo-and-behold it worked.
(Having written a tiny JVM once I also remembered that jump instructions are limited to 64kb, not 100% if the .NET runtime inherited that... once it worked I didn't put any effort into investigating the causes).
Most of the time though compiler bugs aren't the worst (unless they help cause confusion in already hard scenarios).
> I am very curious, if these bugs are that common then why don’t we see more programs with weird bugs when they are running and especially having them be documented?
Any given program has N "native" bugs and M bugs introduced by the compiler. I think as long as N >> M you won't really notice. Even if you stumble across a compiler bug by chance, proving it is a nightmare: there's so much UB everywhere that any possible output is technically correct. Exceptions are compiler crashes but those are rare.
In my experience most of compiler bugs were found by well-tested and proven software during the update of the compiler version or switching compilers. That kind corresponds to the prerequisite of "N is small".
Compilers runs enormous regression suites, and CI/git/bisect/etc style of development has made bugs harder to check in and quicker to squash in a lot of cases I would say.
I have found a number of compiler bugs in GCC and LLVM (and GAS and LLMV AS). Almost without fail they have been in the use of new features (certain new instructions, new ABI / addresing model) or esoteric things (linker script trickery, unusual use of extended inline asm) etc where the compilers had probably no or very little "real" code to test against other than presumably some simple things and basic unit tests when they check in said features.
Unless you're doing _really_ unusual things, or exercising new paths that don't just get picked up when compiling existing code (e.g., like many/most optimizations would), it's just not that likely you'll write code that triggers some unique path / state that has a noticeable bug.
To identify the bug is a compiler bug that is silent bad code generation, you basically assume the compiler is correct until you start to narrow the problem down to a state which should be impossible. After you put in enough assertions and breakpoints and logging (some of which might make the problem mysteriously go away) and reach the point of banging your head on the table, you start side-eyeing the compiler. If you know assembly you might start looking at some assembly output. Or you would start trying to make an reduced reproducer case. E.g., take the suspect function out on its own and make some unit tests for it. A tool like C-reduce can sometimes help if it's not a relatively simple small function.
How quickly you reach that point where you can actually start to narrow down on a possible compiler bug entirely depends on the problem. If it's causing some memory ordering or race condition or silent memory corruption that is only detected later or can only be reproduced at a customer sporadically, then who knows? Could be months, if ever. Others could be an almost immediate assert or error log or obvious bad result that you could debug and file a bug report in a day.
It's amazing how many compiler issues never translate into meaningful deviations at the level of application behaviour. Code tends to be highly resilient to small execution errors, seemingly by accident. I wonder what a language/runtime would look like if it were optimized to maximize that resilience, i.e. every line could miscompile in arbitrary ways. Is there a smarter solution than computational redundancy without an isolated verifier system?
I hit a similar issue in 2017 which is still the case today: Python's builtin `random.shuffle` destroys numpy arrays passed into it [0]. This is apparently a design limitation within numpy and cannot be detected or fixed, so it still stands today. I spent hours combing through my own code wondering where the bug was, because there was no way that it was caused by numpy or Python, but eventually all the likely scenarios got ruled out...
[0] https://github.com/numpy/numpy/issues/10215
Back when I worked on the MPC-HC project we found a bug in the Visual Studio MSVC compiler. When we upgraded from VS2010 to VS2012 subtitles would fail to render.
We eventually traced it down to a small for loop that added 0.5 to double members in an anonymous struct. For some reason these three factors: an anonymous struct, double datatypes and a for loop caused those member variables to become uninitialized.
We extracted this code into a small code sample to make it easily reproducible and reported it to Microsoft. Their compiler team called it one of the most helpful reports they'd gotten and confirmed it was a bug in their for-loop vectorization code. The compiler appeared to have messed up the SIMD instructions to write the results of the addition back to memory.
As a compiler developer, I see plenty of bugs. So, it's sometimes a bug. But, in the case of C (and C++ by extension), it's often a language design bug that unfortunately has no fix and can only be worked around.
There's 830 open and confirmed wrong-code bugs in GCC at the time of writing. Compiler bugs aren't as rare as people think: https://gcc.gnu.org/bugzilla/buglist.cgi?bug_status=NEW&bug_...
I think it's just common for people to assume they're wrong and change things blindly rather than carefully checking the standard for their language (assuming their language even has a standard to check). It doesn't help that before AddressSanitizer and co. existed compilers would just do all sorts of nonsense when they detected possibly undefined code in C and C++.
Oh man. I uncovered a hash implementation bug in go, ca 2014 or so and I spent like two days prepping my bug report, tests, I was so certain it was me. The team of course was super nice and like ‘good catch’. Victory lap day for any nerd.
The article is right: it is almost never a compiler bug. I have had that experience of reporting and being wrong. It sucks.
On the other hand, I have a confirmed bug in Clang [1] and a non-rejected bug in GCC [2], so it does happen.
[1]: https://github.com/llvm/llvm-project/issues/61133
[2]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108448
No, it is very often a compiler bug. Just look at the gcc, clang or rustc tickets.
e.g. https://gcc.gnu.org/bugzilla/buglist.cgi?bug_status=__open__...
It's massive, and several gcc versions have to be blacklisted. The clang restrict bug is still not fixed, it never worked. rustc was never memory-, type- nor concurrency-safe.
I wonder if the bubble-sort implementation in this library helped prolong the life of this bug. Most people would choose another impl for performance reasons, and thus not find this bug.
Note... it's not really that "it's never a compiler bug," but more like "it's never a backend/codegen bug."
It's not particularly hard (for someone who knows the language rules, which are difficult for a language like C++) to make a widely-used compiler be erroneous in its acceptance or rejection of code.
What's much more difficult ("never" happens) is to make the compiler accept valid code and then generate an incorrect executable. It's possible (and I run into this maybe once a year doing unusual things) but it's really rare. If you think that's what's going on, it's very unlikely to be the case.
Codegen bugs are not particularly rare either; but you usually run into them if doing "weird stuff" (which hits an edge case somewhere within the compiler). And the first instinct of most C++ programmers when seeing weird compiler behavior is to assume their weird code somehow triggered undefined behavior, so they refactor their program until it's less weird. But then it usually also no longer hits the edge case in the compiler's logic, so the program starts working correctly. Most developers then don't spend additional hours/days to investigate whether it was truly undefined behavior or if they hit a compiler bug.
Round 'bout 10 years ago I was working on this Python C extension and, after a distro upgrade, it started segfaulting. Dropping down into gdb, python was fairly obviously calling the wrong C function. I didn't know if the linker, compiler or python was at fault and "it is never a compiler error" was at the forefront of my mind so I never even tried to report the incorrect behavior out of fear that maybe I was doing something stupid that caused gcc to compile an incorrect shared library without complaining.
IIRC after the next fedora release everything started working again so maybe not me? Still don't know.
It depends really which compiler you are testing and whether the version you are testing has just been released or has been around for some time. If the compiler is for a niche language, then it's possible to find bugs. If the compiler has been released, it's even possible to be the first person to note the bug. But the bigger the language, the more has passed, the less likely this is.
This is definitely a big factor. I've found one compiler bug, but it was in a feature that had been added all of two months earlier (optional chaining in Typescript 3.7).
Learning to code (C) I thought I found a compiler bug lots of times and was almost always wrong. It gave me the heuristic that if I thought I found a compiler bug, it was time to take a break, have a snack and go for a walk or something before looking again. It usually helped me find my mistake much faster.
The thing I disliked most about later learning PHP or Javascript was that my previously usually wrong reaction of "the compiler is insane" suddenly turned out to be commonly true. Even when it wasn't an actual bug PHP and javascript were often so poorly designed that intended behaviour wasn't much better than one.
Thanks for reminding me that the two programming languages I'm using are poorly designed :) Joke aside, JS is getting better especially when paired with the right tools, like Typescript, VS Code, ESLint, Prettier, React+JSX, etc. PHP has been evolving for a long time to be a bit safer with more static analysis.
I'm not a fan or PHP's variables that are available in a bigger scope than they should, arrays that can be filled without having been defined in the first place, or that are falsy when empty. The solution is to not abuse these features (not use them at all, really) and code as if it was not PHP.
My worst slowdown ever was when a compiler failed because a bit had flipped somehow. After a month and a half I finally reinstalled it and everything worked perfectly.
That sucks, but feels good to have solved it.
This is why it is important to have portable build environments. And ECC and checksumming file systems.
Do macbooks have that? What is the best checksumming file system I could use on mainstream linux?
Only ones I know of are ZFS and BTRFS.
However, duckling the web got this workaround for EXT4:
https://serverfault.com/a/1153319
i.e. use device mapping or LVM setup to do the data integrity check at the block level, under the filesystem.
1. No, Macs don't have ECC to my knowledge
2. ZFS if supported by your distro, btrfs otherwise
While it's almost never a compiler error, it happens, and I have personal experience; I once found an error in the VAX/VMS Pascal compiler - and could demonstrate it as such by disassembling the compiler output - and had to work around it until DEC fixed it.
Idk I reports bugs on GCC / clang something like every few months. I used to do it for msvc too but there were honestly too many
I crashed the Oracle HotSpot Java virtual machine back in 2017 with a totally innocuous program involving nested arrays. After reproducing and minimizing it, I filed a bug report. It got fixed quickly.
I'm not sure why the page is no longer publicly available: https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-818... (JDK-8181921)
At my first job, it actually was a compiler error, and I'm not sure if my manager ever believed me. We were using an internal gcc fork and cross-compiling, so who knows where the bug was, but the compiler team got back to me. Jump tables were sometimes broken, and we had to add a switch to disable them.
Not the right lesson to learn for a first job.
For anyone who worked in embedded programming in the bad old days of proprietary compilers, it sometimes felt like the compiler working correctly was the common case. One of my first jobs involved programming a smallish, embeddedish, ruggedized computer in C. IIRC I wasted several hours on a bug once before realizing that it was a compiler issue and I needed to try arbitrarily rearranging the buggy function until it generated code that at least appeared to work.
In the early days of C++11, I used to get unique ICEs in both GCC and Clang weekly. One particular annoyance was when a stable release of Debian decided to ship a point release with a regression (not looking it up, but it was something like: 4.6.1 or 4.6.3 worked, but 4.6.2 had completely broken UDLs for constant expressions or something). I had just converted the whole codebase to use UDLs aggressively since they worked everywhere in my tests, not thinking I had to test every point release in between ...
Thankfully I don't think I ever had any miscompilations - that would require the code actually compile across several compiler versions in the first place.
Discussed at the time:
“It is never a compiler error” - https://news.ycombinator.com/item?id=15699675 - Nov 2017 (272 comments)
This brings back memories of XL calculating an address wrong as a result of it lying on a boundary ≡ 0 (mod 2^32). Fortunately, the TOBEY (XL back-end) guys were in the same area in the building so restablishing our sanity was faster than it otherwise could have been...
>> It is not a compiler error. It is never a compiler error (2017)
No, not always true. Even in modern compilers -- as matured and as modern as VS 2022-- you would still get bug.
I found one[0]. In my case it's easy to tell it's a compiler bug because the program just can't compile properly. But it's also not easy to reproduce, which just proves how well tested compilers usually are.
0: https://github.com/dotnet/roslyn/issues/74872
Our infrastructural team keeps about 2 MSLOC building on several compilers and running on several architectures. They report a new compiler bug every 2-3 years.
I still have a recognition letter from Borland regarding a bug I have found in Turbo Pascal 6.0.
When I started learning Turbo Pascal I came across a problem where an if-statement was obviously decided wrong. I saw the values in the debugger.
My rescue was that I had a more experienced friend who knew that IIRC the compiler would choose the data type of the left operand of a comparison also for the right operand leading to potential sign switches.
I’ve hit so many fun compiler bugs. Usually easy to work around though (yay modern / fp flavored languages). It certainly helps when it also crashes the compiler ;).
Miscompilation bugs are definitely nasty though. Especially if it’s a self boot strapping compiler. Save your old build artifacts! :)
Back when I was using CodeWarrior to make a game for PlayStation 2, I found a compiler bug, but fortunately, it was one where it gave an error on valid code, rather than generating bad output. I can't remember the details, but I had some sort of equation that my co-workers agreed should have compiled with no problems. I was able to rewrite it a little to get the result I wanted without triggering any compiler errors.
Woa, CodeWarrior was one of the worst compilers (and IDEs) I had to use so far.
As the article shows it’s highly dependent on which compiler you’re relying on. Always good to keep this in mind when assessing the likelyhood of an error.
Just last week I tripped over a couple compilation bugs in (an old version of) bpftrace.
One was caught by internal checks somewhere, something about struct member offsets that I think was an alignment / padding issue and didn't seem to actually break anything. The other made it segfault during compilation, and I had to just tweak my code blindly until it decided to go away.
I’ve thought I’d found a compiler but maybe 5 times in my life and it has never actually been a compiler bug.
When I reflect on the ~25 years I’ve been programming C, all of the times I thought I’d found a compiler bug were in the first ~8 years. Dunning-Kruger hard at work :-/
I found one by accident with C++. It was a situation where class A had a protected field x, B inherited from A, and C was a friend of B. Can C access x?
GCC and Clang disagreed on this. Upon close reading, Clang was right, C should be able to access x.
(Why did I do this? C was a helper class for the purpose of running unit tests. Unit tests are supposed to poke around in stuff you wouldn't normally poke around in.)
I've encountered 1 in ~20 years. I don't even remember what it was, but I remember being shocked when I tracked it down and it actually was a compiler bug
I found multiple compiler bugs at my first real programming job in 1997.
MSVC did not do a good job of maintaining the FPU stack in those days…
On the other hand, if you really focus in testing a compiler, particularly an immature one, it's remarkable how many bugs you can find.
Or if one is using newly introduced language features or accelerated instruction sets.
I wonder what % of compiler bugs go unidentified due to the user code-massaging them away in some fashion.
Similarly, I once ran into a broken implementation of a Dictionary type (in Mono, I think.) It was only comparing the keys' hash codes, not the keys themselves. In most scenarios this turned out to be more than good enough - for int32 keys obviously it will work, and for most strings it works too if the hash function is good - but I had a great many keys without an amazing hash function for them.
It's funny how sometimes a really glaring bug can hide in a stdlib for months or years just because by luck the stars never align to trigger it where somebody can notice it. In my case, the dictionary bug was causing recoverable errors, and I only noticed because I dug in instead of going "Mono's just broken".
...unless it is. Compiler crashes are easy to see, but it can actually be nontrivial to identify miscompilations as they can only trigger in certain code paths and with careful observation you can notice the second order effects...
If you specifically look for them you might find quite a bit: https://web.cs.ucdavis.edu/~su/publications/emi.pdf [disclosure: an author]
In my case, it wasn't a compiler bug - it was a bug in the STL, before the STL was part of the compiler. It was a separate thing you downloaded. I found a bug, and emailed Stepanov (or Lee - I forget). Me, just some random nobody on the internet. I got a fix, and then an improved fix, and then a final fix, all within two hours. I was floored.
Thankfully though we can still look at the STL source easily and presumably be able to determine the source of the bug or trace behavior or design test cases easier etc.
I was playing with Java 1.0.1 trying to make an app screen with a GridBagLayout. It made utter hash of my layout, drawing things on top of each other, etc. Applying the First Rule of Compiler/Runtime Bugs I double-checked and triple-checked and quadruple-checked my work, making sure I used the GridBagLayout API exactly according to spec. Eventually I posted to USENET comp.lang.java asking, "Is there a bug in GridBagLayout?"
The problem disappeared in Java 1.0.3.