Debugging is a discipline that requires patience, and a fervent attention to detail. In the often times fast paced world of software development, when we're faced with deadlines, and an ever growing list of new features to add, and bugs to resolve, it can be a difficult to slow down and proceed in a meticulous, measured fashion. When it comes to solving difficult problems though, this fastidious approach is exactly what's required to locate, and resolve, a problem's root cause.
Stop thinking, and look!
"Stop thinking, and look!"—is one of the maxims from David Agan's book Debugging: The 9 Indispensable Rules for Finding Even the Most Elusive Software and Hardware Problems. As Agan points out in his book, engineers like to think. It's fun. In fact, for a lot of us it's probably why we became software engineers in the first place, but I guarantee you there are vastly more ways that your software could be broken then you could possibly imagine, even on your most inspired of days. So why do we insist on being able to locate a bug by thinking about it? Probably because it's a lot easier than looking for it.
So you want to get better at debugging? Here's a tip: Stop thinking, and look!
I break this rule far more often than I would like. It's easy to approach a bug and immediately say, "Oh, I think I know what's going on here, let me just try ...". Then after churning and burning through one idea after another and hitting F5 for an hour (usually more like 2 or 3 hours), I'm no closer to solving the problem than I was when I started. All I've done is wasted a few hours of time, and worked myself into a desperate and frustrated state that is not conducive to thinking rationally, and applying logic. Simply because I don't actually have any clue what's causing the problem and I'm just shooting blindly.
The problem is, looking is often times, if not always, more complicated than we would like. Making empirical observations generally involves a lot of setup. It's easy to jump right in and start writing code, and it's hard to wire up your debugger, add breakpoints, monitor variables, and actually observe the problem. As developers, we would much rather be adding new features, or writing the next great algorithm, than trying to figure out why something we wrote a year ago is suddenly no longer working. And, we're also wickedly good at finding shortcuts.
Looking is tedious. Sometimes you don't even know where to start looking. When you have to wrestle with those bugs that only manifest themselves occasionally, and only on the production server, and only after bed time. Even just coming up with a plan for how you might be able to go about observing the issue in action can be a chore in and of itself. It's easy to say, "Eh, I know this code pretty well, and it looks like this line might be the problem so I'll just tweak that, and there, we're good to go."
The thing is, you don't know for sure if you've fixed it, or frankly if the bug is even in your code. And it will come back to bite you. Scout's honor.
The Curious Case Of the Leaky Toilet
I used to live in an older house, built in 1908, and spent many of my evenings and weekends troubleshooting and fixing things like squeaky floorboards, and windows that wouldn't open. One morning I woke up and noticed that the bathroom floor was covered in a thin layer of water. I couldn't find anything leaking though so I mopped it up and went about my day. A few days later – same problem. The bathroom floor was covered in water.
I was eventually able to reproduce the problem. Sometimes, but not every time, when the toilet was flushed, water would seep out from underneath and onto the floor. A quick Google search informed me that the wax seal that secures the toilet to the top of the drain was likely loose, bad, or both. So I removed the toilet. I didn't notice the wax seal being particularly bad, or unsealed. But what I did notice is that the problem was worse than I had originally thought. This had been going on for some time, and only occasionally seeped out from under the toilet. Generally, it just got the floorboards underneath wet. And now they were all starting to rot.
So I tore those out, replaced them, put a new wax seal on — fixing the original problem — put down replacement tiles, and put everything back together. Feeling proud of my accomplishment.
About two months later I woke up one morning, and my bathroom floor was covered in water. Again. I thought about it for a while, and decided that I must have not installed the wax seal properly. So I went back to Google, read some tips on installing wax seals, and put in a new one. Problem solved.
Or not. Within a few months it was happening again. This time, I stopped thinking about what the problem might be, and ultimately fixing the wrong thing, and resolved to figure out how to see the problem in action. In the end, after taking the toilet outside and subjecting it to a garden hose for a while I eventually discovered a crack in the base of the toilet. And I could actually watch the water leaking out this crack. The problem, after all of that, wasn't the wax seal at all, though the symptom was very much the same as what you would expect to see with a poor seal. The seal itself was fine, the leak was actually occurring above the seal.
I ended up getting some porcelain patching compound, sealed the crack, and never had another problem with that leak again. Unfortunately I had spent nearly 4 months solving this problem, and replaced the wax seal no less than 3 times. Wasting time and money, because I allowed myself to think about the problem, and its solution, without ever taking the time to actually see it in action.
So here's the moral of the story; If you can't actually observe the problem, and not just its symptoms, how do you know if you've fixed the right problem? If you guess at how something is failing, you often fix something that isn't the bug. Remember too that replicating the issue is not the same as observing the problem. I had no problems replicating that the floor got wet when the toilet was flushed. I failed to observe where the water was coming from. I didn't even really know if I had fixed the problem or not because at the end of the day, I didn't know what the problem was.
When You Debug, You're Fighting Your Own Brain
As humans, we have a tendency to understand the world around us by starting with a conclusion, and then polishing that conclusion with biased observations that support it. This phenomenon is known as confirmation bias, proactively seeking out information that confirms our existing beliefs. Combine this with the availability heuristic, which is our brain's tendency to subconsciously favor our memories over facts even when our memories are wrong, and every time you try and observe a bug in action you're fighting an uphill battle against your own brain.
These biases are made even worse when we're debugging code that we either wrote ourselves, or are already deeply familiar with. Our memories of previous interactions with that code, even if those memories are actually wrong, make us think that we can simply think about the code and know where the problem is, and thus what the answer ought to be. Sure you might get lucky once in a while, but those instances are likely to be few and far between.
"It's a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suite facts." —Sherlock Homes, A Scandal In Bohemia
Sow now you've thought about the problem for a while, and have generated a hypothesis regarding both where the problem is, and what the fix is. You open your IDE, skip right to the file you think contains the offending code, and sure enough it's right there. Exactly where you thought it would be. Make a couple of quick edits, commit your code, and on to the next problem. Good work Joe, you just closed 7 tickets in one afternoon. Or so you've convinced yourself.
Don't fall prey to the tricks your brain is playing on you when it's time to debug and solve a problem. Make sure you're fixing the right problems, and that you can prove that you've fixed it. Not just hidden it away until it rears its head again in a month.
How Do You Know It's Fixed?
Another reason that it's important to actually observe the problem in action, and not just the symptom, is that doing so allows us to know that the solution we put in place to fix the issue, actually fixed the issue. If I hadn't seen the water leaking out of the crack in the bottom of the toilet prior to sealing it, and then been able to observe that after sealing it, it no longer leaked, I would be right back where I was when I wrongfully assumed it was the seal. Thinking I had fixed it, but not really knowing for certain.
So make sure that every time you debug a problem you do this:
- Observe the issue in action
- Apply your fix
- Observe that the issue is resolved
If you can't do these steps, in that order there's no way of knowing if you actually solved the issue at hand, another different issue, or just wrote a bunch of code that does nothing at all.
So avoid frustration, don't waste time solving the wrong things, be confident that you've resolve the problem at hand, and stop thinking, and look already!