I’ve been spending the past couple of weeks working on stored procedures. Glimpsing into my career so far, I realize how much stored procedures are the backbone of many organizations dealing with data. Stored procedures are something of a potpourri between magic behavior, bespoke black boxes, and the sedimentation of code layers accumulated over years of feature additions implemented by a battalion of sometimes well-intented PL/SQL programmers with tight deadlines. Furthermore, stored procedures, more than any other type of data manipulation, are what the actual live production systems rely upon. It is not uncommon for a piece of software to have hundreds of store procedures essential for it to work, and for good reason. Indeed, store procedures are extremely efficient. So much so that even unoptimized pieces of code harboring redundant test and an unreasonable amount of nested outer joins still run in a few milliseconds. Efficient they are. But you know what they are not? Commented. Seriously, the packages I worked with recently contain tens of thousands of lines of code but never contain more than 10 lines of comments, mostly containing something along the lines of “– 10/10/2014 added by Jay” or “– requirement R3045”. And as far as I can remember, relying solely on my flawed memory and anecdotal evidence, this is the case with the vast majority of stored procs. Therefore, after I spending some time curved into a ball crying, I asked myself: “why?”.
Common consensus about commenting code
Childishly, I first assumed that every piece of code should be commented, and the only reason for not commenting code would be laziness/lack of time/lack of understanding/hatred for whomever would read your code in the future. I was obviously misguided as one often is when assuming anything to be simple. Indeed, they are many times when commenting renders your code less readable, or is an excuse for bad coding. This article in particular, Common Excuses Used To Comment Code and What To Do About Them does an excellent job at highlighting when commenting is sub-optimal:
The code is not readable without comments. Or, when someone (possibly myself) revisits the code, the comments will make it clear as to what the code does. The code makes it clear what the code does. In almost all cases, you can choose better variable names and keep all code in a method at the same level of abstraction to make is easy to read without comments.
We want to keep track of who changed what and when it was changed. Version control does this quite well (along with a ton of other benefits), and it only takes a few minutes to set up. Besides, does this ever work? (And how would you know?)
I wanted to keep a commented-out section of code there in case I need it again. Again, version control systems will keep the code in a prior revision for you – just go back and find it if you ever need it again. Unless you’re commenting out the code temporarily to verify some behavior (or debug), I don’t buy into this either. If it stays commented out, just remove it.
The code too complex to understand without comments. I used to think this case was a lot more common than it really is. But truthfully, it is extremely rare. Your code is probably just bad, and hard to understand. Re-write it so that’s no longer the case.
Markers to easily find sections of code. I’ll admit that sometimes I still do this. But I’m not proud of it. What’s keeping us from making our files, classes, and functions more cohesive (and thus, likely to be smaller)? IDEs normally provide easy navigation to classes and methods, so there’s really no need to scan for comments to identify an area you want to work in. Just keep the logical sections of your code small and cohesive, and you won’t need these clutterful comments.
Natural language is easier to read than code. But it’s not as precise. Besides, you’re a programmer, you ought not have trouble reading programs. If you do, it’s likely you haven’t made it simple enough, and what you really think is that the code is too complex to understand without comments.
Why this consensus does not apply to stored procedures
As much as these arguments make sense I don’t think they apply to store procedures:
- “you can choose better variable names and keep all code in a method at the same level of abstraction”: You can’t easily change table fields and names, nor can you cut a big nested SQL statement gracefully.
- “Version control does this quite well”: Version control is almost never implemented for stored procedures.
- “I wanted to keep a commented-out section of code there in case I need it again.”: OK, that’s just BS.
- “[complex code] is extremely rare.”: Nested SQL queries are inherently complex and MUCH less readable than traditional code.
- “Markers to easily find sections of code.”: I never saw a problem with that.
- “you ought not have trouble reading programs”: Except queries are the opposite of natural language. Please, please, please SQL developer, let me know why you are doing this particular join
To summarize, I still don’t understand why stored procedures are generally not commented, while it would seem they are the type of code that could benefit the most from comments. Maybe NoSQL will change this, but in the meantime, I will start this crusade, and make sure people explain their code, yo!