Thursday, March 27, 2008

Lines Of Code - Dispelling The Myths

You see posts related to Lines Of Code (LOC) everywhere on programming blogs. People discussing it as a metric of project size, or of programmer productivity, often drawing conclusions in absolutist tones such as “fewer lines of code means less bugs”. I’m writing today to dispute the idea that anything related to LOC is absolute. And I’m going to do it one myth at a time.

LOC As A Metric of Accomplishment or Project Size
This one is old, and most experienced programmers now officially reject it outright. But still, I feel it worth repeating: The number of LOC you write in a single day is worthless as a metric of what you accomplished that day. After all- If someone wrote a 1000 line class in a single day, and you wrote the same class in 500 lines (and maybe tossed in a little extra functionality while you were at it), who was more productive?

As a metric of project size, it makes a little more sense, but just barely- The Wikipedia article on LOC mentions that “Many useful comparisons involve only the order of magnitude of lines of code in a project.” That at least makes a little sense. One can reasonably assume that the same set of programmers, using the same language, will be able to accomplish more with 10000 lines of code than with 1000.

Editors' note: The lines below may be wrapped and on more than the lines stated due to our template.

Ambiguity: Too Many Ways To Count
The next issue is this: How do you COUNT lines of code? That same Wikipedia article linked above has an interesting example:

for (i=0; i < 100; ++i) printf("hello"); /* How many lines of code is this? */ How many lines is that? One for the line? Two for logical seperation of forloop and print statement? What if you wrap the line in curly braces, one on each line, and move the comment to it’s own line, so it looks more like for (i=0; i < 100; ++i) { /* How many lines of code is this? */ printf("hello"); } Did you see that? I just took one line of code, and turned it into 5! Things get so much more confusing than stylistic issues like this, though. What about libraries? What about frameworks? Jeff Atwood wrote a post a couple of months ago wherein he mentioned that the original version of Basecamp had been written in only 600 lines of code. Given that Basecamp was written using Ruby on Rails, which handles all sorts of crazy things for you, I think the more honest metric would have to be the 600 lines of basecamp, plus however large the RoR framework is, plus the code RoR generated for them when they started the project. Money says it’s more than 600 lines. I’ll give you another example. Currently, the amount of code I’ve written for Migratr is 5000 lines and change. However, Migratr uses FlickrNet, a popular .NET library for interacting with the Flickr API. FlickrNet, if you download the source, is more than 12,371 lines of code (according to the unix util “wc”). I’ve also incorporated a few other libraries, all of which are linked in Migratr’s “about” page. Why? Because I didn’t want to write those 12,000 lines. Someone already had, and they did a fantastic job of it, and I didn’t see any point in re-inventing the wheel. So the total code involved in Migratr is, at this point, more than 17,000 lines. I’ve written 5,000 of them, and I’m the sole developer of Migratr. But I wouldn’t dream of saying “I built this application that interacts with a whole slew of online photo services and migrates all your photos and does all this neat stuff, and it was only 5,000 lines!” No. MY contribution was 5000 lines, but Migratr depends on a lot more than that. The point being, where do you draw the line? Is Migratr a 5000+ LOC project? or 17,000+? Do you only count LOC if you have access to the source of the library you’re using? Do you only count the code written for your specific project? Could I export Migratr’s backend as a library and let someone write public void migrate(String filepath) { ImportFromSource(Services.Flickr, filepath); exportToDestination(Services.SmugMug, filepath); } Did you see that? My whole project was just replaced! Some twerp in a garage somewhere just re-wrote Migratr in 5 lines of code! It doesn’t matter that it depended on my 5,000 line library, which depends on (among others) 12,000 lines of the Flickrnet library. You only saw 5 lines. It was only 5 lines. But clearly, since 5,000 is more than 5, I can at least say that I was more productive. The relationship between line count and bug count only goes so far The idea that “Less lines of code mean less bugs” is some sort of universal absolute is the biggest of all the LOC myths. In a desperate effort to nullify your “oh snap, time for a flamewar” reflex, I’m going to add a couple of disclaimers before I get into this. Any reference I make to a programming language is not an attack on that programming language, just an attack on the abuse of that language. Also, I realize that reducing the amount of code required for a specific task can help reduce the bug count, but only so far. I’m saying there’s a limit to this. Here’s the line I want to draw for you. Reducing the code involved in a task will only help reduce the bug count IF THE PROCESS MAKES THE CODE MORE READABLE. Past that, you’re probably creating lines of code that do more than one thing per line, and thus increasing the number of bugs possible per line of code. Perl programmers are especially frightening in this regard. I really love PERL and use it instead of shell scripting languages such as BASH whenever possible, but there’s something in the PERL culture that demands it’s programmers to try and do something in the fewest number of characters possible. There’s even a name for this as a sport- It’s referred to as “Perl Golf”. Really quick, can you tell me what this is? -pl s!.!y$IVCXL426(-:$XLMCDIVX$dfor$$_.=5x$&*8%29628; $$$_=$_!egfor-4e3..y/iul-}/-$+ /%s''$';*_=eval I’ll give you a hint- It’s not a cat walking across my keyboard. If that was your first guess, though, we’re at least on the same page. This was the winning entry in a game of Perl Golf: The problem was to write a Roman Numeral Calculator in the fewest characters possible. Submissions can be found here. Now, this is an amazing piece of code. The person who wrote it is clearly very skilled. The fact that Perl makes code like that possible is pretty impressive. The problem here is that people write code like this outside of PERL golf, and think to themselves, “it’s only one line, so it must have fewer bugs than a 50 line solution”. Really? Because if I had to debug one, I’d have definitely gone with the 50 line solution. And don’t tell me that that one line of code worked on the first, second, or 5th try. The 50+ line solutions might have, but not that catwalk. Writing unmaintanable code does not reduce the bug count for a project. A friend of mine coined a term for code like that- “Write-Only Code”- a takeoff on the permissions you can set with chmod. Think about it: Readable, writeable, and executable. Shouldn’t code be all three? Another quick example: (0/:l)(_+_) The first time I saw this snippet, I thought someone was, via emoticons, trying to re-enact the facial expression one wore when seeing “Goatse” for the first time. It’s not actually a story told by emoticon. It’s a piece of scala code that sums the elements in a list. And I just don’t see it as having been easier to write than a forloop. It’s definitely not as maintainable. This piece of code, actually, prompted a conversation with a friend about holding a “code or emoticon” contest, but we really couldn’t come up with any serious contenders to this one. Maybe some Perl Golfers could throw their hat in the ring? I want to state too, that for this section, I’m well aware that I didn’t provide buggy one-liners. I provided working one-liners. This is because I was trying to emphasize, the point isn’t if it works or not. The point is if you can tell by looking at it what it should do. If you can’t, it’s unmaintainable. If it’s unmaintanable, then even if it works the first time, it’ll produce a slew of bugs as soon as anyone tries to modify it. So, in essence - stop paying attention to Lines of Code. They’re impossible to count in any useful way. They’re a crappy metric of project size or personal accomplishment. And they don’t control how many bugs your code has. You do. cheers Aurobindo
AddThis Feed Button

No comments: