Thursday, October 25, 2007

The programming paradigm needs an update

The way I view programming is different than it was a week ago due to various factors. First, I have spent a lot of time lately with my head buried in the .NET documentation, and something in the back of my mind said, “If C# is so much more efficient than VB.NET, why do all of the code samples appear to be the same length and look virtually identical?” Then, I reread an e-mail from someone telling me that XYZ language was awesome because it needed dramatically less source lines of code (SLOC) to do the same thing as more mainstream languages. And, I have still been multithreading the same dumb little piece of code. Finally, I read a post by Mark Miller about Emacs and Lisp.

It is all coming together for me now, the conscious understanding of what I really dislike about programming. I got into programming because I like solving problems, but solving problems is really only a small fraction of the work. The rest of it is giving exact, precise details as to how to perform the tasks that solve the problems. It is like painting the Sistine Chapel by putting a seven-year-old boy on a scaffold and dictating to him every stroke you want him to make. It is ridiculous.

This highlights the real problem in the programming industry: Everything we slap on top of the development process is more garbage on top of a rotting foundation. Let’s take a closer look at what we do here, folks, and examine how much the programming paradigm has (or has not) changed.

Then (i.e., 10 -20 years ago)
A programmer sits down and analyzes the process of the work. Using a stencil and a pencil, he lays out a workflow on graph paper. The programmer sits down, fires up his trusty text editor (if he is lucky, it supports autoindent), and creates one or more text files containing source code. He then compiles the source code (or tries to run it through an interpreter) to check for syntactical correctness. Once it compiles or interprets correctly, he runs it with test cases to verify logical correctness and to check for run-time bugs. Any bugs are logged to a text file; or maybe he has a debugger or crash dump analyzer to find out what went wrong. This continues until the program is considered complete.

Now
A programmer sits down and analyzes the process of the work. Using a flowcharting tool or maybe a UML editor, he lays out a workflow. The programmer sits down, fires up his trusty IDE (if he is lucky, it supports version control), and creates one or more text files containing the source code. Some of this code may be auto-generated, such as data objects based on the database schema or some basic code from the UML. Half of this generated code will need to be discarded unless the project is very basic, but at least it is a start. The IDE will handle the basics of getting a form to display regardless of whether it is a desktop app or a Web app. The programmer then compiles the source code (or tries to run it through an interpreter) to check for syntactical correctness. Once it compiles or interprets correctly, he runs it with test cases to verify logical correctness and to check for run-time bugs. Any bugs are logged to a text file; or maybe he has a debugger or crash dump analyzer to find out what went wrong. This continues until the program is considered complete.

Wow… we’ve traded in vi for Visual Studio and a bunch of print statements for [F11]. At the end of the day, nothing has really changed except for the tools, which are barely keeping pace with the increasing complexity of developing software. We are stuck, and we are stuck badly.

The Lisp advocates like to say, “Lisp would have taken over if only we had better/different hardware!” Now we have the hardware, and Lisp is still not taking over. I assert that we need to go far beyond something like Lisp at this point; we need to get into an entirely new type of language. A great example of this is multithreading.

IT pros usually say two things about bringing multithreading work mainstream: Compilers need to get a lot smarter before this can happen, and we cannot make the compilers smart enough for this to happen well. If this seems like an irresolvable contradiction, it isn’t. The problem is not the compilers — it’s our paradigms.

If you’ve spent a fair amount of time with multithreading, you know that it will never become a commonplace technique until the compiler can handle most of it automatically. The fact is that it’s tricky for most programmers to keep the details of a system where concurrency is an issue — and debugging these systems is a pure terror.

Smarter compilers are the answer, but there is a truth amongst the compiler folks that compilers will never get that smart. Why? Because properly translating a serial operation into a parallel one requires understanding the code’s intentions and not just the step-by-step how to guide that the source code models. I can read your code and (provided it is clear enough) understand probably 80% of your intentions up front and possibly figure out an additional 10% or 15% of your code with time. The other 5% or 10% will be a mystery until I ask you.

Read the comments on this post from Chad Perrin about Python and Ruby; these people are trying to optimize a trivial chunk of code. What is even more interesting is the point Chad makes over and over again in his responses (he posts as apotheon): Maybe the intention of the code was to produce that onscreen counter that is eliminated in most of the optimizations. Or maybe, despite it being slower due to the concatenation of immutable strings, there was a knock off effect of that as well. Who knows? The intentions cannot be made through code without having ridiculously verbose comments in the code or having an exacting product spec available.

Perl appears to understand intentions, but it is a trick. Perl was simply designed to make certain assumptions based on the way most programmers usually think in the absence of logical code. If your intentions do not square with Perl’s assumptions, it either will not run, or it will not run properly. If you do not believe me, try abusing Perl a little with a record format that specifies the tilde as a record separator instead of newline. Perl’s assumptions fly out the window unless you set the record separation string in your code; and, at that point, you are no longer actually using the assumptions — you are giving it precise directions.

I am all for self-documenting code. I like to think that no one reading my code had to struggle to figure out what it does. But aside from design specs and/or pages of inline comments, there is no way for someone reading it to know why I wrote the code that I did. It is like being in the Army. They usually don’t tell you why you need to march down a road, they just tell you to do it.

Programmers are in a bind. Without a way for the compiler to know the design specifications, the compiler cannot truly optimize something as complex as a parallel operation beyond really obvious optimizations. There are no hard and fast rules to making a serial operation parallel; it is a black art that involves a lot of trial and error even for frequent practitioners. In order for the compiler be able to understand those design specifications, it would require an entirely different type of language and compiler. Assertations and other “design by contract” items are not even the tip of the iceberg — they are the seagull sitting on the iceberg in terms of resolving the issue. If the compiler is smart enough to understand the design document, why even bother writing the code? UML-to-code tools generally try to do this, but they do not go far enough. At best, these tools can translate a process into code that literally expresses that process; there is no understanding of intention.

There are a few major schools of programming languages — object oriented, procedural, functional, and imperative — all of which have their strengths and weaknesses. Functional and imperative languages come the closest to what I am talking about. SQL is a great example. You tell it what you want not how to get what you want. The details of the “how” are up to the database engine. As a result, databases are optimized in such a way that they run a lot faster than what 99.9% of programmers could get the same stuff to run (not to mention redundancy, transactions, etc.) despite their rather general purpose nature. Even SQL shows a lot of weakness; while it is an imperative language, it is extremely domain specific. As soon as you want to manipulate that data, you land yourself in the world of procedural code either within a stored procedure or your application.

Most applications (and every library) become a miniature domain specific language (DSL) unto itself — not syntactically (although some, such as regex libraries, reach that status), but in terminology, subject matter, and so on. The problem you are most likely trying to address has absolutely nothing to do with the language you are using and everything to do with a specific, nonprogramming problem.

A great example of the disconnect between code and the problem it addresses is the subject of loops. I bet over 50% of the loops out there go over an entire data collection and do not exit without hitting every single element in the collection. The logic we are implementing is something like “find all of the items in the collection that meet condition XYZ” or “do operation F to all items in the collection.” So why are we working with languages that do not deal with set theory?

Programmers end up implementing the same thing over and over again (i.e., a function or object or whatever in a library that takes that big set as the input) and implementing a filter command that returns a smaller, filtered set, or an “ExecuteOnAll” method that takes a pointer to a function as the argument.

When the language folks try to make it easier on us, the result is stuff like LINQ, the welding of a DSL to a framework or language to compensate for that language or framework’s shortcomings in a particular problem domain.

The only way LINQ could get implemented is if the languages that support it have closures, which are “the thing” in a functional language like Lisp; closures are even important in Perl. But programmers have been working in Java and Java-like languages for the last five - 10 years (depending on how much of an early adopter you were), and a lot of folks missed Perl entirely, either sticking with VB and then the .NET languages or starting with C/C++ (or Pascal) and going to Java and/or .NET. The only exposure most programmers ever had to a truly dynamic language is JavaScript (aka ECMAScript). (By dynamic language, I mean a language that can edit itself or implement or re-implement functionality at run-time. JavaScript, with its support for eval(), has it and so does Perl. In functional languages like Lisp, this is all the language really can do.) In reality, JavaScript is not that bad; in fact, I like it quite a bit. I am just not a huge fan of the environment and object model programmers are used to seeing it in (the browser DOM).

Most of the programmers I have dealt with have little (if any) experience with a dynamic language. The result is that they don’t have the ability to think the way you need to about a dynamic language. These programmers stumble around blind; they intuitively know that they aren’t working to their potential, but they have no idea where to start. (Why else would there be the huge market for code generators and other gadgets to improve productivity by reducing typing?)

I do not believe that Lisp and its ilk are the answer; they are too difficult for most programmers to read through and understand, and the entire code is a back reference to itself. Procedural languages have also proven hard to maintain. Object-oriented languages are the easiest to maintain — at the expense of being insanely verbose, which is another reason for the code generators. Imperative languages are almost always going to be little more than DSLs.
Where do we go from here?

I want to see programmers headed towards intentional languages. I have no idea what they would look like, but the source code would have to be a lot more than the usual text file and would need to be able to express relationships within the data format, like XML or a database do. The programming paradigm needs to be able to quickly reduce any business problem to an algorithm so that the real work can go into writing the algorithm and then, at the last minute, translating the results to the domain’s data structures again. The programming paradigm would look like a mix of a lot of different ideas, but syntax and punctuation would have to be fairly irrelevant.

What are your thoughts about the programming paradigm? What direction do you think it needs to take?

No comments: