07 July 2010

What I Would Like in a Programming Language (Part 2)

Continuing from part 1...

Functions

Functions are what does the work. You can have your objects exist and hold a bunch of data, but without functions to do work on and with this data, we would live in a world of pure XML, which I think everybody can agree would be horrific. We use functions to find our social security number, pay our taxes and help the landlady with her garbage. I'm not saying that objects are guilty of virtually every computer crime we have a law for, but functions should really take a more prominent role in programming languages. There are two types of functions that I have special feelings for, the pure function on the lambda function.

Pure and Constant Functions

Pure functions are awesome. For those who hate reading, a pure function can be summarized by saying that it is a function that cannot alter anything but itself: no global memory, no I/O devices - nothing but the stack. The GCC documentation also defines a constant function as a special case of a pure function: it does not even read from global memory.  The C function strlen would be an example of a pure function, since it reads but does not alter global memory (dereferencing the pointer is considered an access to global memory).  A function like sqrt is considered a constant function since it touches nothing (as would Q_rsqrt).

Okay, so what's the point? There are three main reasons: optimization, multi-processing and verification. Optimization from marked pureness comes in two forms: dead code elimination and common subexpression elimination. Explaining how this works is a blog post on its own, but LWN did a pretty good job of this already. In summary: since the compiler can guarantee more about your code, it can do more about it.

Multi-processing comes in the form that once you are aware that a function is constant, you know that only the parameters you pass it matter. This means that all you need to do is move the data of the function to the processor running it and let it go. That's pretty abstract...how about a bigger example? Say you wanted to do some difficult task like find things in a million images. The constant function in this case would be the evaluation of an individual image against the set of feature descriptors. In the end, a central system can hand out a feature set and an image to a bunch of computers individually, knowing that they do not touch anything and getting their results at will because nothing effects anything else. Cool, huh? Now imagine if your compiler did this automatically. Awesome stuff.

The last thing I said was verification, which is probably the most important. What I mean by this is that you should be able to mark a function as pure and have the compiler check this for you. The most helpful case I imagine is based on the fact that a pure function can only call other pure functions (or constant, because a constant is a pure function). Likewise, a constant function can only call other constant functions. So you can easily guarantee that everything you do is working exactly like you expect, which is just fantastic.

I actually really like the way GCC already does this for C and C++ and I wish it would become more prominent in other languages. A similar feature in .NET is Microsoft Code Contracts, which is a pretty sweet tool that fits nicely with their system (although I would like to see it more prominently featured - a first-class citizen in the .NET world).

Lambda Functions

Lambda functions are awesome. I am not just saying that because one of my three readers would kill me with a rusty spork if I said otherwise, but because they are genuinely awesome. Lambda functions are one of the reasons I prefer C# and Scala over Java. The comparison of those three languages is actually a great example of why I think that lambda functions should be a first-order member of any language that wants to call itself awesome because Java's lack of them. Sure, anonymous inner classes can act useful, but lambda functions are more of the culture of the language. As a functional programmer, I find it irritating that I have to write my own Function class in basically every Java project that I do. It is the fact that they are not already there keeps them from populating the Java library. Imagine Java with something like Linq and take off a lot of random code bloat. Hmm...I just described Scala. People have been asking for lambdas in Java for a while and it looks like they are finally coming.

Yes, I realize I just harped on about Java, Scala and C#. My point is that lambda functions are just plain awesome and you should put them in your language no matter what, because they are incredibly beneficial. If C++0x can add lambda functions to that horror of a compilation model, you can too!

Self-Modifying Code

Optimization based on run-time properties

So let's say I have a structure called a Vector4, which contains four floating point numbers (all aligned properly in memory). If I have two of these things and want to add them together, I would like to do it really quickly (especially since this is something I do all the time). I can do this really quickly on x86 with the addps instruction from the SSE instruction set. However, I would really like my code to work perfectly fine on CPUs that do not support SSE and work faster on those that do. All in a single executable so the user does not even realize what is happening. Intel uses a technique in all their compilers called "CPU dispatching," which I think is a horrible name since that name is already taken by the actual CPU dispatcher. Whatever.

Anyway, there is all sorts of cool stuff you can do with this. In a language that allows you to express your intentions (the what instead of the how), this sort of thing could be taken to the max. Language writers should look to the way SQL servers optimize queries -- it is pretty cool and I think lessons from SQL could be taken into a compiled language. Related: Optimizing Hot Paths in a Dynamic Binary Translator.

Multi-stage compilation

Say what you will, but just-in-time compilation is really cool. Believe it or not, some people do not like to distribute their source code to all their customers (crazy, huh?). However, very few people have problems delivering byte code to people. Every decent scripting language has some sort of intermediate representation and some of the most popular languages today compile to a byte code. LLVM uses an intermediate representation so that it can perform common operations like optimization on any input language and easily generate code for multiple architectures. Bart de Smet had a good blog post on JIT optimization in .NET byte code. Pretty cool stuff.

Yeah, so there is a startup cost of having to compile the intermediate language to native architecture and extra expense of having to have a compiler sitting around on every system you want to run software on. But it's really not that bad, especially considering how cheap hard drive space is these days. And for really performance-critical things, you can do something like ahead-of-time compilation for a specific architecture (like Mono).

No comments:

Post a Comment