Continuing from part 1...

Functions

Functions are what does the work. You can have your objects exist and hold a bunch of data, but without functions to do work on and with this data, we would live in a world of pure XML, which I think everybody can agree would be horrific. We use functions to find our social security number, pay our taxes and help the landlady with her garbage. I'm not saying that objects are guilty of virtually every computer crime we have a law for, but functions should really take a more prominent role in programming languages. There are two types of functions that I have special feelings for, the pure function on the lambda function.

Pure and Constant Functions

Pure functions are awesome. For those who hate reading, a pure function can be summarized by saying that it is a function that cannot alter anything but itself: no global memory, no I/O devices - nothing but the stack. The GCC documentation also defines a constant function as a special case of a pure function: it does not even read from global memory. The C function strlen would be an example of a pure function, since it reads but does not alter global memory (dereferencing the pointer is considered an access to global memory). A function like sqrt is considered a constant function since it touches nothing (as would Q_rsqrt).

Okay, so what's the point? There are three main reasons: optimization, multi-processing and verification. Optimization from marked pureness comes in two forms: dead code elimination and common subexpression elimination. Explaining how this works is a blog post on its own, but LWN did a pretty good job of this already. In summary: since the compiler can guarantee more about your code, it can do more about it.

Multi-processing comes in the form that once you are aware that a function is constant, you know that only the parameters you pass it matter. This means that all you need to do is move the data of the function to the processor running it and let it go. That's pretty abstract...how about a bigger example? Say you wanted to do some difficult task like find things in a million images. The constant function in this case would be the evaluation of an individual image against the set of feature descriptors. In the end, a central system can hand out a feature set and an image to a bunch of computers individually, knowing that they do not touch anything and getting their results at will because nothing effects anything else. Cool, huh? Now imagine if your compiler did this automatically. Awesome stuff.

The last thing I said was verification, which is probably the most important. What I mean by this is that you should be able to mark a function as pure and have the compiler check this for you. The most helpful case I imagine is based on the fact that a pure function can only call other pure functions (or constant, because a constant is a pure function). Likewise, a constant function can only call other constant functions. So you can easily guarantee that everything you do is working exactly like you expect, which is just fantastic.

I actually really like the way GCC already does this for C and C++ and I wish it would become more prominent in other languages. A similar feature in .NET is Microsoft Code Contracts, which is a pretty sweet tool that fits nicely with their system (although I would like to see it more prominently featured - a first-class citizen in the .NET world).

Lambda Functions

Lambda functions are awesome. I am not just saying that because one of my three readers would kill me with a rusty spork if I said otherwise, but because they are genuinely awesome. Lambda functions are one of the reasons I prefer C# and Scala over Java. The comparison of those three languages is actually a great example of why I think that lambda functions should be a first-order member of any language that wants to call itself awesome because Java's lack of them. Sure, anonymous inner classes can act useful, but lambda functions are more of the culture of the language. As a functional programmer, I find it irritating that I have to write my own Function class in basically every Java project that I do. It is the fact that they are not already there keeps them from populating the Java library. Imagine Java with something like Linq and take off a lot of random code bloat. Hmm...I just described Scala. People have been asking for lambdas in Java for a while and it looks like they are finally coming.

Yes, I realize I just harped on about Java, Scala and C#. My point is that lambda functions are just plain awesome and you should put them in your language no matter what, because they are incredibly beneficial. If C++0x can add lambda functions to that horror of a compilation model, you can too!

Self-Modifying Code

Optimization based on run-time properties

So let's say I have a structure called a Vector4, which contains four floating point numbers (all aligned properly in memory). If I have two of these things and want to add them together, I would like to do it really quickly (especially since this is something I do all the time). I can do this really quickly on x86 with the addps instruction from the SSE instruction set. However, I would really like my code to work perfectly fine on CPUs that do not support SSE and work faster on those that do. All in a single executable so the user does not even realize what is happening. Intel uses a technique in all their compilers called "CPU dispatching," which I think is a horrible name since that name is already taken by the actual CPU dispatcher. Whatever.

Anyway, there is all sorts of cool stuff you can do with this. In a language that allows you to express your intentions (the what instead of the how), this sort of thing could be taken to the max. Language writers should look to the way SQL servers optimize queries -- it is pretty cool and I think lessons from SQL could be taken into a compiled language. Related: Optimizing Hot Paths in a Dynamic Binary Translator.

Multi-stage compilation

Say what you will, but just-in-time compilation is really cool. Believe it or not, some people do not like to distribute their source code to all their customers (crazy, huh?). However, very few people have problems delivering byte code to people. Every decent scripting language has some sort of intermediate representation and some of the most popular languages today compile to a byte code. LLVM uses an intermediate representation so that it can perform common operations like optimization on any input language and easily generate code for multiple architectures. Bart de Smet had a good blog post on JIT optimization in .NET byte code. Pretty cool stuff.

Yeah, so there is a startup cost of having to compile the intermediate language to native architecture and extra expense of having to have a compiler sitting around on every system you want to run software on. But it's really not that bad, especially considering how cheap hard drive space is these days. And for really performance-critical things, you can do something like ahead-of-time compilation for a specific architecture (like Mono).

Compiler Intrinsics

Intrinsic functions and properties are wonderful things. All the decent C family languages have intrinsic functions like sizeof and alignof, but sometimes you need support for more. While a language designer can try to think of every possible need and expose a good intrinsic for all potential future requirements, this is ultimately a losing battle for pretty obvious reasons. It would be really great if a user could extend the intrinsic properties of the compiler with their own domain-specific needs. I am imagining the compiler gets something like an extension sheet along with the source code so that these properties can be added to the system quite trivially. It would have make it so that people could extend the compiler without actually having to recompile the compiler -- it is the culture of extension that you really care about. Of course, I have no idea how the implementation would work, but it would be nice to have.

Ultimately, this could give a system like C++ type traits that do not feel like a complete hack. Of course, C++ type traits are extremely powerful, but frankly, they were never meant to do what they now do and, of course, just don't feel right. If concepts were not dropped from the C++0x standard, we could be about halfway to a cleaner solution; as it stands, we are stuck with using type traits.

To run completely away with the idea, something on the order of having a Lisp-like macro system where you have an extra program which spits out some abstract syntax tree from the input would be totally awesome. Okay, so this feature already exists in perfect form in Lisp, but I would love to see it in other languages as well.

Unit Testing

QuickCheck

Haskell is a wonderful place to draw examples from. A framework like QuickCheck is just awesome. Because let's face it: Nobody likes writing unit tests. Now, I'm not saying that they are completely unnecessary, just that they are are pain in the ass to write. Say you have a function with the signature sqrt(x : real) : real.

If you wanted to write some unit tests for this function, you would pound away at some known values of various square roots. For brevity, I'll eliminate specifying some range of results that we consider "valid."

assert_equal(sqrt(4), 2)
assert_equal(sqrt(100), 10)
assert_equal(sqrt(2), 1.4142135)
assert_failure(sqrt(-1))

Okay, that's halfway decent and pretty clear what I mean. But let's face it: out of the limited representational power of a real (whatever that may be), I am testing a very pathetic subset of all the possibilities. There might be negative values that somehow work or positive ones that do not - which is especially probable for very small or very large numbers. What would be nice is something with a signature like this:

sqrt(x : real @AcceptableRange(0 .. INFINITY))
: real @ResultCheck(r => (r * r) == x)

So the syntax is not completely readable, but the idea is that we are attaching some annotations to the parameter and the result of the function. These can be processed by whoever might need them, similar to the use of .NET attributes and Java annotations. However, I have stretched the allowable syntax to whatever you want -- in this case, a ranged primitive and a lambda function. The possibilities are endless! Compliers for the language doing static checking could find failures before they happen and IDEs could assist people with their problems.

Better yet, we could use...

Static Type Checking

I am a huge believer in static type checking. From the example above, things like just having a type modifier called unsigned makes exceptional conditions of the sqrt function impossible, which is really nice, since this is what you actually mean. Potential errors due to negative numbers are eliminated at compile time, because you simply cannot compile when there is a chance for an error. Compiler-enforced consistent behavior is an awesome thing.

User-Definable Primitive-looking Types

Let's say you are writing a math function that has the signature rotate(thing : Shape, angle : Real). This function rotates the Shape called thing by angle radians. Oh, you could not tell that I use radians by the method signature? That's a problem...

If we were doing some C++, we might have lines like:
typedef float Radians;
typedef float Degrees;

So the signature would look like: void rotate(Shape* thing, Radians angle). Now the method signature tells you which kind of unit you are using. Of course, the problem here is obvious: since Radians and Degrees are actually the same type, we are free to convert between the two and the compiler will not actually care that there is a difference (because, as far as it is concerned, there is not a difference).

So how can you make the compiler care? In C++, this is notoriously difficult (although possible). Once again, let us pretend that there is something perfect out there for me that looks like this:
type Radians is real range 0 .. 2 * PI
type Degrees is real range 0 .. 360

And then one could specify that there exists a scalar conversion between the two units:
conversion Radians <=> Degrees is implicit direct

That looks a little funny, but stay with me for a second. The <=> token means that the conversion is two-way. After that, I added some words just to show that my dream is really powerful. implicit means that there is an implicit conversion (as opposed to an explicit one) - the compiler is allowed to freely convert the units between each other (assuming it follows the rules of conversion). Then, direct is just a way to specify that there are no fancy conversion rules: the compiler just figures out that 180/PI is a good conversion factor for the number specified.

var angle = 180 degrees
rotate(thing, angle)

This is kind of along the lines of Ada with a little bit of extra conversion logic. A convenient system like this would be great for certain NASA contractors.

I've decided to split this incredibly long post up...

Travis Gockel

07 July 2010

What I Would Like in a Programming Language (Part 2)

Functions

Pure and Constant Functions

Lambda Functions

Self-Modifying Code

Optimization based on run-time properties

Multi-stage compilation

21 April 2010

What I Would Like in a Programming Language (Part 1)

Compiler Intrinsics

Unit Testing

QuickCheck

Static Type Checking

User-Definable Primitive-looking Types

Labels

Blog Archive

Contributors

Followers