12 Mar 2017

Languages Should Let Us Control Side-effects

The thing I worry about the most in my programs is state — the values of variables and objects. Do they have the right values? Are impossible values prohibited? Are the state transitions correct? I've found it prudent to stop and think about how to structure my program to minimise the chance of objects getting into invalid states.

We can use help from the language, with support for immutable classes, and controlling side-effects. 

Immutable Classes

Immutable objects help reduce unintended side-effects, especially when working with a third-party library, especially a closed-source one. But immutable objects are very hard to implement in languages that don't natively support them.

For example, do you think this Java class is immutable?

class Person {
    private int age;
    static Person lastCreatedPerson = null;

    Person(int age) {
        lastCreatedPerson = this;
        this.age = age;
    }

    int getAge() {
        return age;
    }
}

This isn't immutable, because a reference to the object can be accessed by another thread before the field is initialised. So let's fix that, by eliminating the static variable, and make the field final too, to be safe:

class Person {
    private final int age;

    Person(int age) {
        this.age = age;
    }

    int getAge() {
        return age;
    }
}

Is this immutable? Again, no:

class MischievousPerson extends Person {
    public int age2;

    @Override int getAge() {
        return age2;
    }
}

Since the field is now public, the object's state can be changed. Suppose we fix this by making the class final. Consider this example:

final class Person {
    private final List<Person> friends;

    Person(List<Person> friends) {
        this.friends = friends;
    }

    List<Person> getFriends() {
        return friends;
    }
}

Is this immutable? No, because you could pass in a reference to a mutable list:

List<Person> friends = ...;
Person kartick = new Person(friends);
friends.add(...);

So let's say we change the constructor to make a defensive copy:

Person(List<Person> friends) {
    this.friends = new ArrayList<>(friends);
}

Is this now immutable? No:

Person kartick = ...;
kartick.getFriends().add(...);

See how hard it is to define an immutable class?

So languages should natively support immutable classes, perhaps via an immutable keyword:

immutable class Person {

The compiler will do whatever it takes to make the class immutable. It would make the class final. It will make all fields final, and if they're of class type, ensure that they're themselves immutable. Or defensively copied, and that only side-effect free functions are invoked on them.

The compiler will make sure that objects remain immutable even in the presence of race conditions, that the state of the object won't appear to have changed when accessed by multiple threads without synchronisation. Maybe you haven't thought of multithreading. Or you don't understand the memory model of the language. Or don't know what a memory model is. The compiler will still guarantee that an immutable object is, in fact, immutable.

You don't have to be concerned about all these implementation details, just that it is immutable.

Even highly-skilled programmers make mistakes, so let's have the compiler do the hard work for us [1]. We humans can then focus on the high-level properties of our classes, not on the mechanics of their implementaiton.

Controlling Side Effects

Not all objects can be immutable, so we should think about how to control side effects [2]. A side effect is when you invoke a function that, rather than just returning a value, modifies something, like an argument, or the receiver [3].

You may think you can figure that out by looking at the implementation of a function, but it may call other functions, which call yet others, and so on, so it's hard to track whether an argument you pass in will be modified somewhere down the call tree.

To fix this problem, functions that modify parameters should annotate those parameters mutable, like in this example:

static void copy(mutating List destination, List source)

This says that the copy() method may modify the destination list, say by adding elements to it. You can safely pass any list as the source parameter, knowing it won't be modified [3]. No need to make defensive copies, which you can forget, causing bugs. Defensive copies also hurt performance. You shouldn't need to waste time making a copy if you can prove that it won't be modified, anyway.

The mutable keyword also applies to methods:

class Person {
  int getAge();
  mutable void setAge(int age);
}

This says that the setter modifies the Person object [4].

Parameters being immutable cascades up and down the call tree. The copy()method above can call only methods on source that are side-effect-free. Mutable functions and mutable methods go together — if the List class didn't annotate its methods as being mutable or not, the compiler won't be able to verify that the copy() function is conforming to its contract.

For clarity, the language could require the mutating keyword to be used at the call site, too:

List a = ...
List b = ...
copy(mutating b, a);

This eliminates the risk that you'll accidentally pass in an object without realising it can be modified, causing bugs.

This is a stronger guarantee than C++'s const, which you can const_cast away. Here, there'll be no const_cast, which would defeat the point.

This offers a stronger guarantee than Haskell, even, in a way. In Haskell, functions are supposedly side-effect-free, but you can use unsafePerformIO to cause side-effects, like modifying a variable. A Haskell function a() can call b() which calls c() which uses unsafePerformIO, and callers of a() won't know that a() can have a side-effect. Not in our language, which requires callers to be warned via the mutating keyword.

Conclusion

Tracking state is one of the hard parts of building software, and the most important, from the users' point of view, so our languages should evolve to let us control side effects.

[1] It's more helpful to have the compiler validate such high-level properties, as opposed to the administrivia that today's statically-typed languages distract us with.

[2] We don't need pure functions. It's okay for a function to do some computation based on the current time, for example. Or print something to the console. The thing we want to prevent is changing anything the values of variables and objects you have. Pure functions will be too high a barrier, with little benefit.

[3] The receiver is the object you're calling the method on.

[3] Parameters default to readonly, with mutable parameters requiring an annotation. This is perhaps better than the opposite, because people may not mark readonly parameters as such, which in turn forces their callers to make them mutable. The right thing should be the path of least resistance.

[4] Though any method name beginning with "set" may be automatically considered mutable, to reduce the overhead of declaring things mutable that are obviously mutable.

No comments:

Post a Comment