21 Mar 2017

Four Levels of Productivity with Memory Management

Programming languages offer different ways of managing memory, trading off programmer productivity for performance [1]. Let's rank them in terms of productivity: which would let you build your software faster, and with fewer bugs to fix?

The easiest kind of memory management to use is garbage collection as in Java or Go.

Slightly less productive is Swift's automatic reference-counting [2]. This maintains a reference count for each object, and when the count reaches zero, the object is deallocated. You still have to worry about reference cycles, by annotating one of the references as weak. Weak references don't prevent the deallocation of the referred-to object. Instead, they become null when the target is deallocated. Swift offers a second kind of weak reference, unowned references, which are just weak references followed by a "not null" runtime check at each point of access [3].

All this has a much steeper learning curve than garbage-collection.

Even after you're comfortable with it and used it for years, as I have, ref-counting keeps resulting in occasional bugs. I recently had a crash where I registered a callback for the system to notify me when the user's location changed. But the callback object was getting deallocated by then, because the system had a weak reference to it, not a strong reference, causing a crash when it was invoked [4]. You can also make the opposite mistake, ending up with a strong reference cycle, causing a memory leak. These are the two problems garbage-collection doesn't have.

In addition to the learning curve, experienced developers aren't immune to making mistakes, so automatic reference-counting is less productive to program in than garbage-collection.

The next lower level of abstraction is something like Rust, which has Swift-style reference counts, but also other kinds of references: thread-unsafe reference-counts, unique references and temporary references. These are merely special cases of Swift-style reference counts, for performance, so choosing between them is an overhead you don't have in Swift.

Again, there are two aspect to the complexity: the learning curve, and the continual overhead of thinking about and choosing the right reference. This makes Rust's memory management less productive than Swift's.

Even less productive than Swift is something like C or C++, where it's much easier to accidentally make a whole range of mistakes all the higher levels of abstraction above prevent: use after free, double-freeing, freeing an object using a pointer of the wrong type [5], freeing an array the wrong way [6], and leaking memory. That's a whole host of failure modes the more abstracted memory management systems above don't suffer from [7].

Ranking these from most to least productive, the most productive is garbage-collection. Slightly less productive is Swift-style automatic reference counting. Slighting less productive is Rust, with four different kinds of references you must choose from, which are all special cases of ref-counting. The least productive way is the C++ way, which exposes you to a whole range of runtime crashes and bugs.

Use the most productive abstraction that works for your use case. If not, you're wasting your time, delaying your product, and risking losing in the market.

For example, for a server, I'd pick a garbage-collected language over a ref-counted one, everything else being equal.

[1] Which includes responsiveness and peak memory required.

[2] I'm ignoring languages that require you to manually insert calls to increment and decrement the reference-count, since there's no reason to do it manually what the compiler can, without bugs.

[3] As far as I understand, Swift doesn't let you recover from such errors. This is a problem when you're invoking a third-party library, or want to contain failures. For example, I'm building a camera app, and one of the things a camera app must do is geo-tag photos. In Java, I can do:

try {
  geotag();
} catch (Throwable t) {
  logToServer(t);
}

I can't do this in Swift since I can't catch the nil dereference as an exception and keep going. The app crashes, which is the last thing the user wants at that point. That's a bad failure mode, the opposite of graceful degradation. A camera app that can't geotag the photo should at least be able to save the photo without geotagging it.

As another example, if you have a server that encounters a bug while servicing a request, you'd want that request to fail, rather than bring down the entire server. Again, Swift doesn't let you do that.

[4] Which is still better than C++, where you can corrupt memory and keep running. Swift crashes immediately and cleanly.

[5] Unless you have a virtual destructor.

[6] In C++, if you allocate an array, you should deallocate it using the delete[] operator, not the normal delete operator. Not doing so could corrupt memory.

[7] C++ does have ref-counted pointers, but they're painful to use and produce less readable code:

shared_ptr<Foo> p(new Foo);

... compared to plain pointers:

Foo *p = new Foo;

C++ tutorials also start with teaching people to use plain pointers.

Swift and Rust make the opposite choice, by making safe references the default type. If anything, using an unsafe pointer in Swift is harder:

let p = UnsafeMutablePointer<Foo>.alloc(1)
p.initialize(...)

than using a ref-counted pointer:

let p = Foo()

Just because both C++ and Swift have both safe and unsafe pointers doesn't make them equivalent. The important point is that Swift chooses a more productive default than C++. Safe references in Swift are easier to learn and produce more readable code, and beginners start with them as opposed to unsafe references.

Defaults are powerful. Most people stick with the default, which makes C++ memory management a lower level of abstraction than Swift's. Even if you use a ref-counted pointer in C++, a library you use may not, so you're back at the lower level of abstraction.

No comments:

Post a Comment