The expression problem as a litmus test
The expression problem is a famous problem in programming languages.
“The Expression Problem is a new name for an old problem. The goal is to define a datatype by cases, where one can add new cases to the datatype and new functions over the datatype, without recompiling existing code, and while retaining static type safety (e.g., no casts).”
Using interfaces (like in Java) as the datatype example, the problem simply asks whether it is possible to derive the interface and add new methods to the interface, without having to recompile existing code or to resort to using casts.
Obviously, in a OOP language it’s easy to derive interfaces, but the problem uncovers the rigidity of the type system: you can’t modify (i.e. extend) the interface, because you have to modify all the classes classes that implement the interface.
Conversely, in functional programming languages, adding new methods operating on the interface is easy. Consider the canonical OCaml example:
type shape = Circle of float | Rectangle of float * float
let area shp = match shp with
Circle radius -> 3.14159 *. radius *. radius
| Rectangle (width, height) -> width *. height
let vertices shp = match shp with
Circle _ -> infinity
| Rectangle (_, _) -> 4
So in FP, you could create a function called volume
that computes the volume for the existing
types, and you needn’t touch the above code. However, as soon as you do that, you realize you’ve
made a silly mistake: our shapes are flat so their volume is zero. Quickly, you realize you need a
three-dimensional Cube
shape.
let volume shp = match shp with
Circle _ -> 0.
| Rectangle _ -> 0.
| Cube s -> a *. a *. a (* Cube isn't defined *)
Oops!
Here’s the onion: to get the Cube
working, you’ll have to modify the existing code in two places:
the definition of shape
and both methods area
and vertices
.
In OOP, this isn’t the case, since you can just derive the new IShape
interface and be done with
it, but the problem arises when you’re adding the volume
function, because you need to modify
IShape
, and thus every class that derives it.
In FP, adding new functions over the datatypes is easy, but adding new cases to the datatype is tricky, because you have to modify existing functions. In OOP, adding new functions over the datatype is hard, because you need to modify each implementation; adding new cases to the datatype is easy, since all you need to do is derive the interface. FP has invented a multitude of ways to deal with this problem, ranging from type classes, traits to protocols; OOP usually solves with either patterns or open classes. Ruby’s refinements can be used for this purpose as well.
Polymorphic variants
That’s a quick introduction to the problem. I think the expression problem is a perfect litmus test of sorts for programming languages, that is, the measure of the expressive power of the language is the quality of the solutions the language presents to the expression problem.
The expression problem is theoretically solvable in any language, but to varying degrees of elegance. In Java one must resort to using the visitor pattern, and in my mind this is the most inelegant way of going about it. I would rate the solutions on a spectrum: with the most basic solution being the visitor pattern, at the other end we have something like polymorphic variants and type classes. Multimethods and protocols are somewhere in between.
When you compare polymorphic variants of OCaml with Haskell’s type classes, there’s a marked difference in brevity. Polymorphic variants are succincter than type classes but cannot provide the same level of type safety.
type shape = [ `Circle of float | `Rectangle of float * float ]
let area shp = match shp with
`Circle radius -> radius *. radius *. 3.14159
| `Rectangle (w, h) -> w *. h
let vertices shp = match shp with
`Circle radius -> infinity
| `Rectangle (w, h) -> 4.
Not too different from the above declaration, the type is surrounded with brackets and the types are preceded with backticks. Recreating the volume function is easy.
let volume shp = match shp with
`Circle _ -> 0.
| `Rectangle _ -> 0.
| `Cube a -> a *. a *. a
So now I’ve extended the shape
type with another type Cube
, and I haven’t touched vertices
and
area
functions. The volume
function can be done even more succinctly:
let short_volume shp = match shp with
(* no volume in two dimensions! *)
#shape -> 0.
| `Cube a -> a *. a *. a
It is also possible to constrain the polymorphic variants:
let flatten shp = match shp with
#shape as x -> x
| `Cube a -> `Rectangle a
The type of this function is
[ < `Circle of float | `Cube of float | `Rectangle of float * float ] -> [> shape]
. The
[< A | B]
means a closed type: it can be only A
or B
, but nothing else, and [> Foo]
means
“Foo or something else”. So the flatten
function accepts Circle
, Rectangle
or Cube
and
returns a shape
(or possibly something else). Trying to run flatten (`Sphere 4)
produces a
type error:
# flatten (`Sphere 3);;
Characters 8-19:
flatten (`Sphere 3);;
^^^^^^^^^^^
Error: This expression has type [> `Sphere of int ]
but an expression was expected of type
[< `Circle of float
| `Cube of float * float
| `Rectangle of float * float ]
The second variant type does not allow tag(s) `Sphere
However, the following code compiles:
type polytope = [ shape | `Cube | `Octahedron ]
let frobnicate pt =
let flattened = flatten pt in
match flattened with
#shape -> "Already flaaat!"
| `Octagon -> "Eight coorneeeeerss"
The compiles, although we didn’t tell the compiler that flatten
does not return
Octagon
. There are two ways to fix this: either explicitly annotate pt
to be of type
polytope
, which produces this error:
Error: This expression has type polytope
but an expression was expected of type
[< `Circle of float | `Cube of float | `Rectangle of float * float ]
The second variant type does not allow tag(s) `Octahedron
It is possible to further constrain the type with type annotations. We can make sure that the
flatten
function returns only flat shapes:
let safe_flatten shp : [< shape] = match shp with
#shape as x -> x
| `Cube a -> `Rectangle a
| `Sphere r -> `Circle r
This produces the error:
Error: This pattern matches values of type [? `Octagon ]
but a pattern was expected which matches values of type shape
The second variant type does not allow tag(s) `Octagon
Not a silver bullet
Unfortunately, polymorphic variants are problematic. The problem with polymorphic variants is you quickly reach an absurd level of complexity and are forced to use annotations or subtyping to ensure maximal type safety. So although polymorphic variants are nice, and they do let us solve the expression problem, they’re an unsteady compromise between type safety and brevity. You can certainly make elegant abstractions with them but they get unwieldy quickly. They aren’t as efficient compared to regular variants either.
So what are the options? In OCaml 4.02, you can use extensible variant types:
type boring_shape = ..
type boring_shape += Circle of float | Square of float
let boring_area shp = match shp with
Circle r -> r *. r *. 3.14159
| Square a -> a *. a
| _ -> 0.
type boring_shape += Rectangle of float * float
let radical_area shp = match shp with
Circle _ as c -> boring_area c
| Square _ as s -> boring_area s
| Rectangle (w, h) -> w *. h
| _ -> 0.
An extensible variant is defined using ..
, and extension is done with the +=
operator. The
caveat is that you must handle the default _
case in pattern matching. Extensible variants are
another neat trick for solving the expression problem.
A measure of expressive power
The expression problem is a great litmus test that measures the expressive power of a programming language. The actual measurement of the test can be either the brevity of the code or its type safety. The solutions range from the clumsy Visitor Pattern in Java to polymorphic and extensible variants in OCaml and to type classes in Haskell. Clojure and Elixir have protocols that are both quite nice but not so type-safe since both are dynamically typed languages. What is more, since the expression problem is also about type safety, then strictly speaking the problem isn’t valid in a dynamic language. Any Lisper knows that Lisps are super expressive anyway.