Implicit power

Posted on March 15, 2017

Scala gets lots of flak for implicits. Some of the feedback is justified: implicits in Scala can be quite intimidating or confusing for beginners. That does not justify their dismissal, as implicits, in all of their flavours, when used correctly, can be actually quite simple and powerful.

I recently had to do a refactoring for large program. The codebase was old, and wasn’t designed to cope with the sort of change I was going to introduce, and I didn’t have much time either. Pressed for time, but compelled by a modicum of professional pride, I didn’t want to half-ass the task by adding jury-rigged solutions that would have left me feeling dirty and empty inside, at worst, leaving a rotting mess to future developers—me. The codebase itself was simple, but large. Its task was more or less to serve as a REST API in front of a high-availability, fast database (Cassandra). One part of the program provided abstractions called collections of database tables. Each collection had a set of methods (such as get) that were then translated to a database query to fetch entities. An entity is a piece of data encapsulating some value. Using a fictional and simplified example of an Bork from a database:

case class Bork(id: Int,
                date: ZonedDateTime,
                frobnicate: Decimal)

trait Borks {
  // find a bork by id
  def get(by: Int): Future[Option[Bork]]
}

// ... in another place, another module

class BorksImpl extends Borks {
  def get(by: Int): Future[Option[Bork]] = {
      // fetch by UUID from the database
  }
}

Each collection trait was implemented by a real class (like RealEntries), hiding database logic behind a concrete implementation. Other parts of the program accessed these entities via the collection trait, like the API front here:

val borks: Borks = ...

// spray dsl
pathNamespace("borks" / IntNumber) { id =>
  get {
    complete {
      borks.get(id).toJson
    }
  }
}

The database in question was Cassandra, in which this database abstraction doesn’t really exist, as databases in Cassandra are actually just prefixes called keyspaces that map to physical directories on the disk. These keyspaces have some properties that separate one keyspace from another, but the point is that they are unlike traditional SQL databases: you need not connect to a database, you can simply switch your query for the table Foo, in keyspace A to B, by switching A.Foo to B.Foo. So, in Cassandra, these keyspaces are opaque and you can simply choose the appropriate keyspace with the right namespace in the table name part of the query.

The task was to support multiple, concurrent databases of entries. Previously, this program operated as a monolith, i.e. there was ever only one database it was operating on. Support was needed for concurrent access to several (possibly non-finite) databases, and the support had to come quickly.

Turns out the simple solution – instantiate one BorksImpl for each keyspace – was not available, as there could be entities in one shared keyspace mapping to other keyspaces. So, one collection like BorksImpl needed to know which keyspaces it was supposed to query, because this information is unavailable to the caller.

A way around the splitting and namespacing was consolidation, but this introduced security problems. We couldn’t simply consolidate all the entries into the same database, as we had access limitations – callers of get acting on keyspace Foo were not allowed to see the data in keyspace Bar. This justified the creation of a split by keyspace, isolating data for the purposes of permission control. This also destroyed the possibility of the above solution, i.e., instantiate one BorksImpl for each keyspace, because one BorksImpl might have needed to query for data from many keyspaces.

So, a request with an id 123 comes in at /borks/123, the application uses the central lookup table to find the target keyspace. The initial implementation looked like this.

trait Borks {
  // find a bork by id
  def get(namespace: String, by: Int): Option[Bork]
}

// ... in another place, another module

class BorksImpl extends Borks {
  def get(namespace: String, by: UUID): Option[Bork] = {
      // fetch by UUID from the database
  }
}

And update the caller API:

val borks: Borks = ...

pathPrefix("borks" / IntNumber) { id =>
  queryNamespace(id) { namespace =>
    get {
      complete {
        borks.get(namespace, borkId).toJson
      }
    }
  }
}

This was fairly simple, but painful, as the get methods of collections like Borks may have called other methods on other collections, nesting calls ever downward, as shown below in the example, where Borks calls barks.get and so forth. As a result, I had to deal with adding the namespace: String parameter to all methods on all collections. Remember, adding the namespace method as a field was not an option – the namespace was an extra parameter to every method invocation.

So I was dealing with transforming code that looked like this:

val barksImpl: Barks = ...
def aggregateWithBarks(id: Int, barks: Set[Int]): Future[Seq[Borks]] = {
   val aggregates = get(id) map { b =>
     b map { bork => 
       barks flatMap { bark =>
         barksImpl.get(bark.id) match {
            ...
         }
       }
     }
   }
   ...
}

and by adding namespace everywhere, I had to transform it into

val barksImpl: Barks = ...
def aggregateWithBarks(namespace: String, id: Int, barks: Set[Int]): Future[Seq[Borks]] = {
   val aggregates = get(namespace, id) map { b =>
     b map { bork => 
       barks flatMap { bark =>
         barksImpl.get(namespace, bark.id) match {
            ...
         }
       }
     }
   }
   ...
}

So I had to add namespace: String to barks.get and borks.aggregateWithBarks. Sounds tedious? Well, imagine there weren’t just one call to barksImpl.get, but tens, and imagine there weren’t just two collections, but a hundred – and tens of thousands of lines to refactor.

Specifically, I didn’t want to keep adding namespace, into every method call inside a method call, but chose to make it implicit instead. This way, I needed only pass the implicit parameter around, and I didn’t need to modify any of the nested method calls. I typed the namespace with a custom case class and added it as an implicit argument:

case class Namespace(namespace: String)

trait Borks {
  def get(id: Int)(implicit namespace: Namespace) = ...
  def aggregateWithBarks(id: Int, barks: Set[Int])(implicit namespace: Namespace) = ...
}

trait Barks {
  def get(id: Int)(implicit namespace: Namespace) = ...
}

So, that was one particularly nice use case for implicit parameters. The good thing is that if the datastore is redesigned cleanly so that you cannot access from one namespace (keyspace) to another, all you need is to instantiate BorksImpl and set implicit val namespace = ... upon instantiation, and the code will work just fine. Implicit parameters let me implement a painful refactoring very quickly.

Naturally, had I had more time, I would’ve done the separation properly, implemented namespacing rules more clearly, completely redesigning the database, and so forth. Anyway, with Scala implicits, I was able to do a non-proper solution in a way that did not elicit a “jesus christ what a hack” feeling. It didn’t pollute my code too much and it will be easy to refactor out when it’s no longer needed.

And, it turned out, I was able to benefit from other implicits: conversions and arguments. I needed the ability to convert from the Namespace entity into a String, as I had in the querybuilder syntax. I needed only to insert namespace instead of having to write namespace.namespace.

object Namespace {
  implicit def namespace2String(n: Namespace): String = n.namespace
}

Session.prepare(QueryBuilder.insertInto(namespace, table).values(...))

Another nice thing was using implicit arguments. The REST API gets the namespace from the URI segment as a parameter to the anonymous function. If I called borks.get I would have needed to put an implicit val n: Namespace = namespace. I avoided that using the implicit argument method:

val borks: Borks = ...

pathPrefix("borks" / IntNumber) { id =>
  queryNamespace(id) { implicit namespace: Namespace =>
    get {
      complete {
        borks.get(borkId).toJson
      }
    }
  }
}

The implicit namespace: Namespace => is equivalent to having namespace => implicit val n: Namespace = namespace; .... Very useful if you’re calling methods requiring implicits in closures, though potentially hazardous, if you’re not typing your implicits! A simpler example:

trait Vyx {
  def frobnicate(num: Int): Int
}

// contrived example, makes no sense
def foo(i: Int)(implicit vyx: Vyx) = {
   i * vyx.frobnicate(num)
}

val foo = Seq("one", "two", "three") map { implicit v: Vyx => foo(1) }

It’s a good idea to type your implicit values as defining an implicit x: X will yoink any implicit X in scope, and if this X happens to be a basic type like String, and you’re not careful, you end up with the wrong implicit value.

Implicits weren’t a new thing to me, this was just a scenario where I was able to simultaneously benefit from many kinds of implicits Scala has to offer (parameters, conversions and arguments). They let me perform an annoying refactoring quickly and painlessly, in a manner that was also future-proof.

Previous: Apache Camel and the price of abstractions Next: Useless interfaces

Antoine Kalmbach