Antoine Kalmbach

A feature that often frustrates me in object-oriented code is the prevalence of useless interfaces. Interface isn’t meant literally here: this applies to traits of Rust/Scala and protocols of Clojure as well.

The advice of planning for the interface is just and solid, but people tend to follow this tip to the extreme. It is not uncommon to see people design a module or class by defining its interface first, with an actual implementation following later.

One eagerly designs an interface as follows:

trait Mediator {
  def validate(r: Request): Boolean
  def postpone(r: Request, d: Duration): Postponed[Request]
  def frobnicate(a: Request, b: Request): Frobnicated[Request]
}

Then, the implementation, in a class called MainMediator or DefaultMediator, in a separate directory, implements the Mediator interface:

class MediatorImpl extends Mediator {
  def validate(r: Request): Boolean = { ... }
  def postpone(r: Request, d: Duration): Postponed[Request] = { ... }
  def frobnicate(a: Request, b: Request): Frobnicated[Request] = { ... }
}

Dependents of the Mediator trait then naturally get their dependency provided with a constructor argument:

class Resequencer(m: Mediator, c: Clock) {
  def resequence(requests: Seq[Request]): Seq[Postponed[Request]] = 
    requests map { r => m.postpone(r, c.randomDelay()) }
}

val m = Mediator()
val c = Clock()
val foo = new Foo(m)

This pattern is older than the sun, and has been a characteristic of modern, inheritance-based object-oriented programming for ages. Fundamentally, it is alright to separate implementation from specification, but where it goes wrong is the overuse of this paradigm, or when this separation is superfluous.

This separation is superfluous when it serves no purpose. You could as well call it dependency injection on the toilet. There is no fundamental reason why a class or module like Mediator warrants an interface when it is likely that there will never be an alternative implementation.

At a glance, Guy Steele’s influential plan for growth idea from his “Growing a Language” talk seems to contradict what I just said. Shouldn’t defining an interface help us plan for future, alternative implementations of the Mediator? Perhaps a different kind of a Mediator?

Removing the Mediator trait and simply renaming its only implementation will still keep the code working, with the benefit that there are fewer lines of code now, and it isn’t any harder to extend.

This is actually more in line with Steele’s idea. It doesn’t say anywhere a trait or interface cannot be distilled from a set of basic implementations. In other words, when our intuition says to prepare for repetition, we should identify them. The Gang of Four book was never a recipe book for building great programs. It was a catalog! They observed several kinds of large-scale systems and programs and extracted repetetive behaviours in the code, patterns. They never said that to do things right, one ought to use the visitor pattern, or this other pattern, otherwise your programs will be bad.

Back to interface distillation. Programming is about getting rid of repetition. The more experienced the programmer, the better they get at noticing patterns of repetition. The downside is that this may also lead to overengineering for repetition.

So, an experienced programmer thinks, this behaviour I have specificed may be repetitive, let me first create a construct that lets me share the repetition (an interface), and then proceed with the implementation. This is fine if the actual code is known to be repeated, but by seeing interfaces as a hammer and every bit of code as a nail, you will soon bury yourself in pointless dependency injection scenarios.

It may be just as easy to first create a base implementation and once you must duplicate its behaviour, only then create the abstract implementation. You might actually need to spend less total time wiring up the interface, since you observed the repetition. Creating an abstract implementation first always involves a deal of speculation and this is not reliable.

The more experienced programmer understands that you don’t always need to plan for repetition. In fact, repetition is good sometimes. Not every shared functionality needs to be extracted to its own module, because sometimes, shared dependencies will be bad.

The approach I suggest is to in order to produce modularity as a side effect, structure your program into small, reusable pieces. Don’t create huge, monolithic interfaces. Functional programming shows us that dependency injection can be done just by passing a function.

class Foo(postponer: Request => Postponed[Request], delayer: =>Duration) {
  def resequence(requests: Seq[Request]): Seq[Postponed[Request]] = 
    requests map { r => postponer(r, delayer()) }
}

// ...
val m = Mediator()
val c = Clock()
val foo = new Foo(m.postpone, c.randomDelay)

One sees that a higher-order function like the above can be just as well represented by a trait with a single method. If you only need that method, you should depend only on that. With a structural type system it is easy to decompose types. An alternative is to stack traits, and in languages like Scala this is fairly easy. You could as well decompose Mediator into Validator, Postponer, et cetera, ideally interfaces should be fairly homogenous in their purpose: if your interface defines methods for reading, keep that interface separate from the writing interface, and if you need read and write just compose the interfaces together, and so on.

It also helps if your language is powerful enough to do without excessive DI – the reason why horrors like Spring exist is that Java simply wasn’t expressive enough to do dependency injection the traditional way without setting your hair on fire. That, and for some odd reason, people thought writing constructor arguments was so painful it warranted a gigantic framework for it.

Overall, it’s usually a good idea to toy around first with the concrete, and then extract the abstraction. Going the other way around is a dangerous swamp. It’s certainly something I’ve used to do – overengineer for patterns a priori – but I found better results by getting my hands dirty, by writing repetitive code first and then cleaning it up.


Scala gets lots of flak for implicits. Some of the feedback is justified: implicits in Scala can be quite intimidating or confusing for beginners. That does not justify their dismissal, as implicits, in all of their flavours, when used correctly, can be actually quite simple and powerful.

I recently had to do a refactoring for large program. The codebase was old, and wasn’t designed to cope with the sort of change I was going to introduce, and I didn’t have much time either. Pressed for time, but compelled by a modicum of professional pride, I didn’t want to half-ass the task by adding jury-rigged solutions that would have left me feeling dirty and empty inside, at worst, leaving a rotting mess to future developers—me. The codebase itself was simple, but large. Its task was more or less to serve as a REST API in front of a high-availability, fast database (Cassandra). One part of the program provided abstractions called collections of database tables. Each collection had a set of methods (such as get) that were then translated to a database query to fetch entities. An entity is a piece of data encapsulating some value. Using a fictional and simplified example of an Bork from a database:

case class Bork(id: Int,
                date: ZonedDateTime,
                frobnicate: Decimal)

trait Borks {
  // find a bork by id
  def get(by: Int): Future[Option[Bork]]
}

// ... in another place, another module

class BorksImpl extends Borks {
  def get(by: Int): Future[Option[Bork]] = {
      // fetch by UUID from the database
  }
}

Each collection trait was implemented by a real class (like RealEntries), hiding database logic behind a concrete implementation. Other parts of the program accessed these entities via the collection trait, like the API front here:

val borks: Borks = ...

// spray dsl
pathNamespace("borks" / IntNumber) { id =>
  get {
    complete {
      borks.get(id).toJson
    }
  }
}

The database in question was Cassandra, in which this database abstraction doesn’t really exist, as databases in Cassandra are actually just prefixes called keyspaces that map to physical directories on the disk. These keyspaces have some properties that separate one keyspace from another, but the point is that they are unlike traditional SQL databases: you need not connect to a database, you can simply switch your query for the table Foo, in keyspace A to B, by switching A.Foo to B.Foo. So, in Cassandra, these keyspaces are opaque and you can simply choose the appropriate keyspace with the right namespace in the table name part of the query.

The task was to support multiple, concurrent databases of entries. Previously, this program operated as a monolith, i.e. there was ever only one database it was operating on. Support was needed for concurrent access to several (possibly non-finite) databases, and the support had to come quickly.

Turns out the simple solution – instantiate one BorksImpl for each keyspace – was not available, as there could be entities in one shared keyspace mapping to other keyspaces. So, one collection like BorksImpl needed to know which keyspaces it was supposed to query, because this information is unavailable to the caller.

A way around the splitting and namespacing was consolidation, but this introduced security problems. We couldn’t simply consolidate all the entries into the same database, as we had access limitations – callers of get acting on keyspace Foo were not allowed to see the data in keyspace Bar. This justified the creation of a split by keyspace, isolating data for the purposes of permission control. This also destroyed the possibility of the above solution, i.e., instantiate one BorksImpl for each keyspace, because one BorksImpl might have needed to query for data from many keyspaces.

So, a request with an id 123 comes in at /borks/123, the application uses the central lookup table to find the target keyspace. The initial implementation looked like this.

trait Borks {
  // find a bork by id
  def get(namespace: String, by: Int): Option[Bork]
}

// ... in another place, another module

class BorksImpl extends Borks {
  def get(namespace: String, by: UUID): Option[Bork] = {
      // fetch by UUID from the database
  }
}

And update the caller API:

val borks: Borks = ...

pathPrefix("borks" / IntNumber) { id =>
  queryNamespace(id) { namespace =>
    get {
      complete {
        borks.get(namespace, borkId).toJson
      }
    }
  }
}

This was fairly simple, but painful, as the get methods of collections like Borks may have called other methods on other collections, nesting calls ever downward, as shown below in the example, where Borks calls barks.get and so forth. As a result, I had to deal with adding the namespace: String parameter to all methods on all collections. Remember, adding the namespace method as a field was not an option – the namespace was an extra parameter to every method invocation.

So I was dealing with transforming code that looked like this:

val barksImpl: Barks = ...
def aggregateWithBarks(id: Int, barks: Set[Int]): Future[Seq[Borks]] = {
   val aggregates = get(id) map { b =>
     b map { bork => 
       barks flatMap { bark =>
         barksImpl.get(bark.id) match {
            ...
         }
       }
     }
   }
   ...
}

and by adding namespace everywhere, I had to transform it into

val barksImpl: Barks = ...
def aggregateWithBarks(namespace: String, id: Int, barks: Set[Int]): Future[Seq[Borks]] = {
   val aggregates = get(namespace, id) map { b =>
     b map { bork => 
       barks flatMap { bark =>
         barksImpl.get(namespace, bark.id) match {
            ...
         }
       }
     }
   }
   ...
}

So I had to add namespace: String to barks.get and borks.aggregateWithBarks. Sounds tedious? Well, imagine there weren’t just one call to barksImpl.get, but tens, and imagine there weren’t just two collections, but a hundred – and tens of thousands of lines to refactor.

Specifically, I didn’t want to keep adding namespace, into every method call inside a method call, but chose to make it implicit instead. This way, I needed only pass the implicit parameter around, and I didn’t need to modify any of the nested method calls. I typed the namespace with a custom case class and added it as an implicit argument:

case class Namespace(namespace: String)

trait Borks {
  def get(id: Int)(implicit namespace: Namespace) = ...
  def aggregateWithBarks(id: Int, barks: Set[Int])(implicit namespace: Namespace) = ...
}

trait Barks {
  def get(id: Int)(implicit namespace: Namespace) = ...
}

So, that was one particularly nice use case for implicit parameters. The good thing is that if the datastore is redesigned cleanly so that you cannot access from one namespace (keyspace) to another, all you need is to instantiate BorksImpl and set implicit val namespace = ... upon instantiation, and the code will work just fine. Implicit parameters let me implement a painful refactoring very quickly.

Naturally, had I had more time, I would’ve done the separation properly, implemented namespacing rules more clearly, completely redesigning the database, and so forth. Anyway, with Scala implicits, I was able to do a non-proper solution in a way that did not elicit a “jesus christ what a hack” feeling. It didn’t pollute my code too much and it will be easy to refactor out when it’s no longer needed.

And, it turned out, I was able to benefit from other implicits: conversions and arguments. I needed the ability to convert from the Namespace entity into a String, as I had in the querybuilder syntax. I needed only to insert namespace instead of having to write namespace.namespace.

object Namespace {
  implicit def namespace2String(n: Namespace): String = n.namespace
}

Session.prepare(QueryBuilder.insertInto(namespace, table).values(...))

Another nice thing was using implicit arguments. The REST API gets the namespace from the URI segment as a parameter to the anonymous function. If I called borks.get I would have needed to put an implicit val n: Namespace = namespace. I avoided that using the implicit argument method:

val borks: Borks = ...

pathPrefix("borks" / IntNumber) { id =>
  queryNamespace(id) { implicit namespace: Namespace =>
    get {
      complete {
        borks.get(borkId).toJson
      }
    }
  }
}

The implicit namespace: Namespace => is equivalent to having namespace => implicit val n: Namespace = namespace; .... Very useful if you’re calling methods requiring implicits in closures, though potentially hazardous, if you’re not typing your implicits! A simpler example:

trait Vyx {
  def frobnicate(num: Int): Int
}

// contrived example, makes no sense
def foo(i: Int)(implicit vyx: Vyx) = {
   i * vyx.frobnicate(num)
}

val foo = Seq("one", "two", "three") map { implicit v: Vyx => foo(1) }

It’s a good idea to type your implicit values as defining an implicit x: X will yoink any implicit X in scope, and if this X happens to be a basic type like String, and you’re not careful, you end up with the wrong implicit value.

Implicits weren’t a new thing to me, this was just a scenario where I was able to simultaneously benefit from many kinds of implicits Scala has to offer (parameters, conversions and arguments). They let me perform an annoying refactoring quickly and painlessly, in a manner that was also future-proof.


Apache Camel is a routing and mediation engine. If that doesn’t say anything to you, let’s try this: Camel lets you connect endpoints together. These endpoints can vary. They can simple local components, like files, or external services like ActiveMQ or web services. It has a common language format for the data, so that your data can be protocol agnostic, and an intuitive DSL for specifying the connections and how the data should be processed between messages.

The common language consists of exchanges and messages. These are translated into protocol-specific formats (like a HTTP request) by components, which provide the technical implementation of that service, i.e., the translation of a simple Message into an actual HTTP request.

The connection method is an intuitive DSL that speaks in terms such as from and to. Informally, you can create a route that can, for example, read messages from ActiveMQ, and write them to a file. The language is much richer than this, grouping together things like aggregation, filtering, routing, splitting, load balancing, the list goes on.

Choosing what component to instantiate is done using an URI. An URI will identify the target component, e.g., rabbitmq://myserver:1234/... instantiates the RabbitMQ component, file:... instantiates the file component, netty4:... instantiates the Netty component (version 4.0). As long as the component is available in the classpath, it will be instantiated in the background by Camel. The total number of available components is huge! You have e.g.:

  • ActiveMQ, RabbitMQ, Kafka, AVRO connectors
  • Files and directories
  • REST, SOAP, WSDL, etc.
  • More esoteric ones like SMPP – yes, you can send SMSes with Camel!

So what’s the point? Let’s assume we need to integrate an upstream system Xyz into Bar. Xyz provides data to you using a binary JSON format, using some known protocol, like ActiveMQ. Then you need to apply some transformations to the data, finally sending it to Bar, which accepts XML, and requires the information to be POSTed to someURL.

In a non-camel setting, using your favorite language, to do this, you

  1. Using an ActiveMQ connector, you build your queue reader and de-serializer
  2. Apply your business logic (whatever that is) to the de-serialized data
  3. Transform into XML
  4. POST the data towards someURL using some HTTP library

Fairly straightforward, right? All you need are an ActiveMQ library, a HTTP library and something that works with JSON and XML.

Here’s where it gets hairy. Three months in, you are informed that the upstream source is converting to RabbitMQ. Oh well, you think, it’s nicer, faster, and implements a saner version of AMQP, why not. So you refactor ActiveMQ to RabbitMQ and there it is.

The point of Camel is this. The previous step requires you to manually refactor your ActiveMQ logic to RabbitMQ. But you’re just sending messages, you don’t really care about the protocol. You’re just sending messages to an endpoint, it’s the data you should care about, nothing else.

So here’s when Apache Camel comes in. It let’s you specify an URL like

rabbitmq://localhost/blah?routingKey=Events.XMC.*

to use the RabbitMQ component, and to painlessly switch to Kafka, you’d add a dependency to the camel-kafka artifact and specify the URL as

kafka:localhost:9092?topic=test

and the Camel Kafka component handles message delivery for you. Since you’re sending canonical camel messages, you needn’t trouble yourself on how this message is already sent. It is likely that you will have to add or remove some message headers though.

Now, you may be asking, is that it? Is it really that simple?

The answer is that it depends. Some components are better than others. If you want to be truly protocol and component agnostic, and you want to refactor from protocol Foo to Bar just by switching the URL of foo://... to bar://, you need to make sure that

  1. You can configure everything for that endpoint using the URI
  2. Message exchanges do not require extra shenanigans to work (no custom headers or a special format required)

Case in point, let’s compare switching from ActiveMQ to RabbitMQ. The first glaring difference is that the ActiveMQ component does not accept the host part in the URI. So we need to do something like

CamelContext ctx = new DefaultCamelContext();
ctx.addComponent("activemq", 
    ActiveMQComponent.activeMQComponent("tcp://USER:PASS@HOSTNAME?broker.persistent=false"));

This makes any activemq:... URI in the context ctx connect to the parameters configured.

Conversely, the RabbitMQ component lets you directly set this in the URI part (multiple addresses can be given with the addresses parameter). So if you’re going with ActiveMQ to RabbitMQ, your code actually becomes simpler, but the complexity merely moves to the URI. The other way around, you have to move your URI-configuration to actual code (or XML, but please, don’t).

So where does this lead us? Ideally, the situation is that given between a choice between three components, you could use an external configuration file that configures a simple URI. The right component is identified based on the URI, pulled out of the classpath. This assumes that, in order of importance,

  1. the endpoints are volatile and finite and can vary between different implementations,
  2. each implementation has a Component which is in the classpath, and
  3. said volatility varies often enough it warrants dynamic configurability via configuration editing and app restarts.

If all of the above hold true, Camel might a good fit for you. Otherwise, I’d be careful: the abstraction isn’t free! What this leads to is a kind of complexity shoveling: although with the RabbitMQ component we don’t need to use code to configure it, we move it to the URI. So it’s still a configuration point. Yet, it’s a nicer configuration point. As in the example above, we see that the connection contains three configurable variables USER, PASS, and HOSTNAME. So, in addition to having to configure the system using code, we have to still configure it otherwise, lest we hard-code the values into the application.

The above approach suffers from decentralization: you now have two places where you customize your system. The first is defining the custom component for a system in code. The second is configuring said custom component via other means.

Our ability to centralize configuration – any configuration, not just that of Camel – depends on the power of the configuration language. Too powerful, you end up in DSL hell. Not powerful enough, people write their own horror shows to add power.

Lastly, we run in the problem of universal pluggability, or universal composition. We imagine that systems like Camel let us “run anything” and “connect everything”, but the reality is different. Systems are usually made of a finite set of components. For practical purposes, it makes no sense to depend on every Camel component. Therefore, you need to pick your dependencies from this finite set of known endpoints. This effectively shatters the myth of universal pluggability.

Most importantly though, nobody really needs this. What really matters is the simplicity of extension. A well designed component is completely configurable through its URI parameters. These are easy to add to your Camel-based system: you only need to understand the new configuration, add the dependency and you’re done.

In summary, if you’re considering Apache Camel, make sure you check both of these, of which the second is most important.

  1. The components are volatile and you need to change them often, so that you can justify the pluggable hole (the changing URI!)
  2. The components you want exist and are completely configurable via that pluggable hole

If you’re unsure of the first item, you can still treat Camel as a lazy way to future-proof the system, e.g., by using one component now, while knowing that another may be used in the future. To that end, you need to make sure that the components fit the above requirements.

I’m currently working on a Clojure library for a Clojure-based routing DSL. It’s shaping up to be quite nice! Here’s an example of the routing DSL:

(route (from "netty4-http:localhost:80/foo")
       (process 
         (comp println body in))
       (to "rabbitmq://localhost:5672/foo"))

My goal is to make the DSL terse and functional (which the current model really isn’t) and to add Akka Camel Consumers and Producers to it. The nice thing about Clojure is that the macro system lets me define these really easily!

Overall, Camel is a nice abstraction, well worth the effort and years that has been put into it. It’s not a free abstraction, since there’s always a slight compatibility or configuration overhead. If it works, it removes programmers from the protocol level, moving them to the data level. This is the level where you should be working at, if your goal is to shuffle data around. For this purpose, when it works, Camel is excellent.

Conversely, if it doesn’t, it puts programmers at an awkward position: you’re still working with both data and protocol, and you have the overhead of the framework to deal with. Worse, your code is now polluted by the requirements of Camel endpoints, when the goal of Camel is to completely remove the requirements imposed by endpoints in general.

That said, in integration scenarios, Camel works most of the time, so you should always have a think about it before you start using it.