# Envisioning a new tracing system

I’ve begun designing a new kind of tracing system for distributed systems. With tracing system I mean something like Zipkin or Prometheus: they trace the flow of information in distributed systems.

I think the tool should offer a complementary view on the system, it’s not going to offer detailed logs or individual traces.

Instead, it’s something what I call an overhead monitor, an observatory from above. It doesn’t track individual elements flowing in the system, to put it metaphorically, it lives too high up in order to see individual ants in a colony. What it can see is the flow and not-flow of ants, and measure their speed.

I’m interested in the big picture. Is information flowing the way it should? How fast is it traveling? Is my context propagated correctly? How much information is there flowing per second, minute, or hour?

The idea is to monitor rate. It would be a bit like a traffic monitor. What you could use it for is to instantly read the amount of information flow in the system. The flow would be represented as a directed graph.

Earlier last year I sketched a design document, so I won’t go into the details too much, if you’re interested in those, go read it.

So far, I have figured the following design characteristics:

• All you need is to provide the names of nodes in the network, and the system will figure out the directions of the information flows.
• It is optionally possible to design some quality attributes but these require manual configuration. For example, you’d have to state that “95% of requests from A to B should happen within 200 ms”.
• The system is God; it sees all. That is to say, all the events in the network should be fed into the system.
• Because God systems are terrible and prone to failure, the aim is to support distribution, such that you can have multiple observers to which you can partition the events. The big picture will be assembled by combining the observer data.
• Configuration should be easy, with a humane syntax like YAML, and the possibility to override this.

The system doesn’t have to worry about spans, that’s for other systems. All you need to do is propagate context.

I have thought about distributed tracing previously. I’ve found that many of those questions are still unanswered. Most tracing systems like Zipkin and Prometheus do very well with individual elements and span lengths, but they naturally require you to configure the spans in the code.

My aim with the observatory-like approach is to make it extremely simple to track flow rates. All you need is the following:

• Propagate context in the system, like a tracing token coupled with a timestamp
• Aggregate logs to the observatory

The second problem is the harder one: if you centralize the system, every node in the system will just log events there, and the system is prone to failure. My idea is to overcome this limitation by decentralizing observation, and then using something like the Gossip protocol to combine the results.

It doesn’t need to react immediately (probably within seconds), so the slowness of Gossip is not a threat to the design. Observation data is also initially ephemeral, I’d most likely prefer using event sourcing to build the observation model.

I haven’t chosen the technology yet, but most likely it will be written in Scala and then event sourcing will most likely based on Kafka.

Now only to find the time necessary to build this.

# Useless interfaces

A feature that often frustrates me in object-oriented code is the prevalence of useless interfaces. Interface isn’t meant literally here: this applies to traits of Rust/Scala and protocols of Clojure as well.

The advice of planning for the interface is just and solid, but people tend to follow this tip to the extreme. It is not uncommon to see people design a module or class by defining its interface first, with an actual implementation following later.

One eagerly designs an interface as follows:

trait Mediator {
def validate(r: Request): Boolean
def postpone(r: Request, d: Duration): Postponed[Request]
def frobnicate(a: Request, b: Request): Frobnicated[Request]
}


Then, the implementation, in a class called MainMediator or DefaultMediator, in a separate directory, implements the Mediator interface:

class MediatorImpl extends Mediator {
def validate(r: Request): Boolean = { ... }
def postpone(r: Request, d: Duration): Postponed[Request] = { ... }
def frobnicate(a: Request, b: Request): Frobnicated[Request] = { ... }
}


Dependents of the Mediator trait then naturally get their dependency provided with a constructor argument:

class Resequencer(m: Mediator, c: Clock) {
def resequence(requests: Seq[Request]): Seq[Postponed[Request]] =
requests map { r => m.postpone(r, c.randomDelay()) }
}

val m = Mediator()
val c = Clock()
val foo = new Foo(m)


This pattern is older than the sun, and has been a characteristic of modern, inheritance-based object-oriented programming for ages. Fundamentally, it is alright to separate implementation from specification, but where it goes wrong is the overuse of this paradigm, or when this separation is superfluous.

This separation is superfluous when it serves no purpose. You could as well call it dependency injection on the toilet. There is no fundamental reason why a class or module like Mediator warrants an interface when it is likely that there will never be an alternative implementation.

At a glance, Guy Steele’s influential plan for growth idea from his “Growing a Language” talk seems to contradict what I just said. Shouldn’t defining an interface help us plan for future, alternative implementations of the Mediator? Perhaps a different kind of a Mediator?

Removing the Mediator trait and simply renaming its only implementation will still keep the code working, with the benefit that there are fewer lines of code now, and it isn’t any harder to extend.

This is actually more in line with Steele’s idea. It doesn’t say anywhere a trait or interface cannot be distilled from a set of basic implementations. In other words, when our intuition says to prepare for repetition, we should identify them. The Gang of Four book was never a recipe book for building great programs. It was a catalog! They observed several kinds of large-scale systems and programs and extracted repetetive behaviours in the code, patterns. They never said that to do things right, one ought to use the visitor pattern, or this other pattern, otherwise your programs will be bad.

Back to interface distillation. Programming is about getting rid of repetition. The more experienced the programmer, the better they get at noticing patterns of repetition. The downside is that this may also lead to overengineering for repetition.

So, an experienced programmer thinks, this behaviour I have specificed may be repetitive, let me first create a construct that lets me share the repetition (an interface), and then proceed with the implementation. This is fine if the actual code is known to be repeated, but by seeing interfaces as a hammer and every bit of code as a nail, you will soon bury yourself in pointless dependency injection scenarios.

It may be just as easy to first create a base implementation and once you must duplicate its behaviour, only then create the abstract implementation. You might actually need to spend less total time wiring up the interface, since you observed the repetition. Creating an abstract implementation first always involves a deal of speculation and this is not reliable.

The more experienced programmer understands that you don’t always need to plan for repetition. In fact, repetition is good sometimes. Not every shared functionality needs to be extracted to its own module, because sometimes, shared dependencies will be bad.

The approach I suggest is to in order to produce modularity as a side effect, structure your program into small, reusable pieces. Don’t create huge, monolithic interfaces. Functional programming shows us that dependency injection can be done just by passing a function.

class Foo(postponer: Request => Postponed[Request], delayer: =>Duration) {
def resequence(requests: Seq[Request]): Seq[Postponed[Request]] =
requests map { r => postponer(r, delayer()) }
}

// ...
val m = Mediator()
val c = Clock()
val foo = new Foo(m.postpone, c.randomDelay)


One sees that a higher-order function like the above can be just as well represented by a trait with a single method. If you only need that method, you should depend only on that. With a structural type system it is easy to decompose types. An alternative is to stack traits, and in languages like Scala this is fairly easy. You could as well decompose Mediator into Validator, Postponer, et cetera, ideally interfaces should be fairly homogenous in their purpose: if your interface defines methods for reading, keep that interface separate from the writing interface, and if you need read and write just compose the interfaces together, and so on.

It also helps if your language is powerful enough to do without excessive DI – the reason why horrors like Spring exist is that Java simply wasn’t expressive enough to do dependency injection the traditional way without setting your hair on fire. That, and for some odd reason, people thought writing constructor arguments was so painful it warranted a gigantic framework for it.

Overall, it’s usually a good idea to toy around first with the concrete, and then extract the abstraction. Going the other way around is a dangerous swamp. It’s certainly something I’ve used to do – overengineer for patterns a priori – but I found better results by getting my hands dirty, by writing repetitive code first and then cleaning it up.

# Implicit power

Scala gets lots of flak for implicits. Some of the feedback is justified: implicits in Scala can be quite intimidating or confusing for beginners. That does not justify their dismissal, as implicits, in all of their flavours, when used correctly, can be actually quite simple and powerful.

I recently had to do a refactoring for large program. The codebase was old, and wasn’t designed to cope with the sort of change I was going to introduce, and I didn’t have much time either. Pressed for time, but compelled by a modicum of professional pride, I didn’t want to half-ass the task by adding jury-rigged solutions that would have left me feeling dirty and empty inside, at worst, leaving a rotting mess to future developers—me. The codebase itself was simple, but large. Its task was more or less to serve as a REST API in front of a high-availability, fast database (Cassandra). One part of the program provided abstractions called collections of database tables. Each collection had a set of methods (such as get) that were then translated to a database query to fetch entities. An entity is a piece of data encapsulating some value. Using a fictional and simplified example of an Bork from a database:

case class Bork(id: Int,
date: ZonedDateTime,
frobnicate: Decimal)

trait Borks {
// find a bork by id
def get(by: Int): Future[Option[Bork]]
}

// ... in another place, another module

class BorksImpl extends Borks {
def get(by: Int): Future[Option[Bork]] = {
// fetch by UUID from the database
}
}


Each collection trait was implemented by a real class (like RealEntries), hiding database logic behind a concrete implementation. Other parts of the program accessed these entities via the collection trait, like the API front here:

val borks: Borks = ...

// spray dsl
pathNamespace("borks" / IntNumber) { id =>
get {
complete {
borks.get(id).toJson
}
}
}


The database in question was Cassandra, in which this database abstraction doesn’t really exist, as databases in Cassandra are actually just prefixes called keyspaces that map to physical directories on the disk. These keyspaces have some properties that separate one keyspace from another, but the point is that they are unlike traditional SQL databases: you need not connect to a database, you can simply switch your query for the table Foo, in keyspace A to B, by switching A.Foo to B.Foo. So, in Cassandra, these keyspaces are opaque and you can simply choose the appropriate keyspace with the right namespace in the table name part of the query.

The task was to support multiple, concurrent databases of entries. Previously, this program operated as a monolith, i.e. there was ever only one database it was operating on. Support was needed for concurrent access to several (possibly non-finite) databases, and the support had to come quickly.

Turns out the simple solution – instantiate one BorksImpl for each keyspace – was not available, as there could be entities in one shared keyspace mapping to other keyspaces. So, one collection like BorksImpl needed to know which keyspaces it was supposed to query, because this information is unavailable to the caller.

A way around the splitting and namespacing was consolidation, but this introduced security problems. We couldn’t simply consolidate all the entries into the same database, as we had access limitations – callers of get acting on keyspace Foo were not allowed to see the data in keyspace Bar. This justified the creation of a split by keyspace, isolating data for the purposes of permission control. This also destroyed the possibility of the above solution, i.e., instantiate one BorksImpl for each keyspace, because one BorksImpl might have needed to query for data from many keyspaces.

So, a request with an id 123 comes in at /borks/123, the application uses the central lookup table to find the target keyspace. The initial implementation looked like this.

trait Borks {
// find a bork by id
def get(namespace: String, by: Int): Option[Bork]
}

// ... in another place, another module

class BorksImpl extends Borks {
def get(namespace: String, by: UUID): Option[Bork] = {
// fetch by UUID from the database
}
}


And update the caller API:

val borks: Borks = ...

pathPrefix("borks" / IntNumber) { id =>
queryNamespace(id) { namespace =>
get {
complete {
borks.get(namespace, borkId).toJson
}
}
}
}


This was fairly simple, but painful, as the get methods of collections like Borks may have called other methods on other collections, nesting calls ever downward, as shown below in the example, where Borks calls barks.get and so forth. As a result, I had to deal with adding the namespace: String parameter to all methods on all collections. Remember, adding the namespace method as a field was not an option – the namespace was an extra parameter to every method invocation.

So I was dealing with transforming code that looked like this:

val barksImpl: Barks = ...
def aggregateWithBarks(id: Int, barks: Set[Int]): Future[Seq[Borks]] = {
val aggregates = get(id) map { b =>
b map { bork =>
barks flatMap { bark =>
barksImpl.get(bark.id) match {
...
}
}
}
}
...
}


and by adding namespace everywhere, I had to transform it into

val barksImpl: Barks = ...
def aggregateWithBarks(namespace: String, id: Int, barks: Set[Int]): Future[Seq[Borks]] = {
val aggregates = get(namespace, id) map { b =>
b map { bork =>
barks flatMap { bark =>
barksImpl.get(namespace, bark.id) match {
...
}
}
}
}
...
}


So I had to add namespace: String to barks.get and borks.aggregateWithBarks. Sounds tedious? Well, imagine there weren’t just one call to barksImpl.get, but tens, and imagine there weren’t just two collections, but a hundred – and tens of thousands of lines to refactor.

Specifically, I didn’t want to keep adding namespace,  into every method call inside a method call, but chose to make it implicit instead. This way, I needed only pass the implicit parameter around, and I didn’t need to modify any of the nested method calls. I typed the namespace with a custom case class and added it as an implicit argument:

case class Namespace(namespace: String)

trait Borks {
def get(id: Int)(implicit namespace: Namespace) = ...
def aggregateWithBarks(id: Int, barks: Set[Int])(implicit namespace: Namespace) = ...
}

trait Barks {
def get(id: Int)(implicit namespace: Namespace) = ...
}


So, that was one particularly nice use case for implicit parameters. The good thing is that if the datastore is redesigned cleanly so that you cannot access from one namespace (keyspace) to another, all you need is to instantiate BorksImpl and set implicit val namespace = ... upon instantiation, and the code will work just fine. Implicit parameters let me implement a painful refactoring very quickly.

Naturally, had I had more time, I would’ve done the separation properly, implemented namespacing rules more clearly, completely redesigning the database, and so forth. Anyway, with Scala implicits, I was able to do a non-proper solution in a way that did not elicit a “jesus christ what a hack” feeling. It didn’t pollute my code too much and it will be easy to refactor out when it’s no longer needed.

And, it turned out, I was able to benefit from other implicits: conversions and arguments. I needed the ability to convert from the Namespace entity into a String, as I had in the querybuilder syntax. I needed only to insert namespace instead of having to write namespace.namespace.

object Namespace {
implicit def namespace2String(n: Namespace): String = n.namespace
}

Session.prepare(QueryBuilder.insertInto(namespace, table).values(...))


Another nice thing was using implicit arguments. The REST API gets the namespace from the URI segment as a parameter to the anonymous function. If I called borks.get I would have needed to put an implicit val n: Namespace = namespace. I avoided that using the implicit argument method:

val borks: Borks = ...

pathPrefix("borks" / IntNumber) { id =>
queryNamespace(id) { implicit namespace: Namespace =>
get {
complete {
borks.get(borkId).toJson
}
}
}
}


The implicit namespace: Namespace => is equivalent to having namespace => implicit val n: Namespace = namespace; .... Very useful if you’re calling methods requiring implicits in closures, though potentially hazardous, if you’re not typing your implicits! A simpler example:

trait Vyx {
def frobnicate(num: Int): Int
}

// contrived example, makes no sense
def foo(i: Int)(implicit vyx: Vyx) = {
i * vyx.frobnicate(num)
}

val foo = Seq("one", "two", "three") map { implicit v: Vyx => foo(1) }


It’s a good idea to type your implicit values as defining an implicit x: X will yoink any implicit X in scope, and if this X happens to be a basic type like String, and you’re not careful, you end up with the wrong implicit value.

Implicits weren’t a new thing to me, this was just a scenario where I was able to simultaneously benefit from many kinds of implicits Scala has to offer (parameters, conversions and arguments). They let me perform an annoying refactoring quickly and painlessly, in a manner that was also future-proof.