Antoine Kalmbach

I have always used Jekyll on Github Pages to host this site. It’s always been very pleasant to use, since Github Pages builds your site automatically. On the other hand, you can only use a limited set of plugins, so if you want a Jekyll extension, you might be out of luck, as I was recently.

I don’t mind Markdown as an input format, but it’s a very limited format. So there are several extensions to Markdown like CommonMark and kramdown addressing the deficiencies of Markdown. Even those extensions have limits.

My solution was to convert to AsciiDoc — a much richer text markup language. It contains every feature Markdown extensions have and more. Tables of contents, admonitions, and diagrams! Yes, that includes Graphviz, PlantUML and even Ditaa. I use those tools a lot, but so far haven’t used them that much here.

So the elephant in the room is that AsciiDoc doesn’t work on the default Github Pages installation since it’s not in the plugin whitelist. I had several choices:

  1. Switch to another static site generator, manually upload the published HTML to github

  2. Do the above but use another hosting system, my own VPS, whatever

  3. Stick with Github pages but build yourself and upload manually — ugh!

I didn’t want to switch site generators as I like Jekyll. Support is easy to get from the web and it’s fairly mature and popular.

I couldn’t be bothered to switch hosting systems, so I was to find another way around it. I instantly realized that option C can be solved by automating the build process. All you’re doing is running bundle exec jekyll build anyway, and uploading _site somewhere! This has got to be easy to automate on Travis.

Turned out Travis did have a Githug Pages deployment plugin, so that was easy. Hooray, my automation problems were solved!

Well, not so fast.

I wanted the Asciidoctor diagram plugin but that required the presence of the diagramming binaries, so in order for it to work I had to have the graphviz, plantuml and ditaa executables installed on the machine. And here is where I ran into problems. Travis (as of 2018) runs Ubuntu Trusty (14.04) on the default container platform, and this version of Ubuntu doesn’t have plantuml as a package.

That package is available in the unstable Debian source list, but that doesn’t work on Ubuntu anymore given Debian unstable switched over to using .tar.xz and the apt packager of Ubuntu doesn’t understand that.

Unstable builds are unstable. Who knew?

So, the fix was to add sudo: required to the .travis.yml file, and add a PPA that contains plantuml. This worked, but because I’m using sudo, my builds run on a virtual machine. Without sudo I could run on a container which build really fast. So building the site takes now about two minutes. That’s not too bad, but with a container-based build that could be less than 30 seconds. So finally, I ended up with this:

language: ruby
  - 2.3.3
sudo: required

# no emails
  email: false

# asciidoctor diagram dependencies
  - sudo add-apt-repository -y ppa:jasekpetr/trusty-backports
  - sudo add-apt-repository -y ppa:jonathonf/rustlang
  - sudo apt-get -qq update
  - sudo apt-get install rustup
  - rustup update && rustup default stable
  - cargo install svgbob_cli
  - sudo apt-get install -y graphviz ditaa plantuml

# build the site
  - bundle exec jekyll build

# upload to the master branch
  provider: pages
  skip-cleanup: true
  github-token: $GITHUB_TOKEN
  local-dir: _site
  target-branch: master
  verbose: true
    branch: develop

Anyway, after banging my head against the beast that is dependency management, I am now happily running my own custom Jekyll build that is fully automated: I don’t have to do anything except push new content via Git, and there it is.

And I’m using AsciiDoc! Here’s a pretty ditaa picture:

Figure 1. A diagram

Here’s the source I used:

                                          |cEEE              |
                                     +--->|      Nice!       |
                                     |    |                  |
     +-------------------+           |    +------------------+
     |  Art with text?   +-----------+
               ^                             /--------------\
               |                             |cPNK          |
               +---------------------------->|   colors!    |
                                             |              |

Neat, huh? It’s a beautiful day!

I’ve begun designing a new kind of tracing system for distributed systems. With tracing system I mean something like Zipkin or Prometheus: they trace the flow of information in distributed systems.

I think the tool should offer a complementary view on the system, it’s not going to offer detailed logs or individual traces.

Instead, it’s something what I call an overhead monitor, an observatory from above. It doesn’t track individual elements flowing in the system, to put it metaphorically, it lives too high up in order to see individual ants in a colony. What it can see is the flow and not-flow of ants, and measure their speed.

I’m interested in the big picture. Is information flowing the way it should? How fast is it traveling? Is my context propagated correctly? How much information is there flowing per second, minute, or hour?

The idea is to monitor rate. It would be a bit like a traffic monitor. What you could use it for is to instantly read the amount of information flow in the system. The flow would be represented as a directed graph.

Figure 1. Something like this.

Earlier last year I sketched a design document, so I won’t go into the details too much, if you’re interested in those, go read it.

So far, I have figured the following design characteristics:

  • All you need is to provide the names of nodes in the network, and the system will figure out the directions of the information flows.

  • It is optionally possible to design some quality attributes but these require manual configuration. For example, you’d have to state that "95% of requests from A to B should happen within 200 ms".

  • The system is God; it sees all. That is to say, all the events in the network should be fed into the system.

  • Because God systems are terrible and prone to failure, the aim is to support distribution, such that you can have multiple observers to which you can partition the events. The big picture will be assembled by combining the observer data.

  • Configuration should be easy, with a humane syntax like YAML, and the possibility to override this.

The system doesn’t have to worry about spans, that’s for other systems. All you need to do is propagate context.

I have thought about distributed tracing previously. I’ve found that many of those questions are still unanswered. Most tracing systems like Zipkin and Prometheus do very well with individual elements and span lengths, but they naturally require you to configure the spans in the code.

My aim with the observatory-like approach is to make it extremely simple to track flow rates. All you need is the following:

  • Propagate context in the system, like a tracing token coupled with a timestamp

  • Aggregate logs to the observatory

The second problem is the harder one: if you centralize the system, every node in the system will just log events there, and the system is prone to failure. My idea is to overcome this limitation by decentralizing observation, and then using something like the Gossip protocol to combine the results.

It doesn’t need to react immediately (probably within seconds), so the slowness of Gossip is not a threat to the design. Observation data is also initially ephemeral, I’d most likely prefer using event sourcing to build the observation model.

I haven’t chosen the technology yet, but most likely it will be written in Scala and then event sourcing will most likely based on Kafka.

Now only to find the time necessary to build this.

A feature that often frustrates me in object-oriented code is the prevalence of useless interfaces. Interface isn’t meant literally here: this applies to traits of Rust/Scala and protocols of Clojure as well.

The advice of planning for the interface is just and solid, but people tend to follow this tip to the extreme. It is not uncommon to see people design a module or class by defining its interface first, with an actual implementation following later.

One eagerly designs an interface as follows:

trait Mediator {
  def validate(r: Request): Boolean
  def postpone(r: Request, d: Duration): Postponed[Request]
  def frobnicate(a: Request, b: Request): Frobnicated[Request]

Then, the implementation, in a class called MainMediator or DefaultMediator, in a separate directory, implements the Mediator interface:

class MediatorImpl extends Mediator {
  def validate(r: Request): Boolean = { ... }
  def postpone(r: Request, d: Duration): Postponed[Request] = { ... }
  def frobnicate(a: Request, b: Request): Frobnicated[Request] = { ... }

Dependents of the Mediator trait then naturally get their dependency provided with a constructor argument:

class Resequencer(m: Mediator, c: Clock) {
  def resequence(requests: Seq[Request]): Seq[Postponed[Request]] = 
    requests map { r => m.postpone(r, c.randomDelay()) }

val m = Mediator()
val c = Clock()
val foo = new Foo(m)

This pattern is older than the sun, and has been a characteristic of modern, inheritance-based object-oriented programming for ages. Fundamentally, it is alright to separate implementation from specification, but where it goes wrong is the overuse of this paradigm, or when this separation is superfluous.

This separation is superfluous when it serves no purpose. You could as well call it dependency injection on the toilet. There is no fundamental reason why a class or module like Mediator warrants an interface when it is likely that there will never be an alternative implementation.

At a glance, Guy Steele’s influential plan for growth idea from his “Growing a Language” talk seems to contradict what I just said. Shouldn’t defining an interface help us plan for future, alternative implementations of the Mediator? Perhaps a different kind of a Mediator?

Removing the Mediator trait and simply renaming its only implementation will still keep the code working, with the benefit that there are fewer lines of code now, and it isn’t any harder to extend.

This is actually more in line with Steele’s idea. It doesn’t say anywhere a trait or interface cannot be distilled from a set of basic implementations. In other words, when our intuition says to prepare for repetition, we should identify them. The Gang of Four book was never a recipe book for building great programs. It was a catalog! They observed several kinds of large-scale systems and programs and extracted repetetive behaviours in the code, patterns. They never said that to do things right, one ought to use the visitor pattern, or this other pattern, otherwise your programs will be bad.

Back to interface distillation. Programming is about getting rid of repetition. The more experienced the programmer, the better they get at noticing patterns of repetition. The downside is that this may also lead to overengineering for repetition.

So, an experienced programmer thinks, this behaviour I have specificed may be repetitive, let me first create a construct that lets me share the repetition (an interface), and then proceed with the implementation. This is fine if the actual code is known to be repeated, but by seeing interfaces as a hammer and every bit of code as a nail, you will soon bury yourself in pointless dependency injection scenarios.

It may be just as easy to first create a base implementation and once you must duplicate its behaviour, only then create the abstract implementation. You might actually need to spend less total time wiring up the interface, since you observed the repetition. Creating an abstract implementation first always involves a deal of speculation and this is not reliable.

The more experienced programmer understands that you don’t always need to plan for repetition. In fact, repetition is good sometimes. Not every shared functionality needs to be extracted to its own module, because sometimes, shared dependencies will be bad.

The approach I suggest is to in order to produce modularity as a side effect, structure your program into small, reusable pieces. Don’t create huge, monolithic interfaces. Functional programming shows us that dependency injection can be done just by passing a function.

class Foo(postponer: Request => Postponed[Request], delayer: =>Duration) {
  def resequence(requests: Seq[Request]): Seq[Postponed[Request]] = 
    requests map { r => postponer(r, delayer()) }

// ...
val m = Mediator()
val c = Clock()
val foo = new Foo(m.postpone, c.randomDelay)

One sees that a higher-order function like the above can be just as well represented by a trait with a single method. If you only need that method, you should depend only on that. With a structural type system it is easy to decompose types. An alternative is to stack traits, and in languages like Scala this is fairly easy. You could as well decompose Mediator into Validator, Postponer, et cetera, ideally interfaces should be fairly homogenous in their purpose: if your interface defines methods for reading, keep that interface separate from the writing interface, and if you need read and write just compose the interfaces together, and so on.

It also helps if your language is powerful enough to do without excessive DI – the reason why horrors like Spring exist is that Java simply wasn’t expressive enough to do dependency injection the traditional way without setting your hair on fire. That, and for some odd reason, people thought writing constructor arguments was so painful it warranted a gigantic framework for it.

Overall, it’s usually a good idea to toy around first with the concrete, and then extract the abstraction. Going the other way around is a dangerous swamp. It’s certainly something I’ve used to do – overengineer for patterns a priori – but I found better results by getting my hands dirty, by writing repetitive code first and then cleaning it up.