# Software customization is really hard

I have been thinking long and hard about how to do software customization properly at a software architecture level. I don’t want to claim that I have figured things out, but I offer what I think is a reasonably abstract solution, if that makes sense. I will detail my findings below, but first let’s start by understanding the problem.

The primary goal of software customization is to satisfy the needs of its users. This means implementing their requirements into the product. The secondary goal is to do this efficiently. Needless repetition is what characterizes a bad software product line. Figure 1 illustrates this problem: two teams are building a similar twice from scratch.

Customization is, at heart, modification. By modifying a certain basic version of a product, one creates a new product that behaves differently, works differently, looks different---it’s not the same product anymore.

The extent and size of modifications defines the difficulty of customization. A product can be easily customizable for different reasons. The modifications required to build a custom product might be small. Or there might be a lot of them, but they are easy to implement. It’s not always a game of numbers.

When creating a custom product, we look back to what we are customizing. The base product that requires modification is a baseline product. This baseline product establishes the reference implementation from which we create custom products. The modifications, when split into discrete parts, are customizations. Together, a customized reference is a custom version or a custom product.

Figure 1. Two custom versions of the same product, differing in shape and color.

From a product management perspective, the reference is a product. Yet, from a business perspective, nobody sells a reference product. Customers always require some modifications, and won’t accept an unmodified reference implementation. So what is the business value of the reference? Is there any value in keeping it alive? Why not always do custom versions from scratch?

The answer is scale. It depends. If the deliverable units are small, then it might actually be easier to do complete customization. This approach will eventually prove unscalable: there comes a limit where it is more advantageous to start sharing technology by establishing a reference implementation. Figure 2 shows a situation where there are parallel teams building the same product in isolation.

Figure 2. Repetition can indicate the need for a reference product.

Sometimes it is wise to go fully custom, even with a lot teams. If the teams are small and building different things, it is natural, since the things they build are not related. There can always be some technology sharing in such a scenario. If, however, the units are building the same thing with slight differences, breaking the development process into a reference implementation makes sense. This is often the case in product development, so it is also the case in which I will focus here. Let’s ignore the cases where the products are different, like in Figure 3.

From an engineering perspective, the biggest challenge in customization is how to keep the reference clean. Nobody wants to have customization artifacts, the customizations themselves, sticking out jaggedly in the code. It is much more advantageous to have a plug-in approach, whereby the architecture isolates customizable units in such a way that it does not make the product inflexible. If the code looks like it’s full of switches and toggles for customizationz the code can be really hard to reason about and maintain.

On the other hand, customization has to be efficient. Most of the effort in the product development process should always be in the reference product. The customizations come later, they are modifications, they are not a new product. If a product team spends more time in producing the user-facing custom versions than the actual reference product, this says that there are a lot of issues with the architecture.

Figure 3. Establishing a reference can be hard if product instances aren’t alike.

## On palimpsests

When I think of customization gone wrong, I think of the word palimpsest. Palimpsest derives from Ancient Greek palímpsēstos, meaning "again scraped". Palimpsests were manuscript pages created by erasing them so that they could be reused for writing new documents. Palimpsests were feasible when the raw material for parchment was expensive. It was more economical to erase and reuse a piece of parchment than getting new parchment.

A palimpsest is eerily similar to a failed software customization process. In a failed customization, the reference is written over to create a custom product. This is not efficient. If the reference product is developed in parallel by a separate team, every time they introduce a new feature, the customizing team is impacted. They must either figure out how to bring this new feature in the custom product, or worse, remove it entirely. Either way, this is negative work: work is done to undo work done by others.

This inefficiency will eventually sink the teams. For every new feature potentially a new anti-feature is made. In an ideal development process, reference features flow seamlessly in the custom versions. It does not create conflicts.

Figure 4. Overwritten custom versions of a reference product (gray).

Overwriting is particularly prominent in version control customization. Customization using version control, also known as branching or forking, is antithetical to the idea of proper customization. I find that most branching-based customization is just overwriting existing functionality, as seen in Figure 4.

What is more, people don’t want to do branching customization but they’re often forced to, because the software wasn’t designed for customization from the get-go. The is the heart of the issue. it is important to design for customization from the beginning. You cannot add it later.

It’s now obvious that the product should be designed to be extended so that no part is is overwritten. Parts can be replaced, added, removed or altered. The customization process should feel like drawing on an outline, instead of using an eraser to blank a canvas to draw something new.

## Efficiency is paramount

So the goal is to be able to maintain and develop a reference product and the custom versions. This process should be as efficient as possible. There should be zero rewriting. Most of the time should be spent on installing the customizations. This part of the customization I call wiring.

There are many ways to do wiring. One of the easiest, and probably the least complicated solution, is to use plain if statements to toggle custom logic. The switches are turned on or off using run-time configuration. The application reads the configuration at run-time and determines its behavior based on the configuration.

At this point I have to make a distinction between configuration and application. It is easy to overlook but the detail is important. Configuration is something that’s metadata which governs the behavior of an application, the application itself is an implementation of the behavior. I won’t specify any particular medium for configuration, it can be text files, databases, a remote server. The important part is that it is somehow structured and human-readable, and that it can alter the application behavior.

Configuration alone isn’t going to solve the problem. Programming the customizations into the reference product and selecting them at run-time makes the product bigger. A larger product is harder to maintain.

Figure 5. Custom versions (A, B, C) are just subsets of a large reference product.

So what is it then? By now it is apparent that customizations should not overwrite existing functionality, the reference product should lend itself to extension. On the other hand, piling customizations together and selecting a subset of them to create a custom version can make the product really big. What if there are five different implementations of the same reference functionality, and this is true for five other features? Now you have 25 permutations to choose from! Figure 5 illustrates a big reference product. Customs versions originate from subsets of a large reference.

Different languages give different tools for doing customization tricks. Perhaps the most basic one of them after branching logic—​if’s and else’s—​is trying to use inheritance from object oriented programming.

You really don’t want to do customization using inheritance.

## Inheritance is not a solution

Another troublemaker is to use inheritance from object-oriented programming to do customization. This is extremely dangerous because it doing so tends to break the Liskov substitution principle. With inheritance, customization is created by overriding behavior in a reference class. While this makes sense for simple behavioral subtyping scenarios, where an abstract entity is implemented using inheritance, the customization approach tends to inherit the implementation, providing the custom implementation.

This is particularly harmful because the Liskov substitution principle asserts that

Let $$q(x)$$ be a property provable about objects $$x$$ of type $$T$$. Then $$q(y)$$ should be provable for objects $$y$$ of type $$S$$, where $$S$$ is subtype of $$T$$.[1]

To paraphrase Wikipedia, this means that objects of type $$T$$ should be replaceable by objects of type $$S$$, without altering the behavior of the program. In the principle any $$S$$ behaves the same way as any $$T$$. Substituting one with the other has no overall effect on the program.

This is where the principle collides with inheritance-based customization. The whole point of customization is to alter program behavior, using inheritance to do customization decidedly violates the substitution principle!

Of course it is possible to ignore the principle, but to me, it is a valuable property of any object-oriented design. By obeying the principle, we gain composability, since we can replace any $$T$$ with a $$S$$, and we can expect the same invariants to hold. To me, behavioral subtyping is the only principle of object-oriented programming that makes sense and is useful.

## Plug-ins are not a panacea

Let’s address the elephant in the room. By now, astute readers might have guessed that the we should be using modules and build a plug-in architecture to get easy customization.

A plug-in architecture is obviously a solution to customization. The process is as follows. We take the core product and inspect it and determine parts that are customizable. We then build the product in such a fashion that swapping out these parts is easy. Each part has alternatives, at least one.

In engineering lingo, these parts are modules, and a product engineered like this is a modular product. The idea is to have a mechanism that can support different implementations of the same thing, built in such a way that the changing of implementations is easy.

To create a customized version, we take the core product and choose our set of parts. A custom version, voilà ! Now the customization process becomes a part-picking experience, by taking features off the shelf.

The reality is somewhat darker than this. By emphasizing somewhat I mean a lot darker than this. The preceding paragraphs described the ideal scenario of a modular architecture.

Figure 6. Emmental cheese.

Building modularity properly is tremendously difficult. You not only have to plan for the known use cases—​the custom scenarios—​you also have to plan for the unknown use cases. If your universal interface stops working because you didn’t consider a case where the customization explicitly requires non-universality, tough shit! Maybe you didn’t enforce the Liskov substitution principle, and your messaging system was co-opted into a customer profiling engine, and then the GDPR kicked in, and now your data protection officer wants a word with you!

## A strong reference

A rather typical nightmare scenario is that the reference is like a block of Emmental, only the holes are too big, or there are too many of them. This is usually a symptom of insufficient reference engineering, that is, the reference is not given the attention it deserves. This is the thin reference scenario. In the thin reference scenario, the reference is not a viable product, because the customizations, not the reference itself, received the brunt of engineering focus.

It is often the case that the reference product is never a viable product, but it should be viable enough. The reference needs to be concrete enough to build a model of what the application is. Figure 7 illustrates a modular architecture where most of the implementation is in the modules. While this approach can be viable, if the modules lack strong defaults, it might be hard to say what the reference does.

Figure 7. An extremely modular architecture.

If the reference implementations of the modules are poorly done or unusuable, it will be hard to say what the reference product does. This makes customization difficult, since the only actual product instances are the customized ones. This creates an awkward situation where the reference serves no purpose but to act as a template for customizations, but the reference isn’t a template!

A strong reference product is also useful for quality purposes. If any module has a reference implementation, the custom implementation can be verified against the reference implementation. If the reference implementation doesn’t exist, one must implement new quality checks for the custom implementation.

Having a strong reference will prove problematic when the reference is extremely modular. A modular architecture enables customization. A modular architecture isn’t a goal in itself. The problem with an extremely modular architecture is you now need to maintain a reference product. That can get onerous if the amount of modularity is large, because now every customizable module has to be built and validated twice.

## Reintegration

Organizing the customization into a smaller set of modules makes maintaining the modular architecture easier. If we rearrange the modules of Figure 7 and group them together as a customization layer, we get something like in Figure 8. The idea is to organize the architecture into the static, non-customizable parts into a separate unit, and the customizable part as the customizable unit.

Figure 8. Visualizing the customizations as extensions on top of a base layer. This is most likely not how the customizations are organized concretely.

In Figure 8 we see that the area marked Default is the reference implementation of the customization part. The architecture is now easier to understand from this picture. Customizable plugin belong to the customizable layer and the non-customizable parts are in the static layer.

Figure 9. Identifying a reusable part.

It is a question of architectural taste how big the customizable area of the product should be. Some applications like Eclipse are completely[2] modular. The inverse of a completely modular architecture is an non-modular architecture. By now it is clear that a non-modular architecture is not good for customization. On the other hand, when working with a totally modular architecture, if the development team is willing to put with maintaining a strong reference product and separate custom versions, a totally modular architecture might be fine.

Figure 10. Static reintegration. X is made a non-custom part of the reference.

Sometimes customizations can be seen as reusable assets that should exist in all versions, in the reference product. This is the reintegration process where custom features are made a part of the reference product. There are two approaches to reintegration. Once the reusable part is identified in a customization (see Figure 9), we can choose whether it should be a global customization. A global customization means that this is a customization point in every version. So the feature is made a module in the reference and custom versions.

What if the feature is not seen as a customization, just as a feature that should be made a static part of the reference? In this case we make the feature a non-customizable part of the reference product. This is the static extension process: a custom asset, from a custom version, is made part of the reference product. This happens when the customization is not really a customizable thing, it’s something every instance of the product benefits from.

Figure 11. Static reintegration. X is made a non-custom part of the reference.

The problem with reintegration is the architecture might not allow to do any of this. The extraction of the reusable part might be impossible (Figure 9) because the feature too tightly coupled to the customized product. It might also be impossible to bring new features into the reference product because it wasn’t built to support static extension. This forces the hand of the design to try the global customization approach.

Any of the aforementioned scenarios are time-consuming and risky solutions to an underlying design failure. These scenarios are symptoms. They are artifacts of a design process gone wrong. The architectural design of the product has failed.

Perhaps the most important part of customization is to design for it, to anticipate it. But that’s the hardest part of all!

## No easy wins

The unfortunate truth is that you can always prepare for customization but you can never prepare for it perfectly. Either one is too prepared with an over-engineered product or one is not prepared enough. These are the usual judgments laid a posteriori of a customization scenario.

I have observed that we are just as likely to over-engineer than to under-engineer. This factoid is based on my idea that people tend to place too much emphasis on the things they do know and too little emphasis on the things that they don’t know, and these estimation errors tend to be usually equal in measure. The things that we do know characterize our design with a vision of "holes" or "modules", the actual customization points, and the things that we don’t know add new places for these modules.

Each step towards implementing a requirement, a product feature, always creates an inflexibility of sorts. After all, a product is the sum of the features. To create a customizable feature, one must imagine the product with the feature removed or significantly altered. Omit that step, and you will have a difficulty customizing it!

But this step can be taken to extremes. Exercising caution when planning for customization is necessary, because over-engineering a product for customizations delays the time to market. Creating a minimum viable product will take significantly longer by planning too much for customization. The flip side is that under-engineering makes customization difficult, because you’re forced to take the product apart and redesign for customization.

It varies on a case-by-case basis which one takes the least time or other resources. If you don’t over-engineer too much, your investment might pay off in the end, since adding new features will be easy. Conversely, going over the top might have made the product too expensive.

I have observed some heuristical approaches towards finding a good synthesis. One of them is a rule of thumb to never build customization on the first iteration. Then on the second instance, when the customization becomes necessary, customizability is added. I think this is a very Brutalist approach, but it is feasible, if the process is done correctly. It eliminates the risk for over-engineering customization, but it creates a need for effort when the customization is necessary. It is obvious that this heuristic is only feasible when the planned customization requirements aren’t certain. If they’re certain, this approach is harder to justify.

By now I would say that the question of how much engineering should be done towards customization depends on the following factors:

1. The size and scope of the product itself. What is the product, what does it do?

2. The size and scope of the customizations. What can be customized? How hard is it do a customization?

3. The foreseeable necessity of customizations. How is going to be customized, if at all?

And therefore it’s necessary to evaluate all three carefully before choosing the right amount of customization. But there’s no universal heuristic. You always aim too high or too low. This may sound a little fatalistic, but I think it’s possible to improve the accuracy of this process as one learns to estimate the above points.

## Towards a software customization framework

Programming languages in all their variety offer tons of techniques for building customizable software. From extensible classes to monkey patching to type classes and run-time class loading, there are many tools out there. I think the programming part of building customizability is just one part of the process, and while it’s an important part, it’s not the only part, as you have probably read now.

As I stated at the start, I’m not interested in offering an end-it-all solution or programming technique to tackle the issue of software customization. What I’m interested in is building a framework, in the methodology sense, towards doing customization. This post is just the beginning.

That said, I’m not going to just write a post that is basically just a brain dump of the things that I find difficult on the topic. I said in the beginning that I have developed some rough ideas on how to address the issue of customization. This lays the groundwork on what the framework I mentioned above is going to solve.

A strong reference implementation is necessary

When building a product that is going to have multiple custom instances, it is important to have some sort of a reference to which the instances can be compared. Not only this makes quality assurance easier, because you can validate the instance against the reference, it also makes it easier to separate what is custom and what isn’t. The reference product is the product itself, the custom instances are just extensions of it. Furthermore, custom instances can produce features that are desirable in the core product or other customizations, and the reference product is a channel for adding new features to the product.

Identifying the necessity of customization

It is important to known in advance whether the product is going to have custom instances or not. This makes it easier to know which parts will require customization and which parts are more static. That said, it is often difficult to anticipate what parts will require customization. The architecture should be flexible enough to permit adding customizability as easily as possible. The architecture should also understand that some requirements may be missing entirely, so in one sense, the architecture must always anticipate some customization or redesigning.

The quantity of architectural design

In anticipating the customizations, it is just as easy to over-engineer for every customization as it is to under-engineer for no customization. It can be said that for every known requirement for customization another unknown customization requirement is going to materialize eventually. Creating an overly composable, supremely modular architecture is not a good idea; conversely creating a rigid and static architecture is equally a bad idea. In my view, there are no universal heuristics on just how much is necessary. It requires judgment on a case-by-case basis, especially by taking the previous two points into account.

## Conclusion

This post became way longer than I originally intended to. I originally wanted to present a particular customization technique using the tricks of one particular programming language and programming environment, but as it grew larger I wanted to give the topic a broader treatment. But this barely even scratches the surface. I have not even spoken on how to do customization in practice at all.

I suppose at this point this post is a beginning in a series of longer ones, treating the individual elements of the previous list in a more profound manner. This is a very large topic in general, as it broaches fields from software architecture to product management to requirements engineering to actual programming.

That is not to say I plan to devote the whole blog towards software product engineering, let alone write a book about it (though that would be interesting), but as I deal with these topics daily in my job, it’s a very interesting topic to me. So interested readers can possibly expect more of the subject!

These ideas are just materializations of my recurring thoughts while designing software products, and most of them are not new. A step in any direction is going to lead one towards what is known as software product line engineering, that is, recognizing that the creation of software products can be understood using product lines.

Fundamentally, the art of customization is about software reuse. For a deeper introduction to the idea of reuse I recommend the book Software Reuse: Architecture, Process and Organization for Business Success by Ivar Jacobson, Martin Griss and Patrik Jonsson (Addison-Wesley, 1997). From this is is easy to make the transition to the idea of software product lines. A good introduction to the subject are the books Software Product Lines: Practices and Patterns by Klaus Pohl, Günter Böckle and Frank J. van der Linden (Springer, 2005) and Software Product Line Engineering: Foundations, Principles and Techniques by Paul Clements and Linda Northrop (Addison-Wesley, 2002).

1. Liskov substitution principle. On Wikipedia, retrieved 7th April 2018.
2. Developing Eclipse plug-ins, retrieved 11th April 2018.

# The Joy of AsciiDoc

I have always used Jekyll on Github Pages to host this site. It’s always been very pleasant to use, since Github Pages builds your site automatically. On the other hand, you can only use a limited set of plugins, so if you want a Jekyll extension, you might be out of luck, as I was recently.

I don’t mind Markdown as an input format, but it’s a very limited format. So there are several extensions to Markdown like CommonMark and kramdown addressing the deficiencies of Markdown. Even those extensions have limits.

My solution was to convert to AsciiDoc — a much richer text markup language. It contains every feature Markdown extensions have and more. Tables of contents, admonitions, and diagrams! Yes, that includes Graphviz, PlantUML and even Ditaa. I use those tools a lot, but so far haven’t used them that much here.

So the elephant in the room is that AsciiDoc doesn’t work on the default Github Pages installation since it’s not in the plugin whitelist. I had several choices:

1. Switch to another static site generator, manually upload the published HTML to github

2. Do the above but use another hosting system, my own VPS, whatever

3. Stick with Github pages but build yourself and upload manually — ugh!

I didn’t want to switch site generators as I like Jekyll. Support is easy to get from the web and it’s fairly mature and popular.

I couldn’t be bothered to switch hosting systems, so I was to find another way around it. I instantly realized that option C can be solved by automating the build process. All you’re doing is running bundle exec jekyll build anyway, and uploading _site somewhere! This has got to be easy to automate on Travis.

Turned out Travis did have a Githug Pages deployment plugin, so that was easy. Hooray, my automation problems were solved!

Well, not so fast.

I wanted the Asciidoctor diagram plugin but that required the presence of the diagramming binaries, so in order for it to work I had to have the graphviz, plantuml and ditaa executables installed on the machine. And here is where I ran into problems. Travis (as of 2018) runs Ubuntu Trusty (14.04) on the default container platform, and this version of Ubuntu doesn’t have plantuml as a package.

That package is available in the unstable Debian source list, but that doesn’t work on Ubuntu anymore given Debian unstable switched over to using .tar.xz and the apt packager of Ubuntu doesn’t understand that.

Unstable builds are unstable. Who knew?

So, the fix was to add sudo: required to the .travis.yml file, and add a PPA that contains plantuml. This worked, but because I’m using sudo, my builds run on a virtual machine. Without sudo I could run on a container which build really fast. So building the site takes now about two minutes. That’s not too bad, but with a container-based build that could be less than 30 seconds. So finally, I ended up with this:

language: ruby
rvm:
- 2.3.3
sudo: required

# no emails
email: false

# asciidoctor diagram dependencies
before_install:
- sudo apt-get -qq update
- sudo apt-get install -y graphviz ditaa plantuml

# build the site
script:
- bundle exec jekyll build

# upload to the master branch
deploy:
provider: pages
skip-cleanup: true
github-token: \$GITHUB_TOKEN
local-dir: _site
target-branch: master
verbose: true
on:
branch: develop

Anyway, after banging my head against the beast that is dependency management, I am now happily running my own custom Jekyll build that is fully automated: I don’t have to do anything except push new content via Git, and there it is.

And I’m using AsciiDoc! Here’s a pretty ditaa picture:

Figure 1. A diagram

Here’s the source I used:

[ditaa]
....
+------------------+
|cEEE              |
+--->|      Nice!       |
|    |                  |
+-------------------+           |    +------------------+
|  Art with text?   +-----------+
+---------+---------+
^                             /--------------\
|                             |cPNK          |
+---------------------------->|   colors!    |
|              |
\--------------/
....

Neat, huh? It’s a beautiful day!

# Envisioning a new tracing system

I’ve begun designing a new kind of tracing system for distributed systems. With tracing system I mean something like Zipkin or Prometheus: they trace the flow of information in distributed systems.

I think the tool should offer a complementary view on the system, it’s not going to offer detailed logs or individual traces.

Instead, it’s something what I call an overhead monitor, an observatory from above. It doesn’t track individual elements flowing in the system, to put it metaphorically, it lives too high up in order to see individual ants in a colony. What it can see is the flow and not-flow of ants, and measure their speed.

I’m interested in the big picture. Is information flowing the way it should? How fast is it traveling? Is my context propagated correctly? How much information is there flowing per second, minute, or hour?

The idea is to monitor rate. It would be a bit like a traffic monitor. What you could use it for is to instantly read the amount of information flow in the system. The flow would be represented as a directed graph.

Figure 1. Something like this.

Earlier last year I sketched a design document, so I won’t go into the details too much, if you’re interested in those, go read it.

So far, I have figured the following design characteristics:

• All you need is to provide the names of nodes in the network, and the system will figure out the directions of the information flows.

• It is optionally possible to design some quality attributes but these require manual configuration. For example, you’d have to state that "95% of requests from A to B should happen within 200 ms".

• The system is God; it sees all. That is to say, all the events in the network should be fed into the system.

• Because God systems are terrible and prone to failure, the aim is to support distribution, such that you can have multiple observers to which you can partition the events. The big picture will be assembled by combining the observer data.

• Configuration should be easy, with a humane syntax like YAML, and the possibility to override this.

The system doesn’t have to worry about spans, that’s for other systems. All you need to do is propagate context.

I have thought about distributed tracing previously. I’ve found that many of those questions are still unanswered. Most tracing systems like Zipkin and Prometheus do very well with individual elements and span lengths, but they naturally require you to configure the spans in the code.

My aim with the observatory-like approach is to make it extremely simple to track flow rates. All you need is the following:

• Propagate context in the system, like a tracing token coupled with a timestamp

• Aggregate logs to the observatory

The second problem is the harder one: if you centralize the system, every node in the system will just log events there, and the system is prone to failure. My idea is to overcome this limitation by decentralizing observation, and then using something like the Gossip protocol to combine the results.

It doesn’t need to react immediately (probably within seconds), so the slowness of Gossip is not a threat to the design. Observation data is also initially ephemeral, I’d most likely prefer using event sourcing to build the observation model.

I haven’t chosen the technology yet, but most likely it will be written in Scala and then event sourcing will most likely based on Kafka.

Now only to find the time necessary to build this.