Introduction¶
What is it?¶
Observatory is a distributed tracing system that can be used to monitor traffic latencies between networked services.
Observatory is similar to Zipkin or Kamon.
Observatory aggregates tracing tokens received from various inputs. These tracing tokens are used to render an image as seen below.
Why would I use it?¶
They say a picture is equal to a thousand words. Observatory displays the status of a distributed system as a visual graph. Observatory can be a simple auto-refreshing web-page that looks like this:
The system in it has four nodes: web-server
, a root node, event-processor
, journal
, and
database
. For every web-server
request, we expect event-processor
to have reacted within
10 seconds. None of the observed requests have failed (i.e. taken more than 10s), so the status is
marked as OK. We’ve defined a window of 3 requests out of which three must succeed in order to mark
as OK. Since warn=0
and fail=0
should any of the requests fail the edge will automatically
be marked as NOK
(fail trumps warn).
For a event-processor
→ database
message, we expect it to react within 500ms. The size of
the observation window is 5 requests (3+1+2). For an OK status, 3 requests are required to succeed
within this window. One request in this window has failed. Since warn
is set to 1, but fail
is at 2, we mark the edge as WARN
.
event-processor
→ journal
is much stricter: no request in the window (3+1+0 = 4) must fail
(fail = 0
). As can be seen, only one out of four passed the check, so it is marked as FAIL
.
Think of it as what a logging server contains, but relevant parts visualized, and the visualization can be customized.
Who is it for?¶
Sysadmins, developers, anyone who is interested in the status of information flows in a distributed system they are developing or monitoring. Observatory can be used to get a “high-altitude” view, using which you can identify the likely sources of problems. The typical use case scenario is this:
- Some component in a system is failing – not receiving messages, data isn’t showing up, etc.
- The cause is identified, messages are missing. Where’s the broken link?
- Check Observatory if anything is broken.
- Observatory tells us the connection between
event-processor
andjournal
is broken, no events have been recorded for the requests theweb-server
component received. - Investigate the problem between
event-processor
andjournal
.
How does it work?¶
Each component is programmed to log its messages to Observatory, associating a unique tracing token to a message. Whenever an event occurs in a component, the component tells Observatory about it, associating a unique tracing token. The component passes this event downstream. When the downstream component(s) have received the events, they tell Observatory about it, and Observatory correlates the tracing tokens and timestamps. This lets Observatory track the message rates.
The quality of message rates is configured using health checks. These are customizable and can be based on time or count. For example, we can require a system to have a message pass through it within 1 second. Or we can say that whenever one component generates five events, x events must occur in a downstream component.
If the system deviates from these health checks, the graph indicates this visually, with a red arrow between the nodes. For more information, see Glossary.
How do I integrate it to my system?¶
You pre-configure the information flow with a manifest, which at its simplests, is just a list of components in the system. Each component has an automated mechanism, e.g., a request middleware, that performs the message sending to Observatory, which listens to various sources of input (HTTP, MQ, log files). Manifest configuration is done using a simple TOML syntax, see
For more information, see Configuration.
Summary¶
Observatory is
- A distributed system monitoring tool that measures message rates in the system and displays the rates visually as a graph
- Its health check mechanism can also be
- A tool for developers and sysadmins that care about the above information
- Integrated into your system using plugins that tell Observatory about messages, enabling measurement (see Glossary)
- Configured using a simple TOML syntax (see Configuration) and run as a stand-alone server program with an optional web front-end (the graph)
Now head over to Glossary to learn more!