.. _introduction: ============== Introduction ============== What is it? =========== **Observatory** is a distributed tracing system that can be used to monitor traffic latencies between networked services. Observatory is similar to `Zipkin `_ or `Kamon `_. Observatory aggregates tracing tokens received from various inputs. These tracing tokens are used to render an image as seen :ref:`below `. Why would I use it? =================== They say a picture is equal to a thousand words. Observatory displays the status of a distributed system as a visual graph. Observatory can be a simple auto-refreshing web-page that looks like this: .. _sample: .. graphviz:: images/intro.dot :caption: **A prototype of Observatory's user interface.** Yes, at this point, it's just a Graphviz graph. The system in it has four nodes: ``web-server``, a root node, ``event-processor``, ``journal``, and ``database``. For every ``web-server`` request, we expect ``event-processor`` to have reacted within 10 seconds. None of the observed requests have failed (i.e. taken more than 10s), so the status is marked as OK. We've defined a window of 3 requests out of which three must succeed in order to mark as OK. Since ``warn=0`` and ``fail=0`` should any of the requests fail the edge will automatically be marked as ``NOK`` (fail trumps warn). For a ``event-processor`` → ``database`` message, we expect it to react within 500ms. The size of the observation window is 5 requests (3+1+2). For an OK status, 3 requests are required to succeed within this window. One request in this window has failed. Since ``warn`` is set to 1, but ``fail`` is at 2, we mark the edge as ``WARN``. ``event-processor`` → ``journal`` is much stricter: no request in the window (3+1+0 = 4) must fail (``fail = 0``). As can be seen, only one out of four passed the check, so it is marked as ``FAIL``. Think of it as what a logging server contains, but relevant parts visualized, and the visualization can be customized. Who is it for? ============== Sysadmins, developers, anyone who is interested in the status of information flows in a distributed system they are developing or monitoring. Observatory can be used to get a "high-altitude" view, using which you can identify the likely sources of problems. The typical use case scenario is this: 1. Some component in a system is failing -- not receiving messages, data isn't showing up, etc. 2. The cause is identified, messages are missing. Where's the broken link? 3. Check Observatory if anything is broken. 4. Observatory tells us the connection between ``event-processor`` and ``journal`` is broken, no events have been recorded for the requests the ``web-server`` component received. 5. Investigate the problem between ``event-processor`` and ``journal``. How does it work? ================= Each component is programmed to log its messages to Observatory, associating a unique tracing token to a message. Whenever an event occurs in a component, the component tells Observatory about it, associating a unique tracing token. The component passes this event downstream. When the downstream component(s) have received the events, they tell Observatory about it, and Observatory correlates the tracing tokens and timestamps. This lets Observatory track the message rates. The *quality* of message rates is configured using *health checks*. These are customizable and can be based on time or count. For example, we can require a system to have a message pass through it within 1 second. Or we can say that whenever one component generates five events, `x` events must occur in a downstream component. If the system deviates from these health checks, the graph indicates this visually, with a red arrow between the nodes. For more information, see :ref:`overview`. How do I integrate it to my system? =================================== You **pre-configure** the information flow with a *manifest*, which at its simplests, is just a list of components in the system. Each component has an automated mechanism, e.g., a request middleware, that performs the message sending to Observatory, which listens to various sources of input (HTTP, MQ, log files). Manifest configuration is done using a simple TOML syntax, see For more information, see :ref:`configuration`. Summary ======= Observatory is - A distributed system monitoring tool that measures message rates in the system and displays the rates visually as a graph - Its health check mechanism can also be - A tool for developers and sysadmins that care about the above information - Integrated into your system using plugins that tell Observatory about messages, enabling measurement (see :ref:`overview`) - Configured using a simple TOML syntax (see :ref:`configuration`) and run as a stand-alone server program with an optional web front-end (the graph) Now head over to :ref:`overview` to learn more!