Configuration¶
Observatory is configured with a manifest. Its syntax is a simple TOML syntax. Here is an example.
Manifest¶
The Observatory configuration lives in the manifest file, usually a file called
config.toml
. It will contain some useful variables, unrelated to the content of the system
itself.
For Observatory to function, we need the following:
- An input source for events, e.g. a HTTP server, or a message queue
- An output interface, e.g. a HTTP server
This is an example configuration.
name = "My Observatory"
description = "blah blah"
# create a HTTP server
[input.http]
interface = "localhost"
port = "8080"
path = "/events"
# listen from rabbitmq exchange 'observatory'
# with routing key 'observatory.input'
[input.rabbitmq]
address = "localhost"
port = 5672
username = "rabbit"
password = "asdfasdf"
exchange = "observatory"
exchangeType = "topic"
routingKey = "observatory.input"
# read from ActiveMQ using Camel connector
[input.camel]
url = "activemq://localhost:8888/observatory.in"
# frontend HTTP server
[output.http]
interface = "localhost"
port = "9090"
This will start a HTTP server that listens for incoming event packets in /events
as POST
requests and a RabbitMQ queue binding for the queue observatory.input
. It will also start a HTTP
server for the front-end, providing a REST API front-end.
Observatory is meant to be agnostic in terms of input sources, it won’t be tied down to any central mechanism, but initially it will support a HTTP and a MQ of some kind (probably RabbitMQ).
Do note that the inputs and outputs are entirely optional, but a configuration with either no inputs or no outputs is rather silly.
After this configuration, we can configure streams.
Streams¶
A part of the system we wish to monitor for data flows is called a stream. This is an example:
[[stream]]
name = "my-sample-stream"
nodes = ["web-server", "event-processor", "journal", "database"]
A configuration like this is absolutely minimal. Once Observatory is launched for the first time, it will look like this:
This stream is still in an unobserved state, because we haven’t seen any data flowing in it yet.
Such a minimal configuration will result in an edge getting rendered when there is traffic, but the
edge will never disappear, since Observatory has understood that traffic has occurred at least
once. So, if web-server
gets a request and sends (web-server,foo,TIME1)
to Observatory, nothing
will happen, except that web-server is rendered as having sent traffic:
Lets assume traffic has occurred, whereby event-processor
has received that message from
upstream, with the tracing token foo
, sending (event-processor,foo,TIME2)
. Observatory does
some internal calculations, notes the equal tokens and successive time stamps, giving us this:
Lets assume the same thing happens for the other nodes, journal
and database
, giving this:
This information tells us traffic has occured once between all the nodes. This configuration is useless! Let’s configure some edges.
Edges¶
An edge means communication between two nodes. It is configured thusly:
[[stream.edges]]
name = "http traffic sent to processor"
from = "web-server"
to = "event-processor"
name: | A description of the edge |
---|---|
from: | Edge start node ID |
to: | Edge destination node ID |
The node IDs are matched against sent tracing information. An edge like above will define two nodes:
web-server
and event-processor
. Node ID matching in tracing is case sensitive.
You may guess that an edge without health checks is useless, because this doesn’t really tell us what to look at. So for this we need health checks.
Checks¶
Health checking means that data must flow in the stream under a certain time period. Health checks generally possess a kind and thresholds. The kind is what metrics are used to define the health check, and the threshold defines how many times the check must succeed or fail until a change is triggered.
If we have an edge (web-server,event-processor)
, we can define that if web-server
receives a
request, Journal must correlate it within N
seconds (or any other time unit).
The syntax for a temporal health check is this:
[[stream.edges]]
name = "http traffic sent to processor"
from = "web-server"
to = "event-processor"
[stream.edges.check]
kind = "time"
within = 500
unit = msec
ok = 3
warn = 0
nok = 1
within: | The time window as a number |
---|---|
unit: | The time unit (see Accepted time units) |
ok: | Optional How many times the check must succeed before setting OK status (default: 1) |
nok: | Optional How many failures we allow before setting NOK status (default: 0) |
warn: | Optional How many failures we allow before setting WARN status (default: 0). Note: if
you set this field, Observatory will slap you if you set warn > nok . |
Once data starts flowing, and we’ve received four requests, of which all have passed, we get an output like this:
A configuration like this can be defined between any two nodes in the graph, and there can be any
number of them. The from
and to
fields are limited to the configured nodes.
Unit | Accepted inputs |
---|---|
year | year, y |
month | month, mo |
day | day, d |
hour | hour, h |
minute | minute, min, m |
second | second, sec, s |
millisecond | millisecond, msec, millis, ms |
microsecond | microsecond, usec, us, μs |
nanosecond | nanosecond, nsec, ns |
Example¶
This is how the edge in the example figure was configured, in the example:
[[stream]]
name = "my-sample-system"
nodes = ["web-server", "event-processor", "journal", "database"]
[[stream.edges]]
name = "web server to event processor"
from = "web-server"
to = "event-processor"
[stream.edges.check]
within = 10
unit = "sec"
min = 3
warn = 0
fail = 0
[[stream.edges]]
name = "event-processor to database"
from = "event-processor"
to = "database"
[stream.edges.check]
within = 500
unit = "ms"
min = 3
warn = 1
fail = 2
[[stream.edges]]
name = "event-processor to journal"
from = "event-processor"
to = "journal"
[stream.edges.check]
within = 500
unit = "ms"
min = 3
warn = 0
fail = 0
If you can’t make sense of TOML, here’s the equivalent JSON:
"stream": [{
"edges": [
{
"check": {
"fail": 0,
"min": 3,
"unit": "sec",
"warn": 0,
"within": 10
},
"from": "web-server",
"name": "web server to event processor",
"to": "event-processor"
},
{
"check": {
"fail": 2,
"min": 3,
"unit": "ms",
"warn": 1,
"within": 500
},
"from": "event-processor",
"name": "event-processor to database",
"to": "database"
},
{
"check": {
"fail": 0,
"min": 3,
"unit": "ms",
"warn": 0,
"within": 500
},
"from": "event-processor",
"name": "event-processor to journal",
"to": "journal"
}
],
"name": "my-sample-system",
"nodes": [
"web-server",
"event-processor",
"journal",
"database"
]
}
]}