Glossary

The mechanism is simple. Observatory listens to an input for events coming from different services. Each individual service is called a node. A group of nodes is called a stream. To form these groups, nodes are connected via edges. An edge is an implied connection between two nodes: it can be, e.g., a web server talking to a logging mechanism, or an event system pushing events downstream to a message queue.

The movement on a stream will be initiated by the root node. The message, which is said to pass throughout a stream, is tagged with a tracing ID. This way, the message can be tracked as it moves down the stream. All of these pieces are called elements, while checks are behaviours.

Below is a helpful glossary of the terms.

Node
An individual node in the network, e.g., a web service.
Edge
A connection between two nodes, e.g., from a web service to a database, or from a database to a logging system
Edge
A group of connected edges that define an abstract data stream, e.g. from a web service to a database to a logging system
Health check
How to identify whether individual edge traffic is healthy or not
Event
Information used to tell Observatory traffic has occurred
Tracing token
A unique tracing id that can be used throughout the stream to identify data moving accross the stream

Node

digraph { "web-server"; }

A node is an individual component in the stream. It can be a web service or any program that reads input from one place and produces output to another.

Attributes

name:A unique node identifier

Edge

digraph { rankdir=LR; "web-server" -> "event-processor"; }

An edge is a proof of connectivity between two nodes. You can define, e.g., that a web service A talks to web service B and by observing traffic between these two nodes, you can identify some

Attributes

name:The name of this connection (e.g. “web server to database”)
from:The source node of this edge
to:The destination node of this edge

Stream

digraph { rankdir=LR; "web-server" -> "event-processor" -> "data-warehouse"; }

A stream is a group of connected edges, a pipeline of nodes. It represents the movement of individual messages within an observed system. The above figure illustrates, in very broad terms, that each message from web-server will move to event-processor and from there to data-warehouse.

Attributes

name:The description of the stream
node:The nodes inside the stream
edges:The edges of the stream

Health check

Monitoring stream traffic is of little interest if you don’t define how traffic should move. For example, from observational data (by analyzing logs, etc.) we can say that requests from web-server should reach event-processor within 300ms. We define that the edge traffic is “OK” when, for a hundred requests, or any such number, eighty must make it to event-processor. in this time. If this doesn’t happen, we say that there is something wrong in the connection.

A health check defines three thresholds: the OK threshold, the WARN threshold, and the FAIL threshold. An individual observation window is the sum of the thresholds. If you define 3 for all thresholds, this would create a sliding observation window of 9 events.**

Note: You must have OK >= FAIL >= WARN, otherwise the observations don’t make sense.

Attributes

within:The time window for the edge
unit:The time unit for the window (see Accepted time units)
ok:Minimum events that should pass in order to trigger OK
warn:Minimum events that should pass in order to trigger OK
fail:Minimum events that should pass in order to trigger fail

Event

An event is a signal to Observatory that a node has registered traffic.

Attributes

timestamp:A RFC3339 date-time or 64-bit integer in microseconds from Unix epoch time
node:The node from which the event is sent
tracing:The tracing token of this event

Tracing token

format:A unique string, preferably a UUID.

The tracing token is used to identify the movement of a message. When the message originates at the root node, the root node attaches a unique tracing token to the message. When that message is passed to the next node, e.g., in a HTTP/MQ header, the node uses that tracing token to inform observatory. That way, Observatory can identify that messages are moving successfully.