This is the second article in our "Help Your Future Self" series on how to do logging well. Check out the first article on considering your audience while logging.
If you read our previous post, you know that logs are about laying out enough information that the reader can garner a sufficient understanding of what an application has done in its past to further one or more of a specific set of new objectives. Each of these objectives is different, and the key to achieving them is having enough context to understand the relevant actions taken by the application. In a future post, we will explore some specific use cases, but before we do that, there are some bits of context that are essential to pretty much every imaginable use case for consuming application logs.
There’s a reason the timestamp is usually the first field in any log message. In most cases, it’s not enough to know that something happened, you also need to know when it happened. There are a few fine points that you want to get right.
There are a ton of date/time formats out there, but you should probably just use RFC 3339. If you aren’t familiar with RFC 3339, and you find yourself thinking what’s wrong with good old ISO 8601, put down your fountain pen. There’s no need to draft a formal protest letter. RFC 3339 is just a profile of ISO 8601 which makes a few decisions about things which ISO 8601 leaves as available alternatives. It’s considered the standard date/time format of Internet protocols, and while you are probably not writing one of those, you can still take advantage of the interoperability problems it solves.
If RFC 3339 or another ISO 8601 profile doesn’t do it for you, there are plenty of other formats out there. Still, try to ensure whatever you use has at least the following properties:
- formatted date/time strings sort chronologically
- time zone information is included
A common bit of guidance you’ll see is to use UTC everywhere. This isn’t bad advice, but in keeping with the idea that logs are often read by humans, it can help your readers a great deal to have timestamps in their local time. If you’ve picked a good date format, the time zone information should be part of the format, so you can always use a tool to reformat the timestamp in another time zone for convenient reading. If, for some reason, however, you can’t use that tool, and you know all of your readers will be consuming the log from a single specific time zone, it may make sense to log in that time zone. But... probably just use UTC.
Another problem you can run into occurs when your timestamp’s granularity is too coarse. Ideally, no two log messages should share a timestamp (at least not from the same log source). If they do, tools which sort log messages according timestamp may paint a confusing picture by presenting events in a different order than that in which they occurred. To avoid this, you can either ensure that your timestamp includes a sufficiently high-precision fractional second or add a sequence number to each log message.
While not strictly necessary, a coarse log level is a very common and helpful field to include in your log messages. Typically this looks something like the following:
- FATAL / PANIC / CRITICAL / EMERGENCY — a severe event, very likely indicating the failure of a significant component or even the entire application
- ERROR — an application-level or system-level event not severe enough to impede continuing application or system function
- WARN / WARNING — a noteworthy event that, while not in itself a failure condition, could indicate degraded application function or could contribute to a failure condition given compounding events
- NOTICE — a significant event that may require special attention or intervention
- INFO — a normal event, notable enough to keep a durable record of having occurred
- DEBUG — an event or detail regarding an event which is useful for investigating the correctness or performance of some part of an application and is not generally needed to understand what has occurred during normal operation of the application
- TRACE — similar to debug, but typically with much more detail and/or higher granularity and/or frequency
The levels above are ordered to facilitate filtering either when the log event is being produced or when the log event is being consumed (i.e. “only show me warnings or worse”), but that may not be the dimension you care the most about. Sometimes a specific error can be more important to you than a component panic. If you do use some sort of logging framework to control whether log messages at or beyond a certain threshold get logged, pay very close attention to both the threshold and the level of each particular message. You don’t want to miss a serious problem by accidentally throwing its log message into the void.
Category / Component
The most common expression of a log category is found in the Java ecosystem. In a typical Java application using Log4J, for example, log messages will be prefixed either with the fully-qualified or arbitrarily truncated class name (e.g. com.wrble.logging.Logger or logging.Logger). Especially in a large system, this allows log consumers to narrow their focus to one or a few of the specific components they are working on without being overwhelmed by the noise of the complete system. Using a hierarchical category like a class name can make it very easy to dial in at just the right level of granularity, from debug logs of very specific components to everything from entire packages.
Not all logging frameworks have built-in support for a category field, but that doesn’t mean it shouldn’t be considered a best practice. If your logging facility does not have this concept, consider rolling your own analogue. If you organize your application into modules of any kind, logging the module name as the category is simple and effective.
The concept of a log source isn’t applicable to all applications, but it is absolutely critical for some, like a distributed application or one where you are managing a large number of remote clients. Assuming that every source has some sort of unique identifier, you should include it with every log message originating from that endpoint, along with a prefix of some kind that makes it easy to filter aggregated logs down to just those from a particular endpoint. For example:
2021-04-30T12:34:56.789Z - client=314159 - app launched
If you don’t have a unique identifier, other options might be the remote IP address or perhaps a session identifier. Avoid sensitive data like name, email address, etc. (more on this in a future post on privacy). Also, don’t mix and match identifiers — if you have two valid ways to identify the same log source (say, a user ID and a username), pick one and stick with it. Don’t force consumers of the log to read one, query the other, and build a log search query that returns all events matching both identifiers.
This should go without saying, but don’t forget to log the actual log message itself.
As state up top, the recommendations in this post are fairly general and apply broadly to logging in general for most applications. In the next few posts, we’ll take a look at different scenarios in which one may find oneself as a log consumer and describe some specific practices to help you as the log producer simplify the downstream task at hand.