Code and Chips

Application logging

2021-09-17T00:00:00+00:00

Diagnosing unexpected issues in long-running background processes can be hard. The issue may only arise rarely and it may be difficult to reproduce outside of the production environment.

There is quite an array of information on how to implement logging, but less on what to log. This blog entry offers some guidelines to a solid approach to application logging that will help reconstruct the timeline of events that happened before a software failure.

So without further ado, let’s get stuck in. Here are my tips for writing application logs.

Remember why you are logging

In a nutshell, an application that works perfectly doesn’t need logs. It’s only when something goes wrong that logs are even looked at. So when writing code that emits logs, aim to provide relevant contextual information of what the software is doing.

Keep in mind who reads logs

This could be a support analyst or another developer on your team and they will not necessarily have the same depth of knowledge of the application as you do. Aim to provide them with the information that they need to figure out what the application was doing before it failed.

Add detail to each log

Imagine that you have been asked to investigate an issue that affects some users when they login to your system. You recover the logs and start browsing through. Would you prefer to read ‘Logging on’ or ‘Logging on user {userName} at URL {loginUrl}’?

Maybe the issue being investigated is only related to a particular user, or maybe the login URL has been incorrectly configured. You can’t know that ahead of time, so provide relevant contextual information in the log message.

logger.Trace($"Logging on user {userName} at URL {loginUrl}...");

Avoid ambiguity

Be clear about whether an event is about to happen or has already happened.

Prefer the present continuous tense for something that is about to happen: ‘Trying to connect’ or ‘Querying DB’
Use the past tense for something that just happened: ‘Successfully connected’ or ‘Queried DB’ or ‘Failed while…’

Add a log for each external call

Before the call emit a trace.
If the call succeeds emit a second trace.
If it fails then emit an error and include the exception message and call stack if available.

try
{
    logger.Trace("Querying DB for product list...");
    var products = db.GetProducts();
    logger.Trace($"Got {products.Length} products");
}
catch (Exception e)
{
    logger.Error(e, "Failed to query DB for product list");
    throw new Exception(e, "Failed to query DB for product list");
}

Log all exceptions as they happen

Log exceptions systematically, even if they are handled by the code. This may help if the exception is not being handled correctly, or the exception is a precursor to a later software failure.

try
{
    ... // Send an email
    logger.Info($"Sent email to {emailAddress}");
}
catch (Exception e)
{
    // Log the error
    logger.Error(e, $"Failed while sending an email to {emailAddress} using smtp server {smtpServer}");
    // Rethrow with extra contextual information
    throw new Exception(e, $"Failed while sending an email to {emailAddress} using smtp server {smtpServer}");
}

Use a logging framework

Logging frameworks like Microsoft.Extensions.Logging, NLog and Log4Net offer many options. A few common concepts are log targets, levels and rules, which are managed in configuration and can be changed on the fly without any code changes.

Targets define where the log is sent, from console output, through rotating timestamped files, to advanced network distributed logging.
Log levels indicate the significance of an event and can be used to filter log entries.
Log rules allow you to apply filters to log entries and route them to logging targets.

Use log levels

The logging framework will provide different severity levels. NLog gives this handy list of log levels, with examples of typical use:

Level	Typical Use
Fatal	Something bad happened; application is going down
Error	Something failed, application may or may not continue
Warn	Something unexpected; application will continue
Info	Normal behaviour like mail sent, user updated profile etc.
Debug	For debugging; executed query, user authenticated, session expired
Trace	For trace debugging; begin method X, end method Y

Use rolling logs

If logging to file, make sure you use rolling logs, and if available use a disk partition that is separate from the OS. Without these precautions, over time the log files will grow, and you don’t want to fill up the OS partition with logs.

Don’t pretty print exception messages

Surface up all the gory details of the message and stack trace. Logging frameworks will provide a way of formatting exception messages and stack traces, let them do their job. You don’t want to hide details that might be useful.

In one company where I worked there were a high number of unhandled exceptions with just the information ‘Null reference exception’ but no call stack or contextual information. It was basically impossible to resolve any of those issues without going back to the code and fixing the exception handling first.

Don’t leak secrets

Redact passwords and keys. You can’t be certain of the role of the person reading the logs. The log contents may end up in an email chain in a support request. So you should take care when logging sensitive information. A common practice is to replace secrets and passwords with asterisks. Or you might decide to show the start or end of the secret and replace the rest with asterisks.

In summary

Logs are your friend, they can help you get out of a tight spot. So look after them, log often and log clearly.

I don’t think I have ever complained about having too many logs. However badly written logs, ambiguous logs, too few logs or no logs at all make diagnosing and fixing production failures difficult.

Scrum and organic growth

2021-03-31T00:00:00+00:00

The agile manifesto has been around for twenty years now, and probably the most frequently used approach is using scrum. It’s the best project management method for software development we have. But there are some pitfalls, and this article is about one of them.

Scrum is great for flexibility, concentrating on business requirements, delivering new features as soon as they are complete, and quickly adjusting when business requirements change. It is driven by business requirements as set out by the product owner and it puts delivering value at the heart of the software development process. Find out what the user needs to be more productive and concentrate on delivering that.

What it completely misses is the long-term cost of making decisions driven by short-term gain. If left to run wild, technical debt can be the unwanted side effect. There is no inherent architectural design and there is no short-term incentive to keep a tidy code-base. And of course none of these components of a healthy software project is visible to the product owner who is running the show.

Does this scenario sound familiar to you: pressure is on you to deliver new features without adjusting the underlying architecture. After a while the code starts getting entangled because developers are encouraged to take the shortest path, and you only realize when your technical debt starts to bite. This can lead to a down-turn in productivity, initially features were being shipped regularly to production, but after a while new features become harder to add. In extreme cases, you might even cut your losses and choose to rewrite, with all the risks of failure that restarting from scratch entails.

So what do I mean by organic growth? I have this mental image of code being like a bunch of plants in a garden. The flowers are the features, while the stems and roots are all the support code needed to support them. The scrum method focuses on features; it nurtures the flowers. But the supporting structures are neglected, they are just not considered valuable. The stems and roots are left to multiply in whatever way makes the most flowers.

Scrum can lead to a disorganized jungle of tangled shoots, whereas we as software developers would prefer to work in a meticulously-maintained French ornate garden. It is a constant battle against entropy, there is an energy barrier to be overcome to relax the code back into its lowest energy state.

So what can we do? Empower developers to keep their backyard tidy. Include them in the scrum preparation process and ask us when time is required for refactoring work and take account of it in the sprint plan.

If no-one is keeping on top of code quality, it will quickly get out of control.

Real book adventures

2021-02-17T00:00:00+00:00

My hobby projects often combine two or more of my interests, and music has been a recurring theme for me. This post is about music theory, coding and data modelling.

First though, what is a real book? Well you could say it’s the bible of jazz musicians. It’s a book of jazz songs or standards, each standard has title, composer, and a musical score where you can find the melody and chord chart. In short it’s a kind of sheet music.

A chord chart represents harmony. It’s the list of chords, in order, that an accompanist needs to play while the singer sings the words or soloist plays the tune. It’s the harmonic structure that underpins the melody of a song.

So one day I got to wondering what defines a musical style. Now there are lots of different elements to music including rhythm, tempo, orchestration, etc. but one of the key components is harmony. For example many blues tunes need just three chords, whereas some forms of jazz fusion harmonize each note of the melody with a different chord.

It would be kind of neat to be able to model harmonic structure. It might be possible to create a chord chart generator for example based on existing songs. And the first step to building a model is gathering together some data. That is the subject of this post. So let’s make a start by considering some options.

Manual data entry is one, but that would be fastidious, error-prone, and we software developers are lazy by nature. Far too much like hard work for a hobby project.

Scraping online song charts is another possibility, and I had a quick look at parsing sites like this one with Beautiful Soup and a simple script like the one below. There may be some mileage in this line of attack.

soup = BeautifulSoup(html)
spans = soup.find_all("span", attrs = {"data-name": re.compile(".*")})
chords = [span["data-name"] for span in spans]

And then I considered using the iReal Pro forums. iReal Pro is a mobile app that plays songs as backing tracks. It’s a great tool for music practice and some nice MIDI audio (apart from being a fan, I’m not in any way connected to the app, but it is really awesome). But the good news here is that there are songbooks available for download, and the data is sufficiently well structured for the app to play them so parsing it should also be possible.

And there is more good news: the format is documented, although closer inspection reveals that the documentation is incomplete and there is a totally undocumented obfuscation thing going on. Now I should mention I am not the first to tread this path, there are already a few repositories on github that decode the format, in particular accompaniser has a nearly complete description of the grammar.

Let’s now take a deeper look at the iReal Pro data format. Each song chart has some meta data and then a description of the score. The example given in the documentation page looks like this:

irealbook://Song Title=LastName FirstName=Style=Ab=n=T44*A{C^7 |A-7 |D-9 |G7#5 }

However the actual href elements do not follow this URL scheme. For example, the start of the first URL on the forum page after URL decoding looks like this:


irealb://26-2=Coltrane John==Medium Up Swing=F==1r34LbKcu7ZL7bD4F^7 ZL7F 7-CZL7C...

At first glance, this looks to follow the same format, but there are a few surprises. First thing to note is that songs are concatenated into a song book, although you can’t see that in the example above.

Next up, there’s something weird going on at the start of the chord progression. The first sequence of characters is not documented and makes no sense at all. The answer is in accompaniser on this line, each song is separated by this sequence of characters:


irealb://26-2=Coltrane John==Medium Up Swing=F==1r34LbKcu7ZL7bD4F^7 ZL7F 7-CZL7C...

But that’s not the end of the surprises. When you split the URL into songs using the separator sequence, the resulting chord charts are garbage. There is a further undocumented mechanism that scrambles the chord progression. That is also solved by accompaniser here.

And there’s still more. Once you split into songs and unscramble the chord progression, some of the symbols in the chord progression are not documented at all.

Ignoring the scrambling for the moment, each url contains a bunch of song charts, a bit like this:

URL
- Song
  - Title
  - Composer (Last name, first name)
  - Style
  - Key Signature
  - Chord Progression
- Song
- …

To decode each URL, I first broke it down into separate songs and split the meta data. You can find some C# code to do this on github. I then ran pandas profiling over the meta-data to see what I could find.

import pandas as pd
from pandas_profiling import ProfileReport

style = 'Catalog'

df = pd.read_csv(
  style + '.csv',
  sep=';'
  usecols=[
    'Style',
    'StyleId',
    'KeySignature',
    'KeySignatureId',
    'TimeSignature',
    'TimeSignatureId'])

profile = ProfileReport(
  df,
  title=style,
  correlations={
    "pearson": {"calculate": True},
    "spearman": {"calculate": True},
    "kendall": {"calculate": True},
    "phi_k": {"calculate": False},
    "cramers": {"calculate": False},
  })

profile.to_file(style + '.html')

So to whet your appetite, here is a quick visualization of key signatures filtered by styles:

Key signatures when I filter for styles including the word ‘Pop’ are dominated by keys easy to play on guitar and keyboard (C, D and E). And when I filter for ‘Swing’ styles, I get a different set of key signatures preferred by jazz solo instruments (saxophone Eb, Bb, trumpet Bb).

That’s all for now, but in a future post I will describe how to linearize the chord progression, in other words how to play it like a musician would to produce a single linear sequence of chords. Once that is done, it should be possible to build some kind of model, maybe based on Markov chains, that can generate completely new chord progressions.

Minimal Scope

2021-02-07T00:00:00+00:00

Code complexity can be split in two: the innate complexity of the problem, and the accidental complexity that has been allowed to creep into the solution.

So an obvious question is can we reduce accidental complexity, and will it help us to write more expressive code?

Reducing scope reduces complexity

So what do I mean exactly by scope?

Let’s review a simple code example to explore some ideas. Suppose we’re writing a service in a web app that needs to log user sessions. A reasonable implementation, one that would sit comfortably in many codebases, would be something like this:

void LogSession(HttpContext httpContext, User user)
{
  var url = httpContext.Request.RawUrl;
  var username = user.UserName;
  this.db.tblSessionLogs.Add(new SessionLog(username, url));
}

This code is not bad code: it’s fairly clear in what it does, the method is named appropriately, there are no real surprises.

However I would argue that it has a weakness: the method signature is far too generous. Both arguments are rich objects that contain all kinds of properties, and yet the method only needs two key pieces of information.

And this is really what I mean by scope. Another term could be the size of the execution context. I can imagine a code metric that quantifies it by something like the number of objects that could be accessed. It would be a count of all variables, local or global, and their properties, and their properties’ properties, etc. A count of all the leaves of the object graph that is in scope.

Reducing scope means the object graph that is available is smaller. As a consequence, the developer has fewer concepts to keep track of and can instead concentrate on the job in hand, which is in this case writing session logs.

With the method signature as it stands, the code could potentially access all kinds of information from the HttpContext and from the outside there is no way of telling, the only way is to read the code.

As is often the case, the problem is laid bare when we try to write some unit tests around this method. To exercise LogSession, we would need to provide an HttpContext and a User object, and to determine which properties would need to be set, the only way forward would be to read the code.

Now I have nothing against reading code, on the contrary I positively love reading code. But to understand how a method works, it shouldn’t be necessary. It should be obvious what a method is going to do from the outside. We shouldn’t have to fish out the source code and wade through all the ifs and fors.

If UX design is centred around the user experience of an app, then when writing code we should be thinking of DX design: focusing on the developer experience.

So can we improve this method? Well let’s try by reducing the scope. To log a session, all that is required is a url and a username, so let’s see how that works out.

void LogSession(URL url, string username)
{
  this.db.tblSessionLogs.Add(new SessionLog(username, url));
}

By requiring two basic types we have refined the expectation of what this method does. We shouldn’t need access to the source code and so the developer experience is improved.

As a welcome side effect, writing a test becomes much easier; no need for partially initialized HttpContext and User objects.

Now you could say that I have just displaced the problem, and I think to some extent that is true. The calling code now needs to break down the HttpContext and User objects whereas before it just passed those in without any fuss. I would argue that the calling method is better placed to do that job. It will probably know all about the HttpContext and User objects since they are already in its scope.

But reducing scope brings another welcome benefit. The method is now more reusable. It doesn’t just log http user sessions but it could be used to log ftp sessions. Although the example is a little contrived, this generalisation comes about precisely because the scope has been reduced.

We may be able to apply the same reduction to the calling code too. It might be that it has no direct use of the HttpContext apart from when logging sessions. The code could be simplified several levels through the call stack as we work up through the calling methods.

So to wrap up, reducing the size of scope of a function is a simple technique, but it can bring several benefits: reasoning about the code becomes easier; writing unit tests is simplified; and the code may be more reusable.