Performance Driven Development

User choice is growing, whilst their patience is diminishing. We no longer have a captive audience of internal users of predictable number. Sometimes the assumption is that we can buy ourselves out of trouble, infinitely (auto) scale in the cloud. But if you’re not measuring and optimising performance how can you be sure this isn’t wasted spend?

Mention of the O word will often lead into a debate regarding the folly of early optimisation. It’s arguably pointless unless metrics for performance based acceptance criteria are in place, and there is no doubt that it burns additional cycles, and highly (and potentially over) optimised code can be hard to read and maintain ….. and all this with potentially no benefit to the product and user experience. However if it’s possible to measure performance with limited investment in time, doesn’t that make sense? Even if only to drive out benchmarks against which any post go-live optimisations are measured or to highlight performance regression in subsequent releases.

Benchmarking

Leaving aside the early optimisation debate, it’s indisputable that you can’t improve what you can’t measure. And for taking these measurements, the best tool out there is BenchmarkDotNet.

There is a load of material on their website, and elsewhere, on how to use BenchmarkDotNet, but by way of an intro I would sell it as; an incredibly comprehensive but low cost of entry performance measuring tool, that requires no instrumentation of your code, and takes measurements (benchmarks) with a unit-testing type of approach.

This means that if you’re already writing unit testable code it requires very little additional effort (literally measured in minutes) to also benchmark your code. Once you have the boilerplate in place (a very rudimentary ‘one line’ console app and single ‘benchmarking’ class as a minimum) you can simply annotate expression bodied functions that call the subject under test. For example:

[Benchmark]
public string NameFormatterBenchmark() => myClass.FormatName("Matt Steward");

As with unit testing tools, you can set-up and tear-down, both globally and per iteration, and use params. This allows the benchmark to test just the execution of the SUT, without also measuring the instantiation of the owning class, or parameters etc. There are myriad configuration options for the tool, but one I would strongly recommend is the MemoryDiagnoser. This allows GC collections and memory allocation to be included in your results, which are very useful for the extrapolate step described later.

Remember that at this point, we haven’t optimised (early or otherwise) and we aren’t attempting to reconcile our benchmarks with any acceptance criteria. We have simply, with minimal effort, driven out some reference benchmarks. On the subject of acceptance criteria, even if any have been given they’re unlikely to apply at this micro-benchmark level, the user will receive a response to their initial enquiry within 2 seconds when the system is managing 1000 active connections is hard/impossible to reconcile with a multitude of extremely granular micro-benchmarks. So if these benchmarks can’t be used to sign off on performance based acceptance criteria, what use are they?

Well the performance of a system is the (performance) sum of its parts, so if we can identify quick optimisation wins we are likely to improve overall system performance, and in doing so increase the likelihood of acceptance criteria being met.

Firstly, apply your gut. Experience will give you a feel for how long something should take. If your benchmarks are showing that your email address validation function takes 1sec to complete or allocates 1K, something is awry. Then, identify hot paths and extrapolate. Say you have a function that extracts a message identifier from every message your micro-services receive, and they receive two million messages an hour, extrapolate out how many GC collections will take place each hour or the total time lost to this function every hour. Does it frighten you? If yes, look to optimise. if you have your before benchmark in place, and good and proven unit test coverage, you should be able to iterate quickly with full visibility of any performance improvement (or regression) plus any inadvertent functional regression.

In terms of performance regression testing, you could bake this tool into your devOps pipeline. The tool does take a decent chunk of time to deliver its results (one area it does deviate significantly from unit testing), so I would not recommend including in CI, but you could definitely re-run the benchmarks against any prod candidate through automation, and highlight any performance regression/fail the release.

The .Net team’s continuous focus on performance improvement should translate into continuous performance improvements for us consumers of .Net Core. Some improvements are inherent, and can be acquired by simply upgrading to the latest version. but others, for example using Span<T>, require re-factoring. Having a baseline of benchmarks means that determining whether or not your re-factor has resulted in tangible improvements should be straight forward.

In Conclusion…

I think all the above can be considered a form of Performance Driven Development. Sure, we don’t start with a failing test, instead we start with a benchmark, and indeed this benchmark is neither a pass or fail at this stage. But it gives us a reference point. From it we can extrapolate whether or not the SUT is likely to cause performance issues in the wider context of the overall system, and use it as a benchmark to establish whether attempts to optimise are a success or fail. We can automate benchmarking within our delivery pipeline. I have been benchmarking for a while now, and creating a benchmark at the same time as building out the unit tests is a relatively small additional overhead, and has given some real gains in terms of early visibility of code that is performing outside the expected bounds.