## Archive for the ‘**In The Field**’ Category

## Improving the Temperature Metric

In the book Codermetrics, I introduced a metric called Temperature to measure how ‘hot’ or ‘cold’ a coder is at any given time, which might be useful to know when you are planning new work or analyzing team performance. This is an idea borrowed from Bill James who introduced the idea of Temperature as a metric for measuring hot and cold streaks in baseball (you can read an overview of the baseball version here or watch a video of Bill James explaining it here). The formula that I used in the book sets the starting Temperature for any developer at 72 degrees (“room temperature” also borrowed from Bill James), and then moves the Temperature up or down based on the percentage improvement in “Points” accumulated by the individual in each development iteration. So the formula looks like:

Current Temperature = Previous Temperature * (Points in The Most Recent Iteration / Points in The Prior Iteration)

A reader, Alex Mera, recently pointed out that this formula is flawed in that it is entirely relative to each individual and that significant differences in early iterations make Temperature ratings difficult to compare. As Alex correctly stated, for example, “scoring low on the first iteration will raise your temperature on every subsequent iteration.” While the formula can show you the trend for each developer, two people performing similarly might have two very different Temperatures (based on the results of much earlier performance) and two people with the same Temperature might actually have very different recent performance.

Take for example two people, Coder A and Coder B, who have the following Points over twelve development iterations (you can assume for this example that Points are assigned to a completed task based on the complexity of each task):

In the example data, the Points per iteration are not that different except in a few places. In the first and second iteration, each coder has one iteration where the Points are 16, for Coder A it’s the first iteration and for Coder B it’s the second. The other noticeable difference is a rise in Coder B’s Points in later iterations.

If you use the Temperature metric as calculated in the book for this data, then the results look like this:

Although the Points were similar, the Temperatures are very different. This is because the “baseline” for each coder’s “room temperature” is the number of Points in their first iteration, which for Coder A was much lower than Coder B, resulting in much higher Temperatures overall for Coder A. In later iterations, when Coder B is clearly “hotter” than Coder A, Coder B’s Temperature is still lower. You can see the trends, and you could say that both Coder A and Coder B are “hot” when their Temperatures are for example above 90 degrees, but the difference highlights the kind of problem that Alex noted.

So what can you do to change and improve this? One technique would be to set the baseline “room temperature” in a different way. For example, if you knew that 24 Points was the average for a developer in each iteration, you could use 24 Points as room temperature, and compare the Points in every iteration to that. You might get this baseline by taking the average for all the individuals on the team (maybe just for recent iterations or maybe over a longer period) or you might only use the average for each individual (in other words, compare each individual to their own average). While there are a number of variations you could use, each with different benefits, the general approach can be described with the following formula:

Current Temperature = Room Temperature * (Points in the Most Recent Iteration / Points for Room Temperature)

If you set room temperature to 72 and the points for room temperature to 24 for both coders, then using the example data above you get the following results:

This appears to be a much “fairer” way to calculate Temperature, and it provides a good way to compare individuals since all Temperatures are relative to the same baseline. The variance based on the differences in the initial iterations is gone. Also, using this type of approach you are better able to evaluate what 80 degrees means versus 70 degrees or 90 degrees. On the negative side, however, you’ll notice that the graph of Temperature looks pretty much exactly the same as the graph of Points above. All you’ve really done is translate Points into a different metric, which may provide a different way to analyze Points, but the Temperature rises and dips exactly as the Points do.

So another improvement to consider would be to look at a group of recent iterations together, as opposed to one at a time. Since Temperature is meant to measure “hot” and “cold” it makes sense that it should focus on trends and not just on one period. To do this, you can use moving averages, which would modify the formula to the following:

Current Temperature = Room Temperature * (Moving Average of Points in Recent Iterations / Points for Room Temperature)

If you set room temperature to 72 and the points for room temperature to 24, and then you calculate the moving average over the three most recent iterations at every point, your results will look like this:

This approach, using a common baseline for room temperature, and making use of moving averages, probably gives you the best result and the best response to Alex’s concern. Temperature is more comparable this way, and less subject to isolated bursts or dips.

Other improvements that could be considered:

- Rather than just using Points, calculate Temperature from a combination of other metrics, taking into account other positive contributions (like helping others) to increase an individual’s Temperature and negative outcomes (like attributable production bugs) to decrease Temperature; this, however, would require a more complex formula for increasing and decreasing Temperature, so while I think this idea has merit due to the increased complexity I decided not to delve into the details in the book or here (I may do that at a later time but my main interest so far has been to share the idea of Temperature and “hot” and “cold” streaks for software engineers)
- Rather than only calculating the moving average of recent iterations, you could have multiple moving averages with different weights; for example, you could take the moving average of the three most recent iterations and weight that as 75% and then take the moving average of the three prior iterations and weight that as 25%, giving you a longer trend to feed into the Temperature formula

For what it’s worth, all of the possible improvements mentioned in this article are in line with the techniques used by Bill James in his baseball Temperature metric. Other, more detailed tweaks could be identified, too. But as usual I suggest starting with a simple approach that’s understandable and explainable for you and your team, knowing that you can add more complex metrics later.

## Cost of Implementing Metrics

Paul Loubriel asks “**What do you think would be the expense to the organization for implementing a program like the one you suggest (in terms of time and resources). For example, for an organization with 1,000+ developers?**”

Calculating the cost of gathering and using metrics within normal development processes isn’t something I’ve done in the past. To help analyze the costs, I created a simple spreadsheet, and I’ve added that into the Spreadsheet Template Resources (it is the one titled **Calculated Costs of Implementing Metrics**). The spreadsheet is setup so you can input numbers for your organization at the top, then it will calculate the costs below.

The initial costs for implementing metrics will be in any new tools you might have to buy, and the work involved in setting up the initial tools. If, as I suggest in Codermetrics and as I do myself, you just use existing tools or you use free software like the Spreadsheets in Google Docs, then your additional tools investment is probably zero. The other setup cost is the time it takes you to adjust spreadsheets to your liking, to setup data gathering processes (maybe new mailing lists), and to train everyone. This setup can probably be done by a one person or a small group and then shared with others, but for simplicity in the spreadsheet I’ve put placeholders to specify the hours for setup and training for both managers and coders.

The on-going costs will be the time spent gathering data, and the time spent by managers analyzing or preparing metrics to be used in team meetings or in other reviews. In the spreadsheet I’ve allowed input of weekly amount of time spent by coders and by managers. Putting in these numbers, along with average salaries, allows you to estimate the costs of the time involved. Note that in the spreadsheet I’ve made a few assumptions which you could adjust if you choose. One is that the inputs for time spent are weekly minutes, then these are multiplied by four to get monthly time (rounded to hours) spent. And the calculations use 1880 as the number of hours worked per year (which I arrived at by assuming 5-day work weeks, 8-hour days, and 47 weeks worked in a years). I’m also assuming you don’t hire or assign full- or part-time specialists to your metrics program, which is something that might make sense in a very large organization but is probably not something you would do immediately.

In my experience, the amount of time necessary for coders to gather the type of metrics I discuss in the book is small (since most of the data will come from existing systems), and the time spent by managers in data gathering, analysis, and preparation for presentation, might amount to two or three hours every few weeks. Assuming that time spent is something like that, then using the spreadsheet the costs for a fairly small organization might look like those shown in the screenshot below (this is basically the size of my current engineering team at Vocalocity). In this case it’s about $15K for the first year and about $14K per year after that.

If, as Paul asks, we calculate for 1,000 coders, assuming that setup and weekly time spent is the same, and guessing that maybe there are 180 managers in an organization that size, then the spreadsheet gives the estimated costs below. In this case, even though the time spent per week is modest, the year-one (~$965K) and annual (~$862K) recurring costs are of course much higher since so many people are involved.

You can add your own inputs and play around with the numbers or formulas in the spreadsheet to determine costs for your organization. The interesting thing I noticed about the time and calculations I put in, based on my own estimates, is that the cost of implementing metrics in the two examples above comes out to something less than but maybe around $1K per year per coder. I’m certainly interested to hear if others have feedback on this.