Archive for the ‘New Ideas’ Category
Frequently in software engineering we face hard choices about system and architecture changes. This isn’t unique to software development, of course, but like many other fields we often have to weigh decisions about whether it’s better to fix, upgrade, or replace problem components.
Some decisions are straightforward and obvious, especially when the pain of sticking with what we have is great and a better alternative exists. It’s also easier when our software is in an early stage, especially if only a small number of users are affected. In other words, it’s easy to decide if the risk is low and the chance of a positive outcome is high.
But as our software matures and our user-base grows, decisions become harder. “Rip and replace” gets riskier, and therefore it makes sense to have some kind of process or model to help you weigh the risks and make a good decision.
One analytic framework for architecture or other potentially risky change decisions can be found in the domain of Risk Arbitrage. Risk arbitrage is an area of financial speculation where an investor tries to take advantage of the discrepancy in stock prices between two public companies involved in a merger or acquisition (M&A). Public companies in M&A create opportunities for investors to profit if they can forecast whether or not the merger will go through. The key to successful risk arbitrage is being able to assess the risk and likely outcomes of corporate mergers, as well as the likely results on stock prices, and to use this information to make investment decisions.
Expected Value is a concept from risk arbitrage that you can apply to other fields including software engineering. The idea of Expected Value is to come up with a single number that measures potential gain versus potential loss based on estimated probabilities. The formula can be written as:
Expected Value = (Forecast Gain x Success Probability) – (Forecast Loss x Failure Probability)
In the case of risk arbitrage, Expected Value is used to compare the estimated stock price gain if an acquisition is successful versus the expected stock price reduction if an acquisition fails. By estimating the stock prices and the probabilities, a risk arbitrageur can use Expected Value to guide his or her investment decision.
To use this concept in software engineering decisions, you need to start by finding a way to forecast gain or loss for a specific project. Example measures are:
- The number of minutes or hours of system problems or downtime
- The count of system failures or outages or reported user issues
- The response times of specific components
Any of the above can work. What you need to do is to choose one that is applicable to your project and then estimate the potential gain or loss. For the gain, forecast your expected results. For the loss, forecast what could realistically happen if something goes wrong when you put the change into production. Finally, you need to estimate the probability of success (no unexpected problems in rollout) and failure (unexpected problems in rollout).
For example, let’s say you’re considering an upgrade of a core component in your system. Suppose that bugs in the current version are causing four hours of production problems per month, and you believe the new version will eliminate the production issues. But if the upgrade fails, it could cause ten hours of system problems while you either fix the issue or rollback. You estimate the likelihood of success as 90%. You can calculate your Expected Value as:
Expected Value = (4 hours gained x 90%) – (10 hours lost x 10%) = 3 hours gained
So how can that help you make a decision?
First of all, you’re forcing yourself to think analytically about the project. You have to identify a measure and forecast the likely gain and the potential loss, and you have to estimate the probability of each. Clearly, if you realize that the potential gain is low or the likelihood of failure is high (which in either case might result in an Expected Value near or less-than zero), then that can help you clarify that the potential project is either too low-value or too high-risk.
Second, you can now compare the Expected Value to your current situation. In the above example, the current version is causing four hours (per month) of production problems and the Expected Value of upgrade is a gain of three hours, or a 75% gain. Looking at it this way, Expected Value can help you decide if a project is worthwhile. Obviously it will depend on the details and the effort required for the project, and you can develop your own rule-of-thumb, but personally I think that an Expected Value that shows 50% or more improvement indicates a project that is at least worth serious consideration, while an Expected Value that shows less than 20% improvement indicates a project that may not be worth the effort.
Expected Value is an analytic technique to frame your thinking and analysis on a decision. You will need to decide how high the Expected Value should be to make a project worthwhile. Of course, Expected Value alone is not enough to make a go or no-go decision. In the next article in this series I will cover another analytic concept that can be used to help in critical software engineering decisions.
The folks at Business of Software have published an article I wrote discussing one simple way to measure teams. The technique borrows some philosophy from Clayton Christensen’s new book How Will You Measure Your Life? The technique involves a single session in which a team identifies the qualities that matter most and then self-evaluates on those qualities, and the idea is meant to be useful for engineering but also other types of teams (like marketing or sales). You can read the full article here.
We’re at the start of another major league baseball season. Hope springs eternal for every team, of course some to a greater or lesser extent. Here’s a simple question: why are some fans more hopeful or confident of their team’s chance of success, while others are less hopeful or downright pessimistic? The answer has to do with a simple concept we can call Variance, and it’s a useful concept to consider in the analysis of any team, including software teams.
The concept of Variance is tied directly to track record and predictability. An individual (or team) with a consistent track record as measured for a specific area or skill over time, and for whom factors have not changed significantly, can be categorized as Low Variance. A Low Variance individual or team, therefore, is more predictable, meaning that their consistent track record is more likely to be repeated in the “near-term” future. Alternatively, an individual (or team) with an inconsistent or insufficient track record, or for whom critical factors related to performance have changed significantly, can be categorized as High Variance. A High Variance individual or team is less predictable, meaning that there is a greater likelihood that they will do much better or much worse than might be guessed from what was seen before. In other words, High Variance means that there is a much wider range of outcomes, good or bad, that have a higher probability to occur.
A simple way to categorize whether an individual or team is High or Low Variance would be to assign subjective percentages to how likely they are to perform much better, similar to, or much worse than before. For example, a Low Variance individual or team might be projected as:
- 15% Likely to Perform Significantly Better Than Recent History
- 70% Likely to Perform Similar To Recent History
- 15% Likely to Perform Significantly Worse Than Recent History
Alternatively, a High Variance individual or team might be projected as something like this:
- 30% Likely to Perform Significantly Better Than Recent History
- 40% Likely to Perform Similar To Recent History
- 30% Likely to Perform Significantly Worse Than Recent History
In the Low Variance example, the total probability of variance from recent history is placed at 30%, while in the High Variance example the variance probability is 60%. By definition, High Variance has a higher chance of “upside” but a higher chance of “downside” too.
So what does all this have to do with baseball or software teams? The answer is that while having more High Variance individuals may make you feel more hopeful about your team’s upside (hello Royals and Astros fans) the probability of multiple “risks” paying off is low. Having more Low Variance individuals (hello Yankees and Red Sox fans) is a much better prescription for repeatable success.
To illustrate, suppose there are two teams with twelve individuals each. These teams are made up of veterans with consistent track records, some high performing and some medium performing, and “unproven” team members who don’t have a significant track record and who lack backgrounds that would make you highly confident in success. Let’s call the first team “Sky’s-The-Limit” because they have a lot of unproven individuals who they are hoping will be a big success. Let’s call the second team “Steady-As-She-Goes” because they have a lot of medium performing veterans. The team make-ups are:
- Sky’s-The-Limit team members: 1 high performing, , 2 medium performing, 9 high potential
- Steady-As-She-Goes team members: 1 high performing, 8 medium performing, 3 high potential
The question is: which team is likely to have better results?
Assume that 1-in-3 of the High Variance individuals becomes a high performer, 1-in-3 becomes a medium performer, and 1-in-3 becomes a low performer. Then the results would be:
- Sky’s-The-Limit results: 4 high performing, 5 medium performing, 3 low performing
- Steady-As-She-Goes results: 2 high performing, 9 medium performing, 1 low performing
One view of these results would be that “the two teams perform exactly the same because the high-performing individuals make up for the low-performing ones.” But the reality on many teams is that the problems and issues related to low-performing individuals is disproportionate. It’s the “weak links in the fence” theory, essentially that teams with the greater number of low performers are dragged down by those individuals. In this view, the Steady-As-She-Goes team is stronger, in that it has a significantly smaller percentage of low performers (8% vs. 25%).
To the extent you can abide and succeed with a certain number of low performers, maybe because you make up for it in numbers or in enough known high performers, then having a bunch of High Variance individuals who could end up as low performers might not be a concern for you. Maybe it could even be a known part of your team-building and success-building strategy, like the venture capitalist strategy of betting on a bunch of long-shots in hopes that one or a few will pay off big. But, in general, look out for pinning your team’s hopes on too many High Variance individuals. A set of steady performers are an important part of most successful teams, while teams that take too many personnel risks usually don’t get positive results.
To read how some High Variance players might factor into the chances of winning for your favorite baseball team, you can check out Jonah Keri’s article on the subject at Grantland.com. If you do have a rooting interest, here’s hoping that your baseball team has a great season.
If you follow baseball statistics or read baseball sportswriters like those at ESPN.com, over the last few years you may have heard about VORP or WAR. These intense-sounding acronyms stand for Value Over Replacement Player (VORP) and Wins Above Replacement (WAR). While the method of how these are calculated for baseball players is somewhat arcane (and in fact there are at least 3 different methods that people use to calculate WAR) the concept itself is simple and persuasive, which has led to the increased popularity in these statistics.
The general concept behind VORP and WAR is that a good way to identify the value of a player is to rate them relative to an average replacement player. For example, let’s say you want to rate the 3rd baseman for your favorite baseball team. You could look at his offensive and defensive statistics, and compare the details to 3rd basemen on other teams. But VORP and WAR attempt to create a single number that rates your 3rd baseman against an “average” replacement 3rd baseman. An above-average VORP/WAR means that your player contributes more than an average replacement.
This is based on common statistical techniques (various comparisons to average). A similar idea can be useful as a way to categorize coders and their skills. I call this Value Above Replacement, or VAR. Unlike baseball’s VORP or WAR, which seek to create a single metric to rate baseball players, the concept of VAR can be applied to any metric. The formula is based on using standard deviation, as follows:
For any given metric X –
- Calculate X for every coder
- Calculate the average (mean) of X across all coders
- Calculate the population standard deviation of X across for all coders
- Calculate VAR X for each coder as VAR X = ((X of coder) – (average X)) / (standard deviation X) and then truncate
For any given metric, this shows you how many “standard deviations” each coder is from the average. If you are familiar with normal distributions or bell curves, this applies the same concept. Someone who is in the top 3% for a given area (measured by metric X) might be a +2 for example, meaning that the person is 2 standard deviations above average (and, conversely, someone who is in the bottom 3% might be a -2).
Those of you whose teachers “graded on the curve” in school might be recoiling at the thought of applying this to coders. But as I’ve mentioned in other writing on metrics, this isn’t a grading system. Metrics are best used as a categorization system useful to help you more objectively identify or confirm the strengths and weaknesses of individuals and teams. In this regard, VAR can be extremely helpful as a way to focus on the most meaningful data, namely the distribution and categorization of contributions and skills.
For example, let’s say you have 7 coders on a software team, and you measure their productivity by looking at the number of tasks each coder completes and the complexity of each task (this metric is called Points in Codermetrics). For a one month period, you might have data like the following:
- Coder A Productivity = 24 tasks completed x average task complexity 2 = 48
- Coder B Productivity = 20 tasks completed x average task complexity 2 = 40
- Coder C Productivity = 26 tasks completed x average task complexity 1 = 26
- Coder D Productivity = 38 tasks completed x average task complexity 1 = 38
- Coder E Productivity = 17 tasks completed x average task complexity 3 = 51
- Coder F Productivity = 22 tasks completed x average task complexity 2 = 44
- Coder G Productivity = 15 tasks completed x average task complexity 3 = 45
For this group, then, the average productivity for the month (rounded) is 42, and the standard deviation is 8. With these values, you can calculate the Productivity VAR for each coder. I’ve created an example Google Docs spreadsheet which you can access here and it has also been posted in Shared Resources. Below is a screenshot showing the calculated values for this group.
This provides a good example of how Coder VAR helps. If you just look at the Productivity metrics, as highlighted in the pie chart, it isn’t that easy to see how the values are grouped, and which values stand out as separate from the others. With Productivity VAR, you can easily see that there are three groups, one high (Coder E at +1), one low (Coder C at -2), and one in the middle (everyone else at 0).
In studying codermetrics for your software team, this is often the kind of information that can be extremely useful. How many (if any) coders are above-average or below-average in a specific area? What specific areas of strength might be lost if someone left and was replaced by an average coder? Do areas of strength or weakness correlate with coders’ level of experience, and what areas of weakness might be improved?
Coder VAR also provides a useful way to discuss and convey key findings from your metrics. This is part of what has driven the popularity of WAR in baseball. It’s easier to understand if you say “Coder E is plus one for Productivity” or “Coder B’s Productivity is average,” than if you say “Coder E’s Productivity was fifty-one” or “Coder B’s Productivity is forty.” Or if you are looking to hire a new coder, it might be useful to know and discuss that you are looking for “a coder whose Productivity is plus one.”
There are some limitations to be aware of when using VAR. For example, VAR draws a line that separates values that may in fact be close, such as Coder A (Productivity 48 and Producivity VAR 0) and Coder E (Productivity 51 and Productivity VAR 1) in the data above. This points to the usefulness of VAR, which is as a method of categorization, not a method of detailed analysis and certainly not a method of grading. Also, you should be careful about VAR analysis of coders who are known to have different levels of experience or who have very different roles on your team. Coder VAR is best used as an analysis of coders who are generally similar in experience and roles, and therefore you might want to analyze VAR separately for your senior and junior coders, for example.
The biggest limitation of Coder VAR is that it is clearly relative to your population, so it is limited if your population (the number of coders analyzed) is small and isolated. For example, if you have a team with three senior coders, then it can be somewhat useful to look at their Productivity VAR. But it would also be helpful to know how the coders’ Productivity compares to other senior coders, either on other teams in your organization, or in other organizations. Maybe your senior coders are all similarly productive (Productivity VAR of 0 within the team) but maybe they are highly productive compared to other senior coders (Productivity VAR +1 or +2 when compared across teams). This is a general problem with codermetrics, namely the lack of normalized data we all have and share, and something I hope to address more in the future. For now, those in larger organizations would be able to address this by measuring across teams and applying techniques to establish normalized baselines (something also discussed in my book).
As with other metrics, however, if you are aware of the limitations then Coder VAR can still be very useful. It can help you to increase your understanding of team dynamics, to identify and analyze the characteristics of successful teams, and to plan ways to improve your software team.