TP-PM.1.0
21 April 1997
MANAGING RISK WITH METRICS
A TERM PAPER
FOR THE
MJY TEAM
SOFTWARE RISK MANAGEMENT WWW SITE
Prepared for:
Dr. Richard Bechtold
SWSE 625
Prepared by:
Pat McNeece
Approved by:
P. McNeece
MJY Program Manager
| Figure 1. Software Personnel Indicator | 10 |
Abstract
Software intensive development projects still fail to be delivered on time, within budget, and with desired quality. One area of concentration in software project management that has developed to solve these problems is Risk Management, which attempts to assess and then control the risks that precipitate them. Concurrently, another management area that has received much effort and attention is software project management metrics, i.e. collecting measures and using the resulting metrics to gain insights into and control over software development projects. The connection between these two areas seems obvious: these project management metrics appear to be a useful tool for risk management. However, a survey of several risk management methodologies shows little mention of project management metrics as a tool to either assess or control risk.
This paper briefly discusses software metrics, including what they are and how they're used, and gives a sample set of attributes on which to focus a metrics program. It then surveys several risk management processes, looks at these steps of the process where tools are used, examines the tools being used, and then discusses using software metrics as a tool for risk assessment and control. One potential risk area, personnel shortfalls, and the corresponding measures and metrics are used as an example.
Despite much research and progress in the area of Software Project Management, software development projects still fail to deliver acceptable systems on time and within budget. Much of the failure could be avoided by managers pro-actively planning for and dealing with risk factors rather than waiting for problems to occur and then trying to react. Usually this reaction is too little and too late, because by the time the problem is fully recognized, the schedule has already slipped, a considerable investment has been made, and the product quality has suffered due to introduction of errors or work-arounds. Risk management has been proposed as a solution to provide insight into potential problem areas and to identify, address, and eliminate them before they derail the project.
In order to implement a successful risk management program, project managers need tools to help them. One possible tool is collecting measurements and using them as part of a metrics program. A good measurement program helps managers [1]:
Despite the acceptance of a metrics program as a project management tool, it is rarely discussed as a risk management tool. This seems an oversight. This paper will briefly discuss measurements and metrics as a project management tool, and look at several risk management methodologies and their enabling tools. Then it will propose that there are overlapping areas where a good metrics program can be used as a tool for portions of the risk management process. Selecting a potential risk area, this paper will give examples of ways in which metrics can help identify and track risk.
Much attention has been focused recently on software measurement. By collecting, analyzing and using measurement data, software project managers are more able to have visibility into and control over their projects. Measures and the metrics that can be derived from them allow the managers to determine the status of their programs, track the project's progress against the plan, measure the quality of the product being developed, and become aware of potential problems. Using this data can allow managers to have objective information which helps them make the daily decisions necessary to run their projects. Collecting this data also helps them by providing the basis for possibly improved estimates for future projects.
Most discussions on starting metrics programs stress that the measures chosen to be collected should be simple, and if possible be those already being collected for other uses. There are two reasons for this. First, managers and developers can balk at initiating a metrics program because they feel that measurement collection is time consuming, people-intensive, and expensive and distracts workers from their primary task of developing software. Collecting measures can add anywhere from five to ten percent to the development cost of a program [2], so if a program can reuse data that is already being collected, e.g., timecard or financial data, the process is less expensive and more liable to be accepted. Secondly, collecting low level, simple data allows managers to use it in a variety of different analyses, allowing for more flexibility. It also allows managers to combine different measures to obtain a better metric which may tell them more than any measurement alone.
Project managers need to select a measurement and metrics set that is appropriate for their project, and not pick a set and then try to apply their project to that set. There are many measures and derived metrics that can be proposed, but it should be the program management and technical issues and objectives that drive the measurement requirements. Whatever set is chosen, it should be easy to collect and analyze, cover all phases of the life cycle, give management the desired insight into the project, and deal with specific, defined issues or attributes of the program.
There are two perspectives, feasibility and performance, which should be considered during measurement analysis. Feasibility is more static, and concerns dealing with the accuracy and realism of plans, estimates, or assumptions associated with an issue. Performance is more dynamic, dealing with the adherence to those plans, estimates and assumptions[1].
There are many metric sets proposed throughout DoD and in Software Engineering literature, and following is one very simple set of "core" attributes which drive the list of measures to collect[3]:
Most metric sets deal with a variation of these attributes and are chosen to help project managers gain insight into their product (size, software quality, rework), process (rework, software quality) and project (effort, schedule).
Software development risk has been defined as the exposure to one or more of four types of risk [4]:
Risk management has been defined as the practice of controlling risks that have the potential for causing unwanted program effects [5]. This control is an entire development life cycle activity, starting with planning for risk at the earliest stages of the project and continuing with monitoring and alleviating risk though the support stage. Several risk management methodologies have recently been offered in the literature. The following is an overview of several approaches, with at brief look at the tools that have been proposed to aid those processes.
Barry Boehm is a pioneer in software risk management, developing his risk management methodology in conjunction with his risk driven spiral development model [6] [7]. Boehm offers a six step risk management process, composed of two main steps, each divided into three sub-steps:
Boehm discusses sample tools for use at each of his steps, ranging from checklists to cost models to cost-benefit analysis. No mention is made of any standard project management metrics as a potential tool for aiding risk managers.
The Rand Corporation has developed an excellent "Guide for the Management of Expert Systems Development" using Boehm's risk driven spiral model [8]. In this guide, expert system development is evolutionary, taking place through six phases: initiation, concept, definition/design, development, deployment, and post-deployment. For each phase, the guide discusses the risk containment activities, but no tools are recommended.
The Software Engineering Institute (SEI) has also developed a risk management guidebook for software acquisition managers, with steps very similar to Boehm's [9]. In Appendix C of the guidebook, there is a list of recommended tools for each step in risk management. Under "Planning Methods and Tools," Goal-question-measure, which is used to define a set of measures, is listed as a method or tool, and under "Tracking Approaches" is the use of spreadsheets or graphs to track those metrics selected.
Richard Fairley offers a seven step risk management process based on his work identifying and overcoming risk factors on software development projects with various organizations [10]. His process starts much like Boehm's, but unlike Boehm's Fairley's process seems to assume that some risk items will overtake the project and require remedial action. The steps of his process are:
Outside of assorted plans and his mathematical model for determining risk probabilities and effects, Fairley discusses no tools or tool methodology.
In the latest update of the DoD series 5000 acquisition documents, DoD required risk management in all defense programs [5]. An example of one such risk management process is that of the F/A-18E/F [11], which contains the following four steps performed in a continuous feed-back loop:
Under risk identification, this approach asks, "What causes a risk to be surfaced?" and then suggests a set of tools including "negative trends or forecasts" along with a set of metrics.
Another management process, based on the principles of Dr. Robert Charette, is that of Rockwell [12], which is made up of five steps:
This methodology is tool-based and lists many tools for possible use for each of these five steps, but none is related to common project management metrics.
The risk management processes surveyed above have several steps in common. All start with the identification of risk items, followed by some method of weighting and prioritizing those risks, moving to developing strategies to mitigate risks when and if they occur, and ending with some way of planning for and tracking or monitoring risks. Looking at the common steps through the processes, and looking at the applications of a metrics program, it seems there are two places where basic project management metrics can be applied: identifying risk, and planning for and tracking some risk items.
Furthermore, there are two ways in which metrics can help with risk identification. Using feasibility analysis, measures can be used at the initial risk identification step to help managers create a risk list. Since risk identification should be an ongoing task that happens throughout the life cycle, using performance analysis and looking at negative trends can alert managers to a risk area that might not have been previously identified.
In order to show this connection between metrics and risk management, we need to select an example from common software development risks while keeping in mind the initial set of "core" attributes listed above in the metrics discussion.
Boehm's list of the top ten risk items based on a survey of several experienced project managers are [6]:
In the "Guide for the Management of Expert Systems Development" a sample set of ten prioritized risk items is given for each of the six phases of development. For each phase the top priority risk is personnel shortfall, the same as Boehm's number one risk item.
In "Toward an Assessment of Software Development Risk," the authors attempt to develop a conceptual definition of and an initial measure for the construct of software development risk. As part of their study, they reviewed several other studies that had looked at and identified risk items that threatened project success. As part of this study, they found a strong degree of resemblance between what many authors had labeled "risk factors" and what others called "uncertainty factors" in IS. They looked at these "risk factors" and "uncertainty factors," grouped them according to their shared meanings, and identified their underlying concept. They came up with eighteen underlying concepts of which nine, or one-half, are personnel related [13]:
Looking at Boehm's list, the Rand work and this study, it seems that if one is to pick a risk item that is prevalent across all, personnel shortfalls is the one. In addition, since software development is such a labor intensive enterprise, personnel shortfalls can have a major impact. Staff hours are the primary means of planning and tracking human resources assigned to various tasks and activities. Tracking staff hours allows for accurate scheduling and resource allocation and allows managers to assess the impact of personnel changes on cost and schedule [14]. Based on these reasons, this paper uses personnel issues as its example.
One simple measure that can be collected is staff level, sometimes called personnel, which counts the total number of software personnel available for a project. At initial risk identification, this measure can be used in a feasibility analysis and compared against the estimated staffing level. The results can give managers an indication of whether they will have enough personnel for a project or whether they will have to start looking for new team members.
However, this measure by itself will probably not give sufficient insight to help identify all personnel shortfall risks. Two further refinements of this measure are software development staff profile and software development personnel qualification, sometimes also called staff experience [15].
Software development staff profile is composed of several components which managers can measure and assess during the risk identification period:
Another measure is staff experience, or software personnel qualification, which deals with individual team members' proficiencies. Referring back to the list of underlying risk factors developed by the authors in "Toward an Assessment of Software Development Risk," team expertise is listed in four different categories as potential risk factors.
Staff experience or qualification level can refer to several different things:
All of this measurement data is available to managers who are trying to assess whether their team has sufficient experience to complete the planned project. None of this experience necessarily guarantees capability, but these types of measurements give managers a tool to determine whether their proposed team members can perform the tasks required of them.
Since risk identification is an ongoing process, measures like the above can periodically be reviewed if other indicators point to staffing problems. Managers must be careful to include any experience obtained on the current project is an interim analysis is done.
Once managers have built their teams and the project has begun, their metrics programs allow them to track the progress of the project, product and process. The metrics programs also offer insight into those areas that were identified as potential risk items in the early planning stage. Once they have begun to collect data, managers can begin to use performance analysis and look for trends to indicate whether or not a risk item is under control, as well as to indicate that a new, previously unidentified item may be becoming a risk.
The following is an example that discusses tracking the two personnel issues discussed above, staff level and staff experience [16][17]. Figure 1, to which the discussion refers, is on the following page.
Assume that personnel shortfalls were originally identified as a risk item, and that the project manager is closely tracking his personnel. The project manager should have plotted the planned staffing profiles for the total staff and for the experienced staff at the beginning of the contract. As time passes, some deviation from the curve is expected, but too great a deviation is cause for alarm. A program that does not have enough experienced personnel or that tries to bring too many into the project toward the end of the schedule is a project that is at risk. When looking at the shape of the planned software staff curve, it should grow through the requirements and design phases, peak in code and early test, and begin to fall as acceptance and integration tests are completed. The profile of the experienced staff should be high in the beginning of the project, decrease slightly during coding and increase again during test. The ratio of experienced personnel should be near to 3:1, but never exceed 6:1.
As discussed above, the staff level refers to the ability of the developer to maintain a sufficient level of staff to complete the project timely. In addition to tracking total staff, tracking experienced staff is also important as they are crucial to maintaining schedule and product quality. Finally tracking staff losses is important because staff turnover can impact the stability of the work force. Even though a team member leaves and is quickly replaced, there is usually an impact due to the earning curve while that new person is trained and acclimates to the existing team.
This sample chart is made up of only three measures, total staff level, experienced staff level, and unexpected staff losses. Nevertheless, it shows several things to a manager. Initially, the total number of personnel on the project was lagging behind the estimate, but the number of experienced personnel working on it was higher than planned. This could mean that there were problems getting enough personnel to work on the project at first and the shortage was being covered by a higher than planned number of experienced staff. It could also mean that the schedules can be maintained and the project is on track, but it may be at the expense of cost, since experienced personnel are usually more expensive. The manager should continue to monitor this.
The number of unplanned losses is nominal and does not seem to indicate any problems or risks. However, the number of experienced personnel is beginning to fall faster than the unplanned turnover rate. This may be a risk if the project is starting into the testing phase. Again, the manager may want to look at other measures or begin to track this item more closely.
There are variations of this metric that can also be used to give the manager further insights. These include reporting staffing separately for each development task, e.g., QA, CM, or testing; and reporting staffing separately for special skills, e.g., Ada, client-server, or database development.
Figure 1. Software Personnel Indicator
Normally, understaffing as seen in Figure 1 indicates a possible schedule slippage that must be further tracked. The manager would do well to also look at the schedule and other indicators to assess the impact. If the project does continue to slip due to a personnel shortage, adding new personnel is not necessarily the answer, as this may add further delays due to the learning curve [18]. If the turnover rate becomes too high, this could also become a major risk due to lack of continuity, impairing project knowledge and eroding the knowledge-base.
Software project managers need to manage risk and use every tool available to them for this management. If they can use a tool that is already being used on their project for other purposes, they save themselves time and money. Most managers use some form of a metrics program to track their project for cost, schedule, effort, and quality. Many of the measures used to help them with this project management can also serve additional use in identifying and tracking risk. This paper used a common software risk, personnel shortfall, to show how managers can use measures and metrics to help identify and track risk items. This technique can be applied to other common risk items, such as requirements changes and unrealistic schedules and budgets, to help managers have visibility into and control over their overall projects in addition to identifying and monitoring their risk items.