ESSI Process Improvement Experiment (PIE)

27719 - PIOJAVA

Design process improvement aspects of Object Oriented JAVA in multi-tasking, multi-media, communication intensive, real-time mobile/portable systems

 

 

 

 

 

 

Experimental Plan

 

 

 

 

 

Version 1.0

23rd June 1999

Contents

1. Experiment Overview *

2. Experiment Design *

2.1 Product Metrics *

2.2 Process Metrics *

2.3 Data Analysis Techniques *

2.4 Experimental Constraints *

3. Metrics Collection *

3.1 Tools *

3.2 Choice of Metrics *

3.3 Object-oriented Metrics *

4. Relationship to other published results *

5. References *

Appendix 1 Programs for Reference Data Set *

Appendix 2 Metrics from the Power Software Krakatau Package *

PIOJava – 27719
Experimental Plan

  1. Experiment Overview
  2. The experimental process involved in this project largely follows the model identified within [Fenton and Pfleeger 1996], involving the following stages:

      • Conception
      • Design
      • Preparation
      • Execution
      • Analysis, and
      • Dissemination.

    The concept is to identify the impact of introducing object oriented development techniques within an environment producing multi-tasking, multi-media real-time systems. This has an impact in two dimensions: the 'quality' of the product is expected to change with the introduction of object orientation. Secondly the productivity of the staff will change, which will have a longer-term impact on their cost effectiveness. This requires that the staff involved in the development must experience a paradigm shift, which includes assimilating object-oriented concepts and learning the new languages and tools used in the software development. The experimental process is therefore designed to assess the impact in these two dimensions.

    The first issue relating to the product will need to be assessed by comparison of the "quality" of a software system developed using the new paradigms with software within an equivalent application domain (real-time and embedded systems). A set of ‘product’ metrics is therefore relevant in this case.

    The second issue, namely assimilation of object orientation, will need to be addressed by monitoring the behaviour of staff over a period of time, while the new concepts are learnt, are applied, and as experience is gained. This implies a temporal dimension to the gathering of metrics so that the behaviour of individuals and teams of developers can be monitored and assessed. This also implies that metrics appropriate for ‘process’ evaluation are used.

    This report is primarily concerned with the Experiment Design phase of the metrics programme and addresses both the product and process metrics and their collection.

     

  3. Experiment Design
  4. The experimental monitoring is to take a period of approximately 18 months in total. During this time, a new baseline product for the company is being designed and implemented. Metrics will be collected at all stages of this development. This has led to an experimental structure, as illustrated in the following figure:

     




    Figure 1 Experimental Structure

    This presents a number of opportunities for evaluating the impact of the paradigm change from both a product and process perspective, as metrics can be gathered for the baseline product that can be compared with a reference data set. In addition, changes to the baseline product can be observed through the measures gathered at different points in time during the product development.

    It is proposed that metrics should be gathered that contribute to an understanding of four important software engineering characteristics of the software. These are:

      • Structural. This is sometimes referred to as Structural Complexity. Although the primary traditional use of such metrics is in cost estimation, such as within Putnam's Model [Putnam 1978], COCOMO [Boehm 1981], Jensen's Model [Jensen 1984] and COPMO [Conte 1986], there is also a quality dimension to this, as brought out in [Pfleeger 1991].
      • Module Complexity. This is generally accepted as a major factor in software quality, as it affects the testability and overall manageability of the software components. In contrast with the structural complexity, which is primarily concerned with the external relationships of components, the module complexity focuses on the internal characteristics of the software.

    The Cyclomatic Complexity metric proposed by McCabe [McCabe 1976] is the de-facto standard, and is one of the few metrics for which accepted norms are published [< 10 is considered good].

      • Cohesion. This property is concerned with the functional relatedness of software units. It is acknowledged that software may have increasing strengths of cohesiveness [seven categories have been identified in [Constantine and Yourdon 1979] (and reproduced in many software engineering texts) namely increasing from coincidental to logical, temporal, procedural, communicational, sequential and finally functional]. It is generally accepted that high cohesiveness is a desirable property.
      • Coupling. This is a measure of interconnection amongst modules. Loose coupling is generally regarded as a desirable property. Again subcategories of coupling have been identified, covering content, common data, control, and stamp/data coupling. These are described in [Pfleeger 1991].

    The detailed metrics that relate to these four categories will be described in section 3.

    1. Product Metrics
    2. A Reference Set of Data is needed for comparison of the products under the new and old paradigms. Software with similar functionality has been identified as this reference, including both embedded system software and software running on traditional workstations. The software components to be used as the reference data are identified in Appendix 1.

      Although the main role of the reference data set is to provide a benchmark for comparison, an initial analysis of the software characteristics will be performed in order to identify whether the different categories of software (embedded or workstation based) exhibit distinctive characteristics. It is a hypothesis, drawn from the philosophy that underpins languages and support environments such as for Java, that such distinctions should no longer exist within modern software system environments. This dimension will be investigated as part of the PIOJava experiment.

      This software has been implemented in a number of ways, in a mixture of formal and informal approaches. The approaches taken have been found, pragmatically, to be appropriate to the mixture of applications and interfaces being used. There is little data concerned with the design process of the reference software, as it has evolved over many years. Measurement of appropriate metrics for the design, while not at all impossible, is difficult to carry out consistently on non-computer-system-based data. Techniques used on pictorially presented or semi-formal descriptions can be time-consuming, and not favoured for continuous monitoring in a commercially based environment. Information may be gathered on a "one-off" snapshot basis, for comparison with metrics from the base-line project, but it is not thought viable to re-evaluate metrics when changes to the reference system are being made. Consequently the main metrics to be derived from the reference data relate to the implementation of the software components, usually in either C or C++.

      The new baseline product is being generated using UML as the design notation and Java as the implementation language. From the outset, metrics will be collected for both the design and implementation processes in as automated a way as possible. Transparency of data collection is important to ensure that the measurement process does not interfere with the development activity.

    3. Process Metrics
    4. As probably the most significant impact of the paradigm shift is on the developers themselves, staff training will play an increasingly important role as product development takes place. This provides an opportunity to monitor the impact on the developers of experience gained at different stages. That is, one would expect observable differences in 'quality' in components developed immediately after initial training to those produced after 12 months experience.

      The development of the baseline product will be performed under a strict regime of a configuration management tool, which will be used for all aspects of the new product (documentation, design, implementation, etc.). As high integrity is a major factor, the traceability of the design and implementation is of high priority from a commercial and marketing point of view. This can only be achieved through a controlled and automated regime.

      The benefit of this approach is that it provides, from the outset, an opportunity for configuring and extending the chosen suite of configuration management software, in this case PERFORCE, to enable the collection of metrics on the development process. The primary extensions proposed are to date-stamp all system developments, complete with a (constrained) rationale of the nature of all changes and an identification of the development staff involved. This approach leads to a number of interesting analyses:

        • A standard trend analysis can be undertaken to monitor the impact on the development process of experience with object-oriented paradigms. This will largely use the software characteristics that can be derived from the Product Metrics, and the change in these characteristics during the product development life cycle.
        • Other object oriented related metrics, such as re-use and inheritance, will be derived from the nature of the developments captured by the configuration management software. [This latter aspect is secondary to the main experimental process and it is recognised that the data collected may not yield tangible results within the confines of the base-line project development].
    5. Data Analysis Techniques
    6. Formal statistical analysis methods will be applied to the data sets. For comparison of groups of data, Student's "t" statistic will be applied [Hoel 1971]. For more complex comparisons, the "Planned Comparison" technique will be used [Lindman 1974]. To evaluate time related effects, Trend Analysis will be applied [Lindman 1974].

      All these statistics provide significance measures of any hypothesised relationships and will therefore provide a good indication of the validity of any conclusions.

    7. Experimental Constraints

    Within any development team, there is inevitably considerable variability in the skill and experience of the team members. As the team itself is quite small, there is the potential for distortion of the experimental results, if only due to the size of the sample of designers and implementors being monitored. This could call into question any results that emerge, particularly as to whether the outcomes could transfer to a different development environment. It was concluded that this was a fact of life that could not be removed in the experiment design, and that the experiment was still of value, given the constraints. However, the effects of these variabilities can be evaluated by comparison across team members, which is in itself a useful measure. Where possible, allocation of tasks should be randomised (assuming no a-priori knowledge of task difficulty).

    The configuration management system will be set up to allow the monitoring of individuals' activities. In this respect, the company culture is very important. The development team is close-knit and culpability is not an issue. This aspect has been raised in many other metrics programmes, where the emphasis is on quality and company improvement rather than on individual's performance.

    Alternative possibilities for taking measurements have been explored, based on extent of measurement and frequency. It was decided that metric measurements are to be taken on a regular snapshot basis, expected to be every two weeks, with corresponding archiving of relevant source and design files. This represents a compromise between the diversion of overmuch effort to outcomes, which were essentially non-productive, and the need to gain information at a suitably granular level.

    The choice of frequency for archiving is not a major consideration in experiment design as the availability of archived source and design files from which class and method metrics can be derived, as well as an ongoing time-stamped log from the configuration management system allows post-processing at any multiple of the snapshot frequency.

  5. Metrics Collection
    1. Tools
    2. The issue has been addressed of whether the requirements of the programme could be met using commercially available metrics packages, or whether specially written metric software needed to be developed. A number of packages have been identified and evaluated, capable of functioning with both 'classical' implementation languages and the object oriented languages projected for the baseline product.

      The reference project has been used as the test vehicle for these tools. Whilst no tools are completely satisfactory, and in some cases the results that they generated could best be described as bizarre, it has been decided that the use of commercially available packages is the best return on investment, and that special software, particularly providing an interface into the configuration management suite, would only be developed as needed.

      The metrics package identified as best meeting the needs of this project is KRAKATAU produced by Powersoft [Powersoft 1999].

    3. Choice of Metrics
    4. The metrics considered to be suitable for assessing the quality of the software components are functionally identified in Appendix 2. Those that relate to methods or procedures have been chosen as they can be interpreted in similar ways for both Object-Oriented and Non Object-Oriented software development. They are therefore appropriate for comparison of the base-line project with the reference data software.

      The relevance of the metrics to the four main classifications, namely structural complexity, module complexity, cohesion and coupling are as follows:

      Structural complexity:

      Lines Of Code - although a very primitive measure, it is fundamental to many software engineering models.

      Executable Statements - as above, but focussing on method attributes rather than module characteristics.

      The Halstead Software Science measures [Halstead 1977] are a classical form of metrics relating to structural characteristics, which have application beyond the simple cost estimation of software systems.

      Module Complexity:

      V(G) and OC, the cyclomatic and operational complexity. These are complementary as OC examines complexity of expressions rather than the control structure and number of decision points within the code.

      Control statements within the method or function.

      Branch statements. These are of particular interest within the type of software being developed in the base-line project. Although GOTO's and abnormal exits are regarded as bad practice in general, the servicing of interrupt and exception conditions, with time critical requirements, may require low level exception handling software to undertake actions of this kind.

      Nesting of control constructs, such as within IF … THEN … ELSE statements places an additional cognitive loading on software developers, particular upon those who may be required to undertake software maintenance using the source code.

      Cohesion:

      A measure of particular interest is NION, the number of Input and Output Nodes. Although common practice is One In, One Out, a higher number could be indicative of a functional Fan-out within the module, which might be better addressed within calling procedures and methods.

      The number of Calls is also indicative of a high fan-out to subordinate functions and methods.

      Coupling:

      The focus is specifically on the data/stamp coupling and control coupling, identified earlier. Consequently the Calls and Number of Parameters are measures of specific interest within a tightly encapsulated environment.

      Some of the metrics identified within the appendix focus specifically on the characteristics of classes. These are of particular benefit in being able to assess the adoption of the object-oriented paradigm, and would be used to compare the change in developer performance as the development of the base-line project progresses.

    5. Object-oriented Metrics

    A set of metrics specifically concerned with object-oriented software has also been defined (see Appendix 2). These are based on the metrics of Chidamber and Kemerer. The use of these metrics will provide additional information about the four main areas of interest for the baseline product. Although comparable measures are not available within the reference data set, as the very nature of the measures relate to object oriented characteristics, the collection of this data has use for the trend analysis when comparing the software generated at different stages of the baseline product development.

    Structural complexity:

    CSA measures class size by counting the number of attributes of a class. It is therefore a basic size measure that may be used to indicate whether subclasses should be generated.

    NOCC, the number of child classes measures the number of classes that inherit from a particular class. As this value increases, it is indicative that the particular class is being re-used well. However, it is important to consider that where there are many child classes, they may not all be genuinely appropriate members of the parent class, in which case, the abstraction of the class may be poor. It is also true that as NOCC increases, the amount of testing required for each child class will increase.

    DIT identifies how deeply a class resides in the class inheritance tree. High values imply that the class is quite specialised. It may also be indicative of method extension and overriding, so as a consequence, the behaviour may be more complex to predict.

    NOAC measures the number of operations added by a class. As this value becomes larger for a class, the functionality of that class becomes increasingly distinct from that of the parent classes. In this case, it should be considered whether this class should genuinely be inherited from the parent or if it could be broken down into several smaller subclasses.

    Module Complexity:

    CSO measures class size by counting the number of operations (methods) in a class. If a class has a high number of methods, it may be wise to consider whether it would be appropriate to divide the class into subclasses.

    WMC is the sum of all method complexities within a class. Although primarily considered as a complexity measure, if high values are recorded due to the number of methods rather than the complexity of each, WMC is also indicative of structural complexity. Large numbers of methods with high complexity is also indicative of cohesiveness.

    Cohesion:

    NOCC counts the number of child classes that inherit from a particular class. As this value increases, it is obvious that the particular class is being re-used well. However, it is also important to consider that where there are many child classes, they may not all be genuinely appropriate members of the parent class. In this case, the abstraction of the class may be poor. It is also true that as NOCC increases, the amount of testing required for each child class will increase.

    LOCM measures the percentage of methods that do not access a specific attribute averaged over all attributes in the class and is consequently a measure of Lack of Cohesion of Methods. A low percentage is indicative of high coupling between methods that may also indicate high testing effort and potentially low reusability.

    NOOC measures the Number of inherited Operations Overrides by a Class. High values for NOOC tend to indicate design problems; subclasses should generally add to and extend the functionality of the parent classes rather than overriding them.

    Coupling:

    PPP is the percentage of Package, Public and Protected members in a Class

    Members that have package level protection are visible to other classes in the same package. Public members are available to classes in all packages. Protected members are available to subclasses.

    Extensive use of such members violates the encapsulation principle. This metric allows the proportion of ‘vulnerable’ members in a class to be seen. A large proportion of such members means that the class has high potential to be affected by external classes and means that increased effort will be needed to test such a class thoroughly.

    CBO is a direct measure of coupling, being the number of classes to which a particular class refers.

    PA (Public Accessors) measures class coupling as the number of other classes that access / use this class.

  6. Relationship to other published results
  7. The metrics resulting from this programme will be compared with data from other available sources. Published results exist for non-object oriented software, although in general these have been qualitative in nature rather than quantitative. During the life-time of the project, the suppliers of metric measurement tools will be contacted with a view to obtaining typical measurements derived from their tools for comparison purposes.

    Similar issues have arisen with the publication of qualitative measures for object-oriented software development. However, there are some few quantitative results published, for example the work of Lorenz undertaken within IBM [Lorenz 1993, Lorenz and Kidd 1994]. The results of these may be of limited use within this project as they have tended to focus on alternative object-oriented languages, such as Smalltalk.

    Some C++ measures are reported in Lorenz and Kidd, and more recent papers are starting to appear including metrics relating to Java [Kaczmarek and Kucharski 1999]. Other relevant sources will be used as they become available.

  8. References
  9. Boehm BW, Software Engineering Economics, Prentice-Hall 1981.

    Chidamber S R and Kemerer C F, A Metrics Suite for Object-Oriented Design, IEEE Transactions on Software Engineering Vol 20 No 6 1994.

    Constantine L L and Yourdon E, Structured Design, Prentice-Hall, 1979.

    Conte S D, Dunsmore H E and Shen V Y, Software Engineering Metrics and Models, Benjamin-Cummings 1986.

    Fenton N E and Pfleeger S L, Software Metrics: A Rigorous and Practical Approach, Thompson Computer Press, 1996.

    Halstead M, Elements of Software Science, Elsevier North Holland, 1977

    Hoel P G, Introduction to Mathematical Statistics, Wiley 1971.

    Jensen R W, A comparison of the Jensen and COCOMO schedule and cost estimation models, Proceedings International Society of Parametric Analysis, 1984.

    Kaczmarek J and Kucharski M, Application of Object-Oriented Metrics for Java Programs, ESCOM (European Software COntrol and Metrics) and SCOPE (Software Certification PrOgramme in Europe), 1999.

    Lindman H R, Analysis of Variance in Complex Experimental Designs, Freeman 1974.

    Lorenz M, Object-Oriented Software Development: A Practical Guide, Prentice-Hall 1993.

    Lorenz M and Kidd J, Object-Oriented Software Metrics, Prentice-Hall 1994.

    McCabe T, A Software Complexity Measure, IEEE Transactions on Software Engineering Vol 2, No 12, 1976.

    Pfleeger S L, Software Engineering: The production of quality Software, Macmillan 1991.

    Powersoft, Krakatau Software Metrics, www.powersoftware.com, 1999.

    Pressman R S, Software Engineering: A Practitioner's Approach, McGraw Hill 1987.

    Putnam L H, A General Empirical Solution to the Macro Software Sizing and Estimating Problem, IEEE Transactions on Software Engineering, Vol 4, No 4, 1978.

     

     

    Appendix 1 Programs for Reference Data Set

     

    OPERATING SYSTEM

    LANGUAGE

    NUMBER OF MODULES

    EMBEDDED SOFTWARE    

    IIU

    RTOS

    C

    20

    MIP

    RTOS

    C

    10

    CIU-CHANNEL

    RTOS

    C

    6

    CIU-CONTROL

    RTOS

    C

    8

    WORKSTATION SOFTWARE    

    DGPSSERVER

    Windows NT

    C++

    17

    BASESTAT

    DOS

    C

    98

    This range of program modules covers both the components of the embedded systems, and the software running under standard operating systems as part of the Command and Control system.

     

    Appendix 2 Metrics from the Power Software Krakatau Package

    LOC – Method Lines of Code

    LOC is a primitive metric to measure the size of a method.

    EXEC – Number of Executable Statements

    This is a measure of the number of executable statements in a method or function.

    NEST – Maximum Number of Levels

    Cognitive sciences have shown that groups that contain more than seven pieces of information are increasingly harder for people to understand in problem solving. To measure this, the number of IF…THEN or IF…THEN…ELSEs in a nest are counted. Logical units with a large number of nested levels may need implementation simplification and process improvement.

    V(G) – Cyclomatic Complexity

    Cyclomatic Complexity measures the number of possible paths through an algorithm by counting the number of distinct regions on a flowgraph. This represents the cognitive complexity of the method.

    OC – Operational Complexity

    This metric assigns weights to operations which can occur in expressions. The values of the weights for all the expressions in a method are summed to provide a value for OC. This is complementary to V(G) since it looks at the complexity of the expressions which are being evaluated rather than the number of decision points in the method.

    CONTROL – Number of Control Statement

    This is a measure of the number of control statements (selection, iteration) in a method or function.

    BRANCH – Number of Branching Nodes

    Higher values indicate possible use of GOTOs and / or abnormal exits from control structures such as loops. This is an indicator of unstructured design and increases the testing difficulty. However, it may also indicate exception handling conditions.

    NION – Number of Input / Output Nodes

    NION is a measure of the number of input / output nodes in a given method / function. Programming practices today state that there should be one way into a module and one way out. This measures the difficulty of testing the control logic of software. Logical units with a large number of input / output nodes may need implementation simplification and process improvement.

    CALLS – Number of Calls

    CALLS is the number of calls from a method or function to subordinate logical units (methods or functions). This is a measure of the degree of FAN-OUT.

    NP - Number of Parameters

    NP is simply a measure of the number of parameters that a method accepts. High values for NP can mean that a metric will require extensive testing (since the range of possible inputs may be greater). As a rule of thumb, methods with many parameters also tend to be more specialised and so are less likely to be reusable.

     

    The following metrics relate directly to Halstead's Software Science measures [Halstead 1977], and will be collected, although their relevance is primarily for ‘added value’

    N1 – Total Number of Operators

    N2 – Total Number of Operands

    n1 – Number of Unique Operators

    n2 – Number of Unique Operands

    These are input to Halsteads Software Science metrics.

    N – Halstead Program Length

    N is calculated as N1 + N2 and is a general measure of the program length for a given file.

    n – Halstead Program Vocabulary

    n is calculated as n1 + n2 and is a measure of the number of unique operands and operators used in a particular file. This can provide an impression of comprehensibility and complexity for that file.

    V – Halstead Program Volume

    V is the program volume metric from the Halstead Software Science metrics. It is calculated as V=N*(LOG2n).

     

    The following metrics are also collected by the Krakatau software, and provide a further dimension to interpreting the object oriented properties of the base-line project.

    CSA – Class Size in Attributes

    CSA measures class size by counting the number of attributes of a class (not including inherited attributes).

    CSO – Class Size in Operations

    CSO measures class size by counting the number of operations (methods) in a class (not including inherited methods).

    NOCC – Number of Child Classes

    This metric counts the number of classes which inherit from a particular class.

    PPP – Percentage of Package, Public and Protected members in a Class

    Members which have package level protection are visible to other classes in the same package. Public members are available to classes in all packages and protected members are available to subclasses.

    DIT – Depth in Class Inheritance Tree

    This metric reports how deeply a class resides in the class inheritance tree.

    CBO – Coupling between Object Classes

    The value for CBO is the number of classes to which a particular class refers. References can be uses of classes as member types, parameter types, method local variable types or casts.

    LOCM – Lack of Cohesion of Methods

    LOCM measures the percentage of methods that do not access a specific attribute averaged over all attributes in the class.

    NOOC – Number of Operations Overridden by a Class

    NOOC measures the number of inherited operations which a class overrides. High values for NOOC tend to indicate design problems; subclasses should generally add to and extend the functionality of the parent classes rather than overriding them.

    NOAC – Number of Operations Added by a Class

    This metric measures the number of operations added by a class.

    WMC – Weighted Methods per Class

    WMC is a count of the methods in a class (weighted according to complexity).

    PA – Public Accessors

    This measures class coupling as the number of other classes which access / use this class.