The first sentence of the ISO 29119 Introduction states:
The purpose of the ISO/IEC/IEEE 29119 series of software testing standards is to define an internationally agreed set of standards for software testing that can be used by any organization when performing any form of software testing.
Don’t think I need to emphasize the “any”s, but go ahead and bold those in your mind if it helps.
Like my other blog posts, this one far exceeds widely-held standards for blog post length, using the best practice metric of word count. So, to summarize:
- I have some experience with creating ISO Standards in another field. My experience was that it is a difficult and highly political process.
- Of course our community objects to standards, because they cause crappy testing, here defined as not providing a meaningful, accurate account of quality risks of a project. ISO 29119-1 says it “is informative and no conformance with it is required”.
- To ISO 29119-1, “Testing” is a “set of activities conducted to facilitate discovery and/or evaluation of properties of one or more test items”, noted to include “planning, preparation, execution, reporting, and management activities, insofar as they are directed towards testing.” It seems that the term “test item” (a system, a software item, a requirements document, a design specification, a user guide) is deliberately constructed to suggest these items can all be tested.
- This particular document (ISO 29119-1) has a lot of definitions and Naming of Documents. The SDLC model referenced for context is Waterfall-y (Not Iterative).
- Exploratory Testing, or as it has recently come to be called, “Testing”, is found buried under “Experience-Based Testing”. Further discussion is promised in 29119-2.
- “Risk-based” is referenced multiple times, with strong language around claims of wide adoption. Given that choosing what to test should always(?) involve evaluating risk, this is hard to argue with: “…truisms…to those new to software testing they are a useful starting point“, indeed.
- Overall? Several years behind the state of the art, overly focused on formality and control, and barely concerned with technique.
Context and Perspective
I spent a little more than two years in a Working Group like the one that produced ISO 29119. I worked on trusted archiving of digital data, specifically document and records management. Most of the stuff I wrote concerned storage requirements for permanent, reliable, auditable, and discoverable electronic storage of documents. Others worked on PDF standards – one great example of how and where standards are helpful. Of course, a certain vendor associated with PDF tools drove a lot of very specific format standards, so vendor capture is a real concern as well. The last point: standards get written by people who keep showing up.
It was occasionally interesting work to research and write about the reliability and trustworthiness of the different types of proprietary electronic WORM storage at a time when write-once optical media was generally being replaced by software. It was also very deliberative, highly political, entirely subject to who got to edit last, and all progress on publishing anything was blocked by one person’s two-handed, white-knuckled grip on every work item. They were very busy in their consulting business – having successfully marketed their committee position. It was with some relief that I resigned from the group, though there was some useful work being done there. Eventually, there could be guidance for how to properly store electronic documents and records so that data will be permanently preserved – a real problem that lacks clear guidance.
I believe standards can be very useful in some circumstances. A standard for what constitutes proper archival of scanned documents makes a lot of sense. Railways, communications protocols, food labeling, and a thousand more things are all good examples of areas where standards can be very valuable, even essential. I just don’t agree that software testing is one of them.
I strongly identify with the Context-Driven School of Testing (CDT), so it should be no surprise that my default position on any testing standard is against. I’ve said previously that my community could object to a standard without knowing its content, because a standard means that context follows process, whatever its prescriptions. By existing, it conflicts with the idea that context should inform method. As Fiona Charles says: “Where is the standard for writing software?” I’d add “Where are the standards for writing, editing, cooking, or painting?”
I came to my understanding of why making software is not manufacturing on my own, well before I found my way to CDT. To refine my problem statement for Sick Sigma, bug density metrics, standards, and anywhere else where someone is trying to force industrial quality processes and thinking on software: making software is not producing widgets. All software projects are one-offs created by developers, testers, and project staff of various skill levels and engagement, to solve different problems, using different tools and often shared components, based on variably incomplete and conflicting understandings of requirements. In a given context, at a given time, a team of (hopefully) smart and (always) flawed people work together to create something that barely works, and follow-up with as much bug-fixing as they are allotted time, money, and energy for. Requirements, conditions, and the skill and engagement of the people on the team are all moving targets.
Criticism aside, there is a large contingent of people who worked on these standards for many years. I respect their effort, and once someone decided there needs to be a software testing standard, a lot of work went into making these happen. There is some good work in here. There are other things I disagree with. I consider the people that worked on this standard colleagues, and there is no personal animus. We disagree about software testing, and this is reasonable professional debate.
I know that there are people who want predictability and certainty in planning projects. Sometimes the stakes are high enough (or feared to be high enough) that a detailed structure such as this process definition will comfort a stakeholder just by virtue of its specificity and number of control metrics. I also understand that some of the context in any situation is externally imposed, and not always available for debate.
I think that employing skilled and experienced testers who study context, asking them lots of questions and listening closely to the answers, and leaving broad latitude in implementation is a better strategy. I think that the approach to software testing I espouse can lead to better results than following a cookbook, but it is possible to screw up either (or any!) approach. A meal prepared by a skilled cook from the ingredients on hand is delicious, nutritious, and sustainable. I don’t eat McDonalds, and I don’t recommend anyone else does, either.
If cheap and fast are the objectives, you might make a different choice. There is enough room in the world for people to use whatever approach makes sense for them. The problem is that when a specific approach is published as an international standard, it is asserted to be the best way to test software, using prescription to trump skill. When we join projects that already have engineering managers (or even worse, test managers) invested in following standards, they are A). Almost certainly in dire need of our help, and B). Going to be hard to separate from the security blanket of Bestandardices. Our stakeholders and the craft of testing suffer as a result.
I will report on the standard from my perspective as a Context-Driven Tester. I intend to review the pieces of ISO 29119 separately at first, as they are a lot to digest. Also, these get long enough. Let’s start with Part 1, or ISO/IEC/IEEE 29119-1:2013 as it is formally known. I’ll try to extract points of interest, but when other parts (2-5) of the standard are explicitly referenced, I’ll save the discussion of those issues until we get to the appropriate part of the standard.
This is a lot of stuff to read, friends. I’ve been chipping away at this for weeks. I encourage you to do your own review if you have an interest.
As noted, the standard starts with a statement of purpose. Next, there is an acknowledgement of many contexts (domains, organizations, methodologies), but says that the standard is applicable in many contexts.
In the next part, words like “conformance” and “required” appear. There is no conformance possible with this section; 29119 sections 2-4 are standards “where conformance can be claimed”. There is something important about safety and liability to think about there.
In the next section, almost a hundred terms are defined. Someday, our community should consider creating a dictionary of terms we can debate from.
“Test case(s)” appears 30 times. The definition of test case includes preconditions (state?), input, and expected results. “Test item” is referenced here as something to be executed, while defined elsewhere as an object of testing (system, piece of software, requirements, document, user documentation). Test cases are noted as the lowest level of test input (cannot be nested) for a “test sub-process”.
Exploratory testing uses “spontaneously” where we would typically use “simultaneously” in front of designs and executes; perhaps this is a simple error. “Unscripting testing” is dynamic testing, or when the tester’s actions are not prescribed by written instructions in a test case. So this implies that they usually are?
“Test coverage” is a percentage to which test coverage items have been exercised by a test case or test cases. This seems difficult to calculate. Test coverage item is an attribute(s) derived from a test condition(s) with an unspecified test design technique.
Testing itself is a “set of activities conducted to facilitate discovery and/or evaluation of properties of one or more test items”, noted to include “planning, preparation, execution, reporting, and management activities, insofar as they are directed towards testing.” It seems that the term “test item” (which as defined here includes both software and documentation) is deliberately constructed to suggest these items can all be tested, and it is explicitly stated that it is not necessary to execute software to test. For example, a specification could be “tested” for correctness against requirements, and this would be considered a testing activity by this standard. It apparently would also be testing to discuss which team member will be responsible for reviewing the standard.
- “Oracle” is missing altogether; pass/fail criteria is the substituted term.
- Three separate terms are used to describe parts of equivalence partitioning. This is the only real testing skill described in this section. Later, fuzz testing and a few sampling techniques (for choosing test cases) are referenced, but not described.
- Black box testing is folded into “specification-based testing”.
- Documents are Named for Several Things, including Describing the Status of each Test Data Requirement and a Separate Document describing Test Environment Requirements (caps party inspired by ISO 29199-1)
Next is background about what testing is and what it is intended to accomplish. This is definitely a target-rich environment, with several points worthy of extended discussion. I’ve pulled out a couple that seem worth deconstructing. My comments in italics.
– Every defect is believed to be introduced by a human’s error or mistake, either in understanding requirements specification or creating software code.
I disagree with this point, as an incomplete understanding of requirements is the only state of discovery I’ve ever seen, meaning, the specification is never “complete”. This is natural and expected when one considers the limitations of communication between any two people. One of the strengths of iterative development is exposing these gaps and eventually fixing them (usually). Thankfully, the standard includes room for “undocumented understanding of the required behaviour”.
– The primary goals of testing are to provide information about the quality of the test item and residual risk. This risk is said to be related to how much the test item has been tested.
If all test items (by the standard’s definition) were equal, there might be something to that, but my experience is that different pieces of software/documents/etc vary greatly in complexity, risk, and in any other meaningful attribute that must be ignored in order to say one piece of software is equivalent to another. What does “how much” mean? Relative to what?
ISO 25010 is referenced for its eight quality characteristics, but the standard later details slightly different characteristics: Functional Suitability, Performance Efficiency, Compatibility, Usability, Reliability, Security, Maintainability, and Portability.
There are many mentions of “Dynamic testing”, but this is said to include executable test items, preparation, and follow-up. More detail is promised in 29119-2.
Testing is said to be a subset of verification and validation. Other standards are referenced: ISO/IEC 12207: software life cycle processes, and 1012-2012 – IEEE: Standard for System and Software Verification and Validation.
At some future date, I would like to figure out what the distinction being made here is; since this standard includes evaluation of specifications as “testing” along with a broad definition of software quality characteristics, it’s hard to imagine what’s left.
There is some discussion of testing in an Organizational and Project context. It is said that an organization might supplement the standard, though it “would adopt” the standard. It is then said that conformity with the organization’s processes is more typical than conforming directly to the standard. It’s still said that if an organization does not have “an appropriate set of processes”, they should apply this standard directly. The case is being made for standardization more than strict adoption of the standard. A welcome bit of guidance, though: “The experience of the industry is that no single test strategy, plan, method or process will work in all situations. Hence organizations and projects should tailor and refine the details of testing with reference to standards such as this.”
Many artifacts are Named and Capitalized, from Organizational Test Policies and Strategies to Project Test Plan, down to”Test Sub-process Plan”, with two examples given as System Test Plan and Performance Test Plan.
Some of these documents may be appropriate in some contexts, but I find heavy documentation to be one of the most undesirable characteristics of industrial testing, and a major contributor to why the Agile community seems to think most testing (besides automated checks) is a waste of time. Time spent writing documentation no one reads or updates is time better spent on gathering information.
There is a very complicated diagram describing the relationship between standards, test strategy, test processes, test policy, etc.
The most important point I see are that standards are put in the same box as Regulations and Laws – an especially toxic outcome of pushing adoption of standards such as these. I’ve written about this before, but it is hard enough getting people to use modern testing techniques without having to give the straw man of liability the illusion of a brain.
“Dynamic Test Processes” appears again – in a diagram saying that test design, test specification, and the “Test Environment Readiness Report” are all necessary to get to Test Execution. The term “Issue (s) Notice” is used post Test-Result, as a decision loop for whether or not to report issues; another pointer to 29119-2.
Software Life Cycle and Testing’s Place
The next part talks about the life cycle of software projects between conception and retirement. ISO/IEC 15288 is referenced as a source for life cycles, and ISO/IEC 25051 is mentioned for testing software written by another company. A Requirements…Design…Coding…Acceptance example is given, and then it is said that defining a development model is out of scope. Still, probably would have been better to use an iterative one.
Quality Assurance is described as a support process required to aid the SDLC:
a set of planned and systematic supporting processes and activities required to provide adequate confidence that a process or work product will fulfill established technical or quality requirements. This is achieved by the imposition of methods, standards, tools, and skills that are recognised as the appropriate practice for the context.
Actually, not that bad!
Measures should be collected during testing, as they can provide information about the quality of test processes and/or test items and the effectiveness of their application on each project.
Oh, there we go.
Testing information is to be provided to Project Management with completion reports and test measures. There is some talk about process improvement, and pushing process improvements across an organization.
This is introduced as “Testing is a sampling activity”. The need to identify product, project, and organizational risks, and then to tailor the test processes to address these risks is described in some detail. “Risk-based” is referenced multiple times, with strong language around claims of wide adaptation. Given that all test choices involves evaluating risk, this is hard to argue with.
There is a distinction made between choosing test cases for requirements coverage, and for risk, which is good. There is some talk about polling a wide group of stakeholders to develop a risk model, and most welcome is an introduction of context, using the Medical Device example of compliance.
Five annexes are included. Some quick tastes:
Annex 1 is about testing as a part of “Verification and Validation”, and presents the following model. The idea of Metrics outside of Testing is worth more contemplation.
Annex 2, Metrics: “In order to monitor and control testing and provide timely information to stakeholders it is necessary to effectively measure the test process…Thus all test efforts need to define and use metrics and provide measures in respect of both products and processes.”
Annex 3 has our first discussion of Iterative Development processes, comparing them to “Sequential” and something called “Evolutionary” that seems to be trying to split the difference.
In Annex 4, there is a discussion of “test sub-processes”, defined earlier in the standard as “test management and dynamic (and static) test processes used to perform a specific test level (e.g. system testing, acceptance testing) or test type (e.g. usability testing, performance testing) normally within the context
of an overall test process for a test project”.
This new term is used constantly throughout the standard, but it doesn’t seem to add a lot of value beyond discussing control of the overall test process. Examples given here include: Acceptance testing, Detailed design testing, Integration testing, Performance testing, Regression testing, Retesting, System testing, Component testing. There are some tables of each type, listing objectives, claims of detail processes, and technique (usually followed by “As Appropriate”.
My last piece of criticism here is that this supposes these are all distinct and separate activities,as opposed to overlapping and concurrent ones. Perhaps I don’t yet fully understand the usage here.
Annex 5 is about Testing Roles. It names a Strategist who establishes process and ensures conformance, Test Manager who manages to process, and a Tester who executes the processes. There is an interesting discussion about the independence between who writes the code and who tests it that unfortunately concludes with “The intention is to achieve as much independence between those who design the tests and those who produced the test item as possible, within the project’s constraints of time, budget, quality and risk.” Sounds like designing as large a communication gap as possible.
A bibliography lists many ISO and IEE standard documents, plus a ISTQB glossary of terms. Agile Testing (Crispin and Gregory) is listed, but no other books on testing that we would be familiar with are included, nor any contemporary sources.
We made it! It looks like we have the result of a committee, several years behind the state of the art, overly focused on formality and control, barely concerned with technique. So about as expected.
The other parts of the standard will be reviewed here soon, and by soon, I mean this year. I have not had any real progress on my attempt to formalize this work with AST, but I will continue to work on that, too.