Analysis of ISO 29119-2: Test Processes

This is the second post in a series following and analyzing the ISO 29119 standard. Most of the essential context references were covered in the first post, Analysis of ISO 29119-1. One thing that has changed since that first post is the AST Committee I proposed has been formalized. Watch for more from us soon!

So, what can we expect in Part 2 of the Standard?

ISO/IEC/IEEE 29119 supports dynamic testing, functional and non-functional testing, manual and automated testing, and scripted and unscripted testing.The processes defined in this series of international standards can be used in conjunction with any software development lifecycle model. Each process is defined…and covers the purpose, outcomes, activities, tasks and information items of each test process.

Can’t wait!

Please remember I am criticizing the standard (and the idea of a testing standard), not the people who worked on it. I believe that smart, experienced people attempted to lay out their view(s) of testing, hoping to help people test effectively. I think that in the right discussion about the many contexts software might be tested in, they might concede that no prescriptive standard can be relevant and useful in every context. In fact, some of them are already doing that. Whatever the shortcomings of 29119 (and there are plenty) it could never possibly satisfy its mission, even if it was a better standard than it actually is.

TL;DR

My best-practice, conform-ational approach is to summarize my primary conclusions at the top of my blog posts, sparing tens of readers the post’s full brilliance. Here are my “above the fold” takeaways from analyzing ISO 29119-2:

  • 29119 literally puts process (Part 2) before technique (promised in Part 4, still not published)
  • 29119 claims to be applicable to testing in *all* software development lifecycle models, despite heavy documentation and compliance burdens
  • 29119-2 has Conformance on page 1. To claim Conformance, there are 138 “Shalls” to conform to in this document. To claim “Tailored Conformance” without meeting every “Shall”, “justification shall be provided…whenever a process defined in…29119 is not followed”
  • Part 2’s vocabulary section has conflicts, revisions, and pointers to new terms relative to Part 1. This is not a “gotcha” – but is worth remembering when someone claims that with a test standard “At least there is a common vocabulary for testing”.
  • Conformance is driven by fear. Fear is the mind-killer.
  • Some of the “shalls” are highly specific. Some are vague and hard to understand. Some, through reference, contain multitudes. Some are nonsense.
  • The standard is not detailed enough to be very useful to someone who doesn’t already understand a fair amount about testing, yet an experienced tester would waste a lot of time and effort attempting to comply with it.

Conformance

29119-2 goes to Conformance very early – Page 1. Either Full or Tailored conformance can be claimed for the standard.

  • “Full conformance is achieved by demonstrating that all of the requirements (i.e. shall statements) of the full set of processes defined in this part of ISO/IEC/IEEE 29119 have been satisfied.”
  • “Tailored conformance is achieved by demonstrating that all of the requirements (i.e. shall statements) for the recorded subset of processes have been satisfied. Where tailoring occurs, justification shall be provided (either directly or by reference), whenever a process defined in…ISO 29119 is not followed. All tailoring decisions shall be recorded with their rationale, including the consideration of any applicable risks.”

I can find no guidance on what “the recorded subset of processes” means. Not just what the various nesting levels of “process” are in the standard, either. Are these the processes that reference record-keeping and documentation? I bet I can find a consultant to help not-interpret that…

There is a “Reference” example given for exclusion from the requirement for providing direct justification:

“Where organizations follow information item management processes in standards such as ISO 15489… ISO 9001…or use similar internal organizational processes, they can decide to use those processes in place of the information item management tasks defined in this part of ISO/IEC/IEEE 29119.”

So, no exclusion from the requirement to document and describe the justifications – just an exclusion from the requirement to provide a separate document including these justifications for ISO 29119, as long as they are in another document somewhere else.

After 10 months, the only defense raised thus far by the authors of the standard to the questions about difficult compliance is to claim it is more flexible than what is actually said in the standard:

 

… and that’s the last message in the conversation. I suppose we could take the word of a standard author over the standard itself, which says with little ambiguity under Intended Usage: “The organization shall assert whether it is claiming full or tailored conformance to this part of ISO/IEC/IEEE 29119”.

Clashing Definitions

Section 2 spells out definitions for some terms. There is overlap with Section 1 – and some disagreement with what was found there.

For example, in Section 1, Feature Set meant  “collection of items which contain the test conditions of the test item to be tested which can be collected from risks, requirements, functions, models, etc.” Section 2: “logical subset of the test item(s) that could be treated independently of other feature sets in the subsequent test design activities”. Additional differences, revisions, and pointers to new terms are found. This is not a “gotcha” – but is worth remembering when someone claims that with a test standard: “At least there is a common vocabulary for testing”, ISO 29119 already has divergence in critical definitions between its first two parts.

Do-not-think-it-meansAt least these terms are interesting to think about. It’s far less interesting to trace the relationships between test activity, test item, test condition, rest requirement, test phase, test plan, test policy, test planning process, test procedure, test procedure specification, test condition, test process, test sub-process, test script, test set, test models, test technique, test specification, and test type. Yes, these are all separate things, but time spent debating their boundaries is time not spent “testing”.

Exploratory testing is again defined as “spontaneously designs and executes”, not “simultaneously” as we define it.

Process and Hierarchy

screenshot.1571

This diagram shows a hierarchy of test processes. It doesn’t actually cover all the processes referenced in the standard, despite the caption’s claim. The diagram does demonstrate the standard’s insistence on separating control processes from execution processes.

It is intended to illustrate that the vertical layers define each other downwards. First is the organizational process that defines process for organizational test policies, which dictate policy, strategy, process, procedure, and “other assets”. Test Management Processes are defined at the project level, Dynamic Test Processes are said to control a phase or particular type of testing.

This seems tailored for adoption by the mid-level executive who wants to put their stamp on an organization’s entire testing practice. Over and over again, the standard lays out separate process nodes for each possible step of testing. This exhaustive documentation of the steps involved in one view of testing is way too much for an experienced tester, who would rather provide useful information to stakeholders. It’s still not enough to arm someone with no testing experience to plan and supervise good testing. So who is it for?

When Fear Drives Testing

IQSTD

Software testing is frequently perceived as a high-risk, low-reward activity by people who aren’t testers. It’s thought of as a cost center (“there is no ROI in testing”) and if anything goes wrong, someone’s in trouble. Over and over again, testing is blamed for poor quality, despite the fact that most people who work in software engineering know “you can’t test quality into the product”. Testing is often thought of as less intellectually rigorous than other parts of software engineering, frequently is not a prestigious area to work in, sometimes is led by people without real training, experience, and/or skill in testing, and is often a convenient scapegoat for quality issues – particularly by people who should know better.

Many people that work in testing (rightfully) fear the buck stopping on their desk after a quality failure, and for good reason. If you are likely to have blame imposed for a bug escape, the most rational response by a skilled person might be to interrogate the context and demand the tools and latitude to gather the most comprehensive and useful set of information about the system under test.

If you are controlled by fear, you might shy away from the responsibility, and look for some cover under best practices. After all, if you faithfully observed and obeyed someone else’s plan, you can’t be blamed if the plan fails, right? It wasn’t you, it was the plan!

If you don’t know what you are doing, you might be even more likely to seek the comfort of an externally defined standard that removes your responsibility to decide what to do. If you don’t trust your team (and yourself), you hand off control to someone or something else. Like a prescriptive standard, full of “shall statements” to replace “you thinking”.

The standard is still not detailed enough to be very useful to someone who doesn’t already understand a fair amount about testing, yet an experienced tester could waste a lot of time and effort trying to comply with it. Any discussion of actual techniques seems to be waiting for 29119-4 – at one point promised for late 2014, currently late in the approval process.

You Shall…

There are 138 instances of “shall” in this document. Some of them are highly specific. Some, by reference, contain multitudes. Some are simply nonsense. Some of them are too vague to be useful, though that may make them more applicable in multiple contexts. Some real wisdom can be found in here.

I spent some time pulling apart the various processes, sub-processes, dependencies, and circular references. Rather than try to further sketch out the overall shape of process (and documentation) requirements, I present my 10 most entertaining/concerning/Kafkaesque “Shall Statements” in ISO 29119-2:

  1. The person responsible for organizational test specifications shall implement the following activities and tasks in accordance with applicable organization policies and procedures with respect to the Organizational Test Process.
  2. The organizational test specification requirements shall be used to create the organizational test specification.
  3. Appropriate actions shall be taken to encourage alignment of stakeholders to the organizational test specification.
  4. The traceability between the test basis, feature sets, test conditions, test coverage items, test cases and test sets shall be recorded.
  5. The testing of the feature sets shall be prioritized using the risk exposure levels documented in the Identify and Analyze Risks activity (TP3).
  6. Any risks that have been previously identified shall be reviewed to identify those that relate to and/or can be treated by software testing.
  7. Each required test activity in the Test Strategy shall be scheduled based on the estimates, dependencies and staff availability.
  8. Those actions necessary to implement control directives received from higher level management processes shall be performed.
  9. Readiness for commencing any assigned test activity shall be established before commencing that activity, if not already done.
  10. The test coverage items to be exercised by the testing shall be derived by applying test design techniques to the test conditions to achieve the test completion coverage criteria specified in the Test Plan…
    NOTE 2 Where a test completion criterion for the test item is specified as less than 100% of a test coverage measure, a subset of the test coverage items required to achieve 100 % coverage needs to be selected to be exercised by the testing.

It’s not all baffling. Here’s a richly meaningful shall statement that demonstrates something about the depth necessary to understand context:

A Test Strategy (comprising choices including test phases, test types, features to be tested, test design techniques, test completion criteria, and suspension and resumption criteria) shall be designed that considers test basis, risks, and organizational, project and product constraints…

NOTE 3 This takes into consideration the level of risk exposure to prioritise the test activities, the initial test estimates, the resources needed to perform actions (e.g. skills, tool support and environment needs), and organizational, project and product constraints, such as:
a) regulatory standards; b) the requirements of the Organizational Test Policy, Organizational Test Strategy and the Project Test Plan (if designing a test strategy for a lower level of testing); c) contractual requirements; d) project time and cost constraints; e) availability of appropriately-skilled testers; f) availability of tools and environments; g) technical, system or product limitations.

Mapping

The last third of 29119-2 is an Annex mapping clauses of other standards (ISO 12207, ISO 15288, ISO 17025, ISO 25051, BS 7925, and IEEE 1008) to 29119-2. Rather than critique these other standards, I will simply question the value and purpose of this exercise. Is it to justify the standard, or to prove that it equals or even supersedes the others?

Conclusion

We still have parts 3 (and 4? soon?) of 29119 to go. Having processes defined before considering what we want to accomplish will guarantee we end at our desired results (whatever that might be), right?

Metrics Fixations: How People Feel About Numbers

The tweet that inspired this post:

Metrics that are not valid are dangerous.

TL;DR

My blog posts sometimes branch and overlap like legacy code that no one feels confident enough to refactor. So, new feature: the Too Long, Didn’t Read summary:

  1. Metrics are useful tools for helping evaluate and understand a situation. They have similar problems to other kinds of models.
  2. People believe metrics provide facts for reasoning, credibility in reporting, and safety in decision-making.
  3. Questioning metrics remains an important mission of our community.

Metrics are Models

A metric is a model. I see modeling here as a way of representing something so that we can more easily understand or describe it. They have value in expressing a measurement of data, but they need context to be information.

nerfherder
She Chooses Whoever Shoots Last?

I could look into my pasture full of hundreds of nerfs grouped in their pods, and communicate what I see as “There sure are lots of them.” Or, I might say “There are 1138 of them in 82 pods. Well, there were 1138 when I counted them all up last week. Oh wait, there have been seven calves, one death, and two missing since them. Yes, 1142, definitely 1142. I think. Unless some died or came back. And there are a few pregnant females out there. Still, only males for meat until wool production recovers.”

Other people have dug into the validity of metrics in great detail previously, and I don’t want to get sidetracked into (just) validity. We will get to the use of metrics shortly, but to get us into the right state of mind:

  • If I were to say that after implementing goat pairing in one pod of nerfs as a trial, nerf losses were at 7%, is that a good or bad number?
  • If nerf losses were 14% in the period before introducing goat pairing, does that help? What if I point out that there are an average of 14 nerfs in a pod? Are you going to ask where in the sample period today is?
  • Did I mention that wool production is down 38% because of the goats snacking on nerf fur clumps?
  • Meat revenue is up 3% this season.
  • Per animal? No, overall.
  • Meat prices are down relative to wool prices lately , but still up 5% this year to about $5.25.
  • How many animals butchered? I record that separately, but usually just divide pounds sold by 600 and use that for investor reports and taxes.
  • “All models are flawed. Some are useful.”
  • Remember not to confuse models for what they represent, lest you get the metrics – as opposed to the results – that you are looking for.
  • Correlation is not causation. It’s especially suspect when you are trying to explain something in retrospect.
  • The last, hardly subtle point: make sure what you measure matters.

And….time. It’s not that helpful to pick apart specific metrics – whether they measure something real, or if they are based on CMM Levels, KLOCs, defect densities, nerf herd finances, and other arbitrary/imaginary constructs. It’s not that helpful because it doesn’t necessarily change minds. Let’s instead discuss why people are so enamored with metrics, how they use them, and speculate on what they might be getting from them.

Quantifying With Measurement

By measuring something, we may feel like we are replacing feel with facts, emotions with knowledge, and uncertainty with determinism. By naming a thing, we abstract it; the constant march of computer science is to reduce complexity by setting aside information we don’t need, and simplify things to fewer descriptors. Everybody enjoys the idea of being a scientist.

894 story points, 37 story points per dev per sprint...
894 story points, at 37 story points per dev per sprint, is….

Similarly, we feel more control when we can point to a number. We can say that a thing is a certain height, length, size, etc, and we feel like we understand it. We’ve reduced the complexity to where we can describe a thing, removing the need to try to transfer some bit of tacit knowledge if we understand what we are looking at, or deceiving ourselves about how much we actually understand if we don’t. Everyone likes to feel clever.

We can then discuss quantities, group things that seem to be similar, and so forth. This means we can put it in spreadsheets, we can talk about how many people are needed to produce certain quantities, etc.

Of course, once something is represented by a number, it invites dangerous extrapolation: “Once we implement goat pairing across all pods, we’ll make $252,000 more!”

You Can’t Argue With Facts

When we can cite a number, wherever it comes from, we might feel like we are making quantitative judgments, removing our judgment and opinions. Something that is a fact isn’t open for interpretation, right?

leprechaun_on_the_loose

This provides us with cover and safety. Instead of stating an opinion, we can claim we’re simply pointing at reality. If you make a mistake in judgment, metrics can be the justification for why you did it. Wouldn’t anyone else have made that choice with those facts at hand?

Where did my facts come from? If they are measurements, how do I take them, and what do I discard? Why do they mean what I say they mean, and why do they mean that here and now? This is the slippery stuff that allows us to frame a discussion with our version of “facts” and interpretation of what they mean, inserting our biases and opinions while maintaining the illusion that we are making completely quantitative decisions, using only logic and reason, denying our influence in stacking the deck in the first place.

“Quantitatively, we’ve had the same experience as everyone else – goat pairing is essential for maximizing wool production.”

We Have to Measure Our Progress Somehow

If you do get a person pursuing metrics to admit problems with validity, a common deflection for reframing these conversations is to claim that however flawed they might be, metrics are an external requirement that is not open for discussion. When the boss’ boss demands metrics – or when we say that they do, we are attempting to end the conversation about the validity/need for metrics. Persisting with these questions past this signal that this is not open for discussion is going to reduce future influence, or worse.

Is this Accurate? Precisely.
Is this Accurate? Precisely.

This resolve comes from the experience of being asked to report status, which is essentially answering the following set of questions:

  • Is there progress being made?
  • Is the schedule still accurate?
  • Do you need help with anything?

If the answer is No, No, or Yes, there will need to be additional supporting detail. You are persuading another person to act or not act, committing personal credibility, and taking the risk that what you claim is correct enough that they won’t look foolish for endorsing it and you.

Reporting, Cloaked in Metrics

We often have limited opportunities to prove ourselves. We want our bosses, and our boss’ bosses, to believe that we are smart and capable. Presenting metrics to bolster our conclusion makes us feel more credible – and it can’t be denied that when the subject isn’t understood, almost any metrics are going to sound impressive and credible, making everyone involved feel smarter.

Many of us have found ourselves in discussions where a stakeholder is looking at a chart where the underlying measurements are barely – or not at all – understood, but they will still question the shape of curves and graph lines, asking for explanations when any troughs appear. This can be a powerful mechanism for having a discussion about the relevant issues, but there is a tradeoff in presenting a single metric – and having that become the standard.

Good reporting communicates facts, risks, context, and recommendations. Metrics that don’t support one of these are not in the mission of reporting.

What Does it All Mean?

Is it really true we can’t run a business without metrics? I don’t think I am advocating that, but I am suggesting we can help make it disreputable to manage to flat, two-dimensional metrics as if they were reality.

Managers have been simmered in the pot of using best practices to manage to metrics for at least a generation. Questioning metrics, both in formulation and usage, is an important mission of our community. We need to be thoughtful about when and how we raise these issues, but understanding the components of our reasoning is necessary to be confident that we are reasoning well.

Arguments to Moderation

In the last couple of months, other people outside of the Context-Driven Community have spoken up about the disagreements we’ve long had with certification and standards. One of the articles is here. Go ahead and read, it’s short and I’ll wait.

On first reading, the implication seemed to be that the Context-Driven Community’s approach to testing is from a single perspective – even though the editorial’s pronouncement is essentially CDT:

Limiting oneself to a single perspective is misguided and inefficient…It’s not either this or that. It’s all of the above. Testers must come at complex problems from a variety of ways, combining strategies that make sense in a given situation—whatever it takes to mitigate risks and ensure code and software quality.

Does the editorial writer know what CDT is about? This is something that could be said by any number of people in my community. My concern is that people who are not familiar will get the impression that CDT simply has different process or method prescriptions – a common fallacy amongst people who don’t (or won’t) understand what Context-Driven means. This is really frustrating, since this is the opposite of one of the most important things to us. We keep saying that our prescription is to examine the context, and select tools, methods, and reporting that are appropriate for the context. We have a bias against doing things that we see as wasteful, but we also acknowledge that these things may need to be done to satisfy some piece of the context.

Despite essentially agreeing with us, the mischaracterization of our point of view was necessary to serve the structure of the article as an Argument to Moderation. This is both a trope of modern “journalism” and a logical fallacy: selecting/characterizing two points of view as opposites, and then searching for some middle, compromise position, usually with pointed criticism directed at “both sides” to demonstrate how much more reasonable and wise the observer is.

JusticeGinsberg

This is a flawed model, though. Sometimes one position is simply correct. Often the two positions are not talking about the same reality, and the framing is important. There are typically more than two positions available on an issue, but as with politics, two seems to be the intellectual limit, with every point of view placed somewhere on a spectrum between slippery-slope extremes.

The debate – such as it is – about ISO 29119 is suffering from a lack of voices willing to take up for the standard’s content and mission. Even the authors of the standard are responding to criticism by defining down what “standard” means and what it’s for. No one seems to be speaking up against the things CDT says, but there are people who seem to be enjoying contradiction for its own sake, or taking on a bystander role, clucking about personal agendas without naming anyone or anything as an example.

Debate is appropriate for describing conversations about subjects where there is professional disagreement. That’s what’s here – and that’s all that’s here. We can disagree, as professionals, and it’s fine. “Can’t we all just get along” was first uttered as a call for peace during riots where people were being injured and killed. A professional debate is not a riot. I don’t hate people I disagree with. I consider them colleagues, and if we didn’t disagree, what would we talk about? If we didn’t feel passionately, why would we bother debating?

I’m not a fan of yelling at people on Twitter. It makes many people uncomfortable, nuance is lost, and often, the person doing the yelling just looks mean. These are all valid criticisms of communication style, but not of substance – both in the sense that it ignores the issues at hand, and in that complaining about the PR instead of the content is a transparent mechanism to claim the higher ground.

obamaseewhatyoudidIf you want to talk about how our community supports and nurtures young thinkers, discussion of this particular subject is valid and important. If you want to talk about twitter manners in order to not-so-subtly discredit a point of view without actually engaging with it, it’s not hard to see that.

People working within and profiting from a system are almost always going to think the system works well, despite whatever flaws they might acknowledge. Any criticism of the system is a challenge to the status quo, and will be opposed by the people working within a system. Particularly when you profit from a system, you should not expect to be exempted from criticism of that system, or your role in it. It was ever thus, and there is no reason why this field, or this subject should be any different.

I speak at conferences about the things I do and think that pertain to my field of study. I expect to encounter other experts, and be asked questions. If I didn’t get any questions, I probably didn’t say anything new, important, or relevant.

If you sell certification training or work on standards bodies, you nominate yourself as a spokesperson for the ideas you clearly support – or that support you, more like. If you claim expertise on a subject, or purport to accumulate anecdotes and then pass off your opaque classifications and conclusions from them as statistical evidence, you should expect to be asked questions and asked to provide more detail. If you are not willing to speak for and defend your ideas, maybe you shouldn’t be willing to profit from them, either?

If you’re an observer, you could add something to the discussion by debating the issues at hand. If your contribution is just to tone police, maybe sit this one out?

Standardization (is) for Dummies

A theme seems to have developed on this blog. There has been a lot of complaining about the control mechanisms of industrial-scale quality management. Let’s not change course now; we must stay committed to the process.

doge_wink
Very irony…much humor…wow

Today, I want to talk about the pathology of “Standardization” – the idea that the most efficient way for large organizations to “manage” work done by many people is to make the tools and/or processes the same across the organization, specifically for testing. I am not talking about rolling tool licenses up into an enterprise license, or even reporting structures coalesced into “establishing centers of excellence” (more like “marketing scheme to justify cost for enterprise tools of mediocrity”, amirite?)

Note the diversity of characters
Note the diversity of characters, even with “common tooling”

And of course, many things do benefit from standardization. Railroad travel, measurement (Metric, ‘Murica!), Lego, hardware interfaces, operating systems, etc. Often, it is the right thing to standardize.

Achieving standardization through generalization process engineering is really trying to replace variables of skill and experience with a perceived constant provided by documentation of some idealized process and a set of required artifacts.

The desire to model our world as full of easy equivalencies is easy to understand; we make decisions all the time between two or more choices – or at least “choices” as we have framed them (or allowed them to be framed for us). The reduction of complexity to symbols is necessary for us to decide what to do, where to go, and how to get there without paralysis.

Testers are rich sources and frequent users of heuristics. Heuristics are very effective when used responsibly, skillfully, and in the right circumstances. What always matters is context. Nothing is truly abstract.

Choosing between an apple or an orange for a morning glass of juice is a matter of preference, and a very different choice than deciding which tree to plant in your yard, which requires considering climate, sun exposure, and soil. Apples and oranges is not even really “Apples and Oranges” without understanding why the choice is being made, who is making it, and what the desired outcome(s) is.

Standardizing Test Practices

This weenie would like to see your test case documentation format
This weenie would like to see your test case documentation format

I believe that process weenies who lead standardization efforts really believe most of the things that they say. They believe that if they can properly standardize, document, and implement the Amalgamated Consolidated, Inc way to do things, they will save the company money, shorten testing cycles, implement the proper metrics, and reduce hard and soft costs. I didn’t say they are right, but I am saying that they are not intending to mislead when they make those claims. Given how the Iron Triangle of Cost, Time, and Quality works, they can indeed move towards two of the corners.

In addition to the very common pathology of presenting goal statements like “save money and improve efficiency” as a strategy, there are some other things that are said that I find troubling. Let’s unpack some of these.

“If we standardize, our training costs will be lower.”

“Standardizing will make it easy to transfer work and employees between groups.” 

Effective testers know the software and system they test, and how best to work with the people on their team. If they go to a new team, that knowledge will be lost from their old team. The relocated tester will need to build new contextual information again. If they get work from another team, they will have the same challenges.

It is incrementally cheaper to have one set of process documentation than two, or one set of software manuals. Of course, that documentation will already have holes by the time it is published, and by the time processes have a year or two to evolve, the documentation will have notes, exceptions, and whole sections that are flat-out ignored. Oh, you are going to keep the documentation updated, you say? Does that ever really happen?

The truth is, new employees aren’t really going to learn any faster from attempting to make every department do things the same way. They are going to have to learn whatever process they need to know to be successful in the area in which they are hired. Who cares whether Group A does things the same way as Group B? Not the new employee, they only care about how their group does things.

Every department/business unit/team will have a number of “local variables” – where data is stored, how to get equipment, refreshing builds – all of the contextual parts of the process that can only be learned through practice. It is also hugely important to learn what the group’s priorities are set, who gets mad, who is receptive, how this manager likes to run things, what that director means when they say things cryptically in email; every new employee has to learn these to be effective.

What portion of onboarding time is *really* spent on learning the steps of a process?

“HR says we have to formalize job descriptions/responsibilities/salary bands.” 

The 4 different shapes makes them appear more natural...
These are the 4 testing role descriptions we will use across the company…

Sometimes, the organization is looking to rationalize job descriptions, salaries, and other things that make it easier for bracketing employees against each other. This process is almost never good for workers. There might be some very pious talk about making sure pay is “fair” across the organization; but my experience is that it is far more likely expensive salaries end up targeted for cutting than low salaries are raised.

Treating people like interchangeable, programmable cogs is not only dumb, it’s dehumanizing and demotivating. Smart, passionate people will be motivated to find somewhere else to work where they can use and grow judgment and skill. If you are looking to commoditize testers between groups, you are likely to end up with a pool of McTesters with similar skill levels, and results across projects may also be similar in an equally undesirable fashion.

“It costs us money to have redundant tools in the organization.” 

Yes, this is absolutely true. But enough about middle management.

Tool-centric views of testing are somewhat less prevalent than they were a few years ago (though automation-obsession is a large problem). Open source and custom tooling seems to be pulling ahead – because there is always a bias towards and a significant focus on cost-cutting.

If you believe that all tools are equivalent, it becomes easy to make “dollars and sense” decisions at some remove from the front lines to reduce redundancy and merge the organization’s knowledge. Unfortunately, this is simply not true. Tools are not equally fit for the same purposes.

All tools have strengths and weaknesses – most of which are less important than the skill and judgment of the tool operator. If you actually do take away the tool an experienced and skilled person is comfortable using and replace it with another, you are also discarding a great deal of experience and developed work – the cost of which might be difficult to measure, but is really hard to replace without significant time and energy. Sure, some of it is crap – most automation code is. The useful bits would go into the garbage with the rest of it. A sunk cost, perhaps – but still discarded value.

Briefly, this trope: “We have wasted redundancy trapped inside more than one code base”.  As if code were perfectly commented and interchangeable, ready to be pulled off a shelf and swapped in to a waiting receptacle as if it were a battery.

“All of our documentation looks different.”

Standardization of document templates is probably harmless, beyond giving anal-retentive ninnies some cover in the form of work product to justify their grab at “thought leadership”.

Perhaps the first question should be “So?” or “And?” Documentation is just a way to communicate important information to the people who need it. Who actually consumes the documentation? What information are they looking for?

“We’re all doing things differently.”

We Must Be In Lockstep
We Must Be In Lockstep

Different groups of people will choose different methods for attacking different problems – or perhaps even the same ones. The collective skills, experience, and inclinations of one group will be different than another’s – so of course they will come up with different ways to do things.

This “argument” is a great example of “begging the question;” why is it bad to do things differently? What is to be gained by forcing people to work at the same pace in the same way? Effective groups will develop ways to work together efficiently – a process of continuing improvement.  It has to be asked – why is standardization so important, really? Does it make people feel safer, more secure, or less at risk? What is the real value of “consistency”? Are we solving for real problems or neuroticism?

It was in a sales context (full of other buzzwords like “cadence”) that I first heard the phrase “in lockstep”. This led me to call this “Three Legged Racing” – an old child’s activity that rewards careful synchronization, and is sometimes intended to deliver a teamwork lesson. The two children could run to the finish separately much faster, but tying them together induces a crippling limitation, forcing them to discard their natural instincts and abilities to stumble along, trying to get somewhere while struggling against their constraints, with grass-stained clothes, shortened tempers, and injured ankles sure to follow. 

Human beings have some natural skills and tendencies – which vary from individual to individual, but in the same way children run freely and naturally, people think, talk, and work in ways that feel comfortable and effortless. The best work product results from people not wasting effort on process overhead and administrivia, allowing them to find their flow. When people aren’t allowed to work in natural ways, it’s much harder for them to accomplish anything at all, and they will be unhappy.

I remind myself not quite frequently enough that I must be careful assigning motivations to others; this is how you end up begrudging and resenting people who don’t really spare you a second thought. Most people (everyone) are trying to fumble their way along with the rest of us, and are doing what they think is the right thing to do, by some formulation of what they think the right thing to do is. People are never as sinister as an irritated person might think – though even the worst people feel justified in what they are doing, trapped by circumstance into making perfectly rational and logical decisions.

moon_illusion1a
“Train Tracks are SUPPOSED to be the same width!”

When we try to prevent mistakes by attempting to dictate future activities, we are using the fear of what truly incompetent people might do to force competent people to discard their judgment. We are harshly judging people we don’t know, and we are supposing that we can still make better decisions than them without context.

If we are arrogant enough to try to dehumanize people in the future by giving them questionable marching orders from the present, we create environments that are not healthy for thinkers and passionate people – and they will leave.

Mashing “Best Practices”

Once, someone irritated me about using the term “Best Practices”. I spouted off.

After reflection, I’ve realized that while comparing test case spreadsheets to Nickleback was good for getting people that already agree with me to snicker, it was not helping me make my point to people who did not already have that point of view.

Canadian Arena Rock Best Practices
Arena Rock Best Practices

Clearly, “Vancouver Creed” is evocative of strong emotions to many music fans, but lots of people like them, and I don’t just mean Avril Lavigne and other Canadian Mediocrities. They sell millions of records and sell out arenas, even today. These guys are rich beyond even *my* wildest dreams, and that is a pretty frightening Lynchian hip hop video.

The core issue is not the wording of whatever label is used for defining helpful practices as an abstract concept, but with trying to control dialogue about practices by insisting that precedent is the most important concept in deciding what to do. Here, I’ve tried to spell out what I think the Worst of the “Best Practices” label really is: how it can be used to fool ourselves and each other into rushing without sufficient care or information to bad decisions.

1. Implied Framing

Anyone can point at anything and say that it is a “Best Practice”. I think that adding chicken stock before whipping is a good practice for mashed potatoes. I’ve made mashed potatoes that way and liked it, so I could decide to call that a “Best Practice”.

However, that would be dumb. My wife doesn’t like mashed potatoes that way. She, and many other people, have different techniques to reach different definitions of delicious, creamy mashed potatoes.

If I insist on a practice because it will create an outcome *I* want, I am assuming that the outcome I want is exactly the same as everyone else wants. There are other problems here to discuss, but that is the first – that my view of things is the only valid one, and that everything must serve the end.

Many recommended testing practices seem to be focused on the end of establishing central control and repeatable testing practices. That may serve the purposes of a person “managing” a large testing project, but most stakeholders don’t care about that.

2. Over-Simplification – Problem and Implementation

Mashed potatoes are a terrible analogue for testing software. In fact, any process seen as deterministic from a handful of factors is the absolute opposite of useful.

All software projects are one-offs created by developers, testers, and project staff of various skill levels and engagement, to solve different problems, using different components, based on variably incomplete understandings of requirements. In a given context, at a given time, a team of (hopefully) smart and (always) flawed people work together to make JUST ONE thing that never existed before, and follow-up with as much bug-fixing as they have time, money, and energy for. They stumble through a project, splitting their time with other projects and non-work situations, and eventually call it done. Then they move on to create something else, with a different team addressing a different set of requirements, some a little more skilled, some a little more cynical or a little less interested. Requirements, conditions, and the skill and engagement of the people on the team, are all moving targets.

Software testing isn’t mashed potatoes. It isn’t even a restaurant; the closest I can get to is an “Iron Chef” style competition: here are some ingredients and requirements you may or may not be familiar with and an arbitrary timeline – go!

A Demanding Stakeholder who doesn't care about process, just results
A Demanding Stakeholder who doesn’t care about process, just results

That being said, indulge me. Let’s briefly consider the simple system of mashed potatoes:

– Mashed potatoes involves first, potatoes. Selecting Yukon Gold, Redskin, Russet, Purple Peruvian, or another variety makes a huge difference. This woman is all wrong about types of potatoes, by the way. Redskin potatoes make awesome mashed potatoes. The age of the potatoes at picking, and how long they hung around before cooking matters, too. Finally, do you peel them first or not?

– Usually, the potatoes are boiled until soft. The mineral content of the water? Salt and garlic cloves in the water (no and yes for me)? How soft?

– Then they get whipped, often with a hand mixer, sometimes with a stand mixer, maybe with a food processor. Some amount of salt and butter is added. A splash of milk is a good idea, though I like sour cream or yogurt better. Some people like cheese.

– Then some amount of time passes, and the potatoes are eaten hot, reheated, or cooled off, and may or may not have gravy.

Are all mashed potatoes the same? I guess that depends on how much you care about the quality of the mashed potatoes. What’s good enough for a Tuesday night? What’s good enough for Christmas Dinner? If you wanted commodity potatoes, why not just get instant?

3. False Assertion of Authority/Expertise

“I’m an expert on potatoes. And mashing them. So let’s do it my way.”

If the person is saying that because they have a true love of potatoes and many years of experience, they should still be sensitive to the context: the available materials, time, budget and skill.

This is far more preferable than the next most likely situation: someone read something or watched a presentation, and are attempting to apply techniques they haven’t used. This turns on the worst assumption of a process engineer – that process trumps skill, experience, and context.

I love aphorisms, but one of my favorite is “Knows enough to be dangerous”. This is a place that applies.

4. Shut Down Debate/Appeal to Fear

This is related to the previous. Once you believe the right software testing is a matter of selecting the right process, then you can do whatever it takes to advocate for a “pure” implementation of it.

One highly effective way to control people is with fear.

"And then we will probably all get sued!"
“And then we will probably all get sued!”

Whether you construct legal strawmen, invoke the boss’ name to try to access people’s fear buttons, or go straight for the fear of losing their jobs if the project fails, the easiest way to halt debate is to make people think that implementing processes that worked somewhere else has less risk – even if you have to make it personal. You can always claim later that you were just being cautious, which will have many people forgiving you for your attempted manipulation.

I don’t mean to minimize the important of safety. If people don’t feel safe, of course they will be more conservative in decision making. My experience is that there is a lot more fear out there than is appropriate. I am saying that amplifying the level of fear is unkind at best, often cruel, and never fair as a technique for debate.

Sometime, there is no debate. A consensus builds (or the HiPPO makes the call), and perceived risk is reduced; whether that is project or political risk is rarely untangled.

I SAID TMap!!!
I SAID TMap!!!

Somehow, I think we are a long ways from being done with debating these issues. “Use Best Practices” is a mantra that has spread far and wide. We have a lot of work left to do.

Happy holidays to everyone. I hope that wherever you are, and whoever you’re with, that your mashed potatoes are satisfying to all.

That *Best* Time on Twitter…

Here’s a story. It even has a message.  What it doesn’t have is a villain – though he has a FANTASTIC name for it.  I am definitely not a hero, but I’ll settle for thinker. Let’s start at the beginning.

Before I found Context-Driven Testing in 2004, everything I had heard and read about testing was not applicable or helpful to the work I was doing. The guidance I could find was highly prescriptive, and described testing as an exercise in keeping records of how you had done what you were supposed to do, which had been decided by someone smarter at some previous time. I knew it was wrong – and had done my part to challenge and defeat an ISO 9001 effort at my company- but I had a hard time explaining why.

Then…(take cover! incoming name drops!)..Ross Collard brought me to WOPR3, where I met Rob Sabourin, Brett Pettichord, Antony Marcano, Paul Holland, Roland Stens, Mike Kelly, Karen Johnson, Cem Kaner, Julian Harty, Richard Leeke, Dawn Haynes, and others. There, I heard a reference to schools, and found my way to an earlier draft of Brett Pettichord’s formulation of schools of testing.

I had found thinking about testing that was relevant to me, and that made sense. It changed me forever. Judgement and skill were more important than citing process documentation or checked off spreadsheets of test cases, providing information was more important than gate-keeping, adapting practice and process to the situation was more appropriate than imposing a “Best Practice”. Ever since, I am happy and proud to be a context-driven tester.

Yes, I helped myself to a nice tall glass of Kool-Aid. It tasted pretty good, too. So I had seconds. Mmm, Kool-Aid. I’ll talk more about this in a future post.

Obama_Cool-Story-Bro

The value in discussing testing “schools” today is to understand different approaches to testing – or quality – or quality assurance – or whatever it is called at the place they pay you money to get you to show up and interact with complex systems to mitigate risk. I do not see others who work in testing with different approaches as adversaries, though I do disagree with them, sometimes vigorously. Still, I rep my school aggressively enough that I use the term “Modern Testing” sometimes. More on this another time, as well. Enough exposition, let’s get to the story.

In April, I had seen some Twitter sniping at Rex Black (@RBCS). Rex is a successful testing consultant, speaker, and trainer. He has been around a long time – and he is credited as a contributor to Brett’s schools presentation.

The last few months, Rex has taken some tough shots from the CDT community on certification, metrics, and other issues. At that point, Keith Klain (@keithklain) had been challenging him on Twitter about the reliability coefficient of the ISTQB exam. If you don’t know about that, Keith launched a petition about it here. I don’t have a strong or informed opinion on the matter – but the only test certification I have is BBST Foundations.

So, at STPCon in April, I found myself sitting at a table with Rex before his keynote/panel discussion. I made some joke about him trolling the context-driven community, he suggested it might be the other way around, and it was all very friendly. Then he went on stage for a panel discussion, and that was that.

A week later, I was at the stage of waking where I was not out of bed yet, but I was reading Twitter. I saw this tweet:

‏@RBCS Question for “context-driven”/RSTs: Why the animus toward the common business mgmt phrase “best practice”? Even twitter has best practices.
I thought about it for a minute, and started tapping. 30 minutes later, I had replied with 9 tweets. 10 might have been more psychologically satisfying, but 9 turned out to be the right number.
1. Context matters. You wouldn’t test a free mobile game the same way you would test a medical record system, for example.
2. The idea of Best Practices attempts to substitute process for skill, which leads to crap results.
3. Substituting process for skill is dehumanizing. Do you tell an artisan he’s wrong for not making beer the way Budweiser does?
4. Best Practices are a B-School fairy tale executives think they can use to manage work they don’t understand.
5. Best Practices stifle innovation and improvement by tying processes to something that seemed to work once, and got good PR
6. Best Practices use command-and-control to deliver the efficiency of government and the consistency of fast food.
7. Best Practices don’t evolve well. How do you identify Better Practices when they come along from somewhere that innovates?
8. The term is problematic. “Best” implies there is nothing better – and imposes framing that is not explained.
9. Who decides what “Best” is? Someone sees a presentation, it sounds good? Why won’t everyone say theirs are best? When do we vote?

Rex and I went back and forth on whether I was constructing straw men (somewhat, it must be admitted), and whether oatmeal was a breakfast best practice. Others piped up; there was a lot of talking past each other, and it doesn’t seem like anyone’s mind got changed.

fascinated_cat

OK, if you insist. Some more things I am glad I said:

DDT was a best practice once. Waterfall was the only way to develop software once. Is Agile or Scrum best practice today? Who decided?
The tweet you replied to pointed out a problem in selecting the best practice: if we each do it differently, which is “best”?

On Best Practices are “The consensus of the majority of experienced professionals”:

How was that consensus arrived at? When do we review and update? Anyone can point to a best practice and say a lot of people agree

Where is this consensus published so that we can know we are talking about the same consensus? What if you have last year’s version?

And, since it is my blog post, I’ll repeat my favorite exchange:

‏@RBCS Example: it is a best practice to study defects found in sw dev in order to learn how to write better sw and how to test better.
‏@ericproegler @RBCS …In almost every case, reviewing bugs is valuable. But there are times it isn’t. Can I trust you to know when?

End result? Nothing much. I’m not disciplined or thoughtful enough to have leveraged it into anything beyond a self-aggrandizing (and now recursive) blog post several weeks later. I don’t tweet or blog much, but I did appreciate the feedback and RTs.

I’ve learned that Rex isn’t bothered by the use of the word “Best” the way I and others in my community are.  He has suggested that we are focused way too much on the meanings of words, and others have responded. Six weeks later, he’s getting the same kind of flak, and is saying things like “Best practices must allow adaptation.”

Rex is definitely a good sport, and I respect him for hanging in there while maintaining a couple of other debates. He’s still taking shots from people in my community on this and other subjects. Most seem to be addressing genuine items of disagreement with him, though some of the outrage could be dialed back a bit. I certainly reserve the right to take more shots myself – though some days on Twitter look like this:

Blog you later, test debaters.