Many hands make software work
The stakes for Microsoft, which was outlining its Office 2010 product strategy, were extremely high. According to Microsoft’s earnings statements, Microsoft Office productivity suite generates more revenue than any other business division, says Gregg Keizer, who covers Microsoft and general technology news for Computerworld.
Months before Microsoft released Office 2010 productivity suite, 9 million people downloaded the beta version to test the software and to provide feedback. Through this program, Microsoft collected 2 million valuable comments and insights from those testers.
Denise Carlevato, a Microsoft usability engineer for 10 years, and her colleagues from Microsoft’s Virtual Research Lab observed how people used new features. Their objective was to make Microsoft Office fit the way millions of people used their product and to help them work better. It was a massive, controlled crowdsourcing project.
Developing a new software product is always exciting, especially to watch ideas take form and truly become a reality. Sometimes a fresh perspective or an innovative use case is all it takes to turn a product from good to great. However, when it comes to testing, we often find ourselves in unchartered waters wondering if the product will actually work in the diverse customer landscapes. It is virtually impossible to test the vast number of devices and configurations of software that web-based software can run on today. Truly robust testing is time-consuming and ensuring that every possible permutation and the combination of features, localizations, and platforms works, as intended is nearly impossible.
Often times, comprehensive testing is a challenge and buggy code is delivered to the customer. For example, if a Software-as-a-Service (SaaS) application does not render in a particular browser or a critical software tool fails to deliver its intended functionality, a bug fix or a patch is promised and the vicious cycle starts all over again. Either way, the customer withstands the worst of inadequate testing, especially when faced with the escalating costs of software maintenance, performance, etc. For the software development company, ramifications include distress around brand image, perceived quality, relationship and potential future projects, trust, etc.
Welcome to the new world of crowd-sourced testing, an emerging trend in software engineering that exploits the benefits, effectiveness, and efficiency of crowdsourcing and the cloud platform towards software quality assurance and control. With this new form of software testing, the product is put to test under diverse platforms, which makes it more representative, reliable, cost-effective, fast, and above all, bug-free.
Crowdsourced testing, conceived around a Testing-as-a-Service (TaaS) framework, helps companies reach out to a community to solve problems and remain innovative. When it comes to testing software applications, crowdsourcing helps companies reduce expenses, reduce time to market and increase resources for testing, manage a wide range of testing projects, test competence needs, exigency to resolve higher defects rates and use 3rd party’s test environment to subside the project requirements.
It differs from traditional testing methods in that the testing is carried out by a number of different testers from across the globe, and not by locally hired consultants and professionals. In other words, crowdsourced testing is a form of outsourced software testing, a time-consuming activity, to testers around the world, thus enabling small startups to use ad-hoc quality-assurance teams, even though they themselves could not afford traditional quality assurance testing teams.
Why Does Crowd Sourced Testing Work?
To understand why crowdsourced testing works, it is important to understand the set of biases that infect most testers and test managers around the world. This phenomenon is called, “The Curse of Knowledge,” a phrase used in a 1989 paper in The Journal of Political Economy. It means that for a particular subject expert, it is nearly impossible to imagine and look beyond the knowledge the tester has acquired i.e. the set of concepts, beliefs, and scenarios that the tester knows or predicts. As a result, it is particularly challenging to think outside the box and conceive the various ways a typical end user would use particular software.
This phenomenon has been empirically proven through an infamous experiment conducted by a Stanford University graduate student of psychology, Elizabeth Newton. She illustrated the phenomenon through a simple game, people were assigned to one of two roles, namely tappers and listeners. Each tapper was to select a well-known song, such as “Happy Birthday,” and tap the rhythm on a table. The listeners were to guess the song from the taps. However, before the listeners guessed the song, tappers were asked to predict the probability that listeners would guess correctly. They predicted 50%. Over the course of the experiment, 120 songs were tapped out, but listeners guessed only three of the songs correctly – a success rate of merely 2.5%
The explanation is as follows: when tappers tap, it is impossible for them to avoid hearing the tune playing along to their taps. Meanwhile, all the listeners could hear is a kind of bizarre Morse code. The problem is that once we know something, we find it impossible to imagine the other party not knowing it.
Extrapolating this experiment to software testing, most testers conduct a battery of tests that they feel is representative and that captures the set of end-user scenarios for how the software would be used. The reality is far from this. Any expert tester would assert that it is impossible to capture the complete set of scenarios that an end user may throw at a software system. As a result, critical path(s) of the code under certain scenarios go untested, which leads to software malfunctioning, production system crashes, customer escalations, long hours of meetings, debugging, etc.
Crowdsourced testing circumvents all these headaches by bringing a comprehensive set of code coverage mechanisms and end user scenarios during the design and development stages of software engineering, during which the cost of modification is meager. This results in identifying critical use cases early on and providing for those contingencies, which reduces software maintenance costs later on during and after productive deployment. Besides progressive code coverage, the quality and depth of software testing among various vital software modules is achieved, which ultimately results in a higher code quality, among other benefits.
Crowdsourced testing – the framework
At the heart of the crowd, sourced testing is the community that tests a given software product. The community encompasses people from diverse backgrounds, cultures, geographies, languages, all with a diverse approach to software usage. The community, represented by a diverse and extended user space, tests any given software by putting it to use under realistic scenarios, which a tester in the core test team may not be able to envision, given a tester’s constraints, such as limited bounds of operation, knowledge, scenarios. Thus, it is easy to observe the broad set of usage patterns that put the software under intense scrutiny. Crowdsourcing software testing draws its benefits from delegating the task of testing a web or software project, while in development, on to a number of Internet users, to ensure that the software contains no defects.
The method of crowdsourced testing is particularly useful when the software is user-centric when software’s success and adoption is determined by its user feedback. It is frequently implemented with gaming or mobile applications, when experts who may be difficult to find in one place are required for specific testing, or when the company lacks the resources or time to carry out internal testing.
The spectrum of issues that such test efforts could uncover within a short lead-time is particularly noteworthy. Such testing efforts yield productive results with reasonable costs. Often times, the product company pays only for those valid reported bugs. Hence, the Return on Investment (ROI) is high compared to the traditional means of software testing.
How does it work?
Most crowdsourced testing companies provide the platform for the testing cycles. Clients specify the type of tests that they wish to have performed and the types of devices that the software product must be tested on.
Testers complete a profile, indicating the skills they have, the devices to which they have access to, and the countries where they reside. Once a tester has completed his profile, he/she can check the project dashboard for a listing of projects and releases that are available for testing. The dashboard may also include sample test scenarios, additional tools, and scripts, instructions for testers about what is expected from them, etc. Usually, the testers are required to submit a QA plan, which outlines both high-level test cases and detailed test scenarios. The plan may also include whether or not the test can be automated and expected results.
A qualified Project Manager, who is typically a proven community leader or a person from the client/the platform company, reviews such plans and approves or amends such plans to cater to the client’s specific testing requirements.
Each project includes an explanation and access to a forum where bugs and issues are discussed and additional questions can be asked. Testers document bug reports and are rated based on the quality of their reports. The amount the testers earn increases as their rating increases.
The community combines aspects of collaboration and competition, as members work to finding solutions to the stated problem. Forums facilitate networking and discussion of bugs or relevant issues; rating systems allow for recognition of a job well done, which helps participants gain credibility and improved career.
Checks & Balances
Security is a crucial element to crowdsource testing. More often than not, confidential customer information is exposed to testers during application testing. Any breach of this data can lead to serious damage, both to the brand and the business. Test data management ensures the availability and security of test data by obfuscating sensitive information for large-scale testing engagements. Masking such information or creating ‘test-only’ data helps maintain privacy and security while using crowdsourced testing services.
In almost all cases, the testers are required to sign a Non-Disclosure Agreement (NDA) when they join the community. The NDA forbids them from talking about customers, their products or specific defects, both offline and online on Facebook, Twitter, personal blogs or anywhere outside the confines of the private testing platform. Beyond that, the customers can upload a customized NDA, which testers must sign before viewing the customer’s project. For projects that require a high level of security, a pre-screened list of white hat engineers, that have a long professional relationship with the platform company are selected.
Furthermore, standardized communication patterns help users secure their data and gain confidence in their testing vendors, which results in a seamless transition.
By combining an internal, permanent team of testers with a crowd of experienced software testers working from around the globe, superior quality in testing is delivered. By constantly filtering the network of testers to accept only experienced software testing professionals, applicants without formal training and significant professional experience are eliminated. This ensures the quality and the validity of the bugs reported. Last but not the least, tests are dispatched to individual testers based on their experience, available material, and languages mastered. The testers and test project exposure are continually monitored to ensure both quality and integrity, not only of the test results but also of the associated environment.
Crowdsourced testing is best when the product under development is consumer-centric rather than enterprise-centric, such as gaming or web is drove consumer applications. A global user base to test the product should exist and the product should be relevant to the community at large. This is also a test for the application’s potential success in the marketplace.
There should also be an earnest interest from the community to proffer critical feedback for the product under consideration such as a monetary reward. This also brings forth another interesting challenge. The product company is not obliged to follow through on community’s recommendations and may dispense with the feedback for various internal reasons. In this case, the community may feel unheard and this mandates a fine balancing act of the entire ecosystem.
The product company should be committed to working with a large group of people and understand that it involves some degree of overhead in such a decentralized test effort. It also requires certain subject matter experts to mentor and monitors various testing efforts as well as offer support and relevant guidance to the testing teams. If the product team does not have the resources to take on full-fledged testing in-house but has a good understanding of the testing requirements, it can realize its overall strategy from a globally sourced team.
With normal employment contracts, employees receive a salary for their contribution and the firm owns any intellectual property developed by the employee during their tenure with the organization. In a crowd-sourcing constellation, people are participating voluntarily. Unless the position on Intellectual Property (IP) is clear and explicitly stated, i.e. a condition of the right to participate is the acceptance of Intellectual Property transfers to the client, potential for IP infringement by the contributor exists.
A crowdsourced project requires skills and mastery in designing the compensation structure, both in monetary and non-monetary terms. The testers are usually paid a certain amount of money in the case of a successful bug/issue discovery. In some cases, the testers would prefer non-monetary aspects like recognition and personal satisfaction rather than monetary compensation. Thus, it is vital to understand the motivators prior to mission critical deployments.
In cases where participants are compensated on a per task basis, an incentive for participants to choose speed over accuracy exists. This is especially the case with especially micro tasks, which are susceptible to mistakes and could result in erroneous overall outcomes. Therefore, robust governance mechanisms need to be installed, continually monitored and policies regularly updated to reflect the changing trends.
Advantages of crowdsourced testing:
- Representative scenarios from the real user base, not hypothetical test cases
- Tight feedback loop with rapid feedback processing and agility
- Comprehensiveness in use cases, platforms, tools, browsers, testers, etc. that is typically impossible to replicate by any single product company
- Cost efficiency, as the product company pays only for the valid bugs reported
- Diversity among the pool of testers leads to comprehensive testing, especially with regard to applications, which are localization based
- Reduced time to test, time to market and total cost of ownership as critical paths of a software module are tested during design time, which leads to a reduced maintenance cost
- Better productivity and improved product development focus
Disadvantages of crowdsourced testing:
- Governance issues around security, exposure, and confidentiality when offering a community project to wide user base for testing
- Quality and workload challenges that arise from the unpredictable nature of customer demands
- Project management challenges that stem from the users’ diverse backgrounds, languages and experience levels
- Documentation issues, such as poor quality of bug reports, bug duplicates, and false alarms
- Equity and equality constraints in the reward mechanism with remuneration as a function of the quality of contributions that meets a prescribed minimum standard
- Management overhead associated with managing an active and growing community
What does the future hold?
Crowdsourced testing, clearly, has its advantages and limitations. It cannot be considered as a panacea for all testing requirements and the power of the crowd should be diligently employed. The key to avoiding failure in crowdsourcing would be to use it prudently depending on the tactical and strategic needs of the organization that seeks crowdsourced testing services. It is important for the organization to embrace the correct model, identify the target audience, offer the right incentives and have a suitable workforce to manage the task results.
Crowdsourcing testing is a relatively new application in Software Engineering and as we continue to experiment and learn about crowdsourcing, we will gain experience and maturity that will help to identify best practices and to harvest the entire value it offers. With this learning, we will become better equipped at mitigating any associated risks and at learning how to better deal with the operational issues around the applicability of crowdsourcing to new sets of activities.
Considering the above points in mind and taking cues from individual scenarios will help determine whether crowdsourced testing really makes sense and if so, what to, when and how to leverage the crowdsourced community.