Customer data is critical when building new services. Regardless of how companies and other organisations are using that data, they will all now be facing the same General Data Protection Regulation (GDPR) challenge, how to mask customer data or build accurate and useable synthetic data, while retaining referential integrity for testing new products or services.
The imminent arrival of the GDPR in May 2018 is bringing the testing community to the forefront of data handling practices. The penalties for non-compliance are now well known. But despite the fact that organisations accessing or processing EU-based personal information must comply with the GDPR, only 19 per cent of UK CIOs currently have plans in place to deal with it.
Against this backdrop testing providers are being called upon to ensure that individuals’ data is processed securely and that an audit trail is in place for compliance purposes. Let’s take a look at the key challenges presented by GDPR and the transformative role test data management can make in a new product/service development environment.
Test Data Management
I believe that the single biggest threat to a business of breaching the new GDPR legislation will be a result of poor test data management. Testing is a critical element of innovation. Test data management covers a wide aspect of specific quality assurance driven disciplines that support all IT and Business test phases. Test data can be the ‘forgotten man’ when building business driven test scenarios. From a GDPR Business compliance viewpoint it will however, need to take centre stage.
Test data management typically covers several key quality assurance driven activities that promotes rigorous quality assurance and includes:
- The targeting and creation of non-production data sets that mimic actual data so that IT and the Business can perform accurate and relevant tests before releasing or updating a new service or product.
- The building of synthetic data where it is not possible or acceptable to use ‘real’ data.
- Ensuring the data can be shared across IT and Business teams.
- Enabling data to ‘time travel’ to support complex business test scenarios.
- Allowing for effective backup and restore capabilities.
- Supporting the effective build and deployment of the corresponding test environments.
Transparency & Consent
Individuals will need to provide specific and active consent covering the use of their data, and this demands a change in processes around the gathering and withdrawal of consent. Consent for third-party processing is also affected as the data owner is liable for its data, wherever the data is handled.
Companies will need to define and then manage legitimate data use and length of storage time before archival and deletion. Test data management will be pivotal in supplying evidence to regulators that a business has due diligence in place particularly for exceptions around legitimate business uses such as pursuing outstanding debts, or holding on to an address for a warranted product.
While a copy of real information may have been used to test systems in the past, this simply cannot continue; individuals need to give explicit and informed consent that their data can be used for testing. Making this a pre-requisite to allowing a customer access to your system or service would be looked on poorly by the regulator and could represent a breach of the GDPR regulations in its own right.
Testing new features will be necessary, for instance to determine how a consent slip will be processed. Different test data sets, scenarios and combinations of consent will need to be created, ensuring that all processes can be measured correctly.
The GDPR also states that individuals have the right to data portability. It allows citizens to move, copy or transfer personal data easily from one IT environment to another in a safe and secure way. Data portability will require testing, and ensuring compliance for data in flight will be a major exercise for organisations that have high volumes of live data in non-protected environments.
Through test data management, testers will have access to the data in a structured and readable format and be able to confirm that the original data has been removed from the ‘source’ system.
All of the above will drive new risk based test scenarios which in turn will impact how late phase testing such as User Acceptance Testing is defined, planned and undertaken. This will have an additional impact on the quantity of ‘Must Tests’ that will need to be executed within a UAT test window.
Data Masking For Anoynmisation
There are two key data types commonly used in testing: real and synthetic. While the use of synthetic data may potentially be preferable (due to being highly targeted), it is not always possible and in such a case anonymisation would be essential. With anonymisation or ‘masking’ as it is sometimes known, while the format of the data stays the same, the values are altered in such a way that the new values cannot identify the original information. Using synthetic data and data masking will be two of the approaches that organisations will now need to consider when they move away from using copies of live production data.
Testing teams can help to create the data, and using powerful tools enable dynamic anonymisation whilst protecting the real source data. Tools enable data snapshots, where users no longer work on the database, but on a snapshot of the data, and this part of the data can be anonymised. Other strategies are also available for example dynamic anonymisation where the result of a query is anonymised in real-time so there is no need to take a snapshot.
New assurance processes and procedures will be needed, as personal data should not be exposed to persons who are not authorised to handle it, and anyone who requires such access should be aware of the rules regarding data usage.
Additionally, with the fines for non-compliance so high, there is a need to ensure that any new functionality is not downgraded or negatively impacted by the changes. In terms of ongoing assurance, this clearly increases the need for regression testing across projects. GDPR testing and test cases will now need to be added to your regression pack.
Tools & Data Discovery
In terms of data discovery, 75 per cent of organisations said the complexity of modern IT services means they can’t always know where all customer data resides. A retail client recently conducted a discovery exercise and found terabytes of customer data over ten years old. Under GDPR, the retailer needs to be able to find that data and justify holding on to it for ten years. This will have an impact on Business as Usual and risk management within the organisation.
A similar example is the TalkTalk breach revealed last year, as some of the data that was breached was ten years old. In May, another company that had been purchased by TalkTalk, a small now defunct regional cable and TV company, was revealed as the source of the leaked data.
To avoid non-compliance, documentation of the use of personal data in all test environments is necessary, including backups and personal copies created by testers. An understanding of all real data sources and the current location of data is key to ensuring that no real personal data is exposed to software testers, test managers, business users and other team members during software development, maintenance and test phases.
Some organisations have legacy, poorly supported IT systems, with unstructured data making it even more complex, especially if an organisation has emails on file relating to individuals, containing names, addresses, telephone number and contact information. Those responsible for the GDPR must be able to examine the database, find the related email attachments and data relevant to an individual.
This presents a major challenge of scale in testing, as discovery could be the equivalent of searching for a needle in a haystack, and a decently sized haystack is necessary for valid tests. Some IT teams will need to test what they haven’t tested before, such as how can we find data on the system, identify data that’s not supposed to be there, or data that’s been audited out?
Unfortunately, finding the right data within gigabytes of information can be a hugely time-consuming task. Testing teams can help search for the data using the same tools that would be typically used for automation.
It’s not enough to simply think of the GDPR as just another regulatory requirement; the transition to becoming a GDPR-compliant organisation is a major undertaking and maintaining compliance requires an ongoing commitment and new ways of testing. Indeed, GDPR compliance will be one of the major IT challenges over the next few years, with initial compliance followed by continuous testing of that compliance.
A robust test data management strategy will save money – and indeed could save an organisation full stop. We see the growing importance of test data management and the provision of ongoing assessments to prove compliance, as a necessary investment to guard organisations against non-compliance.