Non-production data is safer, legally compliant and easily provisioned


Every organization has some data that is core to their line of business, some data without which they would not be able to function. A retailer needs lists of products and prices. A bank needs accounts, transactions, and customer details.

An HR and administration department needs a payroll database, with employee details, tax codes, budget allocations, etc. These data requirements are all critical to the continued operation of the organization; the environments in which they are used are called ‘production’ or ‘live’ environments, in that they impact the day-to-day activity of the organization and absolutely cannot be allowed to fail.

Most organizations have needs beyond the purely tactical, though. An organization also has strategic interests – from improving the efficiency with which they operate, to better understanding the needs and preferences of their customers. These strategic undertakings aren’t strictly necessary for the organization to keep operating for another day, but they are clearly valuable, and so they’re present in the vast majority of organizations – certainly any which aim to grow.

Many of these strategic processes need data as well. For example:

  • Development of new IT systems is a common way to improve efficiency or extend the capabilities of the business. When these new systems are destined to replace existing ones, there is always the requirement that data in the existing system will be compatible with the new system. Developers will need that data in order to analyze it and understand the compatibility issues.
  • Similarly, it is important to test the new system to ensure that the developers actually implemented the system correctly. So, data is needed to test the system against. Alongside testing that serves the development process, it’s also common that the system needs to be demonstrated to other parties – senior managers, investors, partners, customers, and so on – and a realistic demo usually requires data as well.
  • “Business intelligence,” or general-purpose analysis of an organization and the environment in which it operates, requires, naturally, data about the organization and the environment in which it operates. A retailer, for example, might be interested in customer buying habits, so they’d need data about all the purchases customers make. This will often include data that doesn’t even get used in production at all, such as alternative products a customer viewed before making a purchase. Storing large quantities of data for analysis like this is what the topic of Data Warehousing is all about.

There are many other instances of data being needed, varying from organization to organization: wherever non-critical business activity is taking place, you’ll usually find that it’s consuming data somehow. It’s these situations that are considered ‘non-production’ environments.

It’s fairly clear how data is supplied to a production environment: it accumulates naturally over the course of business operation. Customers supply their account details; vendors supply product information; orders and transactions arise as trade and interaction takes place. This data, a native child of the production environment, is what is known as ‘production’ or ‘live’ data.

But what of the non-production environments? How is data supplied to them?

The first and most obvious answer is: get it from the production environment. Just take a copy of the production data. Very simple. But, unfortunately, this path is fraught with problems.

Firstly, consider security. Your production environments will doubtless have a robust supporting infrastructure, there to protect against both accidental problems like power failures, and also to protect against malicious acts like data vandalism or theft. Not only would it be very bad news for a competitor to obtain a copy of your customer list, it would also severely damage the relationship between you and your customers, both present and potential. So, you’ve got measures in place to ensure that your data is kept secret and safe: passwords, firewalls, encryption, activity monitoring, intrusion detection, locked doors.

Now you’re thinking of placing a copy of that production data in a non-production environment. Are the same measures in place? Stealing data from your production environment might be a seriously difficult undertaking, but stealing it from non-production might be as simple as walking off with a developer’s laptop while he’s dozing on the train home.

The environment may have changed, but the data hasn’t; it is still every bit as valuable. If your non-production environments are not just as secure as your production environment, they represent a back door, rendering your production security useless. If your non-production environments are as secure as production, then maybe you don’t have a security concern – but security tends to come at the cost of convenience. How much of an impact on your development schedule will it have?

Beyond the security concern, there’s the simple risk of honest mistakes. If you give your QA team data that includes genuine customer email addresses, and they begin testing the ‘system announcement’ functionality, what are the chances that they’ll inadvertently mass-email all your customers? Such a thing is embarrassing – potentially very much so, depending on exactly what they send – and does not inspire confidence in your organization.

Thirdly, though – and this, for many people, is the clincher – it may simply be illegal. Certain kinds of data – notably things relating to finance, payments, healthcare, and personal information – are, in many parts of the world, governed by laws and regulations that lay down strict requirements of the environments in which the data is used.

The need to secure your non-production environments changes from a matter of risk into a matter of legal liability; and if you thought that the security was a headache, there are also reporting requirements, staff training, disposal issues… everything that you had to cover when building your production environment, you now have to cover again.

So what’s the alternative? Non-production data: data that is not just a copy of your production data. It could be that you take the production data and censor it, carefully stripping out and scrambling all the sensitive parts (an approach known as Data Masking). Or it could be that you don’t even go near the production data, and instead synthesize all your data completely from scratch; that way you’re guaranteed that nothing sensitive will leak out of your organization via a non-production environment. Toolkits for both approaches exist on the market today.

Either way, non-production data is safer, legally compliant, and easily provisioned.

Richard Fine is a Technical Writer for Grid-Tools. He graduated from Oxford University in 2009 with a Master's Degree in Computer Science. A programmer since the age of 4, he has extensive experience working with real-time interactive simulation systems, as well as a range of Web technologies. He has received the MVP Award from Microsoft 5 times for his contributions to online communities and IT education.