Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Current »

If the database we need for our development and testing contains sensitive or personal data, be it names, phones numbers or credit card numbers, then the answer might seem obvious: simply replace all this data with randomly-generated names or numbers. The immediate problem with this is that the resulting data will neither look nor act ‘real’.

To make the data look real requires a bit of thought and effort; let’s take credit card data as an example. To mask this data realistically, your data replacement ruleset must, firstly, use the standard format of credit card numbers, i.e. four sets of four numbers, such as 1234 5678 9012 3456, although you’ll also see 15 digits for American Express. It will also need to ensure that each replacement number is unique (more than one person can share a credit card, but for billing purposes, it’s one card).

So, our replacement method could invoke a random number generator, or a set of them, one for each of the four sets of numbers, which guaranteed a unique set of random numbers in the correct format. Job done? Not quite because, of course, the numbers in a credit card are not random.

In the United States, and based on my research this seems to be accepted around the world, the credit card numbers are assigned by the American National Standards Institute (ANSI), the same organization that defines the SQL language, among other things. They maintain the Issuer Identification Number database. The first six digits of your credit card define the institution that issued your card. So, if you have a Venture MasterCard issued by CapitalOne, I know that the first six numbers of your card are 552851. Further, the first one or several numbers will quickly identify the type of card. In the case of a MasterCard, the first two numbers will always be 51 or 55. A Visa card will start with 4. You can also see if the card was issued by an airline, an oil company or a telecommunications company, based on that first digit.

The next 9 (or up to 12) digits are your personal identifier within the issuing organization, your GUID or IDENTITY value, if you will. The last digit is a checksum value, used with a Luhn algorithm.

To have the realistic masked credit card number:

  • The first 6 digits (the Bank ID Number) will be copied from the original Credit Card Number

  • The next 9 digits (the account number) will be created randomly

  • The last digit is a checksum will be created by Luhn algorithm.

  • No labels