Shuffling

Name in the application : SHUFFLE

The shuffling method is a very common form of data obfuscation. It is similar to the substitution method but it derives the substitution set from the same column of data that is being masked. In very simple terms, the data is randomly shuffled within the column. However, if used in isolation, anyone with any knowledge of the original data can then apply a "What If" scenario to the data set and then piece back together a real identity. The shuffling method is also open to being reversed if the shuffling algorithm can be deciphered.

Shuffling, however, has some real strengths in certain areas. If for instance, the end of year figures for financial information in a test data base, one can mask the names of the suppliers and then shuffle the value of the accounts throughout the masked database. It is highly unlikely that anyone, even someone with intimate knowledge of the original data could derive a true data record back to its original values.

Example:
Source rows:

First_Name	Last_Name	Birth_Date	Salary	Education

First_Name	Last_Name	Birth_Date	Salary	Education
John	Smith	01/10/1978	10000	University
David	Disney	25/07/1995	15000	College
Larry	King	08/02/1999	8000	High-school

Now, we are masking columns First_Name, Last_Name, Salary using shuffle.

We build 3 lists: First_name{John, David,Larry } ; Last_Name {Smith, Disney,King }; Salary {10000, 15000, 8000}

Destination rows:

First_Name	Last_Name	Birth_Date	Salary	Education

First_Name	Last_Name	Birth_Date	Salary	Education
Larry	Disney	01/10/1978	8000	University
John	King	25/07/1995	10000	College
David	Disney	08/02/1999	15000	High-school

Known Issues:

The Shuffling algorithm doesn’t guarantee the uniqueness after the Data obfuscation.

Accelario Copy

Shuffling

Related content