The shuffling method is a very common form of data obfuscation. It is similar to the substitution method but it derives the substitution set from the same column of data that is being masked. In very simple terms, the data is randomly shuffled within the column. However, if used in isolation, anyone with any knowledge of the original data can then apply a "What If" scenario to the data set and then piece back together a real identity. The shuffling method is also open to being reversed if the shuffling algorithm can be deciphered.
Shuffling, however, has some real strengths in certain areas. If for instance, the end of year figures for financial information in a test data base, one can mask the names of the suppliers and then shuffle the value of the accounts throughout the masked database. It is highly unlikely that anyone, even someone with intimate knowledge of the original data could derive a true data record back to its original values.
Example:
Source rows:
First_Name | Last_Name | Birth_Date | Salary | Education |
---|---|---|---|---|
John | Smith | 01/10/1978 | 10000 | University |
David | Disney | 25/07/1995 | 15000 | College |
Larry | King | 08/02/1999 | 8000 | High-school |
Now, we are masking columns First_Name, Last_Name, Salary using shuffle.
We build 3 lists: First_name{John, David,Larry } ; Last_Name {Smith, Disney,King }; Salary {10000, 15000, 8000}
Destination rows:
First_Name | Last_Name | Birth_Date | Salary | Education |
---|---|---|---|---|
Larry | Disney | 01/10/1978 | 8000 | University |
John | King | 25/07/1995 | 10000 | College |
David | Disney | 08/02/1999 | 15000 | High-school |
Known Issues:
The Shuffling algorithm doesn’t guarantee the uniqueness after the Data obfuscation.