Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Current »

The shuffling method is a very common form of data obfuscation. It is similar to the substitution method but it derives the substitution set from the same column of data that is being masked. In very simple terms, the data is randomly shuffled within the column. However, if used in isolation, anyone with any knowledge of the original data can then apply a "What If" scenario to the data set and then piece back together a real identity. The shuffling method is also open to being reversed if the shuffling algorithm can be deciphered.

Shuffling, however, has some real strengths in certain areas. If for instance, the end of year figures for financial information in a test data base, one can mask the names of the suppliers and then shuffle the value of the accounts throughout the masked database. It is highly unlikely that anyone, even someone with intimate knowledge of the original data could derive a true data record back to its original values.

Example:
Source rows:

First_Name

Last_Name

Birth_Date

Salary

Education

John

Smith

01/10/1978

10000

University

David

Disney

25/07/1995

15000

College

Larry

King

08/02/1999

8000

High-school

Now, we are masking columns First_Name, Last_Name, Salary using shuffle.

We build 3 lists: First_name{John, David,Larry } ; Last_Name {Smith, Disney,King }; Salary {10000, 15000, 8000}

Destination rows:

First_Name

Last_Name

Birth_Date

Salary

Education

Larry

Disney

01/10/1978

8000

University

John

King

25/07/1995

10000

College

David

Disney

08/02/1999

15000

High-school

Known Issues:

  • The Shuffling algorithm doesn’t guarantee the uniqueness after the Data obfuscation.

  • No labels