Accelario Data Masking Sensitive data search

The Data Masking service offers a comprehensive solution with an intelligent search engine, empowered by advanced search algorithms, lookup lists, and AI technology.

This article explains the detailed process of leveraging this tool to effectively identify sensitive data within your database environment and assign required masking rules.

Getting started
After setting up your database environment, the Data Masking service allows you to identify sensitive data via the Sensitive data search tool.

Search engine functionality
The intelligent search engine scans every data column within the selected environment configuration. It utilizes a combination of lookup lists, AI technology, and advanced algorithms.

Sensitive data search parameters:

	Parameter	Description
1	Privacy Policy	Privacy Policy refers to the pre-defined set of masking rules designed to comply with specific privacy regulations (like GDPR, CCPA, or HIPAA), or to align with your organization's unique privacy requirements.
2	Parallel Processes	This parameter determines the number of concurrent processes run during a sensitive data search. Adjusting this can impact the speed of your search. High numbers may speed up the operation but require more system resources, while low numbers are less resource-intensive but may extend scan time.
3	A number of unique values to analyze	Denotes how many unique values in each column the system should scan to identify the appropriate masking rule. Higher values might yield more accurate results but may consume more resources and time. Carefully balance the need for precision and resource usage.
4	Search optimization	Allows you to choose between Performance and Accuracy. Performance prioritizes speed over completeness, meaning the service may not analyze all the unique values specified in A number of unique values to analyze. Accuracy guarantees that the system will analyze the exact number of unique values you've specified, regardless of the time it takes. This is achieved by executing SQL queries with the DISTINCT clause. While this method may take longer, it provides a precise analysis of unique values within your data.
5	Search depth:	Search depth is a parameter that operates in Performance mode of Search optimization. This mode employs a faster optimization and approximation method for the time-consuming distinct SQL selection in the Accuracy mode. The parameter sets a multiplier for the number of values to be analyzed, enabling the system to estimate search scope what was specified in A number of unique values to analyze.
6	Auto Refresh	Auto Refresh determines whether the Masking service automatically scans and updates for any changes in the database structure. Enable this setting if you expect changes in the data structure since the last operation. Refreshing and updating the data structure ensures that the sensitive data search is accurate. If the scan is performed on an outdated schema, it may lead to incorrect results. It's essential to keep the data structure updated for accurate sensitive data search results.
7	Incremental	Incremental mode analyzes changes from the user-provided masking configuration. By enabling this, the current masking rules are preserved and only modifications in the data schema are considered during the analysis of sensitive data search scan. This results in a smaller portion of the database being analyzed, significantly reducing operation time. If this mode is disabled, the scan will be conducted on the entire environment. Use this setting to maintain continuity in your masking rules and expedite the scan process for environment with evolving masking configurations.

Proposing masking rules
The system scans each data column to identify sensitive information. It then proposes various masking rules based on diverse techniques:

Data look-up rules: Utilizes predefined lists to mask data.
Column rules: Applies rules specific to each data column names.
Regexp rules: Employs regular expressions.
AI Technology: Leverages artificial intelligence for adaptive masking.

Calculated Probability
Each proposed masking rule is accompanied by a calculated probability. Higher probability indicates the most suitable masking rule for a particular column. The rule with the highest probability is automatically selected as the default choice If the probability is greater than the Sensitive Data Search Threshold parameter (which can be adjusted in the system settings, with a default value of 20%)

Review and Confirmation
Users have the option to review the proposed masking rules. They can either confirm these rules or make adjustments according to their specific requirements.

Saving Masking Configuration
After reviewing and modifying the proposed masking rules as needed, users can save the final configuration.

This sensitive data search tool is particularly beneficial for databases with extensive data schemas with numerous columns, as it can save time by streamlining the process of setting up masking rules. This eliminates the need for manual and time-consuming assignment of rules to each individual column.