Parallel masking of a single table

The parallel update of a single table works in this way:

  • The main table process prepares pool of 3 Writers. Each one of them is checking if BlockingQueue has items to write. If queue is empty, Writers are just been blocked.

  • The main table process reads all RowIDs for current table into ResultSet. Then it iterates through that ResultSet and adds to a dedicated list of RowIDs. When size of that list reach 10_000 items, it creates instance of Reader with parameters of RowIds and clears that dedicated list of RowIDs

  • Each Reader reads data using RowID as WHERE clause, masks data in the java engine and puts results to BlockingQueue . It uses the provided fetchSize to read the data from DB.

  • Once data appears in blocking queue each writer begins to collect records for writing until their buffer reaches batchSize parameter. After that Writer writes masked data to the DB using updates.

  • The BlockingQueue here is a good negotiator and balancer between Readers, Writers and allocated memory

  • If BlockingQueue is not empty and all Writers have full buffer (it means that they all are busy now writing data to the DB) then the main table process adds new instance of Writer.

  • Max number of Writers is 3 (it's BlockingPoolExecutor with capacity 3)

  • BlockingQueue has a hardcoded limit - 100_000