Table Chunking for the parallel migrations

Sometimes we need to migrate a database with one (or few) giant tables and many small tables. The migration of such giant tables:

Taking too long to finish (up to 99% from the entire time of migration);
Using lots of resources such as temp space, undo or redo;
Blocking concurrent online transactions.

The main idea of chunking is to divide these tables on blocks (chunks) and migrate them in parallel. It will have a effect if:

The Destination Database supports parallel commits to the same table;
The Source Database supports the reading by-chunks.

Important: It make sense to turn-on Chunking if the number of parallel processes more than one.

By default Chunking is turned-Off. To turn it On for specific table go to Workflow View → Config Tables and click on the chunking icon

During migration, in Advanced monitor view, the Process by Quantity Pie will show the progress by chunks:

The log for the table with chunking (Advanced monitor view → Status → Selected Table → Log) will show the info about each chunk:

Our recommendations:

Turn the chunking On for huge tables only.

Explanation: It’s not effectively to cut small tables on chunks and insert them concurrently in few threads because the concurrent insert (the switches between threads) takes time.

Don’t set the number of parallel processes more than 4 for databases what doesn’t supports the reading by-chunks.

Explanation: If your source databases MySQL and PostgreSQL what doesn’t allow to read only specific chunk (part of data) then each parallel process should scan all records (to do “full table scan“), filter specific chunk and send it. The spending time on the scanning all records for each parallel process make sense if the writing works a few times slow than the scanning.

Table Chunking for the parallel migrations

Important: It make sense to turn-on Chunking if the number of parallel processes more than one.

Our recommendations:

Turn the chunking On for huge tables only.

Don’t set the number of parallel processes more than 4 for databases what doesn’t supports the reading by-chunks.

Related content