How can we extend DataArchiver is limited to 10,000 records per execution

We have a large amount of data in a table, and we need to delete items that are older than a certain threshold.

After examining the code in the DbArchiver.php class, I noticed that there's a variable hardcoded without being configurable through config_override.php.

private const ARCHIVE_LIMIT = 10000;

  1. How can we change this limit?
  2. We have altered the scheduler to run once per day instead of the OOTB once per week.
  3. Right a custom scheduler?

If we have 1M rows it is going to take 100 days to purge the older data but we will never catch up as we keep adding data to the table.

Thoughts, Suggestions?

Parents
  • Hi Jeff,

    this is a very sensitive topic. In an old project we had database tables which contained more than 500 k records to be deleted, some of them even more than 1 M. The issue with mySQL was that with a certain amount of records the database access may take much more time than expected.

    When we tried to clean up the tables we detected that a normal delete is no delete only. As the database used logging for its tables, the delete statement created a backup copy of all these records before the records were deleted.

    First step in our environment was to reduce the number of records created each day. Deleting 100 k records every day may result from creating 100 k unneccessary records. So we first corrected the design a little bit.

    Then we found out that deleting more than 20 k records in one statement may result in very long runtimes. So the constant 10 k for the ARCHICHE_LIMIT seems to be a good choice. Cleaning up the already existing records was not done by the archive process in our environment. That was no option, this process took too long. So we did it manually by creating new tables and copying data to be preserved into these tables - fortunately the database was not in the cloud.

    For your project I would recommend to rethink the data collection and deletion process. Perhaps you could bring it down to 10k chunks which can be handled by the archiver. Or perhaps you can write an own archiver scheduler process which optimizes these deletions.

    Good luck!

Reply
  • Hi Jeff,

    this is a very sensitive topic. In an old project we had database tables which contained more than 500 k records to be deleted, some of them even more than 1 M. The issue with mySQL was that with a certain amount of records the database access may take much more time than expected.

    When we tried to clean up the tables we detected that a normal delete is no delete only. As the database used logging for its tables, the delete statement created a backup copy of all these records before the records were deleted.

    First step in our environment was to reduce the number of records created each day. Deleting 100 k records every day may result from creating 100 k unneccessary records. So we first corrected the design a little bit.

    Then we found out that deleting more than 20 k records in one statement may result in very long runtimes. So the constant 10 k for the ARCHICHE_LIMIT seems to be a good choice. Cleaning up the already existing records was not done by the archive process in our environment. That was no option, this process took too long. So we did it manually by creating new tables and copying data to be preserved into these tables - fortunately the database was not in the cloud.

    For your project I would recommend to rethink the data collection and deletion process. Perhaps you could bring it down to 10k chunks which can be handled by the archiver. Or perhaps you can write an own archiver scheduler process which optimizes these deletions.

    Good luck!

Children
No Data