Under Review

Idea: pruneDatabase allow an older-than parameter

The pruneDatabase job deletes all records marked as deleted = 1 at the beginning of each month.

Users being users they sometimes change their mind about the deletion and not always in a timely fashion...

I'd like to suggest an additional parameter to the pruneDatabase job to allow admins to set a 'grace period' whereby the script prunes deleted records older than x-number-of-days.

Until then, I guess I'll create my own scheduled job for this purpose :)

thanks,
FrancescaS
  • Here is my first iteration at a custom prune job.

    Note that the emails_text and emails_email_addr_rel tables are treated separately when deleted emails are removed.

    I chose to ignore other tables that don't have dates and just delete anything that is marked deleted regardless of date for those.

    $date_limit defines how far back in the past you want to go. I'm deleting records 3 years old or more for testing purposes.

    And I don't want my backups in the cache, that gets blown away too often to be safe so I'm putting them in a backups dir in uploads.

    Critique welcome.

    FrancescaS

    <?php

    global $sugar_config;

    $sugarroot = '/var/www/html/sugarcrm';

    $GLOBALS['log']->fatal('----->Scheduler fired job of type pruneDatabaseWR()');

    //$backupDir      = sugar_cached('backups');

    $backupDir = $sugar_config['upload_dir'].'/backups';

    $backupFile = 'backup-pruneDatabase-GMT0_'.gmdate('Y_m_d-H_i_s', strtotime('now')).'.php';

    $db = DBManagerFactory::getInstance();

    $tables = $db->getTablesArray();

    $queryString = array();

    $timedate = TimeDate::getInstance();

    $date_limit = date_format(new SugarDateTime('-1095 days'), 'Y-m-d'); //start with 3 year old deletes

     

    if(!empty($tables)) {

      foreach($tables as $kTable => $table) {

        //array and list identify records to be deleted within this table

        $id_arr = array();

        $id_list = '';

        // find tables with deleted=1

        $columns = $db->get_columns($table);

        // no deleted - won't delete

        if(empty($columns['deleted'])) continue;

        // limit to records older than xx days

        // not all tables have a date_modified

        $where = '';

        if(!empty($columns['date_modified'])){

           $where = " AND date(date_modified) < '{$date_limit}' ";

        }elseif(!empty($columns['date_entered'])){

           $where = " AND date(date_entered) < '{$date_limit}' ";

        }elseif(!empty($columns['date_created'])){

           $where = " AND date(date_created) < '{$date_limit}' ";

        }else{

          if($table == 'emails_email_addr_rel' || $table=='emails_text') continue;

          //do not process email tables that don't have a data to limit by

          //these will be addressed individually

        }

        $custom_columns = array();

        if(array_search($table.'_cstm', $tables)) {

          $custom_columns = $db->get_columns($table.'_cstm');

          if(empty($custom_columns['id_c'])) {

            $custom_columns = array();

          }

        }

        $qDel = "SELECT * FROM {$table} WHERE deleted = 1 {$where} LIMIT 20 ";

        $GLOBALS['log']->fatal('pruneDatabaseWR::'.$qDel.'<br>');

        $rDel = $db->query($qDel);

        // make a backup INSERT query if we are deleting.

        while($aDel = $db->fetchByAssoc($rDel, false)) {

          // build column names

          $queryString[] = $db->insertParams($table, $columns, $aDel, null, false);

          if(!empty($custom_columns) && !empty($aDel['id'])) {

            //add to the array of IDs

            $id_arr[]=$db->quoted($aDel['id']);

            $qDelCstm = 'SELECT * FROM '.$table.'_cstm WHERE id_c = '.$db->quoted($aDel['id']);

            $rDelCstm = $db->query($qDelCstm);

            // make a backup INSERT query if we are deleting.

            while($aDelCstm = $db->fetchByAssoc($rDelCstm)) {

              $queryString[] = $db->insertParams($table, $custom_columns, $aDelCstm, null, false);

            } // end aDel while()

            $db->query('DELETE FROM '.$table.'_cstm WHERE id_c = '.$db->quoted($aDel['id']));

            $GLOBALS['log']->fatal('pruneDatabaseWR::'.'DELETE FROM '.$table.'_cstm WHERE id_c = '.$db->quoted($aDel['id']).'<br>');

            //deal with related tables which don't have a date

            //check all related records with deleted = 1

          }

        } // end aDel while()

        // now do the actual delete

        $id_list = implode(",", $id_arr);

        if(!empty($id_list)){

          $db->query('DELETE FROM '.$table.' WHERE deleted = 1 AND date_modified < '.$date_limit );

          $GLOBALS['log']->fatal('pruneDatabaseWR::'.'DELETE FROM '.$table.' WHERE id IN(' .$id_list. ')'.'<br>' );

          //if processing emails also process emails_text and emails_email_addr_rel

          if($table == 'emails'){

            $emails_email_addr_rel_columns = $db->get_columns('emails_email_addr_rel');

            $emails_text_columns = $db->get_columns('emails_text');

            foreach($id_arr as $id){

              //back up emails_email_addr_rel records

              $qemailDel = "SELECT * FROM emails_email_addr_rel where id = {$id}";

              $remailDel = $dv->query($qemailDel);

              while($aemailDel = $db->fetchByAssoc($remailDel) {

                $queryString[] = $db->insertParams('emails_email_addr_rel', $emails_email_addr_rel_columns, $aemailDel, null, false);

              } // end aemailDel while()

     

              //back up emails_text records

              $qemailDel = "SELECT * FROM emails_text where id = {$id}";

              $remailDel = $dv->query($qemailDel);

              while($aemailDel = $db->fetchByAssoc($remailDel) {

                $queryString[] = $db->insertParams('emails_text', $emails_text_columns, $aemailDel, null, false);

              } // end aemailDel while()

              $GLOBALS['log']->fatal('pruneDatabaseWR::'. $emailDel);

            }

            //delete related emails_text and emails_email_addr_rel

            $db->query('DELETE FROM emails_email_addr_rel  WHERE email_id in (' .$id_list. ')');

            $db->query('DELETE FROM emails_text  WHERE email_id in (' .$id_list. ')');

          }

        }

      } // foreach() tables

      if(!file_exists($backupDir) || !file_exists($backupDir.'/'.$backupFile)) {

        // create directory if not existent

        mkdir_recursive($backupDir, false);

      }

      // write cache file

      write_array_to_file('pruneDatabase', $queryString, $backupDir.'/'.$backupFile);

      return true;

    }

    return false;

  • One problem with trying to create a custom job is that some tables don't have a last modified date or even a create date, and even when they do date fields are not named consistently from table to table so there is no simple way to add to the query on the current prune job.

    Perhaps all tables with a delete field should also have a date_deleted field that can then be used for the purpose of pruning.

    Some table entries should only be deleted if the parent is deleted but, of course, that's not a generic statement.

    For example emails_text does not have any date fields and should be deleted when the parent record in the emails table is deleted.

  • Yeah, great idea. The fear of losing accidently deleted data from the last of the month usually keeps us from activating the scheduler job all together (because restoring from backups is a hassle).