How to turn this into a SugarQuery?

Context:

Our database contains orphaned records (e.g., Tasks, Notes) where the parent_type and parent_id reference deleted parent records. These orphaned records, some over five years old, need cleanup to improve database efficiency.

Objective:

Convert the following MySQL queries into a SugarCRM function that adheres to its logic, framework, and ORM practices for managing data integrity.

Request:

Provide guidance or implementation steps for converting this functionality into a SugarCRM-compliant solution.

SET SESSION group_concat_max_len = 100000;

SELECT GROUP_CONCAT(
    CONCAT(
        "SELECT m.id, m.parent_type, m.parent_id FROM tasks m ",
        "LEFT JOIN ", lower(parent_type), " jt ON m.parent_id = jt.id ",
        "WHERE m.deleted = 0 AND m.parent_type = '", parent_type, "' AND jt.id IS NULL"
    )
    SEPARATOR " UNION ALL "
)
INTO @dynamic_query
FROM (SELECT DISTINCT parent_type FROM tasks) t;

PREPARE stmt FROM @dynamic_query;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

Parents
  • Hello  , 

    If you’re using SugarCloud and don’t have SQL access to handle this through a database query.
    I believe that a good approach might be to leverage a custom Scheduler. 


    This is what I thought it might work: 


    1. Create a New Checkbox Field: Add a custom checkbox field (e.g., “No Parent Records”) to mark orphaned records.
    I called it orphaned_c on the tasks module


    2. Set Up a Scheduler: Create a Scheduler that identifies orphaned records and checks the field you created in Step 1. 

    When an Account is deleted, the parent_id field in the Tasks module is set to NULL. To address this, we want to identify and mark:


    . Tasks where the parent_id is NULL (indicating no parent).
    . Tasks where the parent_id points to a record that either has deleted = 1 or no longer exists.

    This is the Scheduler I created on my tests, seems to work fine on my end but do make sure you test to make sure I didn't miss anything: 

    <?php
    
    array_push($job_strings, 'checkOrphanedTasks');
    /**
     * Scheduler job to detect orphaned Task records in batches
     */
    function checkOrphanedTasks()
    {
        $batchSize = 1000; // Number of records to process in one batch
        $offset = 0;       // Start from the beginning
        $continueProcessing = true;
    
        while ($continueProcessing) {
            // Initialize SugarQuery for batch processing
            $sq = new SugarQuery();
            $sq->from(BeanFactory::newBean('Tasks'), ['alias' => 't']);
            $sq->select(['id', 'parent_type', 'parent_id']);
            $sq->where()
                ->notNull('parent_type')  // Ensure the parent_type is not null
                ->equals('deleted', 0);   // Exclude deleted tasks
            $sq->limit($batchSize);
            $sq->offset($offset);
    
            $results = $sq->execute();
    
            // If no more records, stop the loop
            if (empty($results)) {
                $continueProcessing = false;
                break;
            }
    
            foreach ($results as $row) {
                $taskId = $row['id'];
                $parentType = $row['parent_type'];
                $parentId = $row['parent_id'];
    
                if (empty($parentId)) {
                    // If parent_id is null, mark the task as orphaned
                    $taskBean = BeanFactory::retrieveBean('Tasks', $taskId, ['use_cache' => false]);
    
                    if ($taskBean) {
                        $taskBean->orphaned_c = 1;
                        $taskBean->update_date_modified = false; // Prevent date_modified from changing
                        $taskBean->save();
                    }
                } else {
                    // Retrieve the parent record without creating a new bean if it doesn't exist
                    $parentBean = BeanFactory::retrieveBean($parentType, $parentId, ['use_cache' => false]);
    
                    if (empty($parentBean) || $parentBean->deleted == 1) {
                        // Mark the Task as orphaned
                        $taskBean = BeanFactory::retrieveBean('Tasks', $taskId, ['use_cache' => false]);
    
                        if ($taskBean) {
                            $taskBean->orphaned_c = 1;
                            $taskBean->update_date_modified = false; // Prevent date_modified from changing
                            $taskBean->save();
                        }
                    }
                }
            }
    
            // Move to the next batch
            $offset += $batchSize;
        }
    
        return true;
    }

    After the Scheduler runs, all tasks identified as orphaned are easily recognisable



    3. Archive or Delete Records: Once the checkbox is in place, use the Data Archiver to archive or delete the orphaned records as needed.





    This approach enables you to report on orphaned tasks, giving you the opportunity to review them before deciding whether to delete them. 

    Let me know your thoughts—does this align with what you were looking for?

    Cheers,

    André

  • Hi  

    Thank you for your through example. The issue is that we have more than 1M Tasks. Looping through all of the tasks and checking each one for a parent is extremely slow. The prepared statement I provided skips all of this and finds just the Tasks with a missing parent. 

    I'd like to turn my dynamic_query into a query that I can receive just the Tasks without a parent. In our case there are about 92,000 without a related record. Some that go back 8 years. 

    Then I need to do the same for Notes, Calls, Meetings, etc. Any OOTB module that has parent_type and parent_id. 

  • Hello  , 

    If you’re concerned that the Scheduler approach might not perform well enough to handle the volume of data you need to clean up, here are a few optimizations to consider:

    1. Add a forced timeout for each Scheduler run

    Setting a maximum runtime of 10 minutes per Scheduler execution can prevent it from blocking other scheduled jobs. This ensures that the Scheduler runs multiple times without overloading the system, gradually processing the necessary data. You’ll just need to configure it to run enough times (Perhaps out of office hours) to mark all the tasks as orphaned.

    2. Include the orphaned_c field in the initial query conditions

    By excluding already-processed tasks from the query (i.e., where orphaned_c = 1), you ensure that each run processes only unprocessed records. This approach incrementally sets more records to orphaned_c = 1 with each execution until all are handled.

    3. Avoid bean instantiation; use updateParams() for updates instead

    The primary bottleneck in operations like this is often the overhead of instantiating and saving beans. Since you’re only updating a single field, you can also use the DBManager’s updateParams() method to perform a silent database update. This approach is significantly faster and meets the requirements for this task.

    For more information, refer to the documentation here:

    DBManager Best Practices

    With these changes, your Scheduler should be capable of efficiently processing the 1 million tasks.
    This will make them easier to locate and manage using the Data Archiver tool.


    I hope this helps, and happy DB cleansing! 

    Cheers, 


    André 

Reply
  • Hello  , 

    If you’re concerned that the Scheduler approach might not perform well enough to handle the volume of data you need to clean up, here are a few optimizations to consider:

    1. Add a forced timeout for each Scheduler run

    Setting a maximum runtime of 10 minutes per Scheduler execution can prevent it from blocking other scheduled jobs. This ensures that the Scheduler runs multiple times without overloading the system, gradually processing the necessary data. You’ll just need to configure it to run enough times (Perhaps out of office hours) to mark all the tasks as orphaned.

    2. Include the orphaned_c field in the initial query conditions

    By excluding already-processed tasks from the query (i.e., where orphaned_c = 1), you ensure that each run processes only unprocessed records. This approach incrementally sets more records to orphaned_c = 1 with each execution until all are handled.

    3. Avoid bean instantiation; use updateParams() for updates instead

    The primary bottleneck in operations like this is often the overhead of instantiating and saving beans. Since you’re only updating a single field, you can also use the DBManager’s updateParams() method to perform a silent database update. This approach is significantly faster and meets the requirements for this task.

    For more information, refer to the documentation here:

    DBManager Best Practices

    With these changes, your Scheduler should be capable of efficiently processing the 1 million tasks.
    This will make them easier to locate and manage using the Data Archiver tool.


    I hope this helps, and happy DB cleansing! 

    Cheers, 


    André 

Children
No Data