Elastic search reindexing best practice

We have found that we occasionally end up with a situation where we search for new items and they don't come back in the search results. NOTE: Usually new things populate just fine, this only happens intermittently. We are able to fix the issue by running a system reindex from admin. We'd rather not have to do this manually and I see that there is a CLI command to reindex. Is it good practice to just run a system reindex using the CLI occasionally? How often, nightly? According to the documentation[1], this sounds like a not unusual occurrence, does that seem accurate to others?

[1] Performing a system index may resolve search issues for records that have been recently created or modules that were recently enabled for search

Parents
  • Hi Brad,

    For one of our customer we implemented a weekly cron job that reindex the whole database, running on sunday evening in order to make it available on monday morning.

    Here is the code : 

    #!/bin/bash
    json=$(curl -X POST -H Cache-Control:no-cache -H "Content-Type: application/json" -d '{
        "grant_type":"password",
        "client_id":"sugar",
        "client_secret":"",
        "username":"",
        "password":"",
        "platform":"mobile"
    }' http://crm-address/rest/v10/oauth2/token)
    access_token=$(echo $json | sed "s/{.*\"access_token\":\"\([^\"]*\).*}/\1/g") \
    
    curl -X POST -H "oauth-token: $access_token" -H Cache-Control:no-cache -H "Content-Type: application/json" -d '{
        "clear_data":true
    }' http://crm-address/rest/v10/Administration/search/reindex

    Probably there is a better way to do that, but this is working as expected :-)

    Best regards,

    Enes

  • Thank you Enes, we did end up setting up a cron job to re-index, using the CLI:

    ./bin/sugarcrm search:reindex -n --clearData

    We are running it nightly but we are still ending up with issues during the day. Today for example, and these are not even new accounts. It's almost like the re-index did not include them for some reason. Do you have any idea how we can go about debugging this?

  • You're welcome Brad.

    How are created your accounts ? I suppose via REST Api or manually via the CRM ? Can you share me the following config values please ? 

    'search_engine' =>
    array (
    'max_bulk_query_threshold' => 15000,
    'max_bulk_delete_threshold' => 999,
    ),

    We faced an issue three years ago where the indexation did not end because the threshold was too important, the query didn't end and the cron job failed into error, the fts_queue table was very very very (very very) big :-)

    You can also set the following NUS modification : 

    src/Elasticsearch/Queue/QueueManager.php

    function generateQueryModuleFromQueue

    protected function generateQueryModuleFromQueue(\SugarBean $bean, int $bucketId = self::DEFAULT_BUCKET_ID)
    {
        // Get all bean fields
        $beanFields = array_keys(
            $this->container->indexer->getBeanIndexFields($bean->module_name, true)
        );
        $beanFields[] = 'id';
        $beanFields[] = 'deleted';
     
        $sq = new \SugarQuery();
        // disable team security
        // adde erased fields
        $sq->from($bean, ['add_deleted' => false, 'team_security' => false, 'erased_fields' => true]);
        $sq->select($beanFields);
        $sq->limit($this->maxBulkQueryThreshold);
     
        // join fts_queue table
        if ($this->isDefaultBucketId($bucketId)) {
            $sq->joinTable(self::FTS_QUEUE)->on()
                ->equalsField(self::FTS_QUEUE . '.bean_id', 'id')
                ->equals(self::FTS_QUEUE . '.bean_module', $bean->module_name);
        } else {
            $sq->joinTable(self::FTS_QUEUE)->on()
                ->equalsField(self::FTS_QUEUE . '.bean_id', 'id')
                ->equals(self::FTS_QUEUE . '.processed', $bucketId)
                ->equals(self::FTS_QUEUE . '.bean_module', $bean->module_name);
        }
     
        $additionalFields = array(
            array(self::FTS_QUEUE . '.id', 'fts_id'),
            array(self::FTS_QUEUE . '.processed', 'fts_processed'),
        );
     
        $sq->select($additionalFields);
        return $sq;
    }

    the part we modified are after the comment "// join fts_queue table" where we add the condition : 

    ->equals(self::FTS_QUEUE . '.bean_module', $bean->module_name);

    This code is ok for a V10.

    You can begin by checking the fts_queue table daily, it must be empty most of the time.

    Hope that it would help you :)

    Best regards,

    Enes

Reply
  • You're welcome Brad.

    How are created your accounts ? I suppose via REST Api or manually via the CRM ? Can you share me the following config values please ? 

    'search_engine' =>
    array (
    'max_bulk_query_threshold' => 15000,
    'max_bulk_delete_threshold' => 999,
    ),

    We faced an issue three years ago where the indexation did not end because the threshold was too important, the query didn't end and the cron job failed into error, the fts_queue table was very very very (very very) big :-)

    You can also set the following NUS modification : 

    src/Elasticsearch/Queue/QueueManager.php

    function generateQueryModuleFromQueue

    protected function generateQueryModuleFromQueue(\SugarBean $bean, int $bucketId = self::DEFAULT_BUCKET_ID)
    {
        // Get all bean fields
        $beanFields = array_keys(
            $this->container->indexer->getBeanIndexFields($bean->module_name, true)
        );
        $beanFields[] = 'id';
        $beanFields[] = 'deleted';
     
        $sq = new \SugarQuery();
        // disable team security
        // adde erased fields
        $sq->from($bean, ['add_deleted' => false, 'team_security' => false, 'erased_fields' => true]);
        $sq->select($beanFields);
        $sq->limit($this->maxBulkQueryThreshold);
     
        // join fts_queue table
        if ($this->isDefaultBucketId($bucketId)) {
            $sq->joinTable(self::FTS_QUEUE)->on()
                ->equalsField(self::FTS_QUEUE . '.bean_id', 'id')
                ->equals(self::FTS_QUEUE . '.bean_module', $bean->module_name);
        } else {
            $sq->joinTable(self::FTS_QUEUE)->on()
                ->equalsField(self::FTS_QUEUE . '.bean_id', 'id')
                ->equals(self::FTS_QUEUE . '.processed', $bucketId)
                ->equals(self::FTS_QUEUE . '.bean_module', $bean->module_name);
        }
     
        $additionalFields = array(
            array(self::FTS_QUEUE . '.id', 'fts_id'),
            array(self::FTS_QUEUE . '.processed', 'fts_processed'),
        );
     
        $sq->select($additionalFields);
        return $sq;
    }

    the part we modified are after the comment "// join fts_queue table" where we add the condition : 

    ->equals(self::FTS_QUEUE . '.bean_module', $bean->module_name);

    This code is ok for a V10.

    You can begin by checking the fts_queue table daily, it must be empty most of the time.

    Hope that it would help you :)

    Best regards,

    Enes

Children
  • We are creating accounts via the Rest API. We have not changed those config values so I guess we are using the defaults of `max_bulk_query_threshold=15000` and `max_bulk_delete_threshold=3000`. This is a very new project (not even released yet) with a small amount of data. We've added some custom fields but otherwise everything is just out of the box configuration

    We ran a reindex every hour overnight last night and the number of searchable documents was different after every run. Any idea why that would be? Data was not changing during most of this time. The `fts_queue` table is empty