Elastic search reindexing best practice

We have found that we occasionally end up with a situation where we search for new items and they don't come back in the search results. NOTE: Usually new things populate just fine, this only happens intermittently. We are able to fix the issue by running a system reindex from admin. We'd rather not have to do this manually and I see that there is a CLI command to reindex. Is it good practice to just run a system reindex using the CLI occasionally? How often, nightly? According to the documentation[1], this sounds like a not unusual occurrence, does that seem accurate to others?

[1] Performing a system index may resolve search issues for records that have been recently created or modules that were recently enabled for search

  • In which module does that happen? What is the type of ES fields configured in those modules?

    André Lopes
    Lampada Global
    Skype: andre.lampada
  • It happens both with accounts and contacts. It's happening with some accounts right now even though we just reindexed last night. Not sure how relevant this is but Amazon ESS always has a non-zero amount of "Deleted Documents" when the problem is occuring and a re-index drops that number to 0

  • I'm working on the same project as Brad...

    What is the type of ES fields configured in those modules?

    All of the fields for `Accounts` and `Contacts` that ship as globally searchable, plus a few custom fields.

  • We faced a situation in a customer's project. We implemented Visibility Strategy specific bool fields in ES.

    On reindexing it works great, but on editing any record, even if that target bool fields remain intact, the record is not fetched from ES anymore.

    By debugging ES database we realized that, after reindexing the bool fields are set as true or false, but after editing some record those fields are updated to 0 or 1, so ES wlll not fetch them anymore once the value types are not the same.

    We fixed that weird issue by adding both possible values in the Visibility Strategy (true or 1 / false or 0).

    I'm not sure if your issue is related.

    André Lopes
    Lampada Global
    Skype: andre.lampada
  • André, please excuse my ignorance on this subject but I just don't understand what you mean by this: "We implemented Visibility Strategy specific bool fields in ES"
    Is there any sugarcrm documentation describing what this means?

  • Hi Brad,

    For one of our customer we implemented a weekly cron job that reindex the whole database, running on sunday evening in order to make it available on monday morning.

    Here is the code : 

    #!/bin/bash
    json=$(curl -X POST -H Cache-Control:no-cache -H "Content-Type: application/json" -d '{
        "grant_type":"password",
        "client_id":"sugar",
        "client_secret":"",
        "username":"",
        "password":"",
        "platform":"mobile"
    }' http://crm-address/rest/v10/oauth2/token)
    access_token=$(echo $json | sed "s/{.*\"access_token\":\"\([^\"]*\).*}/\1/g") \
    
    curl -X POST -H "oauth-token: $access_token" -H Cache-Control:no-cache -H "Content-Type: application/json" -d '{
        "clear_data":true
    }' http://crm-address/rest/v10/Administration/search/reindex

    Probably there is a better way to do that, but this is working as expected :-)

    Best regards,

    Enes

  • Yes, here is the official documentation.

    Purpose of such a strategy is to, on the fly, restrict access to data depending on criterias (Teams, Roles, User itself). This dynamic filtering affects both SugarQuery and ES, under any platform (base, portal, mobile, custom atc).

    Regards

    André Lopes
    Lampada Global
    Skype: andre.lampada
  • Thank you Enes, we did end up setting up a cron job to re-index, using the CLI:

    ./bin/sugarcrm search:reindex -n --clearData

    We are running it nightly but we are still ending up with issues during the day. Today for example, and these are not even new accounts. It's almost like the re-index did not include them for some reason. Do you have any idea how we can go about debugging this?

  • You're welcome Brad.

    How are created your accounts ? I suppose via REST Api or manually via the CRM ? Can you share me the following config values please ? 

    'search_engine' =>
    array (
    'max_bulk_query_threshold' => 15000,
    'max_bulk_delete_threshold' => 999,
    ),

    We faced an issue three years ago where the indexation did not end because the threshold was too important, the query didn't end and the cron job failed into error, the fts_queue table was very very very (very very) big :-)

    You can also set the following NUS modification : 

    src/Elasticsearch/Queue/QueueManager.php

    function generateQueryModuleFromQueue

    protected function generateQueryModuleFromQueue(\SugarBean $bean, int $bucketId = self::DEFAULT_BUCKET_ID)
    {
        // Get all bean fields
        $beanFields = array_keys(
            $this->container->indexer->getBeanIndexFields($bean->module_name, true)
        );
        $beanFields[] = 'id';
        $beanFields[] = 'deleted';
     
        $sq = new \SugarQuery();
        // disable team security
        // adde erased fields
        $sq->from($bean, ['add_deleted' => false, 'team_security' => false, 'erased_fields' => true]);
        $sq->select($beanFields);
        $sq->limit($this->maxBulkQueryThreshold);
     
        // join fts_queue table
        if ($this->isDefaultBucketId($bucketId)) {
            $sq->joinTable(self::FTS_QUEUE)->on()
                ->equalsField(self::FTS_QUEUE . '.bean_id', 'id')
                ->equals(self::FTS_QUEUE . '.bean_module', $bean->module_name);
        } else {
            $sq->joinTable(self::FTS_QUEUE)->on()
                ->equalsField(self::FTS_QUEUE . '.bean_id', 'id')
                ->equals(self::FTS_QUEUE . '.processed', $bucketId)
                ->equals(self::FTS_QUEUE . '.bean_module', $bean->module_name);
        }
     
        $additionalFields = array(
            array(self::FTS_QUEUE . '.id', 'fts_id'),
            array(self::FTS_QUEUE . '.processed', 'fts_processed'),
        );
     
        $sq->select($additionalFields);
        return $sq;
    }

    the part we modified are after the comment "// join fts_queue table" where we add the condition : 

    ->equals(self::FTS_QUEUE . '.bean_module', $bean->module_name);

    This code is ok for a V10.

    You can begin by checking the fts_queue table daily, it must be empty most of the time.

    Hope that it would help you :)

    Best regards,

    Enes

  • We are creating accounts via the Rest API. We have not changed those config values so I guess we are using the defaults of `max_bulk_query_threshold=15000` and `max_bulk_delete_threshold=3000`. This is a very new project (not even released yet) with a small amount of data. We've added some custom fields but otherwise everything is just out of the box configuration

    We ran a reindex every hour overnight last night and the number of searchable documents was different after every run. Any idea why that would be? Data was not changing during most of this time. The `fts_queue` table is empty