Monitoring Elastic Search Indexing With job_queue Queries

The Sugar Global Search feature is powered by a service called ElasticSearch which is updated asynchronously through the scheduler called “Elasticsearch Queue Scheduler”. Since it is an asynchronous process, its progress or problems are often not made apparent to the user or even admins. Here is a way to monitor the job_queue table to determine if this process is executing as it should.

From either Sugar’s Advanced Reports feature or through a SQL client accessing the underlying database to for your Sugar instance, periodically run this query (every 5 or 10 minutes):

SELECT id, name, date_entered, date_modified, execute_time, status, resolution, data, message
FROM job_queue
WHERE name IN ('Elasticsearch Queue Consumer','Elasticsearch Queue Scheduler')
ORDER BY job_queue.date_modified DESC;

The “Elasticsearch Queue Scheduler” job checks the fts_queue table to see if there are records that need to be re-indexed. This need can be caused by a record being added or updated or by a reindexing request (e.g. Admin > Search > Schedule System Index). This scheduler job will set up needed “Elasticsearch Queue Consumer” jobs in the job_queue table which will process fts_queue records for a given module. On the next scheduler run, these jobs will execute. Each of these “Elasticsearch Queue Consumer” jobs will process fts_queue lines for the given module limited, in number, by either the number of records for that module in the fts_queue or the value set for this config setting:

$sugar_config['search_engine']['max_bulk_query_threshold']

If that setting is not set, the default is 15000 records per run. This method basically breaks the reindexing into multiple smaller jobs as to not occupy the scheduler for too long. The idea is to not block other scheduler jobs from executing.

In short, this SQL query shows the relevant fields from the job_queue table that have to do with the reindexing process in chronological order so that you can easily monitor recent activity first

Here are the fields and their significance:

  • id: ID field of the job_queue record.
  • name: this will be either “Elasticsearch Queue Consumer” or “Elasticsearch Queue Scheduler” – the two jobs that make up this process.
  • date_entered: When the record was entered into the system.
  • date_modified: When the record was modified.
  • execute_time: When the job was actually executed.
  • status: “done”, “queued” or “running”
  • resolution: “success”, “failure”, “queued”, “partial”
  • data: tells what module was processed in “Elasticsearch Queue Consumer” jobs
  • message: provides more information about what happened.

This is what to look for:

Is the “Elasticsearch Queue Scheduler” job running. If it does not run, most likely your scheduler is not running or something else is causing the scheduler to stall. By default this should run once a minute.

With “Elasticsearch Queue Consumer” records, if there are a lot of status “queue” records and the date_entered, date_modified or execute_time arare more than an hour old, then something is probably clogging your scheduler.

With “Elasticsearch Queue Consumer” records with a status of “done” check the message field. If it has a message like “Processed X records in Y second(s)” and if the number for “X” is not what $sugar_config['search_engine']['max_bulk_query_threshold'] is set to and there are still records in the fts_queue table for that module, then something is failing. If it is off by a few, usually the problem will correct itself. If there is not a message like this, rather an error, something is failing in this process. The resolution is usually set to “failure” for these records. If this persists, contact SugarCRM Technical Support.

With “Elasticsearch Queue Consumer” records with a status “failure”, see if the failure is consistent for only some of the modules. Frequently, setting the config_override.php value

$sugar_config['search_engine']['max_bulk_query_threshold'] = 5000;

to a lower number will allow these processes to execute successfully. Though it may be slower, at least it runs. This problem happens if the data in these modules in this instance are particularly large (e.g. email bodies or Description fields).

If there are persistent resolution “failure” records, contact SugarCRM Technical Support for more help.