Elasticsearch Indexing Jobs in Sugar 7

In this post by Jelle Vink, SugarCRM's Security Architect and resident Elasticsearch expert, offers an explanation of how the Sugar Job Scheduler and Job Queue affects Sugar 7's record indexing behavior.

Cron.php Execution

When cron.php is executed, there is a limit of how many jobs the driver executes and how long it will run. When either maximum is reached, the current cycle will terminate. The default maximums are 25 jobs and 1,800 seconds. Both can be changed in config_override.php:

$sugar_config['cron']['max_cron_jobs'] = 25;$sugar_config['cron']['max_cron_runtime'] = 1800;

There is also a minimum interval in minutes (which defaults to 1). If cron is executed multiple times in a row, it will only actually do something when the minimum interval is met. This can be changed to allow another cycle to be run again immediately after the previous finishes by using the following setting.

 $sugar_config['cron']['min_cron_interval'] = 0;



Elasticsearch Job Creation

There are a certain number of schedulers configured out of the box in Sugar 7. When cron is executed, the driver starts by executing schedulers that are due. These schedulers are not jobs themselves.  They simply create new jobs to be executed.  These jobs are then stored in job_queue table.

Once schedulers have created the necessary jobs, the driver starts executing the different jobs based on the order of creation, status, job delay and execution time.  For Elasticsearch there is one scheduler which is configured to run as often as possible - which means every time cron is executed. This scheduler will create a consumer job for every module for which there are queued Elasticsearch records in fts_queue table.

When a full reindex has been triggered by a Sugar Administrator, a consumer job for every FTS enabled module will be created and queued.

Always remember that your Elasticsearch jobs are not alone in the job queue.  There are other schedulers that create jobs like Email reminders, Database pruning, Check inbound email boxes, etc.  Jobs can also be created outside of schedulers via logic hooks or other custom code.

Job execution

As explained above, the cron driver will only run 25 jobs in the queue during each cycle. There is no guarantee that these are going to be Elasticsearch jobs.  Other jobs may also be waiting in the queue.  So there isn't any reason to give Elasticsearch jobs priority as we treat all jobs equally to guarantee that every job is executed eventually.

For Elasticsearch specific jobs there is also a maximum number of records that one Elasticsearch job will consume out of the queue for a given module. As explained above one Elasticsearch (consumer) job will only process one single module. The maximum of records an Elasticsearch consumer job will process for one module is by default 15,000. This can be configured using the following setting.

$sugar_config['search_engine']['max_bulk_query_threshold'] = 15000;

Effects on Elasticsearch indexing

In the demo data there is no single module which has a higher count of 15,000 records. The only limiting issue here is the amount jobs which are created which is in certain cases higher than the default 25. To get everything indexed for a full reindex, on average at least 2 cron runs are needed.

When testing Elasticsearch (full) reindexing after running cron, you should ensure that there are no records left in the fts_queue table. This is the only confirmation that all records are present in Elasticsearch.  A single cycle may not be enough to ensure all records have been indexed!

While it may cause an issue for Sugar Developers doing local development without cron setup, this is not an issue on a properly configured production system. For example, once a cron cycle stops after 25 jobs, the next cycle will happen soon - we typically recommend triggering cron every minute. That next run will pick up the next 25 jobs, etc, until indexing is complete.

Additional ElasticSearch fine tuning

The following config_override options are available for an admin to fine tune the performance of the indexing. This might change in the future as we are considering refactoring our queue out of the Sugar database. Below values are the defaults:

$sugar_config['search_engine']['max_bulk_query_threshold'] = 15000;$sugar_config['search_engine']['max_bulk_delete_threshold'] = 3000;$sugar_config['search_engine']['force_async_index'] = false;$sugar_config['search_engine']['max_bulk_threshold'] = 100;

Development / QA recommendations

We recommend adding the following to our deploy/automation to circumvent any issues regarding Elasticsearch (re)indexing and general cron usage.

All changes have to be done in config_override.php:

$sugar_config['cron']['max_cron_jobs'] = 500;$sugar_config['cron']['min_cron_interval'] = 0;

This will ensure that when a QA person or Sugar Developer executes cron.php multiple times in a short time frame, that cron will run immediately and will tend to clear the queue fully when there are a lot of jobs to be run.

Parents
  • Comment originally made by TamW.

    Thanks for this useful article.  We really struggle with no easy way for Administrators to see what's in the fts_queue table after a re-index, i.e. no visual way to know that the re-index has finished successfully.  Calling on our db team to check progress frequently is not ideal. We also found that when the job for each module hits a timeout it doesn't restart or seem to even make use of the retry columns in the job queue table. We had to make db changes to restart the jobs.  Admittedly some of the tables we index are huge. Combined with the fact that the full re-index job starts by clearing all the ETS index information, we've been badly caught out when when the re-indexing stops part way through ... and the global search stops working for our users!    We've created our own scheduled task which does not clear the already indexed data.  Would also be keen to know how to create a scheduled task that allows passing through extra parameters so that we can make one where the administrator can specify a specific module to re-index.   We run more than one ETS server and while the config.php file takes multiple servers the Admin UI breaks that config. So we look forward to any improvements in the ETS indexing :-)   BTW ... our users love the global search!  (although now they want us to customise the displayed results to better match their Account groupings, e.g. by region)
  • Comment originally made by Matthew Marum.

    Hi Tam,

    I know this message was from a while ago but hopefully you will appreciate some of the FTS monitoring commands that were added as part of Sugar CLI in 7.7.x releases.

Comment Children
No Data