How to track issues with Sugar Cron ?

Frédéric Rinaldi over 1 year ago

Hi guys,

I was wondering how you manage the potential issues and warning with Sugar Cron scripts ?

because, with the out of the box lines provided everything is re-routed to /dev/null.

The problem to re-route to a log file are that some job produced a lot of info, warning etc. and not really errors.

Any suggestion is welcome.

Fred

Parents

0 Francesca Shiekh over 1 year ago
Disclaimer: I am old school and work in vi and command line on my servers all the time. No IDE. We are OnSite and have a lot more flexibility than cloud instances and what I am sharing here are likely far from the best tracking systems, but they have been working for me for about a decade.

- I changed the Cron to track errors in a cron.raw.err file in a subdirectory of logs; my sysadmin set up a logrotate on his end so that it does not grow too large. Caveat: You need to know to tail the log when the cron job is running or go back to it regularly to see what is happening there. So it's basically there when I needed it but provides no "warning" that something is up.

- for schedulers that are Sync jobs where we need much more insight we have separate logs written in another subdirectory of logs, for each run, named as the scheduler name and timestamped, we then have a separate system cron job (outside of Sugar) that zips older logs and deletes the gz files older than x-days so we don't have a ton of logs (we keep about 3 days worth).
These are not reporting "errors" though, these are more like debug statements included in the scripts to track where things are so we can debug particular records that have problems syncing. Again, created more than a decade ago, this is not something we look at often and we are working to get away from. The cron script itself will not re-run if there is a log open for the prior run (that means it's still running or it got stuck) and it helps us debug records that are not syncing for some reason (sadly that happens way more than I would like). I have a custom web page on my sugar server, linked from the mega menu for admins, that shows these logs so I can see if one has been "stuck".

- I also have a scheduler that queries the job_queue and send me an email with all the jobs that failed in the last hour which includes the sync jobs above if they time out. This is pretty crude and was built back in v6 as an Entry Point and essentially executes a query on job_queue and composes an email to Sugar Admin (UserID = 1) with the failed schedulers and the "resolution" field content which occasionally gives a hint of what's wrong.

Fullscreen
1
2
3
4
$query = "SELECT * FROM job_queue WHERE resolution = ? and execute_time > DATE_SUB(UTC_TIMESTAMP(), INTERVAL 1 HOUR)";
$conn = $db->getConnection();
$stmt = $conn->executeQuery($query, array('failure'));
$rows = $stmt->fetchAll();
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
$query = "SELECT * FROM job_queue WHERE resolution = ? and execute_time > DATE_SUB(UTC_TIMESTAMP(), INTERVAL 1 HOUR)"; $conn = $db->getConnection(); $stmt = $conn->executeQuery($query, array('failure')); $rows = $stmt->fetchAll();

Finally, I sometimes watch the logs. Yes, I tail the sugar log and the php log on my production system keep them open in a corner of my desktop so it catches my eye when something is logged (moves on the screen) and I can see what it is and decide if it needs action now or later, or if I can live with it/do nothing about it. My PHP logs are set to include Warnings and Notices so I see it all.
To be honest, I really don't do this as often as I should.

FrancescaS
0 Enrico Simonetti over 1 year ago in reply to Francesca Shiekh
Looking back at my partner days with 100s of Sugar systems hosted on my infrastructure, we had monitoring in place to know if any instance would take too long to run a cron.

It was something that I believe is or was called "passive checks" where you expect something to happen every interval of time, 30 minutes or an hour or whatever you want, and if it does not happen within that timeframe, an alarm goes off.

This is really old school though, as it was > 10 years ago. I am sure there are better ways and tools now :)

Another thing you can do is write a script that parses error logs (PHP web, PHP cli, sugarcrm.log) and looks for specific mentions like "fatal", "timeout" etc. Basically a number of keywords that concern you, and alerts you either right away or once every hour/30 minutes etc.

And now with ChatGPT like apis you could even pump some of those entries to the api (not too many unless you have the need and budget for it...), that could help you determine even the issue, cause, severity, time for fixing etc.

Just a couple of thoughts.
--

Enrico Simonetti

Sugar veteran (from 2007)

www.naonis.tech

Feel free to reach out for consulting regarding:

API Integration and Automation Services

Sugar Architecture

Sugar Performance Optimisation

Sugar Consulting, Best Practices and Technical Training

AWS and Sugar Technical Help

CTO-as-a-service

Solutions-as-a-service

and more!

All active SugarCRM certifications

Actively working remotely with customers based in APAC and in the United States

Reply

0 Enrico Simonetti over 1 year ago in reply to Francesca Shiekh
Looking back at my partner days with 100s of Sugar systems hosted on my infrastructure, we had monitoring in place to know if any instance would take too long to run a cron.

It was something that I believe is or was called "passive checks" where you expect something to happen every interval of time, 30 minutes or an hour or whatever you want, and if it does not happen within that timeframe, an alarm goes off.

This is really old school though, as it was > 10 years ago. I am sure there are better ways and tools now :)

Another thing you can do is write a script that parses error logs (PHP web, PHP cli, sugarcrm.log) and looks for specific mentions like "fatal", "timeout" etc. Basically a number of keywords that concern you, and alerts you either right away or once every hour/30 minutes etc.

And now with ChatGPT like apis you could even pump some of those entries to the api (not too many unless you have the need and budget for it...), that could help you determine even the issue, cause, severity, time for fixing etc.

Just a couple of thoughts.
--

Enrico Simonetti

Sugar veteran (from 2007)

www.naonis.tech

Feel free to reach out for consulting regarding:

API Integration and Automation Services

Sugar Architecture

Sugar Performance Optimisation

Sugar Consulting, Best Practices and Technical Training

AWS and Sugar Technical Help

CTO-as-a-service

Solutions-as-a-service

and more!

All active SugarCRM certifications

Actively working remotely with customers based in APAC and in the United States

Children

No Data