HOW TO - Cleanup your UPLOAD directory

I've seen this question asked a lot and never seen any answers.  I found a way to reduce the size of my on-site "uploads" folder by almost 30GB.  Thats a lot of recovered file system storage especially if you perform regular backups of your file system.

 

Uploads are mentioned in the developer guide and it was enough to point me in the right direction however there wasn't enough information to complete the task.  

https://support.sugarcrm.com/Documentation/Sugar_Developer/Sugar_Developer_Guide_9.0/Architecture/Uploads/ 

 

Specifically what I was targeting here was "Email Attachments".  Any time an email is sent from Sugar with an attachment, Sugar stores a copy of that file in the uploads directory but renames it to a GUID matching the ID of a note record.  What?!?!  Well if you think about it, it makes sense.  Multiple copies of "MyDoc.pdf" with different content would be impossible for Sugar to identify which is the right "MyDoc.pdf".  They used this mechanism to create a unique ID for all documents in the uploads directory.  They just never gave us an easy way to delete or cull this info.  It just grows and grows.....

 

This process was performed on CentOS 6 on an on-site installation but the process will basically be the same for any other on-site platforms.  There is a query in the developer doc to locate email attachments but it is kind of flawed and just needs a little tweaking.  Here is the query Sugar suggests:

 

SELECT filename, file_mime_type FROM notes
INNER JOIN emails ON notes.parent_type = 'Emails' AND notes.parent_id = emails.id
INNER JOIN emails_text ON emails.id = emails_text.email_id;

That will produce a TON of results.  Some even duplicated.  Slightly change this query's first line
and add another line at the bottom.

SELECT notes.id FROM notes
INNER JOIN emails ON notes.parent_type = 'Emails' AND notes.parent_id = emails.id
INNER JOIN emails_text ON emails.id = emails_text.email_id
WHERE notes.date_entered < '2019-09-01';


What I did here was added the note ID (to find the matching document GUIDs) and set a time frame. In
my case, I wanted to delete all email attachments over 3 months old.  Export this list to CSV which
will give you one column with GUIDS.  Then its a simple formula in Excel.  In Column B:
="/path/to/upload/directory/" & A2  {basically the cell to the left of it}.  Now drag that column down
the whole list, highlight Column B, copy and paste values.  Now delete Column A and you now have a
listing of full path file names to be deleted.  Copy this list to a text file on your Sugar server.

Now for CentOS, RHEL... Lets say I named the file "clearuploads.txt"
sudo xargs rm < clearuploads.txt


Thats it! This will remove the email attachments but leave the note in place. The note record will
still exist in Sugar and it will still have the attachment filename for reference however clicking the
hyperlink just presents a "javascript void(0)" dialog and nothing happens when you click it.

Parents
  • I'm not sure why you would want to keep the note and not the attachment, it would seem to me that one would want to keep both or delete both. But perhaps I just don't know your scenario.

    Angel Magana has a nice modification to split the upload directory by date, which makes it a lot cleaner. I have yet to look into this myself but knowing his reputation it should be a really good one! 

    https://github.com/elchele/UploadSplit 

    FrancescaS

  •  this is common request. Several customers face similar issue and ask us to take safe actions which doesn't loose relevant data.

    The most effective action we realized is to identify and remove all emails as well as their attachments which are not associated to the relationship itself, literally (TO and FROM not related to Contact, Account, Lead, Prospect, User).

    The explanation: several customers don't realize that the corporate emails eventually are in charge for personal purpose and/or automatic emails (bounce, third party companies like Google, Apple etc) are stored, eventually with big attachments.

    For one specific customer we were able to reduce around 60% of upload folder and 20% of database size by removing those emails and attachments.

    It becomes a so life savior in such a way that we built a sql and php scripts to identify them all and generate sql scripts to remove the entries and txt files to be loaded with "xargs rm -rf < somefile.txt" in order to remove the files. So you just need to take the backup, identify those data to be removed and file a ticket to Sugar to remove them all accordingly.

    André Lopes
    Lampada Global
    Skype: andre.lampada
Reply
  •  this is common request. Several customers face similar issue and ask us to take safe actions which doesn't loose relevant data.

    The most effective action we realized is to identify and remove all emails as well as their attachments which are not associated to the relationship itself, literally (TO and FROM not related to Contact, Account, Lead, Prospect, User).

    The explanation: several customers don't realize that the corporate emails eventually are in charge for personal purpose and/or automatic emails (bounce, third party companies like Google, Apple etc) are stored, eventually with big attachments.

    For one specific customer we were able to reduce around 60% of upload folder and 20% of database size by removing those emails and attachments.

    It becomes a so life savior in such a way that we built a sql and php scripts to identify them all and generate sql scripts to remove the entries and txt files to be loaded with "xargs rm -rf < somefile.txt" in order to remove the files. So you just need to take the backup, identify those data to be removed and file a ticket to Sugar to remove them all accordingly.

    André Lopes
    Lampada Global
    Skype: andre.lampada
Children
No Data