Upload directory contents

Let me preface this post: I don't know anything about this at all. Because I don't know anything, I am hoping my ignorance may somehow luck into outside of the box thinking. 

My customer's file storage is huge. 

We've determined that the Upload directory has 139 GBs in it. 

It's not really my job to fix it but I am frustrated and want to help. 

Obviously, through my research I've figured out what a GUID is. I've read a million posts on club and elsewhere to see if there is a way to figure out from the GUID what the file actually is. I understand we can set a date range and just have support blanket delete up to a certain date.

Is there any way to use the GUIDs in the directory to determine what the file actually is. For example: 

Let's say you have this: 
00019efe-6622-11e6-aa95-02a6691319f3
Can you use that to backtrack to what this actually was?
When Sugar generates the GUID there must be some criteria it uses to create that right? Can it be reversed so you can tell anything about it?
Second question: This might be even crazier than the first. Not sure. 

Could you somehow move the contents of the upload directory out of Sugar to an external place and then point to that external place without screwing up everything? 
Parents
  • Part 1:

    The GUID is tied to the row to which the data belongs. This could be a Note, Document, Document Revision, or the field picture (which is in certain modules, like Users, Accounts, Contacts, Leads, and Prospects). Additionally, it could be tied to any custom module you may have that has the ability to upload a file. 

    One of the largest places which have a large amount of "files" is the Emails module. If you are archiving emails to Sugar any attachment, image, etc., that is attached to or in the body of the email gets stored into the uploads folder.

    For any of the above modules, if you Delete the record from Sugar, the corresponding upload files will also become deleted. For example: if you delete an Email record, all of the associated files will also be deleted. It might be a good practice to use the Archive Records functionality to remove records that are older than a certain date.

    Additionally, we have a scheduler that will clean up emails older than a certain time period. For example, we only keep 3 years' worth of emails in our Sugar. We arbitrarily picked 3 years, this could have been 1, or 10. It was a choice. We determine that people are not going to look in Sugar at an email older than a certain date.

    Part 2:

    Without knowing which of your modules is using so much storage, it is difficult to provide you with an exact solution.

    There are Module Loadable Packages that will utilize AWS S3 to store files; there is an example written by

    There are Module Loadable Packages for Box or Google Drive, MS365, etc. 

    OOTB, Sugar has the ability to use Google Drive and MS365 in the Documents module. 

    Good Luck,

    Jeff

  • Hi  ,

    Sorry it has been a long time since you wrote the reply to   above....

    Just reading through the thread now as I am currently trying to reduce file storage in our cloud instance and wanted to re-ask one of Frank's questions if that is OK.

    Frank asked this below:

     Is there any way to use the GUIDs in the directory to determine what the file actually is. For example: 

     Let's say you have this: 
     00019efe-6622-11e6-aa95-02a6691319f3
     
     Can you use that to backtrack to what this actually was?
     When Sugar generates the GUID there must be some criteria it uses to create that right? Can it be reversed so you can tell anything about it?

     

    ...but this bit specifically was not really answered, even though the anser you gave was excellent, I just wanted to aks about this specific part.

    The reason I ask is I have a backup with our Upload folder attached but there is no way really to find in there for example only files from one specific module.

    If there was some clue in the naming of the GUID that woud help with the Find then it would really help.  Like if the second set of 4 characters was always 11e6 when the file was from a specific module < for example.  This would help identify the right /desired files to delete first.

    Or is there some way in Postamn to seacrh everywhere for the GUID, and return the module?

    Hope this makes sense...   I did a massive delete of older files and it hardly made a dent in the FS storage which is at 80%.

    Also not sure exactly when sugar changed the upload structure to be split into sub folders - but are these sub folders any clue as to what module the file relates?  (we have custom modules too using file fields)

    Your advice for older emails is also my next plan of attack - so thank you for that too, from your last reply.

    Best regards,
    Luke.

  • Hi  ,

    In addition to the answers from others, there is also the EmbeddedFiles module that can contribute to the upload directory.

    If you store a lot of emails in Sugar, there is a strong likelihood you have a lot of files duplicated in your upload directory. Our Upsert Deduplicate plug-in has an optional add-on to deduplicate file attachments in your instance. Let me know if you'd like an in-depth demo of the plug-in!

    Chris 

Reply Children
  • Thank you  ,  ,  and  ...    Really appreciate the replies above - what a great resource this SugarClub is and for an admin like me to be able to tap into the knowledge and experience of awesome folk like you guys is fantastic, thank you.

    OK, all understood.  

    What I did to start with was using a Python script to give me a CSV of any file in the Upload folder (from the downloaded backup, locally) for anyfile that has a matching checksum, ordered by size descending.

    If the checksum matches another file then it is probably a file that is duplicated, uploaded more than once.

    But to check them all is where I get stuck in a very manual process.   I am working on the larger files first.

    I previously (before the backup) used reports to finds files with no parent for example, and then I generate a list of ID's and use postman to delete those ID's / records.   (one issue here is our custom modules with a file field do not have file_size field, so the report is less useful, though date created is still worthy)


      the plugin looks fantastic but the cost per user (on my current user count so this is very approximate) works out at almost the same as a Sugar licence per month.   There are a handful of things like this which I wish SugarCRM would just aquire and build into the product - It does look great.  But I can't imagine getting that cost signed off.   On my salary too it would equate to an undisclosed ;-/ number of weeks working on storage, which is unlikely. Well I hope so anyway! Joy


    We could just buy more storage from Sugar CRM, but even if we get to that point I am continually educating my users about file storage (don't store massive images, do you really need it in CRM, etc) and second trying to keep my admin eyes on it and come up with nice ways to flush out the old stuff as and when.

    I actually made a google apps script to log daily the storage from insights into a sheet, then I pull that puplished report into my admin dash - so I can see the storage growth over time, and predict when disaster will likely occur Nerd   When we hit 80% I spring into action...  

    For now I will carry on as I am with the additional knowledge learned form you all in here.  I will tweak my Python script to maybe add in the date of creation to begin with the older files.

    Many thanks again to you all,
    Luke.