Upload directory contents

Let me preface this post: I don't know anything about this at all. Because I don't know anything, I am hoping my ignorance may somehow luck into outside of the box thinking. 

My customer's file storage is huge. 

We've determined that the Upload directory has 139 GBs in it. 

It's not really my job to fix it but I am frustrated and want to help. 

Obviously, through my research I've figured out what a GUID is. I've read a million posts on club and elsewhere to see if there is a way to figure out from the GUID what the file actually is. I understand we can set a date range and just have support blanket delete up to a certain date.

Is there any way to use the GUIDs in the directory to determine what the file actually is. For example: 

Let's say you have this: 
00019efe-6622-11e6-aa95-02a6691319f3
Can you use that to backtrack to what this actually was?
When Sugar generates the GUID there must be some criteria it uses to create that right? Can it be reversed so you can tell anything about it?
Second question: This might be even crazier than the first. Not sure. 

Could you somehow move the contents of the upload directory out of Sugar to an external place and then point to that external place without screwing up everything? 
  • Part 1:

    The GUID is tied to the row to which the data belongs. This could be a Note, Document, Document Revision, or the field picture (which is in certain modules, like Users, Accounts, Contacts, Leads, and Prospects). Additionally, it could be tied to any custom module you may have that has the ability to upload a file. 

    One of the largest places which have a large amount of "files" is the Emails module. If you are archiving emails to Sugar any attachment, image, etc., that is attached to or in the body of the email gets stored into the uploads folder.

    For any of the above modules, if you Delete the record from Sugar, the corresponding upload files will also become deleted. For example: if you delete an Email record, all of the associated files will also be deleted. It might be a good practice to use the Archive Records functionality to remove records that are older than a certain date.

    Additionally, we have a scheduler that will clean up emails older than a certain time period. For example, we only keep 3 years' worth of emails in our Sugar. We arbitrarily picked 3 years, this could have been 1, or 10. It was a choice. We determine that people are not going to look in Sugar at an email older than a certain date.

    Part 2:

    Without knowing which of your modules is using so much storage, it is difficult to provide you with an exact solution.

    There are Module Loadable Packages that will utilize AWS S3 to store files; there is an example written by

    There are Module Loadable Packages for Box or Google Drive, MS365, etc. 

    OOTB, Sugar has the ability to use Google Drive and MS365 in the Documents module. 

    Good Luck,

    Jeff

  • We are an on-site customer and my sysadmins simply mounted a directory in the sugar root for the upload directory so all our "upload" files are not physically on the same server as sugar but in their own file system.

    I also started looking into the possibility of dividing the Upload directory by modules and/or dates.

    We have multiple "documents"-type modules from Case Attachments to Contracts and everything ends up in this big bucket called "upload". together with things like email attachments (as Jeff explained).
    The directory is so big that check inbound email scheduler, for example, keeps recording errors because the stat cannot be executed.

    Sadly I never got far enough with splitting directories to make it viable. I worked with Angel Mgaña for a while back in 2015 (I am some of the "Unknown" comments on the blog) trying to test various things but had to give up to tend to other issues and never really got back to it. 

    https://cheleguanaco.blogspot.com/2015/05/sugarcrm-customization-custom-upload.html

    If you have better luck, let me know! :)

    FrancescaS

  • And do not forget, there is a sugar_config setting in config.php

    'upload_dir' => 'upload/',

    Harald Kuske
    Principal Solution Architect – Professional Services, EMEA
    hkuske@sugarcrm.com
    SugarCRM Deutschland GmbH

  • Hi  ,

    Sorry it has been a long time since you wrote the reply to   above....

    Just reading through the thread now as I am currently trying to reduce file storage in our cloud instance and wanted to re-ask one of Frank's questions if that is OK.

    Frank asked this below:

     Is there any way to use the GUIDs in the directory to determine what the file actually is. For example: 

     Let's say you have this: 
     00019efe-6622-11e6-aa95-02a6691319f3
     
     Can you use that to backtrack to what this actually was?
     When Sugar generates the GUID there must be some criteria it uses to create that right? Can it be reversed so you can tell anything about it?

     

    ...but this bit specifically was not really answered, even though the anser you gave was excellent, I just wanted to aks about this specific part.

    The reason I ask is I have a backup with our Upload folder attached but there is no way really to find in there for example only files from one specific module.

    If there was some clue in the naming of the GUID that woud help with the Find then it would really help.  Like if the second set of 4 characters was always 11e6 when the file was from a specific module < for example.  This would help identify the right /desired files to delete first.

    Or is there some way in Postamn to seacrh everywhere for the GUID, and return the module?

    Hope this makes sense...   I did a massive delete of older files and it hardly made a dent in the FS storage which is at 80%.

    Also not sure exactly when sugar changed the upload structure to be split into sub folders - but are these sub folders any clue as to what module the file relates?  (we have custom modules too using file fields)

    Your advice for older emails is also my next plan of attack - so thank you for that too, from your last reply.

    Best regards,
    Luke.

  • Hi Luke,

    unfortunately the directory groupings don't seem to have anything to do with the parent bean of the file:

    "Each uploaded file is now stored in a subdirectory derived from the UUID filename. Existing files will be moved into subdirectories during upgrade. For example, the file 3657325a-bdd6-11eb-9a6c-08002723a3b8 will now be stored in the ./upload/25a/ subdirectory. These UID character locations were selected to ensure even distribution of files across the new subdirectories."

    https://support.sugarcrm.com/Documentation/Sugar_Developer/Sugar_Developer_Guide_13.0/Architecture/Uploads/UploadFile/

    It would have been nice if they could have segmented it by parent bean_module and by Year/Month created inside that. That way we could distinguish, say, email attachments from contract documents. I don't care to keep a full history of 12 years of emails, but would want to keep every contract.

    I wonder if the DataArchiver system would also move the uploads to a different directory? I assume that if records are deleted the corresponding document will be deleted too, but when the record is archived, does the document move elsewhere?
    https://support.sugarcrm.com/Documentation/Sugar_Versions/13.0/Ent/Administration_Guide/System/Data_Archiver/

    I will be interested in seeing what others have to say about this.

    FrancescaS

  •  as mentioned the GUID is a unique identifier and is intended to be random. There is nothing that indicates which Module a GUID is associated with.

    As for the upload folder, there are really only two modules that have attachments. Documents and Notes.

     I believe if a record is marked for deleting the associated attachments are deleted. 

    For example: if you have an Email which a bunch of attachments (i.e., Notes). If you $email->mark_deleted($id). The associated notes (attachments) will be marked as deleted and all of the files associated will be removed from the filesystem.

  • Don't forget the custom modules of type "file" which store their files in the upload directory also. So, you have

    • notes attachments
    • document revisions (the documents files)
    • custom files objects 

    When deleting files you should take care of them.

  • Hi  ,

    In addition to the answers from others, there is also the EmbeddedFiles module that can contribute to the upload directory.

    If you store a lot of emails in Sugar, there is a strong likelihood you have a lot of files duplicated in your upload directory. Our Upsert Deduplicate plug-in has an optional add-on to deduplicate file attachments in your instance. Let me know if you'd like an in-depth demo of the plug-in!

    Chris 

  • Thank you  ,  ,  and  ...    Really appreciate the replies above - what a great resource this SugarClub is and for an admin like me to be able to tap into the knowledge and experience of awesome folk like you guys is fantastic, thank you.

    OK, all understood.  

    What I did to start with was using a Python script to give me a CSV of any file in the Upload folder (from the downloaded backup, locally) for anyfile that has a matching checksum, ordered by size descending.

    If the checksum matches another file then it is probably a file that is duplicated, uploaded more than once.

    But to check them all is where I get stuck in a very manual process.   I am working on the larger files first.

    I previously (before the backup) used reports to finds files with no parent for example, and then I generate a list of ID's and use postman to delete those ID's / records.   (one issue here is our custom modules with a file field do not have file_size field, so the report is less useful, though date created is still worthy)


      the plugin looks fantastic but the cost per user (on my current user count so this is very approximate) works out at almost the same as a Sugar licence per month.   There are a handful of things like this which I wish SugarCRM would just aquire and build into the product - It does look great.  But I can't imagine getting that cost signed off.   On my salary too it would equate to an undisclosed ;-/ number of weeks working on storage, which is unlikely. Well I hope so anyway! Joy


    We could just buy more storage from Sugar CRM, but even if we get to that point I am continually educating my users about file storage (don't store massive images, do you really need it in CRM, etc) and second trying to keep my admin eyes on it and come up with nice ways to flush out the old stuff as and when.

    I actually made a google apps script to log daily the storage from insights into a sheet, then I pull that puplished report into my admin dash - so I can see the storage growth over time, and predict when disaster will likely occur Nerd   When we hit 80% I spring into action...  

    For now I will carry on as I am with the additional knowledge learned form you all in here.  I will tweak my Python script to maybe add in the date of creation to begin with the older files.

    Many thanks again to you all,
    Luke.