Upload directory contents

Let me preface this post: I don't know anything about this at all. Because I don't know anything, I am hoping my ignorance may somehow luck into outside of the box thinking. 

My customer's file storage is huge. 

We've determined that the Upload directory has 139 GBs in it. 

It's not really my job to fix it but I am frustrated and want to help. 

Obviously, through my research I've figured out what a GUID is. I've read a million posts on club and elsewhere to see if there is a way to figure out from the GUID what the file actually is. I understand we can set a date range and just have support blanket delete up to a certain date.

Is there any way to use the GUIDs in the directory to determine what the file actually is. For example: 

Let's say you have this: 
00019efe-6622-11e6-aa95-02a6691319f3
Can you use that to backtrack to what this actually was?
When Sugar generates the GUID there must be some criteria it uses to create that right? Can it be reversed so you can tell anything about it?
Second question: This might be even crazier than the first. Not sure. 

Could you somehow move the contents of the upload directory out of Sugar to an external place and then point to that external place without screwing up everything? 
Parents
  • Part 1:

    The GUID is tied to the row to which the data belongs. This could be a Note, Document, Document Revision, or the field picture (which is in certain modules, like Users, Accounts, Contacts, Leads, and Prospects). Additionally, it could be tied to any custom module you may have that has the ability to upload a file. 

    One of the largest places which have a large amount of "files" is the Emails module. If you are archiving emails to Sugar any attachment, image, etc., that is attached to or in the body of the email gets stored into the uploads folder.

    For any of the above modules, if you Delete the record from Sugar, the corresponding upload files will also become deleted. For example: if you delete an Email record, all of the associated files will also be deleted. It might be a good practice to use the Archive Records functionality to remove records that are older than a certain date.

    Additionally, we have a scheduler that will clean up emails older than a certain time period. For example, we only keep 3 years' worth of emails in our Sugar. We arbitrarily picked 3 years, this could have been 1, or 10. It was a choice. We determine that people are not going to look in Sugar at an email older than a certain date.

    Part 2:

    Without knowing which of your modules is using so much storage, it is difficult to provide you with an exact solution.

    There are Module Loadable Packages that will utilize AWS S3 to store files; there is an example written by

    There are Module Loadable Packages for Box or Google Drive, MS365, etc. 

    OOTB, Sugar has the ability to use Google Drive and MS365 in the Documents module. 

    Good Luck,

    Jeff

  • Hi  ,

    Sorry it has been a long time since you wrote the reply to   above....

    Just reading through the thread now as I am currently trying to reduce file storage in our cloud instance and wanted to re-ask one of Frank's questions if that is OK.

    Frank asked this below:

     Is there any way to use the GUIDs in the directory to determine what the file actually is. For example: 

     Let's say you have this: 
     00019efe-6622-11e6-aa95-02a6691319f3
     
     Can you use that to backtrack to what this actually was?
     When Sugar generates the GUID there must be some criteria it uses to create that right? Can it be reversed so you can tell anything about it?

     

    ...but this bit specifically was not really answered, even though the anser you gave was excellent, I just wanted to aks about this specific part.

    The reason I ask is I have a backup with our Upload folder attached but there is no way really to find in there for example only files from one specific module.

    If there was some clue in the naming of the GUID that woud help with the Find then it would really help.  Like if the second set of 4 characters was always 11e6 when the file was from a specific module < for example.  This would help identify the right /desired files to delete first.

    Or is there some way in Postamn to seacrh everywhere for the GUID, and return the module?

    Hope this makes sense...   I did a massive delete of older files and it hardly made a dent in the FS storage which is at 80%.

    Also not sure exactly when sugar changed the upload structure to be split into sub folders - but are these sub folders any clue as to what module the file relates?  (we have custom modules too using file fields)

    Your advice for older emails is also my next plan of attack - so thank you for that too, from your last reply.

    Best regards,
    Luke.

  • Hi Luke,

    unfortunately the directory groupings don't seem to have anything to do with the parent bean of the file:

    "Each uploaded file is now stored in a subdirectory derived from the UUID filename. Existing files will be moved into subdirectories during upgrade. For example, the file 3657325a-bdd6-11eb-9a6c-08002723a3b8 will now be stored in the ./upload/25a/ subdirectory. These UID character locations were selected to ensure even distribution of files across the new subdirectories."

    https://support.sugarcrm.com/Documentation/Sugar_Developer/Sugar_Developer_Guide_13.0/Architecture/Uploads/UploadFile/

    It would have been nice if they could have segmented it by parent bean_module and by Year/Month created inside that. That way we could distinguish, say, email attachments from contract documents. I don't care to keep a full history of 12 years of emails, but would want to keep every contract.

    I wonder if the DataArchiver system would also move the uploads to a different directory? I assume that if records are deleted the corresponding document will be deleted too, but when the record is archived, does the document move elsewhere?
    https://support.sugarcrm.com/Documentation/Sugar_Versions/13.0/Ent/Administration_Guide/System/Data_Archiver/

    I will be interested in seeing what others have to say about this.

    FrancescaS

  •  as mentioned the GUID is a unique identifier and is intended to be random. There is nothing that indicates which Module a GUID is associated with.

    As for the upload folder, there are really only two modules that have attachments. Documents and Notes.

     I believe if a record is marked for deleting the associated attachments are deleted. 

    For example: if you have an Email which a bunch of attachments (i.e., Notes). If you $email->mark_deleted($id). The associated notes (attachments) will be marked as deleted and all of the files associated will be removed from the filesystem.

Reply
  •  as mentioned the GUID is a unique identifier and is intended to be random. There is nothing that indicates which Module a GUID is associated with.

    As for the upload folder, there are really only two modules that have attachments. Documents and Notes.

     I believe if a record is marked for deleting the associated attachments are deleted. 

    For example: if you have an Email which a bunch of attachments (i.e., Notes). If you $email->mark_deleted($id). The associated notes (attachments) will be marked as deleted and all of the files associated will be removed from the filesystem.

Children