Reducing file storage consumption

I'd like to share a process I use in storage reduction scenarios involving the file system.

Sugar stores files such as email and notes attachments, document revisions, embedded images and files in Knowledge Base articles, user pictures, and even custom module attachments, if applicable.

All of the SELECT database queries I provide for data gathering can be used in the Custom Queries section of the Advanced Reports module.

Alternatives

SugarCRM's team can run a de-duplication script for your file system. I’ve seen numerous duplicates occur due to files included in email signatures, such as business logos or social media icons. Imagine your users have archived 10,000 emails, all containing the same image. In that scenario, there are 10,000 copies of that file stored in your file system, consuming valuable storage space.

Upon request through a support case, the support team can run the de-duplication script. It will identify the 10,000 copies, keep one, and map all references to that single copy, freeing up 99.99% of the storage space consumed by that one image.

Additionally, wherever Data Archiver provides the ability to filter records to accurately target those you wish to hard delete, I recommend using it. If you delete notes, emails, or documents using Data Archiver, their attachments will also be deleted. However, at the time of writing this post, Data Archiver cannot target email attachments alone. If you want to delete attachments but keep the emails, you will need to use a process like the one described below.

Now, onto the steps for identifying and deleting files to reduce storage consumption.


Step 1: Determine types of files, i.e., attachments or documents?

The first step is to determine what types of files are consuming the most space. The two most common culprits are email/notes attachments or document revisions.

For this post, I won’t focus on Knowledge Base articles, user pictures, or custom module attachments, as these are less common. Instead, I'll concentrate on the scenarios you're most likely to encounter.

Email and notes attachments are both recorded as attachments in the same database table: notes.

The following queries can indicate how much of your file system storage is being used by attachments or document revisions:

Fullscreen
1
2
SELECT COUNT(id), SUM(file_size) FROM notes WHERE file_size > 0;
SELECT COUNT(id), SUM(file_size) FROM document_revisions WHERE file_size > 0;
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

The results are in bytes, so adding commas (or periods) will help you visualize the file sizes in KB, MB, or GB.

In most cases, the majority of the size is recorded in the notes module. For simplicity, I will focus on notes examples below, but similar queries can be used for document_revisions by adjusting the module name in the queries.


Step 2: Identify when volume of files were created.

In most storage cases I’ve worked on, decisions about what to delete are filtered by date range.

If you want to see a breakdown of how much content was created each year, you can use this query:

Fullscreen
1
SELECT SUBSTRING(date_entered, 1, 4) a, COUNT(id), SUM(file_size) FROM notes WHERE file_size > 0 GROUP BY a ORDER BY a ASC;
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

To break it down by month in 2024:

Fullscreen
1
SELECT SUBSTRING(date_entered, 1, 7) a, COUNT(id), SUM(file_size) FROM notes WHERE SUBSTRING(date_entered, 1, 4) = '2024' AND file_size > 0 GROUP BY a ORDER BY a ASC;
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

And to break it down by day in August 2024:

Fullscreen
1
SELECT SUBSTRING(date_entered, 1, 10) a, COUNT(id), SUM(file_size) FROM notes WHERE SUBSTRING(date_entered, 1, 7) = '2024-08' AND file_size > 0 GROUP BY a ORDER BY a ASC;
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX


Step 3: Gather a list of file names to delete

When a file is associated with a note or document revision, it is usually named after the ID number of the corresponding database row.

Let’s say I’ve decided to delete the attachments for all records created in August 2024. To get the IDs of the relevant rows, thereby listing the file names I want to target for deletion, I can run this query:

Fullscreen
1
SELECT id FROM notes WHERE SUBSTRING(date_entered, 1, 7) = '2024-08' AND file_size > 0;
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Export the results into a CSV file and open it in Excel to perform the following operations.

Note: There are other columns in the notes module, like attachment_flag, that you can filter on to distinguish note records from attachments. While this can be helpful, focusing on file_size is typically sufficient when the goal is storage reduction.


Step 4: Identify the parent folder containing each file.

All files associated with records in Sugar are stored in the upload/ directory, but they’re split across up to 4,095 subdirectories named in hexadecimal numbers, 000 through fff.

Each file’s subdirectory is predictable based on its ID. For example, the file named f6376262-8663-11ef-99ae-0684a87b501c will be stored at: 

Fullscreen
1
upload/262/f6376262-8663-11ef-99ae-0684a87b501c
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

The name of the subdirectory is determined by the sixth, seventh, and eighth characters of the file name.

In Excel, with the ID numbers in column A, starting at row 1, you can use this function in column B:

Fullscreen
1
=MID(A1,6,3)
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Copying this formula down the column will automatically adjust the reference for each row. You can quickly copy it down by double-clicking the bottom-right corner of cell B1.


Step 5: Prepare the list of delete commands.

With columns A and B filled in, generate the necessary delete command for each row by entering this function in C1:

Fullscreen
1
=CONCATENATE("rm -f ~/upload/",B1,"/",A1)
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Each row should look something like this:

Fullscreen
1
rm -f ~/upload/262/f6376262-8663-11ef-99ae-0684a87b501c
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Copy the values from column C into a text file, and add #!/bin/bash as the first line of the file. Save it as a .sh file.


Step 6: Upload, move, and run your .sh file into the file system.

If you have direct file system access, you may already have a reliable method to copy files into the instance’s file system. However, I’ll share a trick I find handy.

Open the Notes module in the application, create a new note, attach your .sh file, and save it.

In the database, locate the new note's ID. Assuming the .sh file is named f6376262-8663-11ef-99ae-0684a87b501c, navigate to upload/262/ and run:

Fullscreen
1
cp f6376262-8663-11ef-99ae-0684a87b501c ../../ExampleDeletionList.sh
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

With the file in the root directory and the necessary permissions set, run:

Fullscreen
1
bash ./filename.sh
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

The script will execute, iterating through each line to remove the targeted files.


Conclusion

I’ve provided a detailed, step-by-step guide for those with On-Site instances and file system access to complete the described process. If your instance is hosted on SugarCloud, you will need support to complete the final steps of moving and running your .sh file, but you can still follow most of the steps up to that point.

I genuinely hope this has been helpful!


Update January 31, 2025: More useful information and instructions about storage may be found on this post:  SugarBPM and Database Storage: Best Practices  

Parents Reply
  •  ,

    I was thinking about  's question about identifying orphaned files. If the file structure was arranged by module it would be much easier to determine if files are orphaned.

    Also, I never got an answer about what happens when you archive :

    What happens if we turn on auto-archiver (moving records to _archive table, not deleting them), will the attachments also move to an "uploads archive"?

    Thanks,

    Francesca

Children
  • Hi  ,

    I am so sorry I didn't get you a response when you originally asked. Thank you for drawing my attention to that.

    I have tested in 25.1. The attachment file is hard deleted when using the Data Archiver to move the record to the _archive table. I have filed Issue 97246 to represent the data quality concern that creates.

    I have filed Idea 97247 to represent the idea of organizing files in a directory structure that clarifies which module each file is attached to.

    Regards,
    Patrick McQueen

  • Much appreciated!
    I submitted two cases asking to have them linked to those ideas.
    FYI your second link here 97246 but it's 97247 (correct in the actual link  URL Slight smile ) and the file structure is no longer


    e.g. upload/1b8/732481b8-165d-11f0-9125-f709e98c1d43

    as stated in the Case but rather with the weird numbered directories and encoded file names that make it impossible to identify the records without the use of the Sugar decoder functions. Slight smile

    Francesca

  • Hi  ,

    Thanks! I have fixed the number in the response to show accurately.

    I want to make sure we're aligned about the file system directory path for attachments discussed in this post and in the linked Idea.

    The file path for an attachment is:
    upload/1b8/732481b8-165d-11f0-9125-f709e98c1d43

    To break that down:

    • The parent directory in the root of the instance that holds all attachments is upload/
    • The subdirectory under it ( 1b8/ ) corresponds to the three digits just to the left of the first, or leftmost dash in the file's name.
    • The file name is not the name of the file as seen in the application, but is the ID number of the row in the database representing the attachment.

    The row in the database might be a background note representing the attachment to a different note row, or an email. Or it could correspond to a row in document_revisions, or to a profile pic field value in users.

    In my experience across multiple versions, including my test today in 25.1, this is the file system's directory structure for attachments. This structure is the principle upon which the steps of this post are based.

    Does your experience differ from that?

    Regards,

  • Sorry, as always I'm overthinking and mixing things... it's not the file name that is encoded but the file itself - mixing things up at this late hour of the day...

    Improving the security of files uploaded to Sugar

    For some reason I was remembering that the file names themselves were also somehow encoded.

    Apologies! Will try to be more careful with my comments. Disappointed

    Francesca

  • Hi  ,

    No apology needed. I completely understand. I just wanted to make sure we were aligned on this.

    Regards,
    Patrick McQueen