Best practices in data cleansing when you have a large CRM

A long time customer asked the other day about how to best manage the "size of their CRM". They've been using Sugar for ages, and have been concerned about how large it was. At the same time, they wanted to ensure they leveraged all of this data effectively to deliver great CX too.

The following is a paraphrase of what I shared, but I'm really keen to hear what everyone else's take/experience on this is too - there are always better ideas out there.

In general, I see a couple of related topics here, which are sometimes in conflict - so there's not necessarily a one size fits all approach.

  1. Potential value of insight increases with longer data sets

    Now more than ever, the more data you have, the more you can do with it. Take for example, intelligent lead prioritisation in our platform.  For folks like this customer who had been using Sugar for a long time, our AI capabilities can leverage the audit log to start making predictions faster - due to what we already know. Even manual scoring models can perform better with a larger data set. To access this value in aggregate, the development of dashboards which do time-based comparisons (this year v last year, quarter on quarter, month on month) are critical - so that you can determine if key business metrics are trending in the right direction. In particular, if your visualisations are pointing out trends, they can also help identify where there are deviations, i.e. when something is going wrong. Our Advanced Forecasting capability is an example of leveraging large amounts of historical data in order to help businesses make better decisions about where they are going.

  2. Risk of exceeding storage allocations as part of storing all of this data

    Many CRM vendors use size as a part of their licensing model. Usually, it doesn't really matter if you have millions of rows of data - it is more about how much video/imagery you upload - often via your email integration. We detail out Sugar's approach to this in this blog post: If you find you're approaching a limit on size, first understand if it's more about files, or data. More often than not, it's about files. Just like when your phone is running out of space, its important to focus on cleaning up the things that matter - deleting another 1000 old lead records will unlikely make a dent in your storage usage.

  3. Speed of screen loads

    No matter how fast the tech, if there is more data to load, invariably, some screens will take longer to load.  Similar to the above, it is important to ensure focus. If you have a million Lead records, there's no screen in Sugar which is trying to load them all at once. We use pagination to break things up and ensure that lists load in the flashiest of flashes. Again - before you go to delete a bunch of old Lead records thinking its going to make your list view load faster, it probably won't.

    However, if you have a report on your dashboard which is showing you number of lead conversions every month, and there's no date filter on that report - that report is only going to get slower over time. It's also going to be distracting sometimes - do you really need to see what happened in November of 2014? If there are other similar types of visualisations on the same dashboard, then that whole page is going to get slower over time too. It is worth reviewing the filters on your report first to see if there are some sensible boundary points to these types of reports instead of selecting everything.

  4. Data retention policies (i.e. compliance)

    In many regulated industries, and even some public sector environments, there are rules around how long you can hold personally identifiable information, or in what scenarios you're allowed to hold onto it longer. I am not a lawyer, but I know that GDPR put this front and centre for many organisations that do business with EU citizens. A client I used to work with had a rule that said that they could hold onto someone's contact details provided they could prove (when audited) that they had legitimate business contact within the last 24 months. What constituted legitimate business contact? Well... that's up for debate, but Sugar needed to support this organisation's compliance with this policy. It did this through a combination of Business Process Management features as well as SugarLogic (the built in formula engine in Studio).
  5. Keeping relevant data through archiving

    Sometimes, end users are clicking around the CRM and they run into records from forever ago. The question is asked "why do we even have these records? surely they could be archived?"

    It's true, you can archive old records if you feel that their presence is distracting to current users. But before I go into the 'how' of that, some things to consider (reasons to keep this old data)
     - really old leads/opportunities? This data can be very helpful to both sales and marketing. If you know the date when a lead became lost/dead to purchase something from you on a subscription basis, you have a proxy to their renewal date with your competition. You can use Business Process Management to auto create follow up calls/tasks, or place the individual into a nurture stream in Sugar Market. Either way, old leads have a way of coming back to you.
     - really old cases related to bugs/features that aren't even relevant anymore? This kind of data can still be helpful to both R&D/product, as well as customer experience teams in better understanding the history a customer has had with you, or to connect new challenges with old ones. Even if the last time a problem occurred was >5 years ago, in a customer's eyes, it can still feel recent.

    Let's suppose you do have some data which can't be justified on the above basis, and it really is posing a challenge to end users. An example of this sometimes occurs when a user clicks into the 'Tasks' module, and sees thousands of old Tasks there and asks "where's the reminder I set for myself? I can't find it!". This challenge of findability is a common driver for archiving data. Before archiving though, consider reviewing a user's default dashboard, default list views etc. For example, nobody should click into the Tasks module and see the full list of all tasks. They likely should just see 'My Tasks', or better yet 'My Open Tasks' as a starting filter. Using this approach, users are confronted less with 'all' of the data in the CRM, and just focus on what is important to them.

OK, so enough reasons to try and talk you out of archiving. If you do want to do this, there are a couple of options available:

  1. Use the Data Archiver tool. Pretty much does what it says on the tin. You can pick criteria for records to be deleted, or simply moved into a hidden table. 
  2. If you want to use Data Archiver, but you don't feel the filter is sophisticated enough for your needs, you can use Business Process Management and/or SugarLogic (built in formula engine in Studio) to create far more complex scenarios, and more or less set a field like "archive_this_record_c" to true, and then the Data Archiver could be based off that.
  3. If you don't like Data Archiver, you could use Business Process Management to delete or archive the record. A good practice I've used here is to set the 'Team' of a record, to a new team called "Archive", and put no end users into this new Team. That way, nothing is truly permanent about the archiving process.
  4. Custom logic - always a backup option, but never the recommended direction.

None of these thoughts/solutions are intended to be exhaustive - they are hopefully just a starting point for discussion! I am super keen on hearing what other things you've considered - please do share.