Cleaning the data that matters…and not all data matters!

Standard

In my previous post, I alluded to a list of 5 concepts that make data cleansing a bit easier (Not fun, not easy – peasy but easier). In this post, I am going to expand on the concepts of “Knowing your data” and “Classify your data”

It's about half the battle

GI Joe talks about knowing

.But, before we get into the methodology and the doing, let’s talk about tools used. We are actually only using two tool to build out the functionality found with in this post, reports and formulas. However, because the methodologies discussed below is different than most organizations approach to cleaning data (Ocean…Boiling) there will be work on you to get folks bought into the ideas of not just trying to clean everything. So, I guess if you want to get technical, a third tool is the soft grey matter inside your noggin!

First things first. To help me “Know” and “Classify” my data, I am going to write a report that has two bucket fields, “Customer” and “Pipeline”. The bucket fields are looking at two custom field that are rollups counting the number of booked opportunities and the number of open opportunities. These are my two primary classifications because I am going to use a combination of these two classifications to score the value of an account to my company.

1)      Non Customer, No Pipeline (Least Valuable)

2)      Non Customer, Pipeline

3)      Customer, No Pipeline

4)      Customer, Pipeline (Most Valuable)

My fictional org for “Kramerica” wants all 481k of their accounts cleaned. Before jumping in and just starting to cleanse, I set up a report that breaks down an account based on past purchases and pipeline. Just by using two bucket fields, I can see that 14,000 accounts (About 3%) that are high value (Customer with Pipeline), 13,000 (3%) are medium value (Non Customer with Pipeline) and 54,000 accounts (11%) that are medium value (Customer No Pipe or Non Customer Pipe). I have just reduced the pool of accounts that should be cleansed by nearly 83%.

Numbers don't lie

Dry those eyes, it is not as bad as it seems

Unfortunately, there is still a number that is not very friendly standing between us and Maragriatville.

Margaritaville is real, google maps told me!

Which is just outside of Dallas apparently.

So, we are going to take things up a notch and write a set of formulas that will score the data that is entered on our account records. The folks in charge of data management (and that might be you), decided that Address, Phone and Website were most important. Yeah, I didn’t put state / country, but that is because of the change making it a picklist field, and we will just assume Kramerica is using the picklists. I am going to end up creating four formula fields. Three formulas will look at the data contained in the three fields. The fourth field will sum the scores of the three fields and then based on the totals, grade the data “Good”, “Acceptable” and “Poor”. The formulas don’t have to be complex, even something basic like if(len(FIELD=0,1,0), which will check for the presence of any data in those fields.

Just the ones that matter

In this case, red is good because red = less work!

That was a fun diversion, now, go back to the original reports with primary / secondary classifications. We add in the data grading field. Now, you can see how many of your most valuable accounts actually need the most help. In the case of Kramerica, we want to distil down that 14% (68k accounts) even further so we can focus on valuable accounts that have a data score of zero (no values in any of the fields) or one (at least one field has some data in it). Applying the formulas and the buckets to my data set reduces the amount of accounts I need to look at from 54,000 to 18,000.

I think this deserves a quick, bullet pointed recap:

–        Initial data set, 480k accounts

–        Valuable Accounts:

o   Customer / Pipeline (Most) 14,000

o   Pipeline / Non Customer 13,000

o   Customer / No Pipeline 54,000

–        Data scoring of valuable accounts:

o   Zero data score = 5,000

o   One data score = 13,000

–        Reduced my “need to clean” by nearly 90+%

My SFDC admin is amazing

I get this way whenever I shake loose a bit more time in the day.

Yeah, that is pretty awesome. However, there is the question of what do to with all those “other” accounts. Here is where it goes from awesome to AWESOME (in a monster truck voice). Since you have already established what makes an account valuable, once an account meets a certain threshold (gets pipeline), you know that it then needs to be cleaned up…and of course, you know what needs to be cleaned up because you are already scoring it.

2014-06-01 20_51_21-awesome monster truck - Google Search

 

PS – For bonus points, create a nice email alert telling the reps their data is bad, and make it so it sends them that notice every time they edit the account OR opportunity…just put on a timer so it only sends once per day!