SAP Licensing and Authorizations Managers: How do you know that your final report is not relying on corrupted data? Maybe you have a software tool that analyses the data for you – but is any data missing or corrupted to begin with? How do you know?
John Doe is Requesting Your Approval to View Invoices
Recently some of our customers acknowledged that it’s just not enough to only have a superior tool for licensing or user authorization monitoring. As the phrase says: “the devil is in the details”, and they discovered that some of their details is inaccurate and incomplete. This is not a secret of course, you’ve heard it if you’ve been in the IT business for a while, but it’s an extremely critical concept when it comes to combining user data among different systems or granting user authorizations.
Imagine yourself trying to combine 10,000 user details among different applications without having the full email address for each one. “John Smith” and “Zhang Wei” will probably result in multiple combinations, but maybe they really represent a single person and should be joined into one employee? You’ll never know if you don’t have a complete and accurate email address.
Or how about when you get a request for additional user authorizations to view financial statements – the user account is “JOHNS”, the first name is “John” and there’s no last name or email address. How can you identify who’s really behind it?
Missing Data. Incorrect Data
From Google: “Data cleansing, data cleaning or data scrubbing is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database.”
In order to work well with data, the data needs to be accurate and complete. Here are four pieces of advice that we’ve collected from customers who’ve gone through the tedious process of data cleansing.
- Most important data goes first: You should identify what the most important data fields to your process are and start there. For example, in order to have a successful licensing process that includes the step of combining user accounts among different SAP applications, you need data for user email addresses to begin with, and then first and last name. In order to identify whom a person is in an authorization related process, you need data on their department and position too. First identify what the most important data is, and then focus on completing it.
- Share the task, but be prepared to take over: You will probably need other people to help you complete missing data, especially if you need data about offsite locations. Take the time to recognize who the relevant people are that can assist you, but just know that some won’t get to it fast enough, and in this case, you’ll just have to do the work yourself. For example, if you need to complete email address information, you can ask the department managers for their employee data, but you’ll probably discover they passed the task to another person and that person doesn’t understand, or they’re too busy to even reply. Anyway, it’s your responsibility, so if this happens, take over and to do it yourself. Don’t forget that this is just the data cleansing step on the way to sound SAP licensing or an efficient authorization processes.
- It’s not all-or-nothing, 90% is much better than 50%: Eliminating dirty data is a continuous challenge that will probably not end when your project does. Understanding this will make it easier for you to not be frustrated if you discover that there’s more data to clean, and it seems like a never-ending story. Make an effort to cleanse a good amount of data but don’t forget to set time limits and the target; It is much more important to keep to the timeline and reach the goal (better licensing, implementation of workflow) than to have all of the data clean and accurate. After a successful first project, you can always plan the second one.
- Tools are great, but not always a must-have: If your organization is not so big, or the amount of the data is not so huge, you can try to trust the human eye to spot missing and corrupted data. The human eye can easily scan 1,000 records of data, and even 10,000 data items is not beyond its scanning reach. Of course, there are very good tools that can spot corrupted data by using data-rule sets and duplicate-tracing, however you might discover that your management doesn’t want to spend money on them. In this case, good practice is to narrow down the most important data objects and do the work manually. You will be surprised by the ability of the human eye.
A final note:
Missing and corrupted data can turn a potentially successful project into a nightmare and can create a delay in the estimated timetable. It’s important to understand this before implementing a tool that uses that data, but it is also important to focus on the final target and not focus too much on the data. An experienced project manager might make the difference in many cases.
For more in depth information:
Please note: this is just a short brief about our customer's practical concerns. For in depth reading please look at the following terms:
Data Profiling (http://en.wikipedia.org/wiki/Data_profiling),
Data Validation (http://en.wikipedia.org/wiki/Data_validation),
Data Quality Assessment (http://en.wikipedia.org/wiki/Data_quality_assessment),
Data Cleansing (http://en.wikipedia.org/wiki/Data_cleansing)
Dirty data (http://en.wikipedia.org/wiki/Dirty_data).
Another good overview, although a few years old, can be found here: Data Cleaning: Problems and Current Approaches.
Finally, you can also consult with us about our experience in this field.