Data and Analytics

Your Bad Data is Costing you BIG

An IBM study found that bad data cost the United States 3.1 trillion dollars in 2016. You may be thinking, how can that number be so significant? Below are a few examples and their impact.


60% of data scientists admit that they spend most of their time cleaning and organizing data; CrowdFlower


Glassdoor states the average salary of a data scientist in the United States is $115,141. Within a team of ten a company could easily spend $500,000 a year for less productive work.


It costs ten times as much to complete a unit of work when input data are defective as it does when they are perfect – The Rule of Ten, Harvard Business Review.


Defective data causes rework, lost revenue, and bad customer experiences. An example would be sending a repair technician to the wrong address wasting technician’s time and not resolving the customer’s issue.


Experian’s 2019 global data management research found that 29% of organizations believe their data is inaccurate, on average, leading to a distinct lack of trust in data.


Working in organizations that don’t trust data puts you in a situation to question decisions, spend extra time validating and verifying data, and missing opportunities.

Many more examples exist that prove the adage, “Garbage In – Garbage Out”. Clean data becomes even more important as companies deploy self-service solutions and move towards using artificial intelligence and machine learning. It is not glamorous but ensuring your company has clean data will boost productivity, provide better insight, and create higher profits. Whether you want to improve business as usual, perform a systems integration, or start an analytics project, clean data is where you start.

Where should the clean data journey begin?

The critical points in the life of data are the point-of-creation and end-use. The rubber hits the road when the data manager can establish a data linage from the point of creation to its end-use. Most companies will have too much data to do that for every attribute so it’s important to target your high-value data.

Define Critical Data

Critical data “keeps the lights on”. Identify critical data by focusing on the reports used to make decisions, data that provides the best customer experience, or data with regulatory, legal or compliance implications. Make the focus narrow enough that progress is made, and the scope can be widened as time allows. This data aligns to your business strategy meaning don’t worry about data that doesn’t drive your business strategy.

Establish a Data “Owner”

Who owns the data and takes responsibility for its accuracy? The owner should understand the process for creating or acquiring the data. They are responsible for supporting the process to create clean data, documenting metadata, and supporting questions about the data.

Establish and Measure Data Quality

What is clean data? The dimensions of data quality listed below create a comprehensive view into the data being interrogated. Measuring it over time will tell you if your data management processes are effective and help you focus on your next set of improvements.

  • Complete – Business and System required fields are populated, e.g., address information.
  • Unique – Degree to which duplication is managed. What is the source of truth?
  • Accurate – Meets business rules, data standards, and validity parameters.
  • Timely – Availability for use in a timely manner.
  • Compliance – Used for intended purpose and data standards reflect statutory, regulatory, and legal requirements, as necessary.
  • Consistent – Data is the same in definition, format, and values across all systems.

Evaluate Processes

Document the processes from creation to end-use for critical data. Review the processes via the lens of data quality and data security. To find the areas for improvement as the following questions.

  • Are the creation controls sufficient to create quality data?
  • Is data knowledge readably available?
  • Is data access/dissemination efficient and consistent?
  • Is the beginning to end process timely?

Data Strategy – Building The Map

The clean data journey sets the stage for a Data Strategy. Data Strategy must come from an understanding of the data needs inherent in the business strategy: what data the organization needs, how it will get the data, how it will manage it and ensure its reliability over time and how it will utilize it.1 This will inform Data Management activities.

Data Management is the development, execution, and supervision of plans, policies, programs and practices that deliver, control, protect, and enhance the value of data and information assess throughout their lifecycles.2 Thought Logic defines the components of Data Management as:

As seen in the figure above, Data Governance is central to Data Management. Data Governance (DG) is defined as the exercise of authority and control (planning, monitoring, and enforcement) over the management of data assets.3 Data Governance is about always improving. As you become more mature your capabilities with data will grow. As frameworks are built and processes are optimized the data under management can expand and drive deeper insights into customers, products, R&D, and overall company health.

Thought Logic’s approach to data management is designed to evaluate and address the processes, systems and variables impacting the cleanliness and ultimately the value of data to an organization. Our methods help firms prioritize the work necessary to deliver meaningful improvements in data quality and achieve long term strategic objectives.

1 “DAMA-DMBOK Data Management Body of Knowledge, 2nd Edition”, 2017, p.32

2 “DAMA-DMBOK Data Management Body of Knowledge, 2nd Edition”, 2017, p.17

3 “DAMA-DMBOK Data Management Body of Knowledge, 2nd Edition”, 2017, p. 67