« Design Notes Applications with Data Integrity in Mind | Main| Data Integrity and Replication Conflicts »

Lies, Damned Lies, and Statistics


Bookmark : del.icio.us  Technorati  Digg This  Add To Furl  Add To YahooMyWeb  Add To Reddit  Add To NewsVine 

Lies, Damned Lies, and Statistics, Benjamim Disraeli

Earlier this week, I was doing a little market research to determine the market share for operating systems and their various versions. I thought I could do a quick Google search and be done with it. In spite of the fact that there is a lot of information out there on operating system market share, most of it is either too broad to be useful (to me) or too specific.

The difficulty of obtaining the information I want is likely due to a number of things, some intentional and others perhaps not. First, companies such as Gartner Group, IDC and Forrester make a lot of money selling this sort of research. I can’t really blame them for not wanting to give that away.

Second, vendors are often times spinning the data in order to make themselves look better than they might otherwise look. (e.g. HEADLINE: WhackyWhidgets is the leading supplier of widgets! BURIED IN THE SMALL PRINT: WhackyWhidgets is the leading supplier of widgets for businesses in the process manufacturing industry with revenues between $20 million and $23.5 million.)

Third, it is easy to take data that is almost what I want and try to turn it into exactly what I want. The result is misleading or possibly out right wrong. Giving most of us the benefit of the doubt, I don’t think this is intentional really. We might not have asked, or received, all the information really needed. In this case, the information doesn’t exist. Or we might not have organized the data in a way that makes it useful for the current need. Or the data might be stored in different databases and impossible to combine in a meaningful and accurate way.

Obviously there are many more possibilities as to why data might be misleading or wrong. It’s worth reviewing, especially when the data is used to make strategic decisions or it is needed for audits. Reviewing the quality of your data will certainly require less time now then it will when you need to do the analysis. Then, it might also be too late.

Post A Comment

:-D:-o:-p:-x:-(:-):-\:angry::cool::cry::emb::grin::huh::laugh::lips::rolleyes:;-)