An approach to statistics and data analysis


When information systems evolve, they become greedier for both operational and advanced strategic statistics and data analysis. This need is a part of a natural evolution. The more data you have, the higher potential for extracting information you have. Looking at business environments using IT platforms, that's what analytics are actually all about - getting useful information from usually bad data. It turns out the task of analytical reporting is not so complex as it seems, but you definitely need a set of different skills / people to make it work.

There are tons of different statistical approaches, methods and theories, but it turns out that for average business needs you only need basic mathematics, where the most complex operations are sometimes logarithms. So, if it's so simple, where does the problem lay? Why do information systems often lack analytical support, which can be used for decision making?

In my opinion there are three main steps to consider when trying to make useful statistics and data analysis, and ignoring or underestimating any one of them will make your reports suck.


Data is the king. If you don't have the data, you might as well give it up. If your data is bad or weak, you might consider rebuilding it. But you should know one thing - the better the structure of your data is, the better your analysis will be. Using a flat database such as a text file or an Excel spreadsheet gives you few analytical opportunities. Relational databases, such as Access, MySQL or SQL offer cross-data querying and advanced reporting, but huge and complex calculations can take a lot of time. For those, a multidimensional OLAP database designed strictly for analysis becomes the only option.

Challenges in this step: Technical


The data discussed above defines the scope of potential information you can deliver. In this step, the main goal is simple - you need to know what you want to know. Business needs, process flow, strategic goals or just plain simple amusement are the main factors that need to be addressed. Having someone who is able to recognize these opportunities is crucial, because data is just numbers, but aggregated data - information - is knowledge. It's quite clear you won't be able to get something if you don't know what you want to get.

Challenges in this step: Analytical


A picture can tell a thousand words and this goes a long way for data visualization. Even if you can't use charts, you can color information and use measures such as font size to represent another dimension of information or trends. Besides, always keep in mind that less is more, so you should put irrelevant information in the background and punchlines in the spotlight. Check out different chart types, they're useful for different representations and experimenting with them can display things that don't seem there at first sight. Observe patterns. Try to imagine a playground, where information can satisfy your curiosity and while doing it, it also brings useful and valuable results.

Challenges in this step: Creative

If you have the will, you can do all sorts of crazy stuff with statistics and data analysis, but you should know they sometimes take a lot of time. I'm proud my chronolog already has two nice looking children of these activities. The first one is a simple recommendation engine used for content ranking and the other one a set of reports which offer insight into activity and interactions of the chronolog. What can I say, I like to play around, and it may as well be any information system I can get my hands on. Give me the data and I'll give you information.

Like what you read? Now tell the world!
Vote on Hacker News

A few more things you might find interesting:

written 30.11.2009 21:32 CET on chronolog
918 views   •   Like   •   

Connect with :