Following-up to my blog 'Just Tell Me What I'm Doing', I'm starting a series of posts that define the key concepts and terms that make up my analytic world. Everything I do is coloured by my experience actually doing analytics in commercial organisations. So while I believe these posts will present practical definitions that will be actionable in the business world, I know that there are other worlds in academia and science where they are less relevant. At the very least, people in these areas will gain a better understanding of how business regards analytics.
Bennett's Analytica
A Practitioner's Guide To Analytics
Data, Information and Knowledge
Information is a collection of related data – often transformed and aggregated – about a topic. In business, that topic is often insight about an operational area or a performance question. In analytics, information is often used interchangeably to mean ‘data’ but data is actually best thought of as something that on its own carries no meaning. The main differences are in the degree of meaning and the level of abstraction being considered. To explain:
Degree Of Meaning
Data, information and knowledge all have some degree of meaning. Even data has meaning at some level. For example:
- data: 99.9 is a number (you know it is probably not text). There is still a possibility that 99.9 is code for a text string or value.
- information: 99.9 is the percent of transactions successfully processed by an application.
- knowledge: 99.9 is 0.05 below the acceptable level for failed transactions with our customers.
Level Of Abstraction
Data is the lowest level of abstraction, information is the next level, and finally, knowledge is the highest level among all three.
Be careful: abstraction is not the same as summarisation. Summaries may only be the sum of individual pieces of data. This doesn't change the data into information in and of itself. An example:
A list of amounts 5, 8, 5, 2 can be summed to 20. Is 20 information?
Sources
In the business intelligence world data is extracted from fixed sources (batch or in real time, it doesn't matter). Sources are usually either transactional applications or reference data. All sources have meaning. Transactional data has meaning because:
- each transaction is stored in one or more records and this gives context to the individual data items of the record.
- the source application is known and that is information that gives additional meaning to the data.
Reference data also has meaning as the table(s) within which it is stored has an internal meaning due to the relationship between the table rows. Typically this meaning is either hierarchical (for example an organisational structure or products grouped into categories) or group (for example a list of product codes or currencies).
In order for data to become information, it must be interpreted and take on a meaning.
Analytica Illustration
An example (care of Wikipedia):
"The height of Mt. Everest is generally considered as "data", a book on Mt. Everest geological characteristics may be considered as "information", and a report containing practical information on the best way to reach Mt. Everest's peak may be considered as "knowledge"."
Related Terms and Concepts
Refer also to Data
Refer also to Metadata
Comments? Via form below or send feedback to [email protected] version 0.1 201002
Comments
You can follow this conversation by subscribing to the comment feed for this post.