ErsatzOwl’s Information Management Practice (IMP) 101 : Part 1
This article is the first installment of a personal exploration of the elements of information management practice. As I write this I claim no expertise on the topic beyond my ten years of using the various systems and tools that make use of such knowledge. I write this as a way to educate myself (and perhaps spark some feedback) on topics relating to information management as a means to better myself and others through such conversations. In a way, you could say that while this is not quite an exploration of the topic from first principals, it is an exercise in trying to build a fundamental understanding of the topic by discussing what I know and looking for evidence to support that undertstanding. In other words, this isn’t really a course, lecture, or other educational tool: it’s just me working things out on my own.
But if you find it useful please let me know.
In kicking off a bit of self-education on this topic, I considered the many places I could start. I could have dived into the technology, and attempted to work backwards from their functions. I could have pulled up a case study and piece by piece outlined the way I would have gone about taking a jumble of files, knowledge, and other information and organized it in a professional, useful manner. Or I could have done what I did: taken the theoretical and philsophical approach to the study and tried to unravel what I’ve been practicing for many years around some solid concepts, defintions, and ideas. So, this is how it starts…
I start with a defintion. Or a few, to be more exact.
I wanted to spend my first words on this subject exploring the fundamental building blocks of how we tend to think about information. In my syllabus I asked the question “What is the difference between information, data, and content?” and after jotting down my first few pages of notes I quickly realized that in my brevity I’d really done two things wrong. First, I’d put those three entities on a level playing field (as illustrated in the first diagram) and second, I’d missed two very important fundamental aspects of modern information management practice: records and metadata.
Now, if you are a seasoned information manager you might be stepping back and saying… “dude, you’re missing a lot of detail here!” But I’d just like to clarify that I’m not trying to be absolutely inclusive of every aspect of these systems, but rather attempt to reduce (at least from my experience) the fundamentals of the problem down to their very barest minimum of elements. And as such, I’ve decided to narrow the elements to a total of five important bits that I will cover in more detail as I progress through this so-called one-oh-one of information management.
What we are left with then is five pieces: information, data, content, records, and metadata — as illustrated in the second diagram. I considered that though these are represented as five equal circles in my second diagram, they are (a) anything but, and (b) not really even comparable concepts in that some are mere abstractions while others are virtually tangible elements in the process. Thus, what we are left with — or perhaps I should say, what we are starting with — is not some deep theories on managing information, but just that from what I could distill (from both experience and more reading) there are five fundamental parts to this whole question. Rephrased: What is the difference between (in no particular order) information, data, metadata, content, and records?
As someone who is so inclined, my first thought around these five elements was to try and define them, and in trying to define them (at least as far as one might define them in a practical, info management context) I attempted to put them in some sort of order of priority or purpose. What this amounted to was a few pages of scribbles in a notebook that at the very best resembled a Venn diagram of how they tended to overlap and envelope each other. (You might have called it a bit of information managing around the information management concepts.)
The third figure displayed is a (translated) representation of what I created on this first attempt at a Venn diagram.
My justification is far more complex.
At the very bottom I was fairly certain that data is what is considered amongst the most granular of elements for which we need now be concerned. An analogy to science might be to call an element of data the “biological cell” or the “chemical atom” of the information managment world. Yes, further study will find that these smallest of interesting pieces can be broken down to even more interesting pieces, but those smaller and more interesting pieces are arguably both (a) the purvue of other branches of study and (b) dependent on a solid understanding of this fundamental element.
When I talk about data, of course, a practical example might be considered something such as a number or a word, a note or a indivisible collection of pixels or sound waves. I considered that the best way to understand this might be to think of it as anything that could be simply represented or referenced within a single database cell noted by a specific intersection of a row and a column within such.
Directly encapsulating data, I wrapped in layers my two tag-along, nearly forgotten elements of this equation: metadata and records. Metadata I’ve long thought of as the abstract glue that binds data together. It is the conceptual part that adds value to the other parts, but is not wholy vital to calling that data complete. It is descriptors such as keywords, tags, or other descriptive bits that help us say data collection A is related to data collections B and C, without actually linking them together within our data management system. On the other hand, records serve a similar but more literal purpose: records are actual collections of similar elements of data, representing a larger concept that is described by the data within the record.
To think of a practical concept behind these ideas, let’s consider something like a real estate database. In this example, a record is simply, say, a single house listed inside that database, located at such and such an address. It is an individual entity, but an entity that has many pieces of data that describe it (size, colour, price, rooms, bathrooms, location, etc) and many pieces of metadata that connect it to other records (price range, time on market, agent, etc). Arguably, in a different system metadata and data might become one and the same, but this is a complexity I will address at a later time and for matter of simplicty, let’s leave it at that they are subtly different.
Encircling the concept of records, I placed content, and then one an even broader scale, information itself. My rationalle in this first draft was is thinking that content is really one’s collection of all records (and thus all data) on a given topic and located in a given place. It is thus the last concrete definition before we move out to the idea of information, that abstract concept that implies the holistically greater value part that is derived from managing the content itself, yet something a bit more abstract and less tangible.
I can’t say that I was entirely satisfied with this description, so I tried again. This time, still a kind of Venn-like diagram, I elaborated on the above to come up with my fourth diagram.
Three things are significantly different in this fourth representation.
First, data becomes something far more numerous and granular in the system, represented as clusters wholly encapsulated by records.
Second, metadata finds a home in the spaces between the records as some part of the content overall, but both separate and yet important to the system.
And third, information is no longer something that fully encapuslates the content, but rather touches upon it and draws from it, while becoming mostly an independent (yet, still abstract) entity in the system.
Going back to our earlier real estate database example, then, the content is really the whole of, say, a database system with all record-based information stored and interlinked within a computer system. Information is not wholy encapusulated by the system, but is instead supplemented by the system, presumably because in such a system all things knowable about our example properties and how to sell them could reasonably be put inside of such a system. Our imaginary real estate agent would be out of a job if every part of the information management chain occured in a single computer database.
Yet, I was still a little unsatisfied at my “first stab” at defining all these elements. The result was the fifth (and for now, final) illustration of how I see things represented in this hierarchy of elements. There are two distinct changes:
First, and of lesser importance, was the inclusion of metadata within a record. The justification here was alluded to above, and is simply that in a thorough system, metadata could come from multiple places in the system, internal and external to the records, and really the defintion of metadata can be broad enough to include both.
Second, I treated information very differently. This is perhaps where some may deeply disagree with my treatment of the system. I thought for a while on this system and came to the personal conclusion that information is not what is stored on the system. Rather, information is both an abstract concept that is derived from the system and the literal output of the mashing and grinding of the metaphorical gears of the same. Information is the answer received by asking question of our data.
In our real estate example, information might be in the form of a listing of records of houses and their prices currently for sale in a specific price range and neighborhood. This is information, and is different from the rest of it because it has been extracted as a very tailored answer to a very specific question. The better the question, the better the information. The less defined the question, the more likely that the system will simply return data.
So, back to my initial question… or at least my modified question: What is the difference between (in no particular order) information, data, metadata, content, and records? Answer: These are the five fundamental (in my eyes) elements of a system of managing our knowledge about a specific topic, and while dependent upon each other are not really comparable: each can exist in some form without the others, but the most value is derived for each when the others are present and managed effectively so that information, the key purpose to it all, has the most value for those using it.
All diagrams are (c) 2010 by Me.