Tell the Story of Your Data
- Representations of various phenomena
- Contextual collections of things
- Translation of the world into human-defined objects
Today, we are surrounded by an unprecedented mass of data, yet people still have distinctively varying views on what “data” exactly means. The above list is an attempt of mine to come up with a reasonable description of what data is, but I’m sure there are million other ways to explain it differently.
By going through Representation and the Necessity of Interpretation by Laura Kurgan and Objectivity by Lorraine Daston, Peter Galison, I developed a mental model of data that somewhat weaves together the complexities of data such as objective truth, subjective interpretations and etc.
I see data as frozen snippets of encrypted human stories.
They are encrypted because often times there are hidden secrets that are impossible to be discovered unless we are “provided” the keys to unveil them. The encrypted secretes include :
- Who/what collected the data?
- Who chose to produce the data?
- When and where was the data captured?
While the general public indulges in the wide availability of open data nowadays (i.e Google Earth photos, Wikipedia, etc.), we should still remark that these datasets seldom come packaged with the secrets and stories of their origins. It feels like all datasets should claim unique stories behind them because real “people” have taken efforts and actions to create them. Nonetheless, unless we go actively hunt for the keys to these datasets (that might not even exist), we are still left with partially encrypted datasets that are subject to subjective inferences based on the relatively small portion of the story that has been decrypted for us.
Thus, it is no surprise that people propose a wide spectrum of interpretations given the same datasets. Provided limited knowledge, the directions of these interpretations are inevitably biased with individual prior knowledge, and we should acknowledge that the “trustworthiness” of such inferences are applicable only within the systems that provided the rules and standards to form such inferences. For example, a Bayesian thinker would argue that the p-values calculated from a frequentist point view are not trustworthy, but the frequentists would argue the other way around.
Moreover, I call data frozen snippets because we only see the frozen phase of data which in fact goes through many phase transitions.
I imagine there is a big-bang phase for data collection when somebody decides to initiate a dataset. Next is the condensation phase in which a stream of phenomena and events pile up. Then in the freezing phase, the collector of data freezes a moment of the piled-up representation of the world. Lastly in the melting phase, the data is disassembled into various interpretations and applications by external forces. While people can claim many things (that might be true) about the different phases of a dataset, unfortunately there is no way to validate such claim unless we are provided with sufficient background information on each of these data phases.
I feel like my metaphors have gone a little too far by this point, but thinking data as a process and a flowing of interesting stories does certainly help better understand the complexities surrounding data.