Skip to content Skip to navigation

Connexions

You are here: Home » Content » A theoretical model for the data analysis process based on cognitive science

Navigation

Lenses

What is a lens?

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

This content is ...

Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
  • Rice Digital Scholarship

    This module is included in aLens by: Digital Scholarship at Rice UniversityAs a part of collection: "The Art of the PFUG"

    Click the "Rice Digital Scholarship" link to see all content affiliated with them.

Also in these lenses

  • Lens for Engineering

    This module is included inLens: Lens for Engineering
    By: Sidney BurrusAs a part of collection: "The Art of the PFUG"

    Click the "Lens for Engineering" link to see all content selected in this lens.

Recently Viewed

This feature requires Javascript to be enabled.
 

A theoretical model for the data analysis process based on cognitive science

Module by: Garrett Grolemund. E-mail the author

Summary: Many people perform data analysis, but few have offered a theoretical model for the process. The descriptions that have been offered disagree with each other and appear to be based on personal intuition. This module examines the accuracy of conceptualizing data analysis as a sense making process, as described in cognitive science literature. A review of 11 articles that feature data analysis tasks suggests that a sense making model for data analysis would be accurate. Future work will examine if and how statistical data analysis safeguards itself against the sources of bias contained in the sense making process.

MOTIVATION

Data analysis is the process by which we glean understanding from data. While the origins of data analysis extend at least as far back as Francis Bacon and certainly further, the term “Data Analysis” was first introduced as a field of academic study in 1962 by John Tukey.

Improvements in technology have increased both the amount of data that we can store and the speed with which we can analyze it (Friedman 1997). With each improvement, data analysis becomes more relevant. Modern commentators now claim we live in the midst of a “data deluge,” where we no longer have the cognitive power to understand all of the data available (Hey 2003). Further advances in data collection technology will require further advances in data analysis methods.

The fields of Machine Learning, Data Mining, InfoVis, and Visual Analytics are all attempts to improve upon Data Analysis to better meet our analytical needs. But even with the research already done in these areas, scientists claim that there is very little Data Analysis theory to build upon, and that the theory that is available is hard to access (Unwin 2001, Mallows 2006, Cox 2007). This lack of theoretical understanding stymies improvement in the field. Many academic disciplines create innovations by extending existing theory in new ways. Data analysis appears to proceed through a trial and error process.

Researchers have offered multiple suggestions to remedy this. Cox and Mallows propose reviewing data analysis case studies to induce a general pattern of analysis. Unwin suggests creating a pattern language of Data Analysis similar to the pattern language first proposed by architects Alexander, Ishikawa, and Silverstein (1977), and used successfully in the field of software engineering (Coplien 1996). While we are intrigued by Unwin’s proposition, we do not presently have the resources to define a complete pattern language. However, we begin our examination of data analysis by reviewing the data analysis case studies that exist in the literature of statistical consulting, as suggested by Cox and Mallows.

RESEARCH QUESTION

Can the sensemaking model of cognitive science provide a theoretical model for data analysis?

PREVIOUS MODELS OF DATA ANALYSIS

Past efforts to describe data analysis reveal a lack of consensus about the process. Below are three illustrations of the process provided by Box (1976), Box, Hunter, and Hunter (1978), and Wild and Pfannkuch (1999).

Figure 1
Figure 1 (graphics1.png)
Figure 2
Figure 2 (graphics2.png)
Figure 3
Figure 3 (graphics3.png)

While different, the three diagrams suggest some salient aspects of the data analysis process:

  • It is an iterative process
  • It uses observable data to adjust a mental model
  • It alternates between an inductive phase and a deductive phase
  • The aim of data analysis is to create understanding

Data analysis shares these features with a process that has been well studied by cognitive scientists: sense making.

SENSE-MAKING

Sensemaking is an area of cognitive science that examines how the human brain creates understanding from its surroundings.  It began in the 1970’s as an extension of communication theory, but was then adopted by experimental and theoretical psychologists. According to sensemaking research, the human brain continuously scans its environment for data and builds this data into a mental model that explains its surroundings. A couple of sensemaking models exist to explain how this occurs (e.g, the cost structure model, the data-frame model), but each has the same basic components.

Figure 4
Figure 4 (graphics4.png)

The brain begins with a tentative theory, which is also called a model, a schema, or a frame. This theory suggests to the brain what is and what is not relevant data. The brain then constructs this data from the external stimuli it receives through the sense organs. An important facet of sensemaking is that the mind does not automatically accept all present stimuli as data. It instead decides which stimuli would be relevant, searches for them, and then synthesizes them into a piece of data.

The brain compares its currently held theory to the data it has collected. It confirms the theory if the theory accurately fits the data. Otherwise, it will modify the theory to better fit the data or completely reject the theory in favor of a new one. The process occurs continuously; the brain constantly refines existing theories against new data.

A theory provides understanding by describing the relationships between data. These relationships assign meaning to the data points and also allow predictions of unobserved data from observed data. A theory also allows the mind to encode data more efficiently than just storing the raw bits. In this way, sensemaking resembles parametric modeling. The brain retains the theory instead of the raw data, but retains the information contained in the data in the parameters of the theory. Different types of theories can describe different types of relationships among data. Mental maps describe spatial relationships, stories describe temporal and causal relationships, scripts describe roles, plans describe an intended sequence of events, etc. (Klein et al. 2003)

WHY SENSE-MAKING?

Sensemaking shares all of the salient features of data analysis noted above, but there are other reasons to suspect that cognitive science may offer a theoretical foundation for data analysis.

Almost all data analysis is conducted by humans in order to improve their understanding of the world. Hence, data analysis extends the sensemaking process. Moreover, data analysts may use their internal reasoning processes as a model for their data analysis.

As Velleman (1997) points out, data analysis is a revival of Francis Bacon’s scientific method and could be considered the modern incarnation of that method. The history of this method resembles a movement from an internal sensemaking process, which can often be subjective, to an external sensemaking process that tries to be objective. If so, we should expect data analysis to display a foundation based on sensemaking with added safeguards against the biases that sensemaking is vulnerable to.

PRELIMINARY RESULTS

I followed Cox and Mallows suggestions and compared data analysis case studies and suggestions available in the statistical literature to the sensemaking model. In all cases most of the data analysis prescriptions fell into one of the four sensemaking steps. The remaining prescriptions were all “meta-steps” which dealt with the data analysis process itself (e.g, plan, understand the problem). These meta-techniques may be evidence that data analysis has incorporated safeguards against the vulnerabilities of the internal sensemaking process. A visual description of the compliance of 11 papers:

Figure 5
Figure 5 (graphics5.png)

LOOKING FORWARD

This preliminary analysis supports the hypothesis that sensemaking may provide a theoretical model for data analysis. Further study must address the question, “How can we provide a rigorous demonstration that data analysis follows a sensemaking model?” As Cox points out, only a small number of data analysis case studies are available in the statistical literature. Future research may employ more direct methods such as observing actual data analyses or scouring computer code used to perform data analyses.

if a cognitive basis is demonstrated, cognitive science may provide opportunities to improve the activity of data analysis. Do current data analysis methods provide adequate safeguards to the well documented list of sensemaking biases?

Finally, a firmly established model for data analysis can be used to expand the academic understanding of the sub-field. The author originally embarked on this study to address the lack of well defined objectives for data visualization techniques. A better definition of the purpose of data analysis methods may provide new opportunities to optimize data analysis techniques.

ACKNOWLEDGEMENTS

  • The National Science Foundation
  • Dr. Hadley Wickham

REFERENCES

Alexander, et al. (1977). A pattern language: towns, buildings, construction. Oxford University Press, USA.

Bailyn (1977). ‘Research as a cognitive process: Implications for data analysis’. Quality and Quantity 11(2):97–117.

Becker, et al. (1987). ‘Dynamic Graphics for Data Analysis’. Statistical Science2(4):355–383.

Box (1976). ‘Science and Statistics’. Journal of the American Statistical Association 71 (356):791–799.

Box, et al. (1978). Statistics for Experimenters: An Introduction to Design, Data Analysis, and Model Building. John Wiley & Sons.

Cabrera & McDougall (2002). Statistical consulting. Springer Verlag.

Chatfield (1995). Problem solving: a statistician’s guide. Chapman & Hall/CRC.

Coplien (1996). Software patterns. Citeseer.

Cox (2007). ‘Applied statistics: A review’. Annals of Applied Statistics1(1):1–16.

Friedman (1997). ‘Data mining and statistics: what’s the connection? ’Computing Science and Statistics: Proceedings of the 29th Symposium on the interface.

Hey & Trefethen (2003). ‘The Data Deluge: An e-Science Perspective’ pp. 809–824.

Klein, et al. (2003). ‘A Data/Frame Theory of Sense Making"’. In Expertise out of context: proceedings of the sixth International Conference on Naturalistic Decision Making, pp. 113–155.

Mallows (2006). ‘Tukey’s Paper after 40 years (with discussion)’. Technometrics48(3):319–325.

Pirolli & Card (2005). ‘The Sensemaking Process and Leverage Points for Analyst Technology as Identified Through Cognitive Task Analysis’. Proceedings of International Conference on Intelligence.

Ribarsky, et al. (2009). ‘Science of analytical reasoning’. Information Visualization 8(4):254–262.

Tukey & Wilk (1966). ‘Data analysis and statistics: an expository overview’. In Proceedings of the November 7-10, 1966, fall joint computer conference, pp. 695– 709. ACM.

Tukey (1962). ‘The Future of Data Analysis’. The Annals of Mathematical Statistics 33(1):1–67.

Wild & Pfannkuch (1999). ‘Statistical thinking in empirical enquiry’. International Statistical Review/Revue Internationale de Statistique67(3):223–248.

Velleman (1997). The Philosophical Past and the Digital Future of Data Analysis. Princeton University Press.

Content actions

Download module as:

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks