16 Jan Statistical Graphics, Exploratory Data Analysis and Data Visualization
I was recently told that statistical graphics are commonly used throughout the modeling process and that the term “data visualization” appears frequently in conjunction with the term “analytics”. The question that followed was: How are statistical graphics used in exploratory data analysis and is there a difference between the term “statistical graphics” and “data visualization”. So, I did some reading and thinking.
Statistical graphics are an important means during exploratory data analysis in order to familiarize oneself with the data graphically. As Fox (2008, p. 27) writes in his book, “statistical graphs are central to effective data analysis, both in the early stages of an investigation and in statistical modeling” in order to identify structure in the data that CANNOT be made “visible” just by summary statistics as in the case of Anscombe’s quartet.
I would argue that statistical graphics complement summary statistics by visualizing the distribution of data, its skewness, outliers as well as correlations among two or multiple explanatory variables.
Statistical graphs are central to effective data analysis, both in the early stages of an investigation and in statistical modeling.
For instance, Fox (2008) presents in Chapter 3 a variety of different data visualization techniques and its mathematical foundations. While the majority of those techniques were known to me, some of them were not like Nonparametric Density Estimation, Coded Scatterplots, and Conditioning Plots.
Data visualization Techniques:
- Univariate Displays
- Nonparametric Density Estimation
- Quantile-Comparison Plots
- Plotting Bivariate Data
- Parallel boxplots
- Plotting Multivariate Data
- Scatterplot Matrices
- Coded Scatterplots
- Three-Dimensional Scatterplots
- Conditioning Plots
Another really good book that I read through as part of my Master degree was “Now You See It: Simple Visualization Techniques for Quantitative Analysis” by Stephen Few.
Regarding the question on whether there is a difference between the term “data visualization” and the term “statistical graphics”, I would say it depends. While I believe that data visualization includes statistical graphics (as it also presents data visually), the discipline of data visualization may be applied to a much wider area than only to exploratory data analysis. For instance, the practice of data visualization plays an integral part for instance in tools like MS Excel, MS PowerPoint, Tableau or products as for instance presentations or infographics (like this one: http://visualizing.org/full-screen/432169).
I would argue that the important part in data visualization is to present only the relevant information to a particular audience and remove all other data points and data visualizations that may be perceived – knowingly or unknowingly – as distractions and do not directly support the underlying objective or agenda of the visualization. That being said, whoever creates data visualization products needs to know about the objective that needs to be supported and accomplished through the means of data visualization. The mere request for a nice visualization or report on which visualizations may be based (I get a fair share of them) is not very helpful without properly understanding the objectives and specific needs of such reports and visualizations.
- Few, St. (2009). Now you see it: Simple visualization techniques for quantitative analysis. Oakland, California, United States of America: Analytics Press
- Fox, J. (2008). Applied regression analysis and generalized linear models (3rd ed.). Los Angeles, California, United States of America: Sage.