Looking at Data

A big part of information visualization is, of course, the data. This week I delve into examining datasets and how different types of visualizations may be better suited for different types of datasets. How should a certain dataset be visualized? What types of data shouldn’t be visualized?

Examining the Dataset

The first question to ask is: should this dataset be visualized? For small datasets, non-graphical representations, such as tables, may perform better at reporting the data.¹ Tables are better when viewers want to quickly find precise individual values.² Questions such as “What is the current unemployment rate?” can quickly be answered without visualizing the dataset.³ What are we trying to achieve by visualizing the dataset, and can that goal be obtained more effectively through other means?

Even when we have a dataset that we think should be visualized, the right visualization to use isn’t immediately apparent. It shouldn’t be! We should play around with different visualizations and try different angles of approaching the dataset. We have to explore the dataset visually.⁴ To aid in this exploration, we should also have a good understanding of what exactly the dataset is. To do so, we can think about different types of data.

These slides ³ ⁵ provide a good vocabulary for describing data abstractly. Thinking about data in terms of items, attributes, links, positions, and grids will clarify what types of visualizations are better suited for the dataset we are working with. The slides explain and provide examples for each of these terms, as well as how different types of datasets contain different subsets of these components.

Stephen Few’s white paper ² provides more details on categorical scales as well as different types of relationships between numbers (e.g. time-series relationships, correlation relationships, ranking relationships ). He gives examples and definitions of these relationships and guidelines for how to begin visualizing those types of datasets. He also briefly covers when to use points, lines, and bars. The paper contains a useful chart that gives suggestions for what type of graph would be best suited for each type of data relationship.

On the Theory of Scales of Measurement by S. S. Stevens is a short and useful paper on different scales of measurement (nominal, ordinal, interval, and ratio). A table in the paper gives a succint descriptiion of each scale, followed by a discussion on each type of scale.

Chapters 6-13 of Fundamentals of Data Visualization by Claus O. Wilke are very useful. Each chapter is on a different type of dataset (amounts, distributions, proportions, etc) and provides an in-depth explanation of different ways of visualizing that data type, as well as the advantages and disadvantages of using one visualization type over another. For example, we can use a pie chart or a bar chart to visualize proportional data, and the proper choice depends heavily on the context, the dataset, and our visualization goal. Pie charts provide a better sense of a part compared to the whole, while bar charts provide a better sense of the parts compared to each other.

The paper talks about seven data types: 1d, 2d, 3d, temporal, multi-dimensional, tree and network, as well as seven tasks: overview, zoom, filter, details-on-demand, relate, history, and extracts. We should consider data types in the context of what we want viewers to be able to do with the data when presented. This paper is a good basis for thinking about how viewers will interact with our visualization.

While this mantra seems specific to interactive visualizations, they raise important questions for static visualizations. Which part of the data should we zoom in on, and what details should be filtered out? In a static visualization, these answers are decided by the designer, so the designer must be very conscious of the choices they make.

Misc / Resources

The visualization wheel ⁶ is a conceptual device used to think about an information visualization; the main idea is that visualizations must reach some sort of balance between the amount of information presented and its intelligibility. Cairo identifies several spectrums that a visualization can be along (e.g. Density-Lightness, Abstraction-Figuration, etc). These are useful guidelines to think about, especially in relation to the intended audience.

“The art and science of the scatterplot” by Sara Kehaulani Goo is a short article on scatterplot literacy. It doesn’t go too in-depth but their study suggests that scatterplots are difficult to read without prior experience. Level of education seems to be a big factor in whether a viewer can correctly interpret a scatterplot.

References

The Visual Display of Quantitative Information by Edward Tufte ↩
Effectively Communicating Numbers by Stephen Few ↩ ↩²
Slides by Alexander Lex ↩ ↩²
Data Visualization by Kieran Healy ↩
Slides by Miriah Meyer ↩
The Functional Art by Alberto Cairo, Chapter 3 ↩