The Fundamentals of Data Visualization
An overview of the fundamentals of data visualization
This chapter is constantly being updated by the analytics consultants at Supertype to stay current with the latest trends in enterprise data management. Please check back regularly for updates.
Key Principles of Data Visualization
Effective data visualization involves a set of core principles that ensures that visualizations communicate insights clearly and effectively to the intended audience.
Adhering to these principles enhances the utility of data visualizations, ensuring not only conveying data accurately but also making it accessible and actionable for your audience.
Additionally, Edward Tufte created a comparison table between friendly and unfriendly visualizations. This table serves as a useful guide for understanding the principles of effective data visualization and can help in crafting better visualizations.
Friendly | Unfriendly |
---|---|
words are spelled out, mysterious and elaborate encoding avoided | abbreviations abound, requiring the viewer to sort through text to decode abbreviations |
words run from left to right, the usual direction for reading occidental languages | words run vertically, particularly along the Y-axis; words run in several different directions |
little messages help explain data | graphic is cryptic, requires repeated references to scattered text |
elaborately encoded shadings, cross-hatching, and colors are avoided; instead, labels are placed on the graphic itself; no legend is required | obscure codings require going back and forth between legend and graphic |
graphic attracts viewer, provokes curiosity | graphic is repellent, filled with chartjunk |
colors, if used, are chosen so that the color-deficient and color-blind (5 to 10 percent of viewers) can make sense of the graphic (blue can be distinguished from other colors by most color-deficient people) | design insensitive to color-deficient viewers; red and green used for essential contrasts |
type is clear, precise, modest; lettering may be done by hand | type is clotted, overbearing |
type is upper-and-lower case, with serifs | type is all capitals, sans serif |
Matching Data to Visualization Types
Selecting the right chart or graph is an art on its own. It is not just about making data visually appealing—they are powerful tools that can illuminate insights, reveal trends, and tell compelling stories. Understanding the strengths and purposes of various chart types will help you craft visuals that elevate the deliverance of the core message.
Imagine you’re a detective piecing together clues from a complex case. Each piece of evidence needs to be examined through the right lens to uncover the truth. Similarly, choosing the right visualization is about finding the perfect lens to view your data.
Type of Visualization
Bar Chart
Function: Comparing different categories or groups.
Purpose: Used for comparing values across categories.
Example: A company uses a bar chart to compare quarterly revenue across different regions. Each bar represents a region, and the height of the bar represents the revenue for that quarter.
Line Chart
Function: Showing trends over time.
Purpose: Ideal for illustrating trends and changes over time.
Example: A weather service uses a line chart to track temperature changes over a year. The x-axis represents months, and the y-axis represents temperature. The line connects the data points for each month, showing the trend.
Pie Chart
Function: Showing proportions and percentages of a whole.
Purpose: Used to illustrate the parts of a whole, showing percentages and proportional data.
Example: A non-profit organization uses a pie chart to show the percentage breakdown of their budget allocation, illustrating how funds are distributed across different programs.
Scatter Plot
Function: Showing relationships between two numerical variables.
Purpose: Useful for identifying correlations and patterns between two variables.
Example: A researcher uses a scatter plot to examine the relationship between study hours and test scores among students. Each dot represents a student, with study hours on the x-axis and test scores on the y-axis.
Histogram
Function: Displaying the distribution of a continuous data set.
Purpose: Shows the frequency distribution of a continuous dataset, helping to identify patterns and outliers.
Example: A fitness trainer uses a histogram to show the distribution of clients’ ages. The x-axis represents age ranges, and the y-axis represents the number of clients in each range.
Box Plot (Box-and-Whisker Plot)
Function: Showing the distribution of a dataset based on a five-number summary.
Purpose: Displays the spread and central tendency of a dataset, highlighting the median, quartiles, and potential outliers.
Example: A automobile company uses a box plot to compare the highway mileage of cars in different car types. Each box represents a car type, displaying the spread and central tendency of car highway mileage.
Area Chart
Function: Showing cumulative data trends over time.
Purpose: Similar to line charts but with the area under the line filled, useful for emphasizing the magnitude of changes over time.
Example: A music company uses an area chart to show the cumulative number of music sales by format over the past year. The x-axis represents years, and the y-axis represents the number of sales in USD.
Heat Map
Function: Showing intensity of data across a spectrum.
Purpose: Displays data where individual values are represented by colors, useful for showing the intensity of data and identifying patterns.
Example: A data analyst uses a heat map to show website activity, where the intensity of color represents the number of clicks on different areas of the site.
By understanding the functions and purposes of different chart types, whether you’re dealing with categorical, quantitative, or relational data, you can create visuals that not only capture attention but also enhance comprehension and insight.
Effective Visualization Techniques
The effectiveness of a visualization hinges on aligning the type of chart with the nature of the data and the message you want to convey. We’ll also delve into real-world examples and case studies to illustrate how these principles are applied in practice.
Avoiding Common Pitfalls
When creating visualizations, it’s important to avoid common pitfalls to ensure your data is accurately represented and easily interpreted.
Misleading Visuals
- Scale Issues: Manipulating the scale of axes can distort the data. For instance, truncating the y-axis in a bar chart can exaggerate differences between bars.
- Always use a consistent scale or start the axis at zero to avoid misleading viewers.
- Inappropriate Chart Types: Using a pie chart for data with too many categories can be confusing. Pie charts work best with a small number of categories.
- For more categories, consider a bar chart instead.
- Cherry-Picking Data: Selecting only a subset of data to support a specific narrative can be misleading.
- Ensure your visualization represents the full dataset to provide an accurate picture.
Overcomplicating with Too Much Data
- Cluttered Visuals: Including too many data series or categories in a single chart can make it hard to interpret. Keep your visualizations focused and use additional charts if necessary to break down complex data.
- Excessive Details: Adding too many details (e.g., excessive grid lines or labels) can overwhelm the viewer. Use clean, simple designs and only include essential information to ensure clarity.
- Complexity Over Clarity: Sometimes, a simple chart (like a basic bar or line chart) can be more effective than a complex visualization (like a 3D chart or interactive dashboard). Prioritize clarity over complexity to ensure your audience can easily understand the data.
Data Storytelling
Traditional education often treats creative storytelling and technical analysis as separate skills, but today’s job market values professionals who excel in both. However, data visualization is a prime example of where these two areas intersect, making it a highly sought-after skill in our data-driven world. Data storytelling involves merging solid data with compelling narratives to present insights in a way that resonates with audiences. This approach relies on three essential components: data, narrative, and visuals.
- Data: The raw numbers and facts that provide the foundation of the story.
- Narrative: The context and interpretation that give meaning to the data.
- Visuals: The charts, graphs, and other visual aids that illustrate the data and narrative.
By integrating these elements, data storytelling transforms complex data sets into understandable, memorable, and actionable insights. The goal is to make the data not just accessible, but also engaging and persuasive, turning dry statistics into a compelling story that drives home key points.
Example: Good Data Storytelling
Some good data story telling projects.
These are the steps to make a good data story telling
1. Crafting a Data Story
2. Building the Narrative
3. Delivering the Stories
Words, graphics and tables are serves a single purpose - presentating information. As written by Edward R. Tufte, “What is sought in design for the display of information is to achieve clear potrayal of complexity. Not the complication of the simple; rather than the task of the designer is to give visual access to the subtle and the difficult - that is the revelation of the complex.”
The Role of Data Storytelling in Decision-Making
Data storytelling plays a crucial role in decision-making processes across various fields and industries. Here’s how:
-
Enhancing Understanding and Clarity
-
Simplifying Complexity: Data storytelling simplifies complex data sets and analytical results, making them more accessible to non-experts. This clarity ensures that decision-makers can easily grasp the insights without getting bogged down in technical details.
-
Contextualizing Data: By providing context, data storytelling helps audiences understand why the data matters and how it applies to their specific situation. This contextualization is essential for making informed decisions.
-
-
Engaging and Persuading Audiences
-
Emotional Connection: Narratives can evoke emotions, making the data more relatable and memorable. This emotional connection can be crucial for persuading stakeholders to take action based on the data.
-
Visual Appeal: Effective visuals can capture attention and highlight key insights, making the data more compelling. Engaging visualizations help keep the audience interested and focused on the main points.
-
-
Driving Action
-
Highlighting Key Insights: Data storytelling helps to pinpoint the most important insights, ensuring that decision-makers are focusing on the critical information. Clear, focused stories make it easier to identify what actions need to be taken.
-
Facilitating Communication: Well-crafted data stories bridge the gap between data analysts and decision-makers. They enable effective communication of data-driven insights across different levels of an organization, ensuring everyone is on the same page.
-
-
Supporting Evidence-Based Decision-Making
-
Building Credibility: By grounding narratives in solid data, data storytelling builds credibility and trust. Decision-makers are more likely to trust and act on insights that are backed by data.
-
Mitigating Bias: Data storytelling can help counteract cognitive biases by presenting a balanced view of the data. It encourages decision-makers to rely on evidence rather than intuition or assumptions.
-
-
Enabling Collaboration and Discussion
-
Shared Understanding: A well-told data story creates a common understanding among stakeholders, facilitating collaborative decision-making. It ensures that everyone involved is working with the same information and insights.
-
Stimulating Discussion: Data stories can stimulate discussions and debates, encouraging a deeper exploration of the data and potential implications. This collaborative approach often leads to more robust and well-rounded decisions.
-
Author
This chapter was authored by Gerald Bryan, an analytics consultant at Supertype with extensive experience in enterprise AI consulting in Indonesia, having worked with companies such as Adaro Group, Central Bank of Indonesia, Bursa Efek Indonesia, and Toyota Astra Motor. He also developed Sectors (a financial market intelligence platform), responsible for the data gathering and its ETL pipelines.
Gerald is a former Apple Developer Academy @Binus Scholar, with one user-centric product available on the App Store. He also holds the Microsoft Certified Data Analyst Associate certification, with a focus on using PowerBI for data visualization and storytelling.
Contributors
- Evelyn Ong, project owner at Supertype