Tag Archives: data range

My personal attitude towards data – ethics in data storytelling.

On September 26, 1983, in the middle of the Cold war, Russian lieutenant Stanislav Petrov was on duty at the command centre of the nuclear early-warning system. The system reported that six missiles were fired from the US toward the ZSSR. Petrov based on provided information had to decide whether the alarm was true or false and to obey or not obey orders. After countless minutes that seemed to be an eternity, Petrov judged that it was a false alarm and saved the world against third war – the nuclear for sure. Later, the investigation revealed that the system malfunctioned.

But what kind of the world could we live in now if Petrov had not considered other options of the system’s response? Having that historical event in mind, can we trust any information without a doubt?

As data analysts or data storytellers, we are like a nuclear early-warning system. We provide people with the information they need to make critical decisions and shape the future. It is a very responsible role.

Why is so hard not to lie with data?

Does it sound controversial?  I believe so. Does it sound realistic? For sure. Why do I think so? Are you confident that you know all aspects of a subject that you want to present to others? Have you considered all possible options and looked at them from all involved stakeholders’ perspectives? Are you sure that the data set time period is long enough, and data quality is high? There are more questions than answers. So, tell me which version of the truth you are holding in your visualizations?

I do not accuse anybody to mislead people on purpose. Most of the time when we prepare data analysis and data visualizations to communicate information, we have pure intentions. The case is that we hold some biases and believes, and our brain uses previous experiences, and constantly makes unconscious assumptions. All that influences our thoughts and perception.

Harmful data visualization

Let’s do the mental exercise and think together about how harmful data visualization can be. Currently, I’m reading an exciting book by one of the most recognizable authors of the information visualization domain Alberto Cairo “How charts lie”. In one of the chapters, there is a story about nationalist Dylann Roof, who killed several Afro-Americans by being influenced by some charts that presented a number of crimes vs ethnic roots. That shocked me and opened my eyes to the potential consequences of distributing misleading visual representations of data.

That warning is more for data journalists and other people who juggle with data publicly. Often to get more votes or support or to influence some kind of the audience line of thinking. However, even in the business environment, we must be cautious not to make the same mistakes, because results can be catastrophic and have a real impact on people. Nevertheless, all of us should remember that when we share any data on social media or on other web pages.  

The potential negative impact of wrongly done analysis and poor data visualizations:

  • Hundreds of people can lose their job,
  • Profitable business sector can be shut down,
  • Launch of a new product can miss the target,
  • Thousands or billions of people can be at threat because of the release of the new drug.

This vulnerability is real because people who make decisions make history. There is always a human factor in any success or failure.

Do you feel like an influencer?

Some time ago I had a lot of fun preparing and sharing data visualization. But currently, I’m not so eager to do that. I didn’t have enough confidence in the data that are available, and I don’t have enough time to dive into and understand the specific subject, make analyses and investigations.

In upcoming posts, I’ll focus on ethics from a data visualizations point of view. The first one is data range.

Data range

Insights could differ very much in case of changing data scope. Anyone who has some shares on the stock market knows that depending on the selected time range he or she can observe positive or negative trends. The same cognitive dissonance we can have presenting data within our organization. Maybe in the last two years, we achieved tremendous revenue growth, but looking at revenue from a longer perspective, it can turn out that we even got closer to the results from the financial crisis (pick your favourite one as an example, they come and go periodically).

Figure 1 depicts what kind of understanding and feeling the investor can have to look at the same data but from different ranges. The left chart can indicate that results are declining, but when we look at the right one, we can see that in the longer perspective trend is positive.

Figure 1

Of course, our narration can be built around the latest two years of growth, but we shouldn’t hide information from the bigger picture. The approach in such a case should be to display the bigger picture first – a longer period of data is displayed and then zoom in on the last two years to present factors of recent revenue growth.

Another example, which is notoriously used to present voting results, is presenting people support for particular parties but having only people who voted as the full population. When I listen to the news in the mass media, often people refer to the election results without considering the voter turnout. That narration skews reality. Let’s see the below example. Figure 2 shows the result of the latest presidential elections in Poland. What will most people remember from the chart? That Duda won and had more than 50% of public support.

Figure 2

But this is not true! The real public support for Duda was 34.49% if we consider the voter turnout. The voter turnout in this election was 68.18%. It means that 31.82% of Poles didn’t go on the election. I would love to see in the mass media charts which present the entire election results, including those who didn’t vote. Then we would have the complete picture of people’s political preferences. However, I still see truncated data scope.

Figure 3

By manipulating data range as a timeline or included/excluded categories, we tell different stories about data and evoke different understandings and feelings in our audience about the subject. Let’s remember that to not lose in translation the most objective view possible.