Tag Archives: data

My personal attitude towards data – ethics in data storytelling.

On September 26, 1983, in the middle of the Cold war, Russian lieutenant Stanislav Petrov was on duty at the command centre of the nuclear early-warning system. The system reported that six missiles were fired from the US toward the ZSSR. Petrov based on provided information had to decide whether the alarm was true or false and to obey or not obey orders. After countless minutes that seemed to be an eternity, Petrov judged that it was a false alarm and saved the world against third war – the nuclear for sure. Later, the investigation revealed that the system malfunctioned.

But what kind of the world could we live in now if Petrov had not considered other options of the system’s response? Having that historical event in mind, can we trust any information without a doubt?

As data analysts or data storytellers, we are like a nuclear early-warning system. We provide people with the information they need to make critical decisions and shape the future. It is a very responsible role.

Why is so hard not to lie with data?

Does it sound controversial?  I believe so. Does it sound realistic? For sure. Why do I think so? Are you confident that you know all aspects of a subject that you want to present to others? Have you considered all possible options and looked at them from all involved stakeholders’ perspectives? Are you sure that the data set time period is long enough, and data quality is high? There are more questions than answers. So, tell me which version of the truth you are holding in your visualizations?

I do not accuse anybody to mislead people on purpose. Most of the time when we prepare data analysis and data visualizations to communicate information, we have pure intentions. The case is that we hold some biases and believes, and our brain uses previous experiences, and constantly makes unconscious assumptions. All that influences our thoughts and perception.

Harmful data visualization

Let’s do the mental exercise and think together about how harmful data visualization can be. Currently, I’m reading an exciting book by one of the most recognizable authors of the information visualization domain Alberto Cairo “How charts lie”. In one of the chapters, there is a story about nationalist Dylann Roof, who killed several Afro-Americans by being influenced by some charts that presented a number of crimes vs ethnic roots. That shocked me and opened my eyes to the potential consequences of distributing misleading visual representations of data.

That warning is more for data journalists and other people who juggle with data publicly. Often to get more votes or support or to influence some kind of the audience line of thinking. However, even in the business environment, we must be cautious not to make the same mistakes, because results can be catastrophic and have a real impact on people. Nevertheless, all of us should remember that when we share any data on social media or on other web pages.  

The potential negative impact of wrongly done analysis and poor data visualizations:

  • Hundreds of people can lose their job,
  • Profitable business sector can be shut down,
  • Launch of a new product can miss the target,
  • Thousands or billions of people can be at threat because of the release of the new drug.

This vulnerability is real because people who make decisions make history. There is always a human factor in any success or failure.

Do you feel like an influencer?

Some time ago I had a lot of fun preparing and sharing data visualization. But currently, I’m not so eager to do that. I didn’t have enough confidence in the data that are available, and I don’t have enough time to dive into and understand the specific subject, make analyses and investigations.

In upcoming posts, I’ll focus on ethics from a data visualizations point of view. The first one is data range.

Data range

Insights could differ very much in case of changing data scope. Anyone who has some shares on the stock market knows that depending on the selected time range he or she can observe positive or negative trends. The same cognitive dissonance we can have presenting data within our organization. Maybe in the last two years, we achieved tremendous revenue growth, but looking at revenue from a longer perspective, it can turn out that we even got closer to the results from the financial crisis (pick your favourite one as an example, they come and go periodically).

Figure 1 depicts what kind of understanding and feeling the investor can have to look at the same data but from different ranges. The left chart can indicate that results are declining, but when we look at the right one, we can see that in the longer perspective trend is positive.

Figure 1

Of course, our narration can be built around the latest two years of growth, but we shouldn’t hide information from the bigger picture. The approach in such a case should be to display the bigger picture first – a longer period of data is displayed and then zoom in on the last two years to present factors of recent revenue growth.

Another example, which is notoriously used to present voting results, is presenting people support for particular parties but having only people who voted as the full population. When I listen to the news in the mass media, often people refer to the election results without considering the voter turnout. That narration skews reality. Let’s see the below example. Figure 2 shows the result of the latest presidential elections in Poland. What will most people remember from the chart? That Duda won and had more than 50% of public support.

Figure 2

But this is not true! The real public support for Duda was 34.49% if we consider the voter turnout. The voter turnout in this election was 68.18%. It means that 31.82% of Poles didn’t go on the election. I would love to see in the mass media charts which present the entire election results, including those who didn’t vote. Then we would have the complete picture of people’s political preferences. However, I still see truncated data scope.

Figure 3

By manipulating data range as a timeline or included/excluded categories, we tell different stories about data and evoke different understandings and feelings in our audience about the subject. Let’s remember that to not lose in translation the most objective view possible.

Embrace diversity – how to design data visualizations for people with visual impairments.

Have you ever thought that it is possible to discriminate people through data visualization design? Several years ago, it sounded strange to me too, but indeed, it can be done unconsciously if you are not aware of the topic.

Discrimination is most often associated with skin colour, gender, age, religious beliefs, or nationality. However, this negative social phenomenon can have much broader spectrum. One of them, not at all intuitive, is data visualizations practices. The topic is gaining importance as more and more data is used to explain global processes, and those with difficulties in that area are being left behind. It may not be simple, but the onus is on data community and data visualization practitioners to develop new best practices to communicate data in more democratic way with those with difficulties in this area in mind.

To make data visualization more accessible to a wider audience, three dimensions can be improved: vision, cognitive and learning difficulties, and motor capabilities. The basic, obvious difficulty is related with vision impairments; but the degree of impairment is key. I will not discuss the most severe degree, which is blindness (this is a topic for different post), but I will bring closer the subject of colour-blindness and low vision impairments.

COLOUR BLINDNESS

In data visualization, colour is the most important communication channel. The ability to see and understand the meaning of colours helped our ancestors to survive in deep jungles or on savannas. Colour informed them about non-toxic food or allowed them to spot predators in the forest.

Today, we are still sensitive to colours and these naturals reactions are used in many ways. For instance, most warning signals use red colour, because we naturally associate it with danger or action (red is a colour of the blood)[1]. Studies show that prolonged exposure to the red colour can cause the heart rate to accelerate as a result of activating the “fight or flight” instinct[2]. In opposite, blue colour has a calming effect.

However, not everyone can see colours. Approximately about 10% of human population has trouble seeing colours correctly. If you would like to deepen your knowledge about types of colour-blindness, please check the website. There you can learn about causes of colour-blindness, test yourself, and find a tool to check if prepared visualization is in line with best practices.

There are several basic principles that improve your colour palette and enable visualization for broader audience. To understand them we need to understand two important colour properties:  hue, and saturation. Hue defines colour in terms of pink, blue, yellow, or magenta. Saturation is nothing more than volume of the colour. By juggling these main properties we can improve or worsen results of our work.

RED-GREEN

First of all, stop using red-green palette which is confusing or even unrecognizable to colour-blinded people. This is my humble recommendation. For most people with colour difficulties this red and green colour look the same (see Picture 1).

Picture 1

Most modern data visualization tools, such as Tableau or Power BI already have available colour palettes that handle with the topic. Both mentioned tools have also option to create custom compositions and upload them to the application (custom colour palettes for Tableau and Power BI).

If you are wondering about the right colour palettes, check out the ones presented on Picture 2 and Picture 3. They are nice, clean, and fancy and will work for any reports.

Picture 2 – Vivid & Energetic
Picture 3 – Elegant & Sophisticated

CONFUSING COLOUR PAIRS

Even though we try to avoid the red-green colour range there are still other pairs that resulted in similar way. In recent years I have been observing the dizzying career of the grey-blue duet. I like this combination as well, however, it is essential to match them wisely (see Picture 4).

Picture 4

MONOCHROMATIC SCALE

Sometimes the best option is to simply stick with one colour and play with its saturation to differentiate specific categories or data ranges (see Picture 5). This approach can be used in most visualizations.

Picture 5

More practical colour ranges you can find here, and if you would like to test your composition on specific charts use this website.

SHAPE

Another interesting channel we can use to help visually impaired people easily distinguish between coded data is to assign shapes to different data categories. A good example of how the introduction of shapes can make difference is the well-known RAG.

RAG stands for RED-AMBER-GREEN and is widely used in business environment to communicate performances, risks or statuses of activities. It is most commonly used in project management to report status of tasks, but due to its simplicity, it is also used in data visualization to highlight for instance KPIs (key performance indicators) performance. Red indicates about underperforming, amber that something is an issue and needs to be monitored, and green that is fine.

But as you already know RED-GREEN can be very confusing for colour-blind people. So, my suggestion is to use a shape as another visual communication channel to make sure everyone is on the same page. Instead of format with coloured background, it would be better to introduce icons that have different shape and are coloured in red, amber, or green (see Picture 6).

Picture 6

But what about charts like line chart or bar chart? How can we improve distinction between specific lines or bars? We can use different patterns to distinguish one bar from the rest one or to present several lines on one chart (see Picture 7).

Picture 7

WRITTEN INSIGHTS

Written descriptions, recommendations or insights can be tricky. Especially when you want to use colour names to emphasise certain points, data categories or issues. How someone, who does not see green colour (see Picture 1) can understand a message “All departments represented by green bars have exceeded their sales targets this year”? This message must be rewritten to “Departments A,B, and C have exceeded their sales targets this year” to ensure that all stakeholders understand it.

LOW VISION

In addition to the most recognizable challenge, which is colour blindness in data visualizations design, there is another related to vision loss due to age, accidents, or genetics. For those who suffers from low vision, we must remember that size and contrast of displayed text matters. Especially when we display some materials on screens in conference rooms, but even when you present something via communicators as Teams, or Zoom, size matters. You can read more about the topic here.

SIZE

When it comes to the font size, there is no one good recommendation. It depends on the purpose. If you are going to display materials at a conference in a large conference room, it is better not to use smaller fonts than 18 when describing axes or legend and have less information on the slides. There is nothing wrong in having more readable slides rather than fewer but cluttered.

A different approach can be taken when creating reports. I would say use a font size 9 or 10 for axis or legend description, but in no other case should you go lower than 12. In reports crucial thing is to group information together or to display them in close proximity to make it easier to interpret or make decisions. That is why optimizing space is so important. These screens can always be enlarged, and anyone can take advantage of them.

Picture 8

CONTRAST

The general rule is to maintain high contrast between background and foreground (e.g. white – black, black – white). A typical accessible barrier for people with low contrast sensitivity is grey text or figures on a light background. However, for some people better combination is with lower contrast, because they suffer from the bright background (e.g. they have to change a screen background to the darker to be able read what is on the screen).

As you can see there is no single best answer how to approach this challenge. A good practice is to give people the option to change the display mode from bright (light background and dark foreground) to dark one (dark background and light foreground).

Picture 9

By these small changes, we are bringing better user experience in our organizations or widely, if we prepare data visualizations for the media or other public usage.

[1]https://rochester.edu/news/show.php?id=3856

[2] https://journals.sagepub.com/doi/full/10.1177/2158244014525423

Time orientation

Time orientation is crucial for the modern world to understand events and draw the correct conclusions.

The pre-industrial culture had not been so tided to time, and most often people perceived time in cycles as day-cycle or season-cycle. However, industrialization forced on us to create precise time systems and changed circularity to the linear phenomenon.

Currently, the majority of people live within time, and this time has for most of us one orientation from left to right and can not be reversible. It is one of human heuristics – mental shortcut, which helps us understand the world.

The example

Data visualizations best practices tell us to display time on the x-axis with left-right orientation (most of the culture except, e.g. Middle Eastern) and do not play with it especially when charts are going to be short displayed. In the end of August in Polish Public TV, a chart for unemployment rates was presented (see image below) with all possible misleading characteristics. I can not tell if it was intentional or not and politics are not the topic of this post, but let’s have a closer look at how this chart is designed and why it is designed wrong.

I have mentioned above that the human mind craves for mental shortcuts.  A quite possible scenario, in this case, can be that receiver reads only the first label for first bar from the left side on the x-axis and understands and remembers that on x-axis there are months of 2020 start from July (Lipiec 2020). The automated interpretation would be that two next bars represent data for two upcoming months, so August 2020 and September 2020. Of course, someone can raise a question in here “We don’t have data for September yet”, but my question is what a level of general data literacy and competency within society is? I am going even further and asking is it ethical to show data visualization for short time without a proper explanation of the graph? But it is a topic for another post. Going back to our example, the conclusion which can be seen is that the unemployment rate has decreased. Where is totally opposite.

However, let’s put ourselves in devil’s advocate shoes and consider, can we approach creatively presenting timeline or not? As I mentioned above, human eyes are used to interpret the timeline from left to right side. Due to that, it is good to keep that order. Sometimes we have a temptation to change it because for example, we would like to compare year over year change and we use last year data as a benchmark. However, that way of presenting data will not be intuitive for receivers. We must be very careful, when we are dealing with data associated with time.

How to fix it?

So how we can fix this visualisation?

First of all, let’s break years into two separate columns and give the time a proper order. Adding columns with years, we clearly indicate that we are dealing in here with two different time stamps. A title or a subtitle itself can help us emphasise that we are presenting a comparison between time points(July 2019 to July – June 2020), so don’t hesitate to include it. Also, I decluttered visualisation by removing background colour and 3d effects, which helps receivers focus only on data. To highlight the most current bar, I changed colour to orange.

All those changes enabled to present data story professionally and properly. Apart from all aesthetic aspects, data visualisation designers need to remember about ethics. The same as in other professions, data visualisation designers have their code of conduct.