knitr::opts_chunk$set( message=FALSE, warning=FALSE) #echo = FALSE,



Most people know the famous quote “I only believe in statistics that I doctored myself”. Nowadays, as the amount and availability of data are increasing tremendously, the relevance of this statement becomes more and more pronounced. The ability to properly derive useful information from data is both key and obstacle for business decision-makers, politicians and ordinary people. This trend is reinforced as social media strives heavily for our attention (e.g. Zuboff 2019) with pictures, facts, stories, and our resources for verification are usually very limited. The good news is that it is possible to come up with more evidence-based decisions, especially as data availability (see e.g. DataHub or Dataset Search), software and hardware limitations become less of a problem. But how to separate the wheat from the chaff? One step in the right direction will certainly be proper data literacy and transparency.


Good readings on how (not) to do statistical analysis are for example provided by Reinhart (2015) or Spiegelhalter (2019). The work of Kahneman (2011) and Mousavi and Gigerenzer (2014) revolves around rational decision-making and people’s difficulties in probabilistic reasoning. Analytical tools such as statistics software R allow powerful data analysis and, in conjunction with R-Markdown or Quarto, present results together with the code used. Platforms like R-Bloggers and R Weekly facilitate developments in reproducible data analysis. Such possibilities, in parallel with adequate habits of communication (eg. Watzlawick 2018; Franconeri et al. 2021) can enable cultivating sound (public) discussion of data-driven decision making. Being able to do the work provides you with a range of skills from exploring and visualizing data structures, over predicting likely outcomes for new instances (eg. Kuhn and Johnson (2013)), to assessing causal relationships (eg. Pearl and Mackenzie (2018)).


As a start, let’s see how to transcend data into information with just a few lines of R-code: Data is provided by the worldhappiness report 2021. The publisher collects and summarizes data on individual “self-stated” happiness across a lot of countries. We read the data

#data from https://worldhappiness.report/ed/2021/
data_in <- read_excel("DataPanelWHR2021C2.xls")

#show data: 

and manipulate it using the popular R-tidyverse framework in order to allow a regional as well as a developmental perspective on perceived happiness across the world. The resulting graph is inspired by Healy (2018). We select data from 2010 to 2020 and add the constraint that a country must be observable for at least 5 years. The resulting graph, a map of the world’s happiness across space and time, is sorted with regard to a country’s average happiness:

data_in %>%
  filter(year %in% c(2010:2020)) %>% #look at data from 2010 to 2020
  group_by(`Country name`) %>%
  summarize(happiness_mean=mean(`Life Ladder`), n=n()) %>% #mean happiness by country
  filter(n>=5) %>% #minimum 5 observations per country
  mutate(rank=rank(-happiness_mean))  %>% #create country specific happiness rank
  inner_join(data_in, by=("Country name"))  %>% #combine with yearly data
  mutate(year=as.factor(year), `Country name`=fct_reorder(factor(`Country name`), -rank)) %>% #order countries
  filter(year %in% c(2010:2020)) %>%
  ggplot(aes(y=`Country name`, x=year,  fill=`Life Ladder`)) + #create graph
  scale_fill_viridis(option="inferno") +
  geom_tile(colour="white") +
theme_minimal(base_size = 8) +
  theme(axis.text.x = element_text(angle = 45, hjust=1)) +
  labs(title="Hierarchy and development of happiness", fill="Happiness score",
       y="Country", x="Year")  

Not very unexpectedly, the Nordic countries are at the top of the list, whereas African and Oriental countries are at the bottom. We see that average happiness is a relatively stable phenomenon, overall. Among others, the happiest country, Denmark, also had a decrease in average happiness in 2020. Was this due to the COVID-19 pandemic? Looking at the bottom of the list, we find the second last country Afghanistan, for which, after a relatively good year in 2010, happiness decreased steadily afterwards. Considering changes or extreme values are a potentially fruitful opportunity, when one is interested in addressing the welfare of nations, for both good and bad.

What are the reasons that some countries are better off than others? Whereas dynamic geographical determinism is suggested by Diamond (1997), economic theories following Adam Smith emphasized the importance of technological progress, productivity, trade, and functioning and inclusive institutions (Robinson and Acemoglu 2012) as driving forces of long-lasting growth and development. According to Harari (2014) political and religious ideas can have a large impact on the organization and distribution of resources as well. In addition, a rather internal perspective of how complexes such as desire and fear can shape the development of mankind is formulated by Nietzsche (1886). In contrast, Seligman (2012) suggests a model of growth based on positive psychology. These vast number of ideas show that a successful organization of human society is a demanding task. In the era of increased data availability, proper empirical analysis can help to do better-informed decisions, enhancing economic prosperity, allowing people to flourish, and to live a healthier and happy life.


Diamond, Jared M. 1997. Guns, Germs, and Steel.
Franconeri, Steven L, Lace M Padilla, Priti Shah, Jeffrey M Zacks, and Jessica Hullman. 2021. “The Science of Visual Data Communication: What Works.” Psychological Science in the Public Interest 22 (3): 110–61.
Garnier, Simon, Ross, Noam, Rudis, Robert, Camargo, et al. 2021. viridis - Colorblind-Friendly Color Maps for r. https://doi.org/10.5281/zenodo.4679424.
Harari, Yuval. 2014. “Sapiens: A Brief History of Humankind.” Publish in Agreement with The Deborah Harris Agency and the Grayhawk Agency.
Healy, Kieran. 2018. “Visualizing the Baby Boom.” Socius 4: 2378023118777324.
Kahneman, Daniel. 2011. Thinking, Fast and Slow. Macmillan.
Kuhn, Max, and Kjell Johnson. 2013. Applied Predictive Modeling. Vol. 26. Springer.
Mousavi, Shabnam, and Gerd Gigerenzer. 2014. “Risk, Uncertainty, and Heuristics.” Journal of Business Research 67 (8): 1671–78.
Nietzsche, Friedrich. 1886. Beyond Good and Evil.
Pearl, Judea, and Dana Mackenzie. 2018. The Book of Why: The New Science of Cause and Effect. Basic books.
R Core Team. 2021. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Reinhart, Alex. 2015. Statistics Done Wrong: The Woefully Complete Guide. No starch press.
Robinson, James A, and Daron Acemoglu. 2012. Why Nations Fail: The Origins of Power, Prosperity and Poverty. Profile London.
Seligman, Martin EP. 2012. Flourish: A Visionary New Understanding of Happiness and Well-Being. Simon; Schuster.
Spiegelhalter, David. 2019. The Art of Statistics: Learning from Data. Penguin UK.
Watzlawick, Paul. 2018. Wie Wirklich Ist Die Wirklichkeit?: Wahn, Täuschung, Verstehen. Piper ebooks.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Wickham, Hadley, and Jennifer Bryan. 2019. Readxl: Read Excel Files. https://CRAN.R-project.org/package=readxl.
Zuboff, Shoshana. 2019. The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power: Barack Obama’s Books of 2019. Profile books.