#Preliminaries:
knitr::opts_chunk$set( message=FALSE, warning=FALSE) #echo = FALSE,
rm(list=ls())
library(readxl)
library(tidyverse)
library(viridis)
Most people know the famous quote “I only believe in statistics that I doctored myself”. Nowadays, as the amount and availability of data are increasing tremendously, the relevance of this statement becomes more and more pronounced. The ability to properly derive useful information from data is both key and obstacle for business decision-makers, politicians and ordinary people. This trend is reinforced as social media strives heavily for our attention (e.g. Zuboff 2019) with pictures, facts, stories, and our resources for verification are usually very limited. The good news is that it is possible to come up with more evidence-based decisions, especially as data availability (see e.g. DataHub or Dataset Search), software and hardware limitations become less of a problem. But how to separate the wheat from the chaff? One step in the right direction will certainly be proper data literacy and transparency.
Good readings on how (not) to do statistical analysis are for example provided by Reinhart (2015) or Spiegelhalter (2019). The work of Kahneman (2011) and Mousavi and Gigerenzer (2014) revolves around rational decision-making and people’s difficulties in probabilistic reasoning. Analytical tools such as statistics software R allow powerful data analysis and, in conjunction with R-Markdown or Quarto, present results together with the code used. Platforms like R-Bloggers and R Weekly facilitate developments in reproducible data analysis. Such possibilities, in parallel with adequate habits of communication (eg. Watzlawick 2018; Franconeri et al. 2021) can enable cultivating sound (public) discussion of data-driven decision making. Being able to do the work provides you with a range of skills from exploring and visualizing data structures, over predicting likely outcomes for new instances (eg. Kuhn and Johnson (2013)), to assessing causal relationships (eg. Pearl and Mackenzie (2018)).
As a start, let’s see how to transcend data into information with just a few lines of R-code: Data is provided by the worldhappiness report 2021. The publisher collects and summarizes data on individual “self-stated” happiness across a lot of countries. We read the data
#data from https://worldhappiness.report/ed/2021/
data_in <- read_excel("DataPanelWHR2021C2.xls")
#show data:
head(data_in)
and manipulate it using the popular R-tidyverse framework in order to allow a regional as well as a developmental perspective on perceived happiness across the world. The resulting graph is inspired by Healy (2018). We select data from 2010 to 2020 and add the constraint that a country must be observable for at least 5 years. The resulting graph, a map of the world’s happiness across space and time, is sorted with regard to a country’s average happiness:
data_in %>%
filter(year %in% c(2010:2020)) %>% #look at data from 2010 to 2020
group_by(`Country name`) %>%
summarize(happiness_mean=mean(`Life Ladder`), n=n()) %>% #mean happiness by country
filter(n>=5) %>% #minimum 5 observations per country
mutate(rank=rank(-happiness_mean)) %>% #create country specific happiness rank
inner_join(data_in, by=("Country name")) %>% #combine with yearly data
mutate(year=as.factor(year), `Country name`=fct_reorder(factor(`Country name`), -rank)) %>% #order countries
filter(year %in% c(2010:2020)) %>%
ggplot(aes(y=`Country name`, x=year, fill=`Life Ladder`)) + #create graph
scale_fill_viridis(option="inferno") +
geom_tile(colour="white") +
theme_minimal(base_size = 8) +
theme(axis.text.x = element_text(angle = 45, hjust=1)) +
labs(title="Hierarchy and development of happiness", fill="Happiness score",
y="Country", x="Year")
Not very unexpectedly, the Nordic countries are at the top of the list, whereas African and Oriental countries are at the bottom. We see that average happiness is a relatively stable phenomenon, overall. Among others, the happiest country, Denmark, also had a decrease in average happiness in 2020. Was this due to the COVID-19 pandemic? Looking at the bottom of the list, we find the second last country Afghanistan, for which, after a relatively good year in 2010, happiness decreased steadily afterwards. Considering changes or extreme values are a potentially fruitful opportunity, when one is interested in addressing the welfare of nations, for both good and bad.
What are the reasons that some countries are better off than others? Whereas dynamic geographical determinism is suggested by Diamond (1997), economic theories following Adam Smith emphasized the importance of technological progress, productivity, trade, and functioning and inclusive institutions (Robinson and Acemoglu 2012) as driving forces of long-lasting growth and development. According to Harari (2014) political and religious ideas can have a large impact on the organization and distribution of resources as well. In addition, a rather internal perspective of how complexes such as desire and fear can shape the development of mankind is formulated by Nietzsche (1886). In contrast, Seligman (2012) suggests a model of growth based on positive psychology. These vast number of ideas show that a successful organization of human society is a demanding task. In the era of increased data availability, proper empirical analysis can help to do better-informed decisions, enhancing economic prosperity, allowing people to flourish, and to live a healthier and happy life.