Recreating an Economist WDI Chart in R
Background
I really liked this blogpost by Peter Ellis that was recently brought to my attention by everyone’s favourite #rstats tweeter, Mara Averick:
???? code-through: “Inter-country inequality and the World Development Indicators” by @ellis2013nz https://t.co/zIjgqjPqKc #rstats #dataviz pic.twitter.com/h1sUfO2PPJ
— Mara Averick (@dataandme) July 22, 2017
In the post, Peter recreates some of the charts from Branko Milanovic’s highly acclaimed book ‘Global Inequality: A New Approach for the Age of Globalization’ using World Development Indicator data from the World Bank. It’s a great code-through, so go check-it before I inevitably wreck-it.
I think recreating some of your favourite charts is a great way practice and develop your data vis skills, from finding the data all the way through to the final visual output. So I thought I’d try to recreate a WDI based chart that featured in The Economist a few weeks ago…
The Economist
If you’re a fan of data visualistion then you’ve most likely appreciated a chart from The Economist in your time. It’s a brilliant publication that consitently churns out the sort of fact-based journalism we need more of in this era of alternative facts.
Their charts are always enlightening as well as aesthetically on-the-money (and often with hilariously witty titles), so do go and check out their Graphic Detail blog for a daily dose of graphic magic. A recent highlight is this chart of cherry-blossom peak-blooms in Kyoto, Japan with data going all the way back to the year 800!
Today I’m going to try and replicate a slightly more boring chart that caught my eye in a recent article focusing on Liberia and the somewhat paradoxical effect foreign aid investment can have on a developing country.
The chart (below) shows GDP figures for a selection of countries in the years after a significant economic collapse.
I liked this chart, not for the aesthetics or novelty; it’s a fairly standard line chart, but for the way the data has been manipulated to allow for effective comparison between GDP figures with a very high range and inconsistent periods of time. This acheived this by setting each country’s respective year of collapse as year 0, setting GDP in that year to a base of 100, then showing the changes in GDP from that year relative to the base GDP figure.
Clever stuff. So let’s see if we can do this ourselves using R.
As you can see from the chart, the source of the data we’re after is from the World Bank, and as is usually the case with R, we have a package at our disposal that is built for the precise task of calling WDI (world development indicators) data from the World Bank API. Thanks to package author Vincent Arel-Bundock for this great tool - check out the package details here.
Getting The Data
Let us begin.
library(tidyverse)
library(WDI)
library(hrbrthemes)
library(gganimate)
As per the WDI documentation, let’s ensure we have the most up to date data to work with, like so:
new_cache = WDIcache()
Now search for the data type we want using a bit of regex:
WDIsearch('gdp.*constant.*2010', cache=new_cache)
## indicator
## [1,] "NY.GDP.MKTP.KD"
## [2,] "NY.GDP.PCAP.KD"
## [3,] "NYGDPMKTPSAKN"
## [4,] "NYGDPMKTPSAKD"
## [5,] "NYGDPMKTPKDZ"
## name
## [1,] "GDP (constant 2010 US$)"
## [2,] "GDP per capita (constant 2010 US$)"
## [3,] "GDP,constant 2010 LCU,millions,seas. adj.,"
## [4,] "GDP,constant 2010 US$,millions,seas. adj.,"
## [5,] "Annual percentage growth rate of GDP at market prices based on constant 2010 US Dollars."
The indicator we need is NY.GDP.MKTP.KD
. Now we can pull in the GDP data for the selection of countries required using their iso2c codes:
dat <- WDI(indicator='NY.GDP.MKTP.KD', country=c('LR','UA', 'RW', 'ID', 'ZW', 'GR'),
start=1988, end=2016, cache = new_cache)
Wrangling The Data
We now have the data, but there’s still work to be done to have it ready for visualisation. As we need to filter the data into different time periods depending on the country, I’m going split the data into separate dataframes, manipulate as required, then bring it all back together ready for charting. Here’s how I did it:
# create two vectors of country Labels and their correspoding year of economic collapse
countries <- c("Liberia", "Ukraine", "Rwanda", "Indonesia", "Zimbabwe", "Greece")
years <- c(1988, 1989, 1992, 1997, 2001, 2007)
# create seperate df for each country with appropriate time periods
# add GDP indexes and years since columns then store in a list
dfs <- lapply(1:6, function(x) {
dat %>% filter(country == countries[x], year >= years[x]) %>%
arrange(year) %>%
rename(GDP = NY.GDP.MKTP.KD) %>%
mutate(index = GDP / (GDP[1] / 100),
years_since = row_number() -1,
country = factor(country, levels = countries))
})
# rejoin all dfs in list into one df
final.dat <- bind_rows(dfs)
How does our data looks now?
head(final.dat)
## iso2c country GDP year index years_since
## 1 LR Liberia 2391982754 1988 100.00000 0
## 2 LR Liberia 1754078045 1989 73.33155 1
## 3 LR Liberia 858956847 1990 35.90983 2
## 4 LR Liberia 736768416 1991 30.80158 3
## 5 LR Liberia 478268199 1992 19.99463 4
## 6 LR Liberia 320557388 1993 13.40133 5
Goal.
Visualising
Now we can ggplot adding the classic Economist blueish-hue background and white gridlines.
ggplot(final.dat, aes(years_since, index, group = country, colour = country)) +
geom_hline(yintercept = 100, colour = "red", linetype = 1) +
geom_line(size = 1) +
geom_point(aes(x = 0, y = 100), colour = "black", size = 2) +
scale_x_continuous(expand = c(0.02,0)) +
scale_color_brewer(palette = "Dark2",
labels = c("Liberia (1988)", "Ukraine (1989)", "Rwanda (1992)",
"Indonesia (1997)", "Zimbabwe (2001)", "Greece (2007)")) +
theme_ipsum(base_family = "Iosevka", grid_col = "white") +
labs(title = "Shock Therapy - Selected Economic Collapses Since 1988",
subtitle = "100 = GDP prior to collapse (in constant 2010 $)",
y = "", x = "Years since collapse",
caption = "Source: World Bank",
colour = "Country") +
theme(plot.background = element_rect(fill = "#cddee7"))
You might’ve noticed that The Economist elected not to show the continued rise of Indonesia and Rwanda beyond the 100 index line - I guess due to it being a small chart in the published edition, they wanted to focus on the countries lagging below the index - but we don’t have that constraint so I’ve allowed Indonesia and Rwanda to roam free.
We can get the Economist version by simply filtering out any GDP indexes over 100. Let’s also add a bit of animation to the time-series because…tu solamente vivere una volta.
econ_chart <- final.dat %>% filter(index <= 100) %>%
ggplot(aes(years_since, index, group = country, colour = country)) +
geom_hline(yintercept = 100, colour = "red", linetype = 1) +
geom_line(size = 1, aes(frame = years_since, cumulative = TRUE)) +
geom_point(aes(x = 0, y = 100), colour = "black", size = 2) +
scale_x_continuous(expand = c(0.02,0)) +
scale_color_brewer(palette = "Dark2",
labels = c("Liberia (1988)", "Ukraine (1989)", "Rwanda (1992)",
"Indonesia (1997)", "Zimbabwe (2001)", "Greece (2007)")) +
theme_ipsum(base_family = "Iosevka", grid_col = "white") +
labs(title = "Shock Therapy - Selected Economic Collapses Since 1988",
subtitle = "100 = GDP prior to collapse (in constant 2010 $)",
y = "", x = "Years since collapse",
caption = "Source: World Bank",
colour = "Country") +
theme(plot.background = element_rect(fill = "#cddee7"))
gganimate(econ_chart, title_frame = FALSE, interval = .5)
Fin
And that is that. Thanks to the Economist for the great journalism and inspiring charts, and of course to the R community for making tasks like these so refined and enjoyable. With 3 small sections of code we have managed to source the data, transform it to our specifications, and then visualise it appropriately. Molto elegante.
Get in touch if you have any questions. And if you know of a more efficient way to perform the data manipulations I did with lapply - let me know! I’ve yet to dive into the world of purrr and tidyeval so it would be interesting to see some alternatives.
Ciao.