Mapping the Number of Foreign Currency Deposit Accounts across Turkish Cities
A Comparison of 2010 and 2020
In recent years, the Turkish lira has depreciated significantly against foreign currencies. To hedge against this depreciation, in addition to the increasing inflation, several Turkish people have converted their funds to other currencies, mainly the US dollar and the Euro. This process leads them to open bank accounts in foreign currencies. In this post, I will attempt to visualize this process by mapping the number of foreign currency accounts by city. My analysis will involve a comparison of 2010 and 2020. To this end, I will utilize “Number of Foreign Currency Deposit Accounts” data from the Banks Association of Turkey. The geospatial data I will use in this post will come from the GADM. Please note that the GADM data cannot be used for commercial purposes.
In the first stage, I will download data from both sources. After tidying up the Number of Foreign Currency Deposit Accounts
data, I will merge it with the geospatial data. Finally, I will try to make some maps to demonstrate the change in foreign currency deposit accounts by city.
Introductory Steps
As a first step, I will download data from the Banks Association of Turkey. When you click on the previous highlighted link, you will see that there are 3 main columns, namely:
Geographical Regions and Provinces
, Periods
, Parameters
. For the sake of reproducibility, I will ask you to choose Select All
in the Geographical Regions and Provinces
part and then exclude all rows until Istanbul. In the Periods
, we can select all rather than only 2010 and 2020 since we will do a data cleaning session before starting to map visualizations. Finally, in the Parameters
part, I kindly ask you to select Number of Foreign Currency Deposit Accounts
since this is the data we will use in this post. After that, you can click on Report
and, by clicking on the Excel icon on the new page, you can download the data onto your computer automatically.
In a similar manner, let’s move on the GADM website and please select Turkey
from the country list, as well as download R (sp) - level 1
data. Now, the trickiest part about this process is that;
Level 0
: gives the boundaries of a countryLevel 1
: gives the boundaries of the cities in a given countryLevel 2
: gives the boundaries of the cities’ counties in a given country
As we want to visualize the data by city, we need to choose Level 1
. Also, please note that sp
refers to spatial polygons
, while you can think of sf
as spatial features
. The main difference between these two that will matter significantly for our purpose is that, sp
has points data for each city that will allow R to map cities’ boundaries when we run our code. Therefore, you can think of it as first making polygons and then turning them into a map. However, sf
has a different coding structure for spatial analyses. For further information, you can look at its documentation
Now, let’s load the required packages and the data into our working space and start exploring the data.
Installing Libraries and the data into R
options(scipen=999) # to prevent scientific notation (I will explain it in detail later)
library(readxl) # to read excel files
library(dplyr) # for data analysis purposes
library(tidyverse) # for data analysis purposes
library(ggplot2) # to make visualisations
library(viridis) # to make visualisations
library(sp) # to read geospatial data
Now, let’s install data in R and look at the head of Number of Foreign Currency Deposit Accounts
data.
setwd("~/Desktop") # Be careful about where your R working directory is and where your downloaded data is located. If your data is not in the same place where your R working directory is, then R cannot read the data.
foreign_currency_accounts <- read_excel("~/Desktop/PivotGrid-2.xlsx")
tur_spatial <- readRDS("~/Desktop/gadm36_TUR_1_sp.rds")
head(foreign_currency_accounts)
## # A tibble: 6 × 13
## ...1 ...2 `2010` `2011` `2012` `2013` `2014` `2015` `2016` `2017` `2018`
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Number … İstan… 3.66e6 4.02e6 3.91e6 3.95e6 4.07e6 4.35e6 5.14e6 5.37e6 6.31e6
## 2 <NA> Adana 2.49e5 2.88e5 2.65e5 2.65e5 2.67e5 2.79e5 3.18e5 3.25e5 3.78e5
## 3 <NA> Adıya… 3.45e4 3.68e4 3.03e4 3.02e4 3.13e4 3.48e4 3.93e4 4.36e4 5.49e4
## 4 <NA> Afyon… 9.10e4 1.03e5 9.85e4 9.68e4 9.92e4 1.04e5 1.11e5 1.22e5 1.42e5
## 5 <NA> Ağrı 2.17e4 2.32e4 1.77e4 1.80e4 1.83e4 2.17e4 2.32e4 2.58e4 3.27e4
## 6 <NA> Aksar… 8.37e4 8.91e4 8.15e4 8.25e4 8.44e4 9.20e4 9.27e4 1.02e5 1.17e5
## # … with 2 more variables: 2019 <dbl>, 2020 <dbl>
colnames(foreign_currency_accounts)
## [1] "...1" "...2" "2010" "2011" "2012" "2013" "2014" "2015" "2016" "2017"
## [11] "2018" "2019" "2020"
We have cities’ names stored in the second column, and the different years’ values for each city are stored in different columns. Actually, this is not what we want, particularly when doing such an analysis. If years were stored under a year
column, and the values would be stored in another column, this would make our analysis much easier indeed. Therefore, we will need to do some basic data wrangling here. Let’s start by changing the name of the column where cities’ names are stored.
colnames(foreign_currency_accounts)[2] <- "cities"
foreign_currency_accounts_2010_2020 <- foreign_currency_accounts %>%
dplyr::select("cities", "2010","2020") %>%
pivot_longer(!"cities", names_to = "years", values_to = "account_numbers")
head(foreign_currency_accounts_2010_2020)
## # A tibble: 6 × 3
## cities years account_numbers
## <chr> <chr> <dbl>
## 1 İstanbul 2010 3664251
## 2 İstanbul 2020 9218687
## 3 Adana 2010 249008
## 4 Adana 2020 582865
## 5 Adıyaman 2010 34532
## 6 Adıyaman 2020 85340
This is far better than the first one. Let us revise what we have done so far. We first renamed the second column in our data set to cities
since it would not lead to further misunderstandings. Then, by using select
function from the dplyr
package, we selected 3 crucial columns for our analysis from the main dataset. They are, namely, cities
, 2010
, and 2020
. Following that, as we wanted the years to be stored in a new column named years
, and the values corresponding to them in a new column named account_numbers
, we used the powerful pivot_longer
function from the dplyr
package.
!"cities"
told R not to touch this column. names_to
told R to store 2010
, and 2020
in a new column named years
and the corresponding values went to the new account_numbers
column by using values_to
argument. Before moving on, if you want to have a further look at pivot_longer
function, you can try this website.
Now, let’s focus on the tur_spatial
data. When you run view(tur_spatial)
code, you can see that the data has different objects such as polygons
, data
, and plotOrder
. Without getting into the specifics of these objects, let’s run the following code chunk and see what we’re aiming for;
tur_spatial_fort <- fortify(tur_spatial)
head(tur_spatial_fort)
## long lat order hole piece id group
## 1 35.41454 36.58850 1 FALSE 1 1 1.1
## 2 35.41459 36.58820 2 FALSE 1 1 1.1
## 3 35.41434 36.58820 3 FALSE 1 1 1.1
## 4 35.41347 36.58820 4 FALSE 1 1 1.1
## 5 35.41347 36.58792 5 FALSE 1 1 1.1
## 6 35.41236 36.58792 6 FALSE 1 1 1.1
head(tur_spatial@data)
## GID_0 NAME_0 GID_1 NAME_1 VARNAME_1 NL_NAME_1 TYPE_1 ENGTYPE_1 CC_1
## 1 TUR Turkey TUR.1_1 Adana Seyhan <NA> Il Province <NA>
## 12 TUR Turkey TUR.2_1 Adiyaman Adıyaman <NA> Il Province <NA>
## 23 TUR Turkey TUR.3_1 Afyon Afyonkarahisar <NA> Il Province <NA>
## 34 TUR Turkey TUR.4_1 Agri Ağri|Karaköse <NA> Il Province <NA>
## 45 TUR Turkey TUR.5_1 Aksaray <NA> <NA> Il Province <NA>
## 56 TUR Turkey TUR.6_1 Amasya <NA> <NA> Il Province <NA>
## HASC_1
## 1 TR.AA
## 12 TR.AD
## 23 TR.AF
## 34 TR.AG
## 45 TR.AK
## 56 TR.AM
With the fortify
function, we aimed to convert the spatial data into a data frame. In addition to that, with head(tur_new@data)
, we also try to access the data object in tur_spatial
data.
Now, as the second step, I will merge foreign_currency_accounts_2010_2020
data with tur_spatial
so that we can finally move to the last stage.
Merging Datasets
This step is another tricky part of our analysis as it requires a careful examination due to the punctuation differences in city names between foreign_currency_accounts_2010_2020
and tur_spatial
. While this problem occurs because of the nature of the Turkish language, and poses a serious obstacle in our analysis as we aim to merge these two datasets based on cities’ names, with a careful examination, we can handle this problem. Let us first lower all city names in the foreign_currency_accounts_2010_2020
data and then initialize the first letters so that the differences can be minimalized. Following that, let us look at which city names are different between these two datasets with the following code chunk.
foreign_currency_accounts_2010_2020$cities <-
tolower(foreign_currency_accounts_2010_2020$cities)
foreign_currency_accounts_2010_2020$cities <-
str_to_title(foreign_currency_accounts_2010_2020$cities)
a <- unique(tur_spatial@data$NAME_1)
b <- unique(foreign_currency_accounts_2010_2020$cities)
setdiff(a,b)
## [1] "Adiyaman" "Afyon" "Agri" "Aydin" "Balikesir"
## [6] "Çankiri" "Diyarbakir" "Eskisehir" "Gümüshane" "K. Maras"
## [11] "Kinkkale" "Kirklareli" "Kirsehir" "Mersin" "Mugla"
## [16] "Mus" "Nigde" "Sanliurfa" "Sirnak" "Tekirdag"
## [21] "Usak" "Zinguldak"
As you can see from the output, 22 cities’ names in our tur_spatial
data are different from foreign_currency_accounts_2010_2020
. Worse than that, city names in tur_spatial
data have some abbreviations and non-English characters that make it hard for us to handle these differences with a function. Therefore, first, I will do the easy part by converting non-English characters to English characters with a function that I learned from a Stack Overflow post. Then, as the abbreviations will require me to do the rest one by one, I will use a different code.
to.plain <- function(s) {
old1 <- "şŞığ"
new1 <- "sSig"
s1 <- chartr(old1, new1, s)
}
s <- foreign_currency_accounts_2010_2020$cities
s1 <- to.plain(s)
s2 <- tur_spatial@data$NAME_1
s3 <- to.plain(s2)
foreign_currency_accounts_2010_2020$cities <- s1
tur_spatial@data$NAME_1 <- s3
a <- unique(tur_spatial@data$NAME_1)
b <- unique(foreign_currency_accounts_2010_2020$cities)
setdiff(a,b)
## [1] "Afyon" "K. Maras" "Kinkkale" "Mersin" "Zinguldak"
To give a quick review of what we did in this step, I wrote a function named to.plain
, that converts non-English letters in the Turkish language, such as şŞığ
to English characters such as sSig
. Nonetheless, there are still differences between the city names due to mainly abbreviation issues, along with cases where a city has two names and the two datasets have coded it differently. Thus, I will handle the rest by hand.
foreign_currency_accounts_2010_2020$cities[
which(foreign_currency_accounts_2010_2020$cities == "Afyonkarahisar")] <- "Afyon"
foreign_currency_accounts_2010_2020$cities[
which(foreign_currency_accounts_2010_2020$cities == "Kahramanmaras")] <- "K. Maras"
foreign_currency_accounts_2010_2020$cities[
which(foreign_currency_accounts_2010_2020$cities == "Kirikkale")] <- "Kinkkale"
foreign_currency_accounts_2010_2020$cities[
which(foreign_currency_accounts_2010_2020$cities == "Içel")] <- "Mersin"
foreign_currency_accounts_2010_2020$cities[
which(foreign_currency_accounts_2010_2020$cities == "Zonguldak")] <- "Zinguldak"
a <- unique(tur_spatial@data$NAME_1)
b <- unique(foreign_currency_accounts_2010_2020$cities)
setdiff(a,b)
## character(0)
Now, as all the city names’ across the two datasets are matched, we can merge them together with the following code chunk.
colnames(foreign_currency_accounts_2010_2020)[1] <- "NAME_1"
merged_raw <- data_frame(id = rownames(tur_spatial@data),
NAME_1 = tur_spatial@data$NAME_1) %>%
left_join(foreign_currency_accounts_2010_2020, by = "NAME_1")
merged_raw %>% head()
## # A tibble: 6 × 4
## id NAME_1 years account_numbers
## <chr> <chr> <chr> <dbl>
## 1 1 Adana 2010 249008
## 2 1 Adana 2020 582865
## 3 12 Adiyaman 2010 34532
## 4 12 Adiyaman 2020 85340
## 5 23 Afyon 2010 91039
## 6 23 Afyon 2020 203901
merged<- left_join(tur_spatial_fort, merged_raw, by = "id")
merged %>% head()
## long lat order hole piece id group NAME_1 years account_numbers
## 1 35.41454 36.5885 1 FALSE 1 1 1.1 Adana 2010 249008
## 2 35.41454 36.5885 1 FALSE 1 1 1.1 Adana 2020 582865
## 3 35.41459 36.5882 2 FALSE 1 1 1.1 Adana 2010 249008
## 4 35.41459 36.5882 2 FALSE 1 1 1.1 Adana 2020 582865
## 5 35.41434 36.5882 3 FALSE 1 1 1.1 Adana 2010 249008
## 6 35.41434 36.5882 3 FALSE 1 1 1.1 Adana 2020 582865
To review what we have done in the last two steps, let’s go one by one. We initially changed the first column’s name of the foreign_currency_accounts_2010_2020
data to NAME_1
so that we could merge it with the tur_spatial
based on NAME_1
column.
However, merging is not enough for our purposes. If you pay closer attention to our merged
data, you can see that, there are different columns named as id
or group
. What this means is that, if you think of the long
and lat
as the points in a simple XY-plane, they help R identify which long
and lat
correspond to each other and therefore make up a particular city’s borders. Therefore, id
column is very crucial for us, and we should match this information with each particular corresponding city. In the output of merged_raw
, this is basically what we have done. We took each particular city’s corresponding id
number, and by using left_join
function, we matched them to the cities that are coded in the foreign_currency_accounts_2010_2020
data based on NAME_1
. Following that, we finally merged merged_raw
and tur_spatial_fort
by using the id
column, which gave us merged
data that we will finally use in visualisation.
Although I tried to explain the steps as simply as possible, if these explanations might not be convenient for you, I would recommend you look at this website to understand how to use left_join
function. Now, as a last step in this section, I will add a periods
column to our data set so that we can be able to produce a facetted graph.
merged$periods <- merged$years
merged$periods <- as.factor(merged$periods)
Visualize the Data
Now, as the last step, we will visualize our data with the following code chunk;
quantile(merged$account_numbers, probs = seq(0, 1, 0.20), na.rm = FALSE)
## 0% 20% 40% 60% 80% 100%
## 6187 97911 200608 412031 759047 9218687
p <- ggplot() +
geom_polygon(data = merged, aes(fill = account_numbers, x = long, y = lat, group = group) , size=0, alpha=0.9) +
theme_void() +
scale_fill_viridis(trans = "log", breaks=c(6187,97911,200608,412031,759047,9218687), name="Total Account Number", guide = guide_legend( keyheight = unit(3, units = "mm"), keywidth=unit(12, units = "mm"), label.position = "bottom", title.position = 'top', nrow=1) ) +
labs(
title = "Number of Foreign Currency Deposit Accounts in Banks by Province in Turkey",
subtitle = "2010 vs. 2020",
caption = "Data: Banks Association of Turkey | Muhammet Ozkaraca"
) +
theme(
text = element_text(color = "#22211d"),
plot.background = element_rect(fill = "#f5f5f2", color = NA),
panel.background = element_rect(fill = "#f5f5f2", color = NA),
legend.background = element_rect(fill = "#f5f5f2", color = NA),
plot.title = element_text(size= 15, hjust=0.01, color = "#4e4d47", margin = margin(b = -0.1, t = 0.4, l = 2, unit = "cm")),
plot.subtitle = element_text(size= 12, hjust=0.01, color = "#4e4d47", margin = margin(b = -0.1, t = 0.43, l = 2, unit = "cm")),
plot.caption = element_text(size=9, color = "#4e4d47", margin = margin(b = 0.6, r=-1, unit = "mm") ),
legend.position = c(0.22, -0.15)
) +
coord_map() +
facet_wrap(~ periods, nrow = 1)
p
Voilà, here is our beautiful map that demonstrates a comparison of the number of foreign currency deposit accounts in Turkey between 2010 and 2020 by city-level. Before concluding our post, let us look at some specific arguments we made while making this output.
Our data is
merged
that we want to plot. However, we want each city to be colored based on the number of foreign deposit accounts, so we write thisfill = account_numbers
code. Obviously, the X-axis should belong
and the Y-axis should belat
. Furthermore, as I tried to explain above, we should tell R whichlong
andlat
coordinates should be grouped.group = group
argument does this. We wantsize = 0
because we do not have any business with this, and byalpha=0.9
, we adjust the transparacy of colors in our map.theme_void
argument provides a blank background. By usingscale_fill_viridis
, we color our map. As the distribution ofaccount_numbers
data is not normal, you can view this view by mapping a box plot withboxplot(merged$account_numbers)
code, we usetrans = "log"
argument to see the change between 2010 and 2020 better.Without going into further details, I want to lastly point out that
coord_map
andfacet_wrap
are other important functions that improve our plot. If you want to look further into the specifics of these arguments, this website has wonderful tutorials for making visualisations in R that I am sure will satisfy your needs.
To conclude, in this tutorial, I tried to make a visualization that demonstrates the amount of change in foreign currency deposit accounts across cities in Turkey between 2010 and 2020. I hope this post can help you in your own endeavors. If you have further questions and suggestions, please do not hesitate to reach me via muhammetozk@icloud.com.