Data Sources, Data Aggregation and Transformation

The data used for this analysis is obtained from IMF’s database. The two main datasets are Coordinated Direct Investment Survey (CDIS) and Coordinated Portfolio Investment Survey (CPIS). These datasets are part of IMF’s effort to record global investment patterns. The surveys are conducted on yearly basis, while country participation is voluntary.

In the following table we outline the official IMF indicator names and codes that are used for building the networks that are represented in this web application.

The data cleaning and aggregation has been performed using python. The main data aggregation performed is combining the CDIS and CPIS data together. The indicators that make the most sensible pairing are CDIS Outward with CPIS Assets, as well as CDIS Inward with CPIS Liabilities. The main data transformation is computing the strength of the investment between countries in relative terms. Let’s assume that fij is the absolute value of investment between countries i and j. The investment indicator is dependent on what selection is inputted by the user. If CDIS Inward is selected than fij is Inward Direct Investment Positions, US Dollars between countries i and j. By definition, this can be either a positive or a negative number. A negative number could occur when an investor from country i has invested an x amount in country j, but at the same time is withdrawing funds back to country j in amounts larger than the initial investment x. The fact that there is a possibility of negative values, poses a problem and they need to be dealt with. Therefore, we consider the absolute values of investment amounts |fij|. Secondly, it is important to mention the directionality principle, implying that the investments between countries are asymmetrical (ie. fij ≠ fji). Lastly, since we want to analyze proximity and investment patters between countries we compute the weight of an investment as:

Φij = 1 / |fji|

Where Φij is the weight of the investment from country i to country j. Ultimately resulting in a proximity matrix Φ that allows us to build a directed weighted graph.

Network Properties

A social network is a system made up of actors (individuals, organizations, countries, etc.) and sets of bilateral ties that define a relationship between them (Wasserman & Faust, 1994). This provides a structure for network analysis, allowing to identify central agents in complex local and global networks. Network science offers a unique set of tools and principles for studying complex relationships apparent in nature, technology and society (Jackson, 2008). They help us understand how diseases spread, patterns in product purchases, languages spoken, voting and educational decisions, etc (Jackson, 2008).

A network is a system which is composed of two main elements: vertices/nodes and links/edges. Edges connect a pair of nodes and identify a relationship between them. In this case an edge represents an investment relationship between a pair of nodes or countries.


There are three main types of networks that are depicted in the figure above, including: undirected, directed and weighted networks. Below each of the networks we showcase a network representation in adjacency matrix also. The adjacency matrix is a square matrix in which entries aij are zero if there is no relationship between vertices vi and vj, and non-zero if a relationship exists. In the case of a weighted network, an entry aij represents the strength of the link between the nodes. For a directed network a non-zero value represents a link and direction of the relationship. In a network representation, most commonly the directionality is represented with an arrow. While for undirected networks aij = aji. The diagonal entries represent self-relationships, or edges that start and end at the same node and are commonly set to zero. Edges can also have additional attributes, although they are not used in the construction of the network. Usually they allow for additional characterization of links or groups of links. It is also important to note that weights can represent proximities or similarities; as for this case we are the weights represent proximity.

In order to identify the most central elements of a network it is necessary to use several proximity, centrality and connectivity measures. Here we identify the main measures that will be used for this purpose:

  • Betweenness Centrality – measures the importance of a node in a network, by calculating how many times a node acts as an intermediary in a path between other nodes in the network. The more often a node is an intermediator, the more important it is in the network
  • Closeness Centrality – measures the importance of a node by how far away a node is from the rest of the nodes in a given network. The closer a node is to the other nodes in the network the more important it is.
  • In-degree Centrality – measures the importance of a node by the amount of incoming relationships it holds. The more incoming relationships it holds, the more important.
  • Out-degree Centrality – measures the importance of a node by the amount of outgoing relationships it holds. The more outgoing relationships it holds, the more important.

These measures will be replaced with more intuitive and user friendly names in the application on the network page. Betweenness will become top intermediators, closeness will remain closeness, in-degree will become incoming investments, and out-degree will become outgoing investments.

Building the Global Network

The network calculations are also performed in python, using the networkx library. All the computations with regards to the network are performed on a fully spanned weighted-directed graph. Although, when visualizing the fully spanned network it gets extremely dense and it is not possible to depict the most important structural properties. For that reason, a Minimum Spanning Tree (MST) is computed. That is a network containing the minimum number of edges that result in a single connected network; where all nodes (countries) of the network are connected to at least one other node. Since the MST is a very sparse structure, we add the strongest connections (lowest weights) to achieve a certain threshold of average degree of connectivity that is visualized. The threshold varies case-by-case, although the minimum starts at average degree of 3.4 and is capped at 7.5. It is important to state that these transformations are done only with the purpose of visualization, the numbers shown in the tooltips and the position of each country in the network is still dependent on the fully spanned weighted-directed network.

Rank Evolution Chart

The data used in building the rank evolution chart is computed on a fully spanned weighted-directed networks. The puprose of this visualization is to depict the evolution of a country's rank in the network for the four main measures. At any time the rank line shown for each country is dependent on the data selection (ie. CDIS Inward, CPIS Assets, Combined Assets and Combined Liabilities), with the option to see the rank based on the four different measures. The coloring of each line is fixed based on the rank of the given country in year 2018. The purpose of this is for the user to have some relativity of the country's position with respect to 2018 (this year was chosen since it is the last full year of data available for each data source).

Shortest Path Chart

The data used to compute the shortest path between any country pairing is from a fully spanned weighted graph. Here we want to show in an already established network what would be the shortest link between any two countries (even if they don't currently have a direct link). In this way any investor country can use the already established paths in the network to route their investments to their desired final destination. The user has the option to choose the source/target country and see the shortest path over the years between that country pairing.

Treemap Visualization

The data used for the treemap is the raw aggregated data per country. The purpose of this visualization is to depict the relative magnitude of the global investment broken down by continent and country. Furthermore, clicking a country would show the breakdown of that country's total investment by their counterpart.

Top Intermediaries Chart

The data used for the top intermediary chart is based on authors' calculations. Initially all possible shortest paths in a fully spanned weighted-directed are computed, and then the proportions per country are calculated. One variable shows the proportion of shortest paths a country particiaptes in (or with other words, the % of times a country intermediates any shortest path globally). The second variable shows, of all shortest paths how many times the specific country is the 1st intermediary.

Tabular Representation

This table shows the raw data used in drawing the network visualization, and allows the possibility for the user to download this data.