In Praise of Value at Risk, Part I

VaR can misbehave,
Hiding dragons in the tail.
Many views reveal.

Propose value at risk (VaR) as a risk measure and you are the fool in the room. Peers roll their eyes and whisper behind your back “Don’t they know…not sub-additive?” Instinctively we reach for tail value at risk (tVaR) confident in its well-named coherence—would a rose risk measure smell as sweet? Non-actuaries have fewer qualms: VaR is alive and well in capital models from Solvency II, A. M. Best (both original and revised capital adequacy ratio) and Standard and Poor’s. The Swiss Solvency Test is an exception, using coherent tVaR.

This month’s Explorations column will begin to explore three questions:

  • When does VaR fail to be sub-additive in real applications? That is, when is the VaR of a sum greater than the sum of the VaRs?
  • How significant is its failure?
  • Is tVaR the only good alternative or are there others?

Insurance is based on diversification and sub-additivity expresses that a risk measure respects diversification: the risk of a sum is less than the sum of the risk of the parts. “Less risky” can be measured in a number of ways, broadly classified into location, dispersion and tail measures. Insurers are often regulated and internally managed based on tail risk measures, which motivates our interest in VaR.

Our experience with “tame” loss distributions and normal random variables leads us to expect that VaR should be sub-additive, and indeed this is the case for the family of elliptically contoured distributions that greatly generalizes the multivariate normal. But it is not always the case.

How can a portfolio possibly be more risky than the sum of its parts? A well-known risk management text (McNeil, Embrechts, and Frey 2005) lists three ways VaR can fail to be sub-additive:

  1. When the dependence structure is of a special, highly asymmetric form.
  2. When the marginals have a very skewed distribution.
  3. When the marginals are very heavy-tailed.

Case 1 is a circus trick: spectacular and alarming, but generally not a dragon. It can be controlled using tVaR, but also, as our opening Haiku suggests, by using many views, i.e. considering VaR at several different return periods. It is, however, a very instructive trick to learn and an ever-present possibility.

Case 2 is where the dragons live. Dire consequences can follow if they pass unnoticed and the potential skewness of the marginals is ignored. Again, the risk can be controlled using tVaR or using many views.

Case 3 is where the really big dragons live. There is a complete breakdown of diversification: I don’t want to pool risk because I want to minimize the number of samples I draw. Glyn Holton offered a great mental picture: you have a choice of drinking from several wells but one of them is poisoned. You clearly won’t “diversify” your risk by mixing water from all the wells—you’ll try one and if you survive you’ll stick with it. For very thick tailed distributions tVaR is of no use: the distributions involved do not have a mean and therefore tVaR is not defined. However, many views will still ring alarm bells.

In this month’s article we will explore Case 1 in more detail. Subsequent articles will consider the other two cases.

TL;DR: using VaR at a range of return periods (“many views”) will slay all dragons whereas tVaR fails in the face of particularly ferocious foe. Reporting VaR at a number of return periods has long been standard practice within reinsurance (if your broker or reinsurer isn’t showing you a range of return periods it is time for an RFP!) and A. M. Best has recently adopted the idea of assessing tail risk through many views in its stochastic BCAR. It is a theoretically sound approach that works in all circumstances—coherence be damned.

Case 1: failure of sub-additivity driven by dependence structure

Given two non-trivial marginal distributions \(X\) and \(Y\) and a confidence level \(\alpha\) it is always possible to find a particular form of dependence resulting in a failure of sub-additivity! This is very surprising: it shows that dependence trumps characteristics of the marginal distributions. We shall see the exact form of dependence has many unique characteristics.

Too be concrete we will think of \(X\) and \(Y\) as samples from the underlying distribution. In cat model-speak they are samples from the yearly loss table. To be specific suppose we have samples of 10000 draws from \(X\) and \(Y\) and that we are interested in the \(\alpha=0.99\) VaR. From the definition we can compute \(v_X = \text{Var}_{0.99}(X)\) by sorting the \(X\) sample from largest to smallest and selecting the 100th observation, and similarly for \(Y\). (Generally we would select the \(10000 \times (1-\alpha)\) largest observation.)

It is widely appreciated that positive dependence between variables increases the risk of their sum. Therefore a reasonable first guess for the “worst” possible dependence structure is when \(X\) and \(Y\) are comonotonic. Comonotonic means that we order the samples \(X\) and \(Y\) separately from highest to lowest and pair off the resulting elements: the largest value of \(X\) with the largest value of \(Y\), second largest of \(X\) with second largest of \(Y\) and so forth. In many senses this pairing, or dependence structure, does produce the most risky sum \(X+Y\): it has the greatest variance and worst tVaR characteristics, for example. However, it does not result in a failure of VaR sub-additivity at any threshold \(\alpha\)! In fact, it will result in VaR being exactly additive: the \(\alpha\) percentile of the sum is simply the sum of the \(\alpha\) percentiles of \(X\) and \(Y\). There is no diversification benefit, but there is also no failure of sub-additivity. The worst \(\alpha\)-VaR pairing of \(X\) and \(Y\) has a more subtle and surprising form.

To find a failure of sub-additivity let’s start by solving a more general problem: how should we combine observations from \(X\) and \(Y\) so that the value at risk of the sum is as large as possible? That is, given our samples \(x_i, y_i\), \(i=1,2,\dots,10000\) we want to form pairs \((x_i, y_{k(i)})\), which will define a bivariate distribution of \(X\) and \(Y\) so that the VaR of \(X+Y\), which has samples \(x_i+y_{k(i)}\), is as large as possible. The function \(k(i)\) defines a shuffle of \(\{1,2,\dots,10000\}\) as \(i\) varies. In other words we want the 100th largest observation of \(X+Y\) to be as big as possible.

The first thing to observe is that we should only pair the 100 largest observations of \(X\) with the 100 largest observations from \(Y\). If we have a candidate pairing that does not satisfy this condition we can make a better candidate by swapping any pairings using observations outside the “top 100” with unused top 100 entries.

Figure: Crossed (cyan) and uncrossed or comonotonic (gray) combinations of (x_1, x_2) and (y_1, y_2). The filled cyan circles represent the aggregate assuming crossed dependence and filled gray assuming uncrossed. The maximum minimum value is the lower cyan circle corresponding to the crossed arrangement.

We can now abstract the problem as follows: we have \(n=100\) points \(X\) and \(Y\) which we want to pair to maximize the minimum pairwise sum. How should we pair these \(n\) entries? An obvious contender is the crossed pairing: pair the largest value of \(X\) with the smallest of \(Y\), the second largest of \(X\) with the second smallest of \(Y\) and so forth, ending with a pairing of the smallest value of \(X\) with the largest of \(Y\). (Order tied elements arbitrarily.) The crossed pairing makes sense: it does not “waste” any large values by needlessly pairing them together.

It is easy to see that if we are just trying to pair \(n=2\) values from \(X\) and \(Y\) it is the right answer, see Figure \(\ref{figone}\). If the \(X\) values are \(x_1 < x_2\) and \(Y\)s are \(y_1 < y_2\) then there are two possible pairings, the uncrossed pairing \(x_1\leftrightarrow y_1\), \(x_2\leftrightarrow y_2\) and the crossed pairing \(x_1\leftrightarrow y_2\), \(x_2\leftrightarrow y_1\). But clearly \(x_1+y_1 \le x_1+y_2\le x_2+y_2\) and \(x_1+y_1 \le x_2+y_1\le x_2+y_2\) and so the minimum value for the crossed pairing is greater than or equal to that for the uncrossed pairing. It turns out the crossed pairing is the optimal answer for any number of points \(n\ge 2\)—see the Appendix for details.

It is a general theorem, first proved independently by Makarov (1982) in 1981 and Rüschendorf (1982) in 1982 that an analog of the crossed arrangement gives the maximum VaR for any two distributions \(X\) and \(Y\), and not just for equally likely discrete samples. The proof relies on a famous paper by (Strassen 1965) written in 1965. It is surprising this result was not known until 1982.

Getting back to our original problem, note that the crossed pairing will violate sub-additivity if all the samples from \(X\) and \(Y\) above their respective \(\alpha\)-VaRs are different, because each term in the crossed pairing is greater than the sum of the individual VaRs! There are several important points to note about this failure of sub-additivity.

  • The dependence structure works for any non-trivial marginal distributions \(X\) and \(Y\)—it is universal.
  • The dependence structure is tailored to a specific value of \(\alpha\) and does not work for other \(\alpha\)s. It will actually produce relatively thinner tails for higher values of \(\alpha\) than either the comonotonic copula or independence. In this sense it is a peculiar example: it is not hiding dragons; in a way it creates a phantom dragon at a particular \(\alpha\).
  • The implied dependence structure only specifies how the larger values of \(X\) and \(Y\) are related; for values below the \(\alpha\)-VaRs of \(X\) and \(Y\) any dependence structure can be used.
  • The dependence structure does not have “right tail dependence”; in fact it is the exact opposite.

The crossed dependence is hard to generalize to three or more marginal distributions. Whereas it is easy to create maximal positive dependence for any number of variables (the comonotomic copula), it is much harder to create maximal negative dependence between three or more variables. The reason is if \(X\) and \(Y\) are negatively correlated and \(Y\) and \(Z\) are negatively correlated then \(X\) and \(Z\) will tend to be positively correlated. Recently (Embrechts, Puccetti, and Ruschendorf 2013) have shown that iteratively making each marginal crossed with the sum of the other marginal distributions gets close to the optimal solution, and it provides a usable algorithm to compute the worst VaR dependence structure for \(n\ge 3\) variables. Their method is called the Rearrangement Algorithm and it will be explained next month. Future columns will also explore skew and thick-tailed exceptions to sub-additivity.

Appendix (on-line only)

We show the crossed pairing provides the worst VaR pairing of \(X\) and \(Y\). The proof is by induction on the number of points \(n\) being paired. We have already seen it is correct for \(n=2\), so assume it is true for any \(n-1\) points. Suppose \(X\) and \(Y\) have \(n\) points and that we have an optimal pairing producing the maximum minimum pairwise sum value. If the largest value of \(X\) is paired with the smallest value of \(Y\) we can omit those two points, producing collections of \(n-1\) points where the optimal arrangement is crossed by induction. If the minimum paired sum of all \(n\) points in the optimal arrangement is the max of \(X\) plus min of \(Y\) then all \(n-1\) remaining pairings must be greater than this value, but by induction the pairwise sum of the crossed arrangement of these \(n-1\) points is at least as large and hence is also greater than the max of \(X\) plus min of \(Y\). Conversely, if the minimum paired sum of the original \(n\) points is a different pair then it will occur in the \(n-1\) remaining points and must equal the minimum of the crossed arrangement by induction. In either case the crossed arrangement is optimal.

Figure: Crossed (cyan) and uncrossed or comonotonic (gray) combinations for seven points. Again the filled cyan circles represent the aggregate assuming crossed dependence and gray assuming uncrossed. The maximum minimum value is the lowest cyan circle which is greater than the lowest gray circle. The cyan circles have lowest possible variance of any pairing and the highest minimum value. They are clearly more clustered than the uncrossed pairing.

On the other hand suppose the largest value of \(X\) is not paired with the smallest value of \(Y\). Then we can find two pairs: the largest of \(X\) paired with a value \(y\) which is greater than the smallest value \(y_s\) of \(Y\), and a value \(x\) smaller than the largest value of \(X\) which is paired with the smallest value of \(Y\). But if we simply swap these two pairs values we will produce an arrangement with a greater minimum value (compare the case \(n=2\)), contradicting our assumption that the arrangement was optimal. Hence this situation cannot occur. The worst VaR pairing for seven points is illustrated in Figure \(\ref{figtwo}\).


Embrechts, Paul, Giovanni Puccetti, and Ludger Ruschendorf. 2013. Model uncertainty and VaR aggregation.” Journal of Banking and Finance 37 (8): 2750–64.
Makarov, G. D. 1982. Estimates for the Distribution Function of a Sum of Two Random Variables When the Marginal Distributions are Fixed.” Theory of Probability & Its Applications 26 (4): 803–6.
McNeil, Alexander J., Paul Embrechts, and Rudiger Frey. 2005. Quantitative Risk Management: Concepts, Techniques, and Tools. Princeton University Press.
Rüschendorf, Ludger. 1982. Random variables with maximum sums.” Advances in Applied Probability 14 (3): 623–32.
Strassen, V. 1965. The existence of probability measures with given marginals.” The Annals of Mathematical Statistics 36 (2): 423–39.

posted 2017-11-13 | tags: value at risk, risk, risk measure, writing

Share on