Backtesting Value-at-Risk (VaR)
Value at Risk (VaR) is used to model risk. VaR models are used to approximate the changes in value that a portfolio would experience in response to changes in the underlying risk factors (for example, market volatility).
VaR model validation utilizes several methods that we use in order to determine how close the model’s approximations is to actual changes in value. Model validation presents a method to determine what level of confidence to place in these models, and ultimately provides an opportunity to improve their accuracy.
Backtesting & Exceptions: An Introduction
Backtesting is the process of comparing losses predicted by a value at risk (VaR) model to those actually experienced over the testing period. It is an important tool for providing model validation, which is a process for determining whether a VaR model is adequate in its ability to predict. The main goal of backtesting is to ensure that actual losses do not exceed expected losses at a given confidence level.
The number of actual observations that fall outside a given confidence level are called exceptions. The number of exceptions falling outside of the VaR confidence level should not exceed one minus the confidence level (also known as the “significance level”). For example, exceptions should occur less than 5% of the time if the confidence level is 95%.
Why is Backtesting VaR Important? Backtesting is critical for risk managers and regulators to validate whether VaR models are properly calibrated or accurate. If the level of exceptions is too high, models should be recalibrated and risk managers should re-evaluate assumptions, parameters, and/or modeling processes. The Basel Committee allows banks to use internal VaR models to measure their risk levels, and backtesting provides a critical evaluation technique to test the adequacy of those internal VaR models. Bank regulators rely on backtesting to verify risk models and identify banks that are designing models that underestimate their risk. Banks with excessive exceptions (more than four exceptions in a sample size of 250) are penalized with higher capital requirements.
Challenges in Backtesting VaR
VaR models are based on static portfolios, while actual portfolio compositions are constantly changing as relative prices change and positions are bought and sold. Multiple risk factors affect actual profit and loss, but they are not included in the VaR model. For example, the actual returns are complicated by intraday changes as well as profit and loss factors that result from commissions, fees, interest income, and bid-ask spreads. Such effects can be minimized by backtesting with a relatively short time horizon such as a daily holding period.
Another challenge with backtesting is that the sample backtested may not be representative of the true underlying risk. The backtesting period constitutes a limited sample, so we do not expect to find the predicted number of exceptions in every sample. At some level, we must reject the model, which suggests the need to find an acceptable level of exceptions.
Actual vs. Hypothetical vs. Cleaned Returns: Risk managers should track both actual and hypothetical returns that reflect VaR expectations. The VaR modeled returns are comparable to the hypothetical return that would be experienced had the portfolio remained constant for the holding period. Generally, we compare the VaR model returns to cleaned returns (i.e., actual returns adjusted for all changes that arise from changes that are not marked to market, like funding costs and fee income). Both actual and hypothetical returns should be backtested to verify the validity of the VaR model, and the VaR modeling methodology should be adjusted if hypothetical returns fail when backtesting.
Using Failure Rates in VaR Model Validation
If a VaR model were completely accurate, we would expect VaR loss limits to be exceeded (this is called an exception) with the same frequency predicted by the confidence level used in the VaR model. For example, if we use a 95% confidence level, we expect to find exceptions in 5% of instances. Thus, backtesting is the process of systematically comparing actual (exceptions) and predicted loss levels.
The backtesting period constitutes a limited sample at a specific confidence level. We would not expect to find the predicted number of exceptions in every sample. How, then, do we determine if the actual number of exceptions is acceptable? If we expect five exceptions and find eight, is that too many? What about nine? At some level, we must reject the model, and we need to know that level.
Failure rates define the percentage of times the VaR confidence level is exceeded in a given sample. Under Basel rules, bank VaR models must use a 99% confidence level, which means a bank must report the VaR amount at the 1% left tail level for a total of T days. The total number of times exceptions occur is computed as N (the sum of the number of times actual returns exceeded the previous day’s VaR amount).
An unbiased measure of the number of exceptions as a proportion of the number of samples is called the failure rate. The probability of exception, p, equals one minus the confidence level (p = 1 − c), also known as the “significance level”. If we use N to represent the number of exceptions and T to represent the sample size, the failure rate is computed as N / T. This failure rate is unbiased if the computed p approaches the confidence level as the sample size increases. Non-parametric tests can then be used to see if the number of times a VaR model fails is acceptable or not.
Testing that the model is correctly calibrated requires the calculation of a z-score, where x is the number of actual exceptions observed. This z-score is then compared to the critical value at the chosen level of confidence (e.g., 1.96 for the 95% confidence level) to determine whether the VaR model is unbiased.
2 confidence levels — don’t confuse them! Note that the confidence level at which we choose to reject or fail to reject a model is not related to the confidence level at which VaR was calculated. In evaluating the accuracy of the model, we are comparing the number of exceptions observed with the maximum number of exceptions that would be expected from a correct model at a given confidence level.
%< Posted by Bruce Haydon 2020–2022 >%