# Sharpe Ratio

## Distribution of Maximal Sharpe, Corrected Bonferroni

In a previous blog post we used the 'Polyhedral Inference' trick of Lee et al. to perform conditional inference on the asset with maximum Sharpe ratio. This is now a short paper on arxiv. I was somewhat disappointed to find, as noted in the paper, that polyhedral inference has lower power than a simple Bonferroni correction against alternatives where many assets have the same Signal-Noise ratio. (Though apparently it has higher power when one asset alone higher SNR.) The interpretation is that when there is no spread in the SNR, Bonferroni correction should have the same power as a single asset test, while conditional inference is sensitive to the conditioning information that you are testing a single asset which has Sharpe ratio perhaps near that of other assets. In the opposite case, Bonferroni suffers from having to 'pay' for a lot of irrelevant (for having low Sharpe) assets, while conditional inference does fine.

I also showed in the paper, as I demonstrated in a previous blog post, that the Bonferroni correction is conservative when asset returns are correlated. In a simple simulations under the null, I showed that the empirical type I rate goes to zero as common correlation $$\rho$$ goes to one. In this blog post I will describe a simple trick to correct for average positive correlation.

So let us suppose that we observe returns on $$p$$ assets over $$n$$ days, and that returns have correlation matrix $$R$$. Let $$\hat{\zeta}$$ be the vector of Sharpe ratios over this sample. In the paper I show that if returns are normal then the following approximation holds

$$\hat{\zeta}\approx\mathcal{N}\left(\zeta,\frac{1}{n}\left( R + \frac{1}{2}\operatorname{Diag}\left(\zeta\right)\left(R \odot R\right)\operatorname{Diag}\left(\zeta\right) \right)\right).$$

There is a more general form for Elliptically distributed returns. In the paper I find, via simulations, that for realistic SNRs and large sample sizes, the more general form does not add much accuracy. In fact, for the small SNRs one is likely to see in practice the simple approximation

$$\hat{\zeta}\approx\mathcal{N}\left(\zeta,\frac{1}{n}R\right)$$

will suffice.

Now note that, under the null hypothesis that $$\zeta = \zeta_0$$, one has

$$z = \sqrt{n} \left(R^{1/2}\right)^{-1} \left(\hat{\zeta} - \zeta_0\right) \approx\mathcal{N}\left(0,I\right),$$

where $$R^{1/2}$$ is a matrix square root of $$R$$. Testing the null hypothesis should proceed by computing (or estimating) the vector $$z$$, then comparing to normality, either by a Chi-square statistic, or performing Bonferroni-corrected normal inference on the largest element.

In the paper I used a simple rank-one model for correlation for simulations using

$$R = \left(1-\rho\right) I + \rho 1 1^{\top}.$$

This effectively models the influence of a common single 'latent' factor. Certainly this is more flexible for modeling real returns than assuming identity correlation, but is not terribly realistic.

Under this model of $$R$$ it is simple enough to compute the inverse-square-root of $$R$$. Namely

$$\left(R^{1/2}\right)^{-1} = \left(1-\rho\right)^{-1/2} I + \frac{1}{p}\left(\frac{1}{\sqrt{1-\rho+p\rho}} - \frac{1}{\sqrt{1-\rho}}\right)1 1^{\top}.$$

Let's just confirm with code:

p <- 4
rho <- 0.3
R <- (1-rho) * diag(p) + rho
ihR <- (1/sqrt(1-rho)) * diag(p) + (1/p) * ((1/sqrt(1-rho+p*rho)) - (1/sqrt(1-rho)))
hR <- solve(ihR)
R - hR %*% hR

##             [,1]        [,2]        [,3]        [,4]
## [1,] 4.44089e-16 1.66533e-16 1.11022e-16 1.66533e-16
## [2,] 1.66533e-16 0.00000e+00 5.55112e-17 5.55112e-17
## [3,] 1.66533e-16 1.66533e-16 0.00000e+00 1.11022e-16
## [4,] 1.11022e-16 1.11022e-16 1.11022e-16 2.22045e-16


So to test the null hypothesis, one computes

$$z = \sqrt{n} \left( \left(1-\rho\right)^{-1/2} I + \frac{1}{p}\left(\frac{1}{\sqrt{1-\rho+p\rho}} - \frac{1}{\sqrt{1-\rho}}\right)1 1^{\top} \right) \left(\hat{\zeta} - \zeta_0\right)$$

to test against normality. But note that our linear transformation is monotonic (indeed affine): if $$v_i \ge v_j$$ and $$w = \left(R^{1/2}\right)^{-1} v$$, then $$w_i \ge w_j$$. This means that the maximum element of $$z$$ has the same index as the maximum element of $$\hat{\zeta} - \zeta_0$$. To perform Bonferroni correction we need only transform the largest element of $$\hat{\zeta} - \zeta_0$$, by scaling it up, and shifting to accomodate the average. So if the largest element of $$\hat{\zeta} - \zeta_0$$ is $$y$$, and the average value is $$a = \frac{1}{p}1^{\top} \left(\hat{\zeta} - \zeta_0\right)$$, then the largest value of $$z$$ is

$$\frac{\sqrt{n} y}{\sqrt{1-\rho}} + a \sqrt{n} \left(\frac{1}{\sqrt{1-\rho+p\rho}} - \frac{1}{\sqrt{1-\rho}}\right)$$

Reject the null hypothesis if this is larger than $$\Phi\left(1 - \alpha/p\right)$$.

## Simulations

Here we perform simple simulations of Bonferroni and corrected Bonferroni. We will assume that returns are Gaussian, that the correlation follows our simple rank one form, that the correlation is known in order to perform the corrected test. We simulate two years of daily data on 100 assets. For each choice of $$\rho$$ we perform 10000 simulations under the null of zero SNR, computing the simple and 'improved' Bonferroni corrected hypothesis tests. We tabulate the empirical type I rate and plot against $$\rho$$.

suppressMessages({
library(dplyr)
library(tidyr)
library(doFuture)
})
# set up the functions
rawsim <- function(nday,nlatf,nsim=100,rho=0) {
R <- pmin(diag(nlatf) + rho,1)
mu <- rep(0,nlatf)

apart <- sqrt(nday)/sqrt(1-rho)
bpart <- sqrt(nday) * ((1/sqrt(1-rho+nlatf*rho)) - (1/sqrt(1-rho)))

mhtpvals <- replicate(nsim,{
X <- mvtnorm::rmvnorm(nday,mean=mu,sigma=R)
x <- colMeans(X) / apply(X,2,sd)
bonf_pval <- nlatf * SharpeR::psr(max(x),df=nday-1,zeta=0,ope=1,lower.tail=FALSE)
# do the correction
corr_stat <- apart * max(x) + bpart * mean(x)
corr_pval <- nlatf * pnorm(corr_stat,lower.tail=FALSE)

c(bonf_pval,corr_pval)
})
data_frame(bonf_pvals=as.numeric(mhtpvals[1,]),
corr_pvals=as.numeric(mhtpvals[2,]))
}
many_rawsim <- function(nday,nlatf,rho,nsim=1000L,nnodes=7) {
if ((nsim > 10*nnodes) && require(doFuture)) {
registerDoFuture()
plan(multiprocess)
nper <- as.numeric(table(1:nsim %% nnodes))
retv <- foreach(iii=1:nnodes,.export = c('nday','nlatf','rho','nper','rawsim')) %dopar% {
rawsim(nday=nday,nlatf=nlatf,rho=rho,nsim=nper[iii])
} %>%
bind_rows()
} else {
retv <- rawsim(nday=nday,nlatf=nlatf,rho=rho,nsim=nsim)
}
retv
}
mhtsim <- function(alpha=0.05,...) {
many_rawsim(...) %>%
tidyr::gather(key=method,value=pvalues) %>%
group_by(method) %>%
summarize(rej_rate=mean(pvalues < alpha)) %>%
ungroup() %>%
arrange(method)
}

# perform simulations
nsim <- 10000
nday <- 2*252
nlatf <- 100

params <- data_frame(rho=seq(0.01,0.99,length.out=7))

set.seed(123)
resu <- params %>%
group_by(rho) %>%
summarize(resu=list(mhtsim(nday=nday,nlatf=nlatf,rho=rho,nsim=nsim))) %>%
ungroup() %>%
unnest()

suppressMessages({
library(dplyr)
library(ggplot2)
})
# plot empirical rates
ph <- resu %>%
mutate(method=gsub('bonf_pvals','Plain Bonferroni',method)) %>%
mutate(method=gsub('corr_pvals','Corrected Bonferroni',method)) %>%
ggplot(aes(rho,rej_rate,color=method)) +
geom_line() + geom_point() +
geom_hline(yintercept=0.05,linetype=2,alpha=0.5) +
scale_y_sqrt() +
labs(title='Empirical type I rate at the 0.05 level',
x=expression(rho),y='type I rate',
color='test')
print(ph)


As desired, we maintain nominal coverage using the correction for $$\rho$$, while the naive Bonferroni is too conservative for large $$\rho$$. This is not yet a practical test, but could be used for rough estimation by plugging in the average sample correlation (or just SWAG'ing one). To my tastes a more interesting question is whether one can generalize this process to a rank $$k$$ approximation of $$R$$ while keeping the monotonicity property. (I have my doubts this is possible)

## Distribution of Maximal Sharpe, Truncated Normal

In a previous blog post we looked at a symmetric confidence intervals on the Signal-Noise ratio. That study was motivated by the "opportunistic strategy", wherein one observes the historical returns of an asset to determine whether to hold it long or short. Then, conditional on the sign of the trade, we were able to construct proper confidence intervals on the Signal-Noise ratio of the opportunistic strategy's returns.

I had hoped that one could generalize from the single asset opportunistic strategy to the case of $$p$$ assets, where one constructs the Markowitz portfolio based on observed returns. I have not had much luck finding that generalization. However, we can generalize the opportunistic strategy in a different way to what I call the "Winner Take All" strategy. Here one observes the historical returns of $$p$$ different assets, then chooses the one with the highest observed Sharpe ratio to hold long. (Let us hold off on an Opportunistic Winner Take All.)

Observe, however, this is just the problem of inferring the Signal Noise ratio (SNR) of the asset with the maximal Sharpe. We previously approached that problem using a Markowitz approximation, finding it somewhat lacking. That Markowitz approximation was an attempt to correct some deficiencies with what is apparently the state of the art in the field, Marcos Lopez de Prado's (now AQR's?) "Most Important Plot in All of Finance", which is a thin layer of Multiple Hypothesis Testing correction over the usual distribution of the Sharpe ratio. In a previous blog post, we found that Lopez de Prado's method would have lower than nominal type I rates as it ignored correlation of assets.

Moreover, a simple MHT correction will not, I think, deal very well with the case where there are great differences in the Signal Noise ratios of the assets. The 'stinker' assets with low SNR will simply spoil our inference, unlikely to have much influence on which asset shows the highest Sharpe ratio, and only causing us to increase our significance threshold.

With my superior googling skills I recently discovered a 2013 paper by Lee et al, titled Exact Post-Selection Inference, with Application to the Lasso. While aimed at the Lasso, this paper includes a procedure that essentially solves our problem, giving hypothesis tests or confidence intervals with nominal coverage on the asset with maximal Sharpe among a set of possibly correlated assets.

The Lee et al paper assumes one observes $$p$$-vector

$$y \sim \mathcal{N}\left(\mu,\Sigma\right).$$

Then conditional on $$y$$ falling in some polyhedron, $$Ay \le b$$, we wish to perform inference on $$\nu^{\top}y$$. In our case the polyhedron will be the union of all polyhedra with the same maximal element of $$y$$ as we observed. That is, assume that we have reordered the elements of $$y$$ such that $$y_1$$ is the largest element. Then $$A$$ will be a column of negative ones, cbinded to the $$p-1$$ identity matrix, and $$b$$ will be a $$p-1$$ vector of zeros. The test is defined on $$\nu=e_1$$ the vector with a single one in the first element and zero otherwise.

Their method works by decomposing the condition $$Ay \le b$$ into a condition on $$\nu^{\top}y$$ and a condition on some $$z$$ which is normal but independent of $$\nu^{\top}y$$. You can think of this as kind of inverting the transform by $$A$$. After this transform, the value of $$\nu^{\top}y$$ is restricted to a line segment, so we need only perform inference on a truncated normal.

The code to implement this is fairly straightforward, and given below. The procedure to compute the quantile function, which we will need to compute confidence intervals, is a bit trickier, due to numerical issues. We give a hacky version below.

# Lee et. al eqn (5.8)
F_fnc <- function(x,a,b,mu=0,sigmasq=1) {
sigma <- sqrt(sigmasq)
phis <- pnorm((c(x,a,b)-mu)/sigma)
(phis[1] - phis[2]) / (phis[3] - phis[2])
}
# Lee eqns (5.4), (5.5), (5.6)
Vfuncs <- function(z,A,b,ccc) {
Az <- A %*% z
Ac <- A %*% ccc
bres <- b - Az
brat <- bres
brat[Ac!=0] <- brat[Ac!=0] / Ac[Ac!=0]
Vminus <- max(brat[Ac < 0])
Vplus  <- min(brat[Ac > 0])
Vzero  <- min(bres[Ac == 0])
list(Vminus=Vminus,Vplus=Vplus,Vzero=Vzero)
}
# Lee et. al eqn (5.9)
ptn <- function(y,A,b,nu,mu,Sigma,numu=as.numeric(t(nu) %*% mu)) {
Signu <- Sigma %*% nu
nuSnu <- as.numeric(t(nu) %*% Signu)
ccc <- Signu / nuSnu  # eqn (5.3)
nuy <- as.numeric(t(nu) %*% y)
zzz <- y - ccc * nuy  # eqn (5.2)
Vfs <- Vfuncs(zzz,A,b,ccc)
F_fnc(x=nuy,a=Vfs$Vminus,b=Vfs$Vplus,mu=numu,sigmasq=nuSnu)
}
# invert the ptn function to find nu'mu at a given pval.
citn <- function(p,y,A,b,nu,Sigma) {
Signu <- Sigma %*% nu
nuSnu <- as.numeric(t(nu) %*% Signu)
ccc <- Signu / nuSnu  # eqn (5.3)
nuy <- as.numeric(t(nu) %*% y)
zzz <- y - ccc * nuy  # eqn (5.2)
Vfs <- Vfuncs(zzz,A,b,ccc)

# you want this, but there are numerical issues:
#f <- function(numu) { F_fnc(x=nuy,a=Vfs$Vminus,b=Vfs$Vplus,mu=numu,sigmasq=nuSnu) - p }
sigma <- sqrt(nuSnu)
f <- function(numu) {
phis <- pnorm((c(nuy,Vfs$Vminus,Vfs$Vplus)-numu)/sigma)
#(phis[1] - phis[2]) - p * (phis[3] - phis[2])
phis[1] - (1-p) * phis[2] - p * phis[3]
}
# this fails sometimes, so find a better interval
intvl <- c(-1,1)  # a hack.
# this is very unfortunate
trypnts <- seq(from=min(y),to=max(y),length.out=31)
ys <- sapply(trypnts,f)
dsy <- diff(sign(ys))
if (any(dsy < 0)) {
widx <- which(dsy < 0)
intvl <- trypnts[widx + c(0,1)]
} else {
maby <- 2 * (0.1 + max(abs(y)))
trypnts <- seq(from=-maby,to=maby,length.out=31)
ys <- sapply(trypnts,f)
dsy <- diff(sign(ys))
if (any(dsy < 0)) {
widx <- which(dsy < 0)
intvl <- trypnts[widx + c(0,1)]
}
}
uniroot(f=f,interval=intvl,extendInt='yes')$root }  ## Testing on normal data Here we test the code above on the problem considered in Theorem 5.2 of Lee et al. That is, we draw $$y \sim \mathcal{N}\left(\mu,\Sigma\right)$$, then observe the value of $$F$$ given in Theorem 5.2 when we plug in the actual population values of $$\mu$$ and $$\Sigma$$. This is several steps removed from our problem of inference on the SNR, but it is best to pause and make sure the implementation is correct first. We perform 5000 simulations, letting $$p=20$$, then creating a random $$\mu$$ and $$\Sigma$$, drawing a single $$y$$, observing which element is the maximum, creating $$A, b, \nu$$, then computing the $$F$$ function, resulting in a $$p$$-value which should be uniform. We Q-Q plot those empirical $$p$$-values against a uniform law, finding them on the $$y=x$$ line. gram <- function(x) { t(x) %*% x } rWish <- function(n,p=n,Sigma=diag(p)) { require(mvtnorm) gram(rmvnorm(p,sigma=Sigma)) } nsim <- 5000 p <- 20 A1 <- cbind(-1,diag(p-1)) set.seed(1234) pvals <- replicate(nsim,{ mu <- rnorm(p) Sigma <- rWish(n=2*p+5,p=p) y <- t(rmvnorm(1,mean=mu,sigma=Sigma) ) # collect the maximum, so reorder the A above yord <- order(y,decreasing=TRUE) revo <- seq_len(p) revo[yord] <- revo A <- A1[,revo] nu <- rep(0,p) nu[yord[1]] <- 1 b <- rep(0,p-1) foo <- ptn(y=y,A=A,b=b,nu=nu,mu=mu,Sigma=Sigma) }) # plot them library(dplyr) library(ggplot2) ph <- data_frame(pvals=pvals) %>% ggplot(aes(sample=pvals)) + geom_qq(distribution=stats::qunif) + geom_qq_line(distribution=stats::qunif) print(ph)  Now we attempt to use the confidence interval code. We construct a one-sided 95% confidence interval, and check how often it is violated by the $$\mu$$ of the element which shows the highest $$y$$. We will find that the empirical rate of violations of our confidence interval is indeed around 5%: nsim <- 5000 p <- 20 A1 <- cbind(-1,diag(p-1)) set.seed(1234) tgtval <- 0.95 viols <- replicate(nsim,{ mu <- rnorm(p) Sigma <- rWish(n=2*p+5,p=p) y <- t(rmvnorm(1,mean=mu,sigma=Sigma) ) # collect the maximum, so reorder the A above yord <- order(y,decreasing=TRUE) revo <- seq_len(p) revo[yord] <- revo A <- A1[,revo] nu <- rep(0,p) nu[yord[1]] <- 1 b <- rep(0,p-1) # mu is unknown to this guy foo <- citn(p=tgtval,y=y,A=A,b=b,nu=nu,Sigma=Sigma) violated <- mu[yord[1]] < foo }) print(sprintf('%.2f%%',100*mean(viols)))  ## [1] "5.04%"  ## Testing on the Sharpe ratio To use this machinery to perform inference on the SNR, we can either port the results to the multivariate $$t$$-distribution, which seems unlikely because uncorrelated marginals of a multivariate $$t$$ are not independent. Instead we lean on the normal approximation to the vector of Sharpe ratios. If the $$p$$-vector $$x$$ is normal with correlation matrix $$R$$, then $$\hat{\zeta}\approx\mathcal{N}\left(\zeta,\frac{1}{n}\left( R + \frac{1}{2}\operatorname{Diag}\left(\zeta\right)\left(R \odot R\right)\operatorname{Diag}\left(\zeta\right) \right)\right),$$ where $$\hat{\zeta}$$ is the $$p$$-vector of Sharpe ratios computed by observing $$n$$ independent draws of $$x$$, and $$\zeta$$ is the $$p$$-vector of Signal Noise ratios. Note how this generalizes the 'Lo' form of the standard error of a scalar Sharpe ratio, viz $$\sqrt{(1 + \zeta^2/2)/n}$$. Here we will check the uniformity of $$p$$ values resulting from using this normal approximation. This is closer to the actual inference we want to do, except we will cheat by using the actual $$R$$ and $$\zeta$$ to construct what is essentially the $$\Sigma$$ to Lee's formulation. We will set $$p=20$$ and draw 3 years of daily data. Again we plot the putative $$p$$-values against uniformity and find a good match. # let's test it! nsim <- 5000 p <- 20 ndays <- 3 * 252 A1 <- cbind(-1,diag(p-1)) set.seed(4321) pvals <- replicate(nsim,{ # population values here mu <- rnorm(p) Sigma <- rWish(n=2*p+5,p=p) RRR <- cov2cor(Sigma) zeta <- mu /sqrt(diag(Sigma)) Xrets <- rmvnorm(ndays,mean=mu,sigma=Sigma) srs <- colMeans(Xrets) / apply(Xrets,2,FUN=sd) y <- srs mymu <- zeta mySigma <- (1/ndays) * (RRR + (1/2) * diag(zeta) %*% (RRR * RRR) %*% diag(zeta)) # collect the maximum, so reorder the A above yord <- order(y,decreasing=TRUE) revo <- seq_len(p) revo[yord] <- revo A <- A1[,revo] nu <- rep(0,p) nu[yord[1]] <- 1 b <- rep(0,p-1) foo <- ptn(y=y,A=A,b=b,nu=nu,mu=mymu,Sigma=mySigma) }) # plot them library(dplyr) library(ggplot2) ph <- data_frame(pvals=pvals) %>% ggplot(aes(sample=pvals)) + geom_qq(distribution=stats::qunif) + geom_qq_line(distribution=stats::qunif) print(ph)  Lastly we make one more modification, filling in sample estimates for $$\zeta$$ and $$R$$ into the computation of the covariance. We compute one-sided 95% confidence intervals, and check how the empirical rate of violations. We find the rate to be around 5%. nsim <- 5000 p <- 20 ndays <- 3 * 252 A1 <- cbind(-1,diag(p-1)) set.seed(9873) # 5678 gives exactly 250 / 5000, which is eerie tgtval <- 0.95 viols <- replicate(nsim,{ # population values here mu <- rnorm(p) Sigma <- rWish(n=2*p+5,p=p) RRR <- cov2cor(Sigma) zeta <- mu /sqrt(diag(Sigma)) Xrets <- rmvnorm(ndays,mean=mu,sigma=Sigma) srs <- colMeans(Xrets) / apply(Xrets,2,FUN=sd) Sighat <- cov(Xrets) Rhat <- cov2cor(Sighat) y <- srs # now use the sample approximations. # you can compute this from the observed information. mySigma <- (1/ndays) * (Rhat + (1/2) * diag(srs) %*% (Rhat * Rhat) %*% diag(srs)) # collect the maximum, so reorder the A above yord <- order(y,decreasing=TRUE) revo <- seq_len(p) revo[yord] <- revo A <- A1[,revo] nu <- rep(0,p) nu[yord[1]] <- 1 b <- rep(0,p-1) # mu is unknown to this guy foo <- citn(p=tgtval,y=y,A=A,b=b,nu=nu,Sigma=mySigma) violated <- zeta[yord[1]] < foo }) print(sprintf('%.2f%%',100*mean(viols)))  ## [1] "5.14%"  ## Putting it together Lee's method appears to give nominal coverage for hypothesis tests and confidence intervals on the SNR of the asset with maximal Sharpe. In principle it should not be affected by correlation of the assets or by large differences in the SNRs of the assets. It should be applicable in the $$p > n$$ case, as we are not inverting the covariance matrix. On the negative side, requiring one estimate the correlation of assets for the computation will not scale with large $$p$$. We are guardedly optimistic that this method is not adversely affected by the normal approximation of the Sharpe ratio, although it would be ill-advised to use it for the case of small samples until more study is performed. Moreover the quantile function we hacked together here should be improved for stability and accuracy. Click to read and post comments #### Mar 17, 2019 ## Symmetric Confidence Intervals, and Choosing Sides Consider the problem of computing confidence intervals on the Signal-Noise ratio, which is the population quantity $$\zeta = \mu/\sigma$$, based on the observed Sharpe ratio $$\hat{\zeta} = \hat{\mu}/\hat{\sigma}$$. If returns are Gaussian, one can compute 'exact' confidence intervals by inverting the CDF of the non-central $$t$$ distribution with respect to its parameter. Typically instead one often uses an approximate standard error, using either the formula published by Johnson & Welch (and much later by Andrew Lo), or one using higher order moments given by Mertens, then constructs Wald-test confidence intervals. Using standard errors yields symmetric intervals of the form $$\hat{\zeta} \pm z_{\alpha/2} s,$$ where $$s$$ is the approximate standard error, and $$z_{\alpha/2}$$ is the normal $$\alpha/2$$ quantile. As typically constructed, the 'exact' confidence intervals based on the non-central $$t$$ distributionare not symmetric in general, but are very close, and can be made symmetric. The symmetry condition can be expressed as $$\mathcal{P}\left(|\zeta - \hat{\zeta}| \ge c\right) = \alpha,$$ where $$c$$ is some constant. ## Picking sides Usually I think of the Sharpe ratio as a tool to answer the question: Should I invest a predetermined amount of capital (long) in this asset? The Sharpe ratio can be used to construct confidence intervals on the Signal-Noise ratio to help answer that question. Pretend instead that you are more opportunistic: instead of considering a predetermined side to the trade, you will observe historical returns of the asset. Then if the Sharpe ratio is positive, you will consider investing in the asset, and if the Sharpe is negative, you will consider shorting the asset. Can we rely on our standard confidence intervals now? After all, we are now trying to perform inference on $$\operatorname{sign}\left(\hat{\zeta}\right) \zeta$$, which is not a population quantity. Rather it mixes up the population Signal-Noise ratio with information from the observed sample (the sign of the Sharpe). (Because of this mixing of a population quantity with information from the sample, real statisticians get a bit indignant when you try to call this a "confidence interval". So don't do that.) It turns out that you can easily adapt the symmetric confidence intervals to this problem. Because you can multiply the inside of $$\left|\zeta - \hat{\zeta}\right|$$ by $$\pm 1$$ without affecting the absolute value, we have $$\left|\zeta - \hat{\zeta}\right| \ge c \Leftrightarrow \left| \operatorname{sign}\left(\hat{\zeta}\right) \zeta - \left|\hat{\zeta}\right|\right| \ge c.$$ Thus $$\left|\hat{\zeta}\right| \pm z_{\alpha/2} s$$ are $$1-\alpha$$ confidence intervals on $$\operatorname{sign}\left(\hat{\zeta}\right) \zeta$$. Although the type I error rate is maintained, the 'violations' of the confidence interval can be asymmetric. When the Signal Noise ratio is large (in absolute value), type I errors tend to occur on both sides of the confidence interval equally, because the Sharpe is usually the same sign as the Signal-Noise ratio. When the Signal-Noise ratio is near zero, however, typically the type I errors occur only on the lower side. (This must be the case when the Signal-Noise ratio is exactly zero.) Of course, since the Signal-Noise ratio is the unknown population parameter, you do not know which situation you are in, although you have some hints from the observed Sharpe ratio. Before moving on, here we test the symmetric confidence intervals. We vary the Signal Noise ratio from 0 to 2.5 in 'annual units', draw two years of daily normal returns with that Signal-Noise ratio, pick a side of the trade based on the sign of the Sharpe ratio, then build symmetric confidence intervals using the standard error estimator $$\sqrt{(1 + \hat{\zeta}^2/2)/n}$$. We build the 95% confidence intervals, then note any breaches of the upper and lower confidence bounds. We repeat this 10000 times for each choice of SNR. We then plot the type I rate for the lower bound of the CI, the upper bound and the total type I rate, versus the Signal Noise ratio. We see that the total empirical type I rate is very near the nominal rate of 5%, and this is entirely attributable to violations of the lower bound up until a Signal Noise ratio of around 1.4 per square root year. At around 2.5 per square root year, the type I errors are observed in equal proportion on both sides of the CI. suppressMessages({ library(dplyr) library(tidyr) # https://cran.r-project.org/web/packages/doFuture/vignettes/doFuture.html library(doFuture) registerDoFuture() plan(multiprocess) }) # run one simulation of normal returns and CI violations onesim <- function(n,pzeta,zalpha=qnorm(0.025)) { x <- rnorm(n,mean=pzeta,sd=1) sr <- mean(x) / sd(x) se <- sqrt((1+0.5*sr^2)/n) cis <- abs(sr) + se * abs(zalpha) * c(-1,1) pquant <- sign(sr) * pzeta violations <- c(pquant < cis[1],pquant > cis[2]) } # do a bunch of sims, then sum the violations of low and high; repsim <- function(nrep,n,pzeta,zalpha) { jumble <- replicate(nrep,onesim(n=n,pzeta=pzeta,zalpha=zalpha)) retv <- t(jumble) colnames(retv) <- c('nlo','nhi') retv <- as.data.frame(retv) %>% summarize_all(.funs=sum) retv$nrep <- nrep
invisible(retv)
}
manysim <- function(nrep,n,pzeta,zalpha,nnodes=7) {
if (nrep > 2*nnodes) {
# do in parallel.
nper <- table(1 + ((0:(nrep-1) %% nnodes)))
retv <- foreach(i=1:nnodes,.export = c('n','pzeta','zalpha','onesim','repsim')) %dopar% {
repsim(nrep=nper[i],n=n,pzeta=pzeta,zalpha=zalpha)
} %>%
bind_rows() %>%
summarize_all(.funs=sum)
} else {
retv <- repsim(nrep=nrep,n=n,pzeta=pzeta,zalpha=zalpha)
}
# turn sums into means
retv %>%
mutate(vlo=nlo/nrep,vhi=nhi/nrep) %>%
dplyr::select(vlo,vhi)
}

# run a bunch
ope <- 252
nyr <- 2
alpha <- 0.05

# simulation params
params <- data_frame(zetayr=seq(0,2.5,by=0.0625)) %>%
mutate(pzeta=zetayr/sqrt(ope)) %>%
mutate(n=round(ope*nyr))

# run a bunch
nrep <- 100000
set.seed(4321)
system.time({
results <- params %>%
group_by(zetayr,pzeta,n) %>%
summarize(sims=list(manysim(nrep=nrep,nnodes=7,
pzeta=pzeta,n=n,zalpha=qnorm(alpha/2)))) %>%
ungroup() %>%
tidyr::unnest()
})

suppressMessages({
library(dplyr)
library(tidyr)
library(ggplot2)
})
ph <- results %>%
mutate(vtot=vlo+vhi) %>%
gather(key=series,value=violations,vlo,vhi,vtot) %>%
mutate(series=case_when(.$series=='vlo' ~ 'below lower CI', .$series=='vhi' ~ 'above upper CI',
.$series=='vtot' ~ 'outside CI', TRUE ~ 'error')) %>% ggplot(aes(zetayr, violations, colour=series)) + geom_line() + geom_point(alpha=0.5) + geom_hline(yintercept=c(alpha/2,alpha),linetype=2,alpha=0.5) + labs(x='SNR (per square root year)',y='type I rate', color='error type',title='rates of type I error when trade side is sign of Sharpe') print(ph)  ### A Bayesian Donut? Of course, this strategy seems a bit unrealistic: what's the point of constructing confidence intervals if you are going to trade the asset no matter what the evidence? Instead, consider a fund manager whose trading strategies are all above average: she/he observes the Sharpe ratio of a backtest, then only trades a strategy if $$|\hat{\zeta}| \ge c$$ for some sufficiently large $$c$$, and picks a side based on $$\operatorname{sign}\left(\hat{\zeta}\right)$$. This is a 'donut'. Conditional on observing $$|\hat{\zeta}| \ge c$$, can one construct a reliable confidence interval on $$\operatorname{sign}\left(\hat{\zeta}\right) \zeta$$? Perhaps our fund manager thinks there is no point in doing so if $$c$$ is sufficiently large. I think to do so you have to make some assumptions about the distribution of $$\zeta$$ and rely on Baye's law. We did not say what would happen if the junior quant at this shop developed a strategy where $$|\hat{\zeta}| < c$$, but presumably the junior quants were told to keep working until they beat the magic threshold. If the junior quants only produce strategies with small $$\zeta$$, one suspects that the $$c$$ threshold does very little to reject bad strategies, rather it just slows down their deployment. (In response the quants will surely beef up their backtesting infrastructure, or invent automatic strategy generation.) ## Generalizing to higher dimensions The real interesting question is what this looks like in higher dimensions. Now one observes $$p$$ assets, and is to construct a portfolio on those assets. Can we construct good confidence intervals on the Sharpe ratio of the chosen portfolio? In this setting we have many more possible choices, so a general purpose analysis seems unlikely. However, if we restrict ourselves to the Markowitz portfolio, I suspect some progress can be made. (Although I have been very slow to make it!) I hope to purse this in a followup blog post. Click to read and post comments #### Mar 03, 2019 ## A Sharper Sharpe: Its Biased. In a series of posts, we looked at Damien Challet's 'Sharper estimator' of the Signal-Noise Ratio, which computes the number of drawdowns in the returns series (plus some permutations thereof), then uses a spline function to infer the Signal-Noise Ratio from the drawdown statistic. The spline function is built from the results of Monte Carlo simulations. In the last post of the series, we looked at the apparent bias of this 'drawdown estimator'. We suggested, somewhat facetiously, that one could achieve similar properties to the drawdown estimator (reduced RMSE, bias, etc.) by taking the traditional moment-based Sharpe ratio and multiplying it by 0.8. I contacted Challet to present my concerns. I suspected that the spline function was trained with too narrow a range of population Signal-Noise ratios, which would result in this bias, and suggested he expand his simulations as a fix. I see that since that time the sharpeRratio package has gained a proper github page, and the version number was bumped to 1.2. (It has not been released to CRAN, so it is premature to call it "the" 1.2 release.) In this post, I hope to: 1. Demonstrate the bias of the drawdown estimator in a way that clearly illustrates why "Sharpe ratio times 0.8" (well, really 0.7) is a valid comparison. 2. Check whether the bias has been corrected in the 1.2 development version. (Spoiler alert: no.) 3. Provide further evidence that the issue is the spline function, and not in the estimation of $$\nu$$. In order to compare two versions of the same package in the same R session, I forked the github repo, and made a branch with a renamed package. I have called it sharpeRratioTwo because I do not expect it to be used by anyone, and because naming is still a hard problem in CS. To install the package to play along, one can: library(devtools) devtools::install_github('shabbychef/sharpeRratio',ref='astwo')  First, I perform some simulations. I draw 128 days of daily returns from a $$t$$ distribution with $$\nu=4$$ degrees of freedom. I then compute: the moment-based Sharpe ratio; the moment-based Sharpe ratio, but debiased using higher order moments; the drawdown estimator from the 1.1 version of the package, as installed from CRAN; the drawdown estimator from the 1.2 version of the package; the drawdown estimator from the 1.2 version of the package, but feeding $$\nu$$ to the estimator. I do this for 20000 draws of returns. I repeat for 256, and 512 days of data, and for the population Signal-Noise ratio varying from 0.25 to 1.5 in "annualized units" (per square root year), assuming 252 trading days per year. I use doFuture to run the simulations in parallel. suppressMessages({ library(dplyr) library(tidyr) library(tibble) library(SharpeR) library(sharpeRratio) library(sharpeRratioTwo) # https://cran.r-project.org/web/packages/doFuture/vignettes/doFuture.html library(doFuture) registerDoFuture() plan(multiprocess) }) # only works for scalar pzeta: onesim <- function(nday,pzeta=0.1,nu=4) { x <- pzeta + sqrt(1 - (2/nu)) * rt(nday,df=nu) srv <- SharpeR::as.sr(x,higher_order=TRUE) # mental note: this is much more awkward than it should be, # let's make it easier in SharpeR! ssr <- srv$sr
ssr_b <- ssr - SharpeR::sr_bias(snr=ssr,n=nday,cumulants=srv$cumulants) ssr <- mean(x) / sd(x) sim <- sharpeRratio::estimateSNR(x) twm <- sharpeRratioTwo::estimateSNR(x) # this cheats and gives the true nu to the estimator cht <- sharpeRratioTwo::estimateSNR(x,nu=nu) c(ssr,ssr_b,sim$SNR,twm$SNR,cht$SNR)
}
repsim <- function(nrep,nday,pzeta=0.1,nu=4) {
dummy <- invisible(capture.output(jumble <- replicate(nrep,onesim(nday=nday,pzeta=pzeta,nu=nu)),file='/dev/null'))
retv <- t(jumble)
colnames(retv) <- c('sr','sr_unbiased','ddown','ddown_two','ddown_cheat')
invisible(as.data.frame(retv))
}
manysim <- function(nrep,nday,pzeta,nu=4,nnodes=5) {
if (nrep > 2*nnodes) {
# do in parallel.
nper <- table(1 + ((0:(nrep-1) %% nnodes)))
retv <- foreach(i=1:nnodes,.export = c('nday','pzeta','nu')) %dopar% {
repsim(nrep=nper[i],nday=nday,pzeta=pzeta,nu=nu)
} %>%
bind_rows()
} else {
retv <- repsim(nrep=nrep,nday=nday,pzeta=pzeta,nu=nu)
}
retv
}
# summarizing function
sim_summary <- function(retv) {
retv %>%
tidyr::gather(key=metric,value=value,-pzeta,-nday) %>%
filter(!is.na(value)) %>%
group_by(pzeta,nday,metric) %>%
summarize(meanvalue=mean(value),
serr=sd(value) / sqrt(n()),
rmse=sqrt(mean((pzeta - value)^2)),
nsims=n()) %>%
ungroup() %>%
arrange(pzeta,nday,metric)
}

ope <- 252
pzeta <- seq(0.25,1.5,by=0.25) / sqrt(ope)

params <- tidyr::crossing(tibble::tribble(~nday,128,256,512),
tibble::tibble(pzeta=pzeta))

nrep <- 20000
# can do 2000 in ~20 minutes using 7 nodes.
set.seed(1234)
system.time({
results <- params %>%
group_by(nday,pzeta) %>%
summarize(sims=list(manysim(nrep=nrep,nnodes=7,pzeta=pzeta,nday=nday))) %>%
ungroup() %>%
tidyr::unnest()
})

     user    system   elapsed
79879.368    20.066 28291.224


(Don't trust those timings, it should only take 3 and a half hours on 7 cores, but I hibernated my laptop in the middle.)

I compute the mean of each estimator over the 20000 draws, divide that mean estimate by the true Signal-Noise Ratio, then plot versus the annualized SNR. I plot errobars at plus and minus one standard error around the mean. I believe this plot is more informative than previous versions, as it clearly shows the geometric bias of the drawdown estimator. As promised, we see that the drawdown estimator consistently estimates a value around 70% of the true value. This geometric bias appears constant across the range of SNR values we tested. Moreover, it is apparently not affected by sample size: we see about the same bias for 2 years of data as we do for half a year of data. The moment estimator, on the other hand, shows a slight positive bias which is decreasing in sample size, as described by Bao and Miller and Gehr. The higher order moment correction mitigates this bias somewhat, but does not appear to eliminate it entirely.

We also see that the bias of the drawdown estimator is not fixed in the most recent version of the package, and does not appear to due to estimation of the $$\nu$$ parameter.
On the contrary, the estimation of $$\nu$$ appears to make the bias worse. We conclude that the drawdown estimator is still biased, and we suggest that practicioners do not use this estimator until this issue is resolved.

ph <- results %>%
sim_summary() %>%
mutate(metric=case_when(.$metric=='ddown' ~ 'drawdown estimator v1.1', .$metric=='ddown_two' ~ 'drawdown estimator v1.2',
.$metric=='ddown_cheat' ~ 'drawdown estimator v1.2, nu given', .$metric=='sr_unbiased' ~ 'moment estimator, debiased',
.\$metric=='sr' ~ 'moment estimator (SR)',
TRUE ~ 'error')) %>%
mutate(bias = meanvalue / pzeta,
zeta_pa=sqrt(ope) * pzeta,
serr = serr / pzeta) %>%
ggplot(aes(zeta_pa,bias,color=metric,ymin=bias-serr,ymax=bias+serr)) +
geom_line() + geom_point() + geom_errorbar(alpha=0.5) +
geom_hline(yintercept=1,linetype=2,alpha=0.5) +
facet_wrap(~nday,labeller=label_both) +
scale_y_log10() +
labs(x='Signal-noise ratio (per square root year)',
y='empirical expected value of estimator, divided by actual value',
color='estimator',
title='geometric bias of SR estimators')
print(ph)