Efficient Frontier
William J. Bernstein
Do Fund Managers Exhibit Skill?
Of Money Managers, Major Leaguers, Heavy Hitters and Random Walkers
OK, you've done your homework. You've scoured the Morningstar database, culled out the funds with the best performance over the past several years, read the prospectuses, and more importantly, the annual reports, and figured just how the funds chosen fit your target asset allocation. Is it really worth the effort?There is a large body of academic finance literature concerning mutual fund persistence, i.e., just what does past performance tell you about future performance? The short answer is "not much." Burton Malkiel, who has extensively researched the problem, concludes in A Random Walk Down Wall Street that yes, the funds with the best past returns will outperform their peers by a slight amount, but will not beat an index fund. Unfortunately, the analytic techniques used are abstruse, highly complex, and unverifiable by the average investor.
I decided to investigate the problem myself. Morningstar Principia is a commercially available Windows based product aimed at allowing individual investors to sort, search, and rank mutual funds. It is also capable of exporting customized outputs to a spreadsheet; this capability enables even the smallest investors to perform very sophisticated analyses.
I settled on the following technique, the short version of which is: Screen for Aggressive Growth, Growth, Growth&Income, Equity-Income, Small Co., and International Funds with a >10.5 year track record, i.e., inception before 1/87. I know, I know, these are silly categorizations, but they're the best we have going back that far.
For each year, we calculate how much the fund return varies from the objective average. This is why I used the archaic classification system. For each fund, we now have 11 relative returns. ( I included the first half of 1997 as a whole year.) A return of +2.0 means that the fund exceeded the objective average by 2%, and -4.5 means that in that year it fell below the objective average by 4.5%.
Using this data we can test "the null hypothesis" that the average return difference = 0.
We calculate the average relative return value and relative return SD of each fund, from which we can calculate a Z value as sqrt(11)*(avg/SD). Using a one tailed t test with10 degrees of freedom we can now calculate a p value. (For the purists among you, I used a population SD instead of the sample SD. This produces slightly lower p values, and thus militates slightly in favor of the funds.)
The "p value" is simply the probability that the result may have occurred by chance. A p value of 1 indicates that the result occurred almost certainly by chance, whereas a p of 0.05 means that there was only a 5% probability of chance occurrence.
I ran this procedure by Paul Pudaite, chief statistician at Morningstar. Mr. Pudaite pointed out to me that we're still not done. Since we're looking only at the best funds after the fact, we have to guard against "data mining." We do this by calculating the adjusted p value as (1-(1-p)^n). (where p is the unadjusted p value, and n the number of funds) Whew!
Anyway, here are the results:
Aggressive Growth: 34 funds. The best was AIM Constellation I, with an unadjusted p of 0.007 but an adjusted p of 0.21. In other words, there is only a 21% probability that the good result occurred by chance. Not enough to satisfy a statistician, but good enough for me. Kaufmann? Unadjusted p of 0.13, adjusted p of 0.99! In other words, there was a 99% probability that the good result was due to chance, and only a 1% probability of it being due to skill. Yes, the "Tough Guys" exceeded the average fund by 6.6% annually, but the SD of its relative return was 18.5%. In other words, the fund was so volatile that its excellent performance was most likely the result of random motion.
Growth: 200 funds. The best was AIM Value A, unadjusted p of 0.0009, adjusted p 0.17. Next best, Fidelity Destiny II, unadjusted p 0.003, adjusted p 0.47.
G&I: 118 funds. The best was Fido G&I, unadjusted p 0.003, adjusted 0.31. That's it -- out of 118 funds only one with a better than 50% chance of "skill" with >10.5 years of track record.
Equity Income: 20 funds. The best was United Income A, unadjusted p of 0.003, adjusted 0.075. Again, the only fund with adjusted p < 0.5.
Small Co: 57 funds. Best FPA Capital, unadjusted p of 0.01, adjusted of 0.42. Again, only 1 fund with adjusted p<0.5.
International: 27 funds. The best was EuroPacific Growth, unadjusted p of 0.001, adjusted to 0.035--the only one in the whole study which a statistician would accept as showing genuine outperformance. Also, Ivy International, unadjusted p of 0.009, adjusted to 0.22, and TRP International, unadjusted p of 0.02, adjusted to 0.42. My favorite, Harbor International, wasn't included in the analysis because of it's later inception, but for the 10 return periods beginning 1/88 it has an unadjusted p of 0.00035 and an adjusted p of about 0.01. Not too shabby.
What is really striking is that the evidence of underperformance is much more solid -- 7 funds with an adjusted p of <0.05 for underperformance, versus only 1 for superior performance. For those of you who would like to view the output, it is available in .htm format here, and in .xls format here.This method is fairly insensitive, and not particularly good at picking out individual funds. It tends to favor conventional funds with low benchmark tracking error, which produces low relative SDs, and thus high z values and low p values. It penalizes unconventional funds, which have high tracking errors, and thus low z values and high p values. For example, Scudder International is not a particularly distinguished fund, but over the past 10.5 years has outperformed its peers by about 2%. Because it is very "conventional" it tracks its peers closely, with a relative SD of only about 2%, which produces a fairly respectable z value of about 1. On the other hand, Mutual Qualified outperforms it's peers by a similar amount, but has a much larger tracking error -- about 8.5%, so it has a much lower z value (0.23). In fact, by any conventional measure of risk, Mutual Qualified is a much less risky fund than Scudder International, and has much better risk adjusted performance.
Nonetheless, data is highly consistent with the academic data; most exceptional fund performance is due to chance, and not skill. To give you an idea of what the statistics of real skill looks like, let's consider major league batters. I examined 11 annual batting averages from some famous, and not so famous, major league stars from the middle of their careers -- avoiding their rookie as well as their declining years. I assume a non-pitcher's mean batting average of .270, which has been remarkably constant over the decades.
Let's look at arguably the greatest batter of modern times: Ted Williams. I picked the 11 years from the middle of his career, from 1946 to 1958. (I left out 1952-3, when he was flying Navy jets for most of the season.) His unadjusted p value was an astonishing .0000001. Data mining? You bet. But correct it for, say, 1000 major league players and the value is still .0001. If we data mine as egregiously for mutual fund managers we come up with the likes of Peter Lynch. If we look at the data for his heyday of 1976-86 (11 years) we find that his annual besting of the growth fund average by an astonishing 15.9% per year results in an unadjusted p of 0.00001. We've mined his fund from approximately 300 diversified domestic stock funds extant during the period, yielding an adjusted p of 0.0034. Very impressive, but still a few orders of magnitude less impressive than Mr. Williams.
Let's take a slight step down to Stan Musial. For 1948-58 his p values are .00000004 unadjusted and .00004 adjusted. His data actually looks slightly better than Williams' because his averages were much more consistent over the years.
Let's take yet another step down. I'm of a certain age, and from Philadelphia, so Richie Ashburn sticks out in my memory. However, I doubt that even the most fanatical baseball buffs under 30 know who he is. From 1950 to 1960 his p values are .0002 unadjusted and 0.17 adjusted.
Let's eliminate the data mining problem entirely with the following construct: Take all of the NL batting champs for 1959-79, and look at the batting averages for those who played 11 more seasons after that. Six players qualify:
Our first example is Hank Aaron, who won the title in 1959. At the end of that season, we say, "Hmm, he just might have skill. Let's see how he does for the next 11 years." We cannot now be accused of data mining. From 1960-1970 his p value is .00001.
Roberto Clemente won the batting title in 1961, and for 1962-72 his p is .0000008.
Tommy Davis won the title in 1962, and even though his 1963-73 average was only .290 that still produces a p of .01.
Pete Rose won the title in 1968, and for 1969-79 his p was .000002.
The last 2 players who fit the criteria were the less memorable Bill Madlock and Dave Parker, but even they managed 0.01 and 0.044, respectively.
So What's the Point of all This?
As one of our readers wrote of another piece, "All the math made my head hurt." Sorry about that -- I suspect that this piece falls into the same category. So here's the Cliff's Notes version: Out of over 400 diversified funds studied during the 1987-97 period, by definition half showed above average performance, but in almost all cases it seemed likely that this was due to random variation, and not skill. In only one case was there unequivocal statistical evidence of skill. When the same tests were applied to major league batters, abundant evidence of skill was found.
By way of comparison, consider the best performing mutual fund for any given year. Such funds tend to do somewhat better than average the next year, but no better than average in following years. In contrast, in every case the National League batting champions demonstrated strong statistical evidence of skill in the 11 year period following their batting crowns. Put another way, batting performance persists, mutual fund performance does not.
Successful money managers occasionally are tagged as "heavy hitters." The above analysis suggests that they are much more likely random walkers. Is the selection of active money managers worth the effort? I doubt it.
Don't blame me, I'm only the messenger.
copyright (c) 1997, William J. Bernstein