For some time I’ve grown tired of seeing the seven (now eight) benchmark criteria credited to Bailey, Richards & Tierney (2007. “Evaluating Portfolio Performance”). I first learned of them when I took the then CGIPS now CIPM(r) Principles Exam, roughly 11 years ago:
- Unambiguous
- Investable
- Measurable
- Appropriate
- Reflective of current investment opinions
- Specified in advance
- Owned
These are considered attributes that benchmarks should possess. Interestingly, market indexes, peer groups, and absolute benchmarks, the three most commonly used benchmarks, fail at least one (market indexes), and as many as six (peer groups) of these criteria! We discuss them in our firm’s Fundamentals of Investment Performance Class, and I’ve suggested that perhaps they should be augmented with :
- Comparability
- Understood
- In common use
- Fair/reasonable.
Enough, already, with these benchmark criteria!
But, my concerns with them have steadily grown. And why? Well, despite the rather poor showing of these three most commonly used benchmark types, they (that is, market indexes, peer groups, and absolute) remain very much in use, and so, what does that really say about these criteria? Might it be appropriate to accept that these three benchmark types will forever remain, and that perhaps what we need are focused criteria for each, rather than these broad seven (now eight), that detail how they should best be handled?
For example, market indexes
As GIPS(r) verifiers, we regularly review our clients’ composite benchmarks. And, at times we question their appropriateness.
As the GIPS standards make clear, “The benchmark must reflect the investment mandate, objective, or strategy of the composite.” At times, they do not. This should be a criterion that one should reference when evaluating market indexes.
For example, we may see a balanced strategy that is made up of five or more distinct markets or asset classes, but with an index that only represents two or three. Is that appropriate? Or, when someone has a growth strategy but is using a broader equity index: is that appropriate?
Absolute …
My biggest issue with absolute is that we use them relative to time-weighted returns, which (to me) is less than ideal, when in the context of judging an asset owner’s portfolio. For example, if a pension fund wants to achieve a 4% return in 2018, getting a 4% TWR has a very little relationship with this goal (unless there have been zero cash flows, and only then because it will be the same as the money-weighted return). Recall that you can have a positive TWR and lose money: how would THAT help the pension fund achieve it’s 4% goal? If you (an asset owner) use an absolute return as your benchmark, then your return needs to be the Internal Rate of Return, or at least compare it to the IRR, as well as the TWR.
And, of course I might add that this absolute return should have some relationship to the client’s true needs. This is, I believe, typically the case, as it is crafted by actuaries or other knowledgeable and skilled parties who know or can thoroughly assess the needs and objectives of the fund.
We should have different criteria if the absolute return is being used for a hedge fund, for example. Then, time-weighting DOES make sense.
And peer groups?
As for peer groups, might we want to confirm that the other members have some valid relationship to the portfolio being measured? We recognize that sometimes this relationship is merely that the parties are all pension funds. We should not necessarily dismiss this benchmark, knowing that each member will undoubtedly have distinct investment objectives, but rather suggest that these differences be acknowledged and understood as a limitation when seeking to judge how one fund has performed relative to the others.
What about custom indexes?
Interestingly, custom benchmarks pass all seven criteria with flying colors! However, do they really serve us well?
Yes, it’s great that the manager has crafted an index that aligns quite well with their strategy and all its uniqueness. But, can we compare any other manager with this index? Well, probably not. And are there other criteria, beyond these frequently cited seven, which would cause us to further question their appropriateness?
What to do, WHAT TO DO???
I will confess that while this is a topic that definitely interests me, my somewhat limited time has made it difficult for me to ponder this to any great or sufficient degree. It’s a subject that has been gnawing at me for quite some time.
That said, I will confess (as already noted above) that the seven (now eight) criteria that seem to be so widely accepted and promoted should be, perhaps, either set aside, or at least noted as serving a clearly articulated, limited, and qualified purpose.
Or, perhaps I’m mistaken, and someone will enlighten me.
Each time I teach our class and get to the benchmark section, I march through it rather dutifully, but somewhat begrudgingly, as well. Or maybe to say that it’s done in an almost perfunctory fashion, lacking the passion and spirit I have for the other materials we include in our class. [I think I’ve met my adjective quota, so we can now move along]
I’m struggling here a bit, perhaps because I would like to give this more time and attention. This might be a good topic for a face-to-face discussion, but for now we’ll try it in an electronic fashion.
This post is therefore merely a “toss it out and see what happens” kind of one, because I’m curious what others think. Please chime in! Correct my errors or misjudgments, and point me in the right direction, or acknowledge that perhaps we can do a better job. Thanks!