By Chris Chasteen, Ph.D.
The use of Microsoft Excel in the business compensation environment for performing quick and easy calculations has become ubiquitous, making the need to cite usage statistics pretty much meaningless. I often cut and paste data into a spreadsheet for some quick compensation data analyses, even when I have much more powerful statistical software packages at my disposal.
It was during one of these exercises that something odd in my Excel spreadsheet jumped out at me. I was reviewing a long list of salaries for Non-Profit Executive Directors, focusing on the 10th percentile, and the values reported by Excel were, in my opinion, well, illogical.
So, I tried an experiment comparing the results from Excel 2007 to my "powerful" statistics software (SAS 9.2). I then discovered the SAS software has five different algorithms for calculating percentiles, while Excel presents a single result.
Rather than start with a long list of salaries, I started small. I went to the Internet and grabbed the first 35 numbers in the Fibonacci Sequence. I ranked them in order and just took a guess at what the 10th percentile might be.
With 35 numbers, N/10 = 3.5, and I came up with two guesses at the 10th percentile; either the 4th number in the sequence (which is 2) or, if interpolated, halfway between the 3rd and 4th numbers (which is 1.5). I decided my best answer would be 2.
Let’s take a look at the actual results.
As seen in the graph, all five SAS results fall in line with my original guesses (between 1.5 and 2). However, Excel reports the percentile falling nearly halfway between the 4th and 5th numbers – the difference is up to 60% in this example!
What is going on? Well, Wikipedia notes that Excel uses an "alternate" method to calculate percentiles. While these numbers are small, just a 0.9 difference, think about it in terms of salary planning. Multiply these numbers by $10,000, and now that 10th percentile is off by an amazing $9,000!
Conclusions? Well, first, I will be very cautious using Excel to calculate percentiles. The fact that this discrepancy was obvious in a list of actual salaries does lead me to recommend when analyzing important data using Excel, take the time to ensure the results make sense. Attributing a $9,000 discrepancy to Excel’s quirky percentile algorithm may not get you very far.
Of broader concern is that Excel doesn’t provide any information in Help, or elsewhere, letting me know what choices it is making for me in regard to my data. I had assumed that it just knew what was best and, by not giving me any choices or even documenting the option it uses, I had no reason to question its authority. Now I do.
1 Wikipedia – Percentile. Retrieved 6 June, 2011 from http://en.wikipedia.org/wiki/Percentile