By Dr. Bob Uttl (February 17, 2025)
Mark Twain
It is better to keep your mouth shut and appear stupid than to open it and remove all doubt.
In elementary statistical textbooks, high school pupils are often asked to solve questions such as: What is the probability of rolling at least one 1 on a die when you roll a die 10 times? Intuitively, the more times you roll a die, the more opportunities you get to roll 1. The more times you roll, the higher the chance of getting a 1 at least once. The more times you roll, the higher the chance of getting many 1s. For example, rolling a die 500 times means that one gets at least one 1 with a probability of .999 and one may expect 83 of the 500 rolls to finish with 1.
Being tested with numerous psychological tests is, to some degree (depending on the test accuracy or reliability), similar to repeatedly rolling a die and reading off your test results from the die face following each roll. Why? No psychological test is 100% accurate, and all psychological test scores include some random error due to a variety of factors (with some test scores being mostly due to random error). We say that psychological test scores are, to some degree, unreliable or imprecise, that is, the observed score is some mix of an examinee’s true ability and error. Accordingly, the more tests an examinee is tested with, the more low scores the examinee obtains purely by chance even if the examinee is perfectly normal and has no deficits, no impairments, and no difficulties of any kind.
Dr. Mary Westcott, a mid-career clinical psychologist, does not understand this concept. Testifying under oath, questioned by the SD5 lawyer who presented her as “an expert witness”, Dr. Westcott removed all doubts and demonstrated to the world that she has no understanding of the concept of multivariate base rates of low scores, and thus, that she is not competent to interpret psychological test scores and to conduct psychological assessments.
Fifteen Tests And 1/2 Thousand Scores to Cherry-Pick for “Low” Scores
Dr. Mary Westcott made her technicians administer 15 psychological tests/test batteries to Ms. T. Each test may result in numerous scores. Each test may also include many subtests. Each subtest may have numerous trials and one subtest may yield tens or even hundreds of scores. By the time Dr. Westcott chose to interpret what happened on the first trial of a subtest X of test Y, Dr. Westcott had over ½ thousand scores to cherry-pick for “low scores”, and to interpret as “impairments”, “difficulties”, and “deficits.”
On September 15, 2010, Dr. Westcott issued her September 15, 2010 Report (Dr. Westcott September 15, 2010 Report). Dr. Westcott’ September 15, 2010 Report is a remarkable showcase of incompetence: (a) the report was full of errors demonstrating Dr. Westcott’s inability to correctly score and present the test score results, (b) many of the interpretations in the report were plagiarized from a computerized interpretive report without any acknowledgment whatsoever (no quotation marks, no citation, no reference), (c) the report made no mention of imprecision of the reported scores and included exactly zero 95% confidence intervals for the reported scores, and (d) the report made no mention of multivariate base rates of low scores and gave no indication that Dr. Westcott considered multivariate base rates of low scores in interpreting the test scores.
The More Tests Are Given, The More Low Scores Will Be Obtained Purely Due To Chance
On July 25, 2021, Dr. L criticized Dr. Westcott‘s September 15, 2010 Report on a number of grounds including Dr. Westcott’s failure to take into account multivariate base rates of low scores. Dr. L wrote:
The psychological assessment report by Dr. Mary Westcott, dated 2010 September 15, is relatively lengthy, and again has several issues. First, this report uses a large number of measures, but does not appear to take base rates into account, or integrate that information, for example in the context of a medical or psychological diagnosis. The more tests that are given, the more likely that one or more tests will show abnormal results, purely by chance, in the general population, with no identified pathology.[emphasis added] …
Dr. L, July 25, 2021
On August 25, 2021, Dr. Westcott responded that Dr. L was wrong; Dr. Westcott wrote:
Dr. L’s assertion that I used a large number of measures is without foundation. The number of measures I used (15) is well within the typical range in high-stakes assessments of this sort. He says that my report “does not appear to take into account base rates.” This is incorrect as I provided a full integration of all test results and am well aware of the issue of base rates. [emphasis added]
Dr. Mary Westcott, August 25, 2021
On September 26, 2021, Dr. L. responded:
First, my concern is that she administered a relatively large number of tests of executive functioning, without a way to consider overall performance, and account for the expectation that as more tests are administered, more low scores will be found, even in individuals with no neurological disorder. Specifically, she [Dr. Mary Westcott] administered the Delis-Kaplan Executive Function System (D-KEFS), which is composed of nine tests. She administered eight tests from this, which between them provide 35 primary scores, plus further optional scores. Again, in individuals with no pathology, the more scores are obtained, the more likely it is to obtain some scores below expectations. [emphasis added] For an estimate of how many abnormal scores would be expected from this collection of 35 scores, I attempted to enter the data into a computer program developed for this very purpose (Crawford et al., 2007). Unfortunately, this is not able to consider more than 20 tests, but I will note that with a mere 20 scores, over 78% of individuals will show at least one abnormally low score, I estimate that 26% of normal individuals would show about nine abnormally low scores. This is the base rate problem, that Dr. Westcott appears not to understand. The more tests are given to anyone, the more low scores will be found, and there needs to be some basis for determining if an abnormal score is unusual or not. [emphasis added]
Dr. L, September 26, 2021
In short, Dr. L pointed out that “Dr. Westcott appears not to understand” the base rate problem. Helpfully, Dr. L. explained one more time: “The more tests are given to anyone, the more low scores will be found, and there needs to be some basis for determining if an abnormal score is unusual or not.”
Dr. Mary Westcott Removed All Doubt About Her Ignorance of Base Rates of Low Scores
Forward three years.
One may think that three years is plenty of time for Dr. Mary Westcott to study up on the problem of the base rates of low scores. The problem Dr. L noted Dr. Westcott “appears not to understand” but something that Dr. Westcott should have known before she even completed her PhD and and before she obtained her “R.Psych” designation.
In 2024, the SD5 put Dr. Westcott on a witness stand as an “expert” witness. It did not occur to anyone in the SD5 to get Dr. Westcott’s “expert reports” reviewed by someone with demonstrated expertise in psychometrics and psychological testing. Dr. Westcott herself lacked the necessary insight into her condition — for example, her inability to correctly score and report test scores; her lack of understanding of test reliability, measurement error, the base rates of low scores — for her to withdraw her “expert” reports, apologize for the inconvenience and for wasting everyone’s time, and step down from the witness stand. Expert witnesses cannot be subpoena to testify and they are not only entitled to withdraw their reports and testimony but they have the duty to the court to withdraw flawed reports and testimony.
The SD5 lawyer tried valiantly to obtain an explanation of the base rates problem from Dr. Mary Westcott. Dr. Westcott resisted, became confused, spoke in generalities, served a word salad of psychological jargon, and admitted that she has “never been entirely clear on this argument of Dr. L…”.
SD5 COUNSEL: Okay. And then in the next paragraph, you say Dr. L’s assertion that I use a large number of measures is without foundation. I think we talked about that in Dr. L ‘s before.
DR. MARY WESTCOTT: We did, yeah.
SD5 COUNSEL: Yeah. It says the number of measures I use, 15, is well within, a typical range of high stakes assessments of this sort. He said that the report does not appear to take into account base rates. This is incorrect as I provided a full integration of all test results and I’m well aware of the issues of base rates. So let’s break that down a little bit. So the first one is the number of measures that were used. And when we talk about measures, what are we referring to?
DR. MARY WESTCOTT: These are the tests themselves when we looked at that list. If we were talking about the Wechsler adult intelligence scales, he talks about the DKEFS, and they’re the Wechsler memory scale. I administered a total of 15 tests. Again, not unusual in my training, not unusual with what was the standard in the firm in which I was working, not unusual within the context of vocational assessments being conducted in Alberta at the time, certainly was not unusual in other community and hospital settings that I’ve worked with outside of Mandel and Associates. I don’t know how else to say that, but this is, very typical test list. And in fact, was seen in that repeat assessment that he refers to elsewhere was very comparable in the number of measures that were administered.
…
SD5 COUNSEL: And then how do you talk about the base rates? He said, he says that my report does not appear to take into account base rates. So again, remind us what base rates are and.
DR. MARY WESTCOTT [serving irrelevant word salad of psychological jargon]: Well, the base rates in the general population, I think Dr. L is referring to a publication that he had provided and I think was talked about by another psychologist, you know, as Dr. Suffield became involved. So that was really a conversation between the two of them. You know, he was talking about the WAIS-III and how education level impacts Canadian norms. I didn’t use the WAIS-III. I used the WAIS-IV [CDN]. I did take in base rates and that the fact that these things can move and that’s, that’s bringing in that full integration again, bringing data from multiple sources. So taking into consideration some of these artifacts of testing that might occur, taking in data from behavioral observation, the client’s self-reporting, the report herself, taking into account multiple data sources. So we can minimize the effects of these sort of smaller things that can come up.
SD5 COUNSEL: So explain that to us a little bit further. So, so you’re saying, yeah, you mentioned that the base rates may shift. And then as, so as a result, you’re looking at other sources as well. Is that, is that what I understand what you’re saying?
DR. MARY WESTCOTT: You know, I’ve never been entirely clear on this argument of Dr. L and Dr. Suffield. I would, you know, defer to their, they can debate that. I think, you know, this seems to refer to his research in that area. Yeah. I’ve never really been clear on this, to be honest, and I’m sure this will come up with, you know, with the cross exam. I think this was an issue raised between Dr. L and Dr. Suffield. But looking at base rates and normative samples, range of error, those sorts of things are accounted for in the standard standardized administration, considering the prevalence… I, yeah, I must admit, I didn’t have an opportunity to review this and review those articles again in preparation for this question today. So I apologize that I can’t really elaborate on those.
SD5 COUNSEL: And that’s fine. And I don’t think we have to get into it. Just, just in terms of your response.In terms of what you are saying. So I just want to make sure that we’re clear. So, so you’re, you’re, you’re, when, when in response, the allegation that you don’t take into base, take into account base rates, you say I provide full integration of all test results.
DR. MARY WESTCOTT: Yeah.
Direct Questioning of Dr. Mary Westcott
The SD5 counsel moved on another topic and then the break in questioning was taken. Following the break, Dr. Westcott informed the Tribunal that she had “a chance to look at the base rates question”, that is, to study it up in 10 minutes. The SD5 counsel, presumably hopeful that Dr. Westcott refreshed her knowledge of the base rates problem, proceeded to question Dr. Westcott further as to what she — the expert — found out about the base rates problem during the 10-minute break.
Dr. MARY WESTCOTT: I had a chance to look at the base rates question. I found materials in the I know Uh I can offer some clarification there. Sorry about that I just catching up.
TRIBUNAL: Ok.
SD5 COUNSEL: Okay. Perhaps we should start. So you mentioned that you had a little bit more information on the base rates question.
DR. MARY WESTCOTT: Well, I think just looking through my response and what I was referring there, you know, for every test provided, you know, a base rate by definition is a percentage, right, in isolation of that test result. So describing the percentage of the norm of the sample for the standardized test that present with, you know, a similar, the probability that it was similar scores. And throughout my report, when I’m flipping through, essentially, I report percentiles for everything, including that IQ. I believe that’s what I’m responding to, that clearly, you know, I’ve reported percentiles throughout the assessment. And we’ve talked about that. And then looking, I have it, I didn’t review the entire article that L published on this during the break, but I was just looking at that. And just that was something that perhaps I’m referring to that, of course, I was aware because those percentiles are reported throughout.
SD5 COUNSEL: And what is, what does that percent, what do those percentiles have to do with the base rates? Could you explain that to us?
DR. MARY WESTCOTT: Well, for instance, well, the base rate, like an FSIQ of 86 in the normative sample, only 18% score that low. So that puts her, you know, in that bottom 18th percentile, if it was, you know, her premorbid was at 112, which I think is about the 78th percentile, then she’s performing better than 78 of the people within her age range in the normative sample, going by, you know, how the tests are designed. So, when he’s referring to base rates, those base rates are reported throughout my report based on the normative samples in isolation of those tests, right? Of course, we wouldn’t just go along with the test, but these are probabilities and percentages provided in comparison to that normative sample of each test in the absence of other information, right? If this was simply looking at those base rates. Yeah. I don’t know if that provides tadditional explanation or not.
SD5 COUNSEL: No, that’s helpful. Subject to [Tribunal] Member [name redacted] question, that’s it for my direct examination. Thank you.
Direct Questioning of Dr. Mary Westcott
And there it was. The base rates of multiple/multivariate low scores Dr. L was concerned about are percentile ranks, that is, “probabilities and percentages provided in comparison to that normative sample of each test in the absence of other information”, according to the “expert” Dr. Mary Westcott, Ph.D., R. Psych, the College of Alberta Psychologists’ psychologist. Unfortunately, Dr. Mary Westcott’s answer was false.
An undergraduate university student would earn “F” as for “Fail” in Psychometrics/Psychological Testing 101. The percentage of low scores across multiple tests is not the percentile rank of a single score in the normative sample for that test.
What Does Science Say About The Base Rates Of Low Scores
The science agrees with Dr. L. Here is the sampler of various scientific commentaries on the problem of base rates:
However, when, as is almost always the case, multiple tests are administered, attention should also be paid to how common it will be to exhibit a given number of abnormal tests or test differences from among the overall set of tests administered. The ability to estimate such percentages helps guard against overinferring the presence of impairment. As was shown, when commonly used criteria for abnormality are used, substantial percentages of the healthy population are expected to exhibit one or more abnormal scores or score differences, even when the overall number of tests is relatively modest
Crawford, J. R., Garthwaite, P. H., & Gault, C. B. (2007). Estimating the Percentage of the Population With Abnormally Low Scores (or Abnormally Large Score Differences) on Standardized Neuropsychological Test Batteries: A Generic Method With Application. Neuropsychology, 21(4), 419-430. (see also John R Crawford)
Crawford et al. (2007) made it very easy for a clinician to estimate the number of low scores when multiple tests are administered to examinees. Unfortunately, as pointed out by Dr. L, Crawford’s software does not allow estimation of the number of low scores among the set of 1/2 thousand scores Dr. Westcott obtained with her dragnet “fishing-for-low scores” neuropsychological battery. However, we can be certain that a large number of low scores will occur when a psychologist sifts through 1/2 thousand scores, many from tests with so poor reliability that they should never be used (e.g., some DKEFS tests).
When healthy people complete a battery of tests, a substantial minority will obtain one or more low scores. When considering a single WAIS–IV/WMS–IV subtest in isolation, only 9% of healthy adults will obtain a score of 6 or lower. However, when considering the 20 primary subtests from the WAIS–IV/WMS–IV simultaneously, 61% will have one or more scores of 6 or lower. Low scores in healthy adults and older adults might be attributable to measurement error (broadly defined), normative sample characteristics (i.e., having healthy people, rather than clinical groups, at the lower end of the distribution), long-standing weaknesses in certain areas, fluctuations in motivation and effort, psychological interference, and other situational factors such as inattentiveness, fatigue, or minor illness.[emphasis added] This chapter presents five psychometric principles to consider when interpreting multiple test scores. Low scores are common across all batteries of tests (Principle 1), the number of low scores depends on the cut-off score used (Principle 2), the number of low scores obtained increases with the number of tests administered and interpreted (Principle 3), the number of low scores can change depending on the demographic characteristics of a sample (Principle 4), and different levels of intellectual functioning are inherently going to have different rates of low scores (Principle 5). [emphasis added] Numerous look-up tables that illustrate the base rates of low scores in healthy adults and older adults across different combinations of WAIS–IV and WMS–IV indexes and subtests are provided. Multivariate base rates can be used to determine if a person’s profile (i.e., the number of low scores) is broadly normal or uncommon in healthy people. With this information, clinicians are better able to guard against over-interpreting an isolated low score. Using multivariate base rates improves the diagnostic accuracy and interpretation of the WAIS–IV and WMS–IV.
Brooks, L., Iverson, G. L., & Holdnack, J. A. (2013). Understanding and Using Multivariate Base Ratest with the WAIS-IV/WMS-IV.
Brooks et al. say: “When healthy people complete a battery of tests, a substantial minority [sometimes majority] will obtain one or more low scores.” A simple answer as to what the base rates are. Next, Brooks et al. succinctly (a) explain the problem of the base rates of low scores, (b) explain why low scores occur, including due to measurement error and “other situational factors such as inattentiveness, fatigue, or minor illness” (Brooket at all. did not mention “major illness” nor vomiting, perhaps because it did not occur to them that some psychologists such as Dr. Westcott would proceed to interpret scores obtained while an examinee was ill and vomiting), and (c) state “five psychometric principles to consider when interpreting multiple test scores”.
Dr. Westcott had no idea what the multivariate base rates of low scores were, had no idea why they occur frequently in normal healthy people, and had no idea about the five psychometric principles detailed by Brooks et al. (note that the five psychometric principles are mathematically unavoidable consequences of using multiple test scores).
Objective: Multivariate base rates allow for the simultaneous statistical interpretation of multiple test scores, quantifying the normal frequency of low scores on a test battery. This study provides multivariate base rates for the Delis-Kaplan Executive Function System (D-KEFS).
Method: The D-KEFS consists of 9 tests with 16 Total Achievement scores (i.e. primary indicators of executive function ability). Stratified by education and intelligence, multivariate base rates were derived for the full D-KEFS and an abbreviated four-test battery (i.e. Trail Making, Color-Word Interference, Verbal Fluency, and Tower Test) using the adult portion of the normative sample (ages 16-89).
Results: Multivariate base rates are provided for the full and four-test D-KEFS batteries, calculated using five low score cutoffs (i.e. ≤25th, 16th, 9th, 5th, and 2nd percentiles). Low scores occurred commonly among the D-KEFS normative sample, with 82.6 and 71.8% of participants obtaining at least one score ≤16th percentile for the full and four-test batteries, respectively. Intelligence and education were inversely related to low score frequency.
Conclusions: The base rates provided herein allow clinicians to interpret multiple D-KEFS scores simultaneously for the full D-KEFS and an abbreviated battery of commonly administered tests. The use of these base rates will support clinicians when differentiating between normal variations in cognitive performance and true executive function deficits.
Keywords: D-KEFS; Delis-Kaplan Executive Function System; Executive functions; assessment; multivariate base rates; norms/normative studies.
Karr, J.E., Garcia-Barrera, M.A., Holdnack, J.A., & Iverson, G.L. (2018). Advanced clinical interpreation of the Delis-Kaplan Executive Function System: multivariate base rates of low scores. Clinical Neuropsychology, 32(1), 42-52.
Karr et al. (2017) state that “Multivariate base rates” quantify “the normal frequency of low scores on a test battery.” Karr et al. focused on the multivariate base rates for DKEFS, one of the tests Dr. Westcott’s technicians administered to Ms. T., and reported multivariate base rates for five different cut-offs for “low scores”. They found that “Low scores occurred commonly among the D-KEFS normative sample, with 82.6 and 71.8% of participants obtaining at least one score <=16th percentile for the full and four test batteries, respectively.” Not surprisingly, Ms. T obtained 2 “low” scores, that is, two 7s on DKEFS SS (M = 10, SD = 3), corresponding to 85 on IQ scale, among nearly 30 primary scores. Dr. Westcott, ignorant of the multivariate base rates, proceeded to interpret these two low scores as “difficulties” and speculated about the impact of these low scores on Ms. T’s teaching. Yet, approximately 60% of the normative sample obtained 2 or more scores which were 7 or less. Analyzed by Crawford’s Software for D-KEFS, Ms. T’s Executive Index Score was 109 with 95% CI = (100, 118) (in terms of percentiles 72.6 with 95% CI = (50.9 top 88.00).
Notably, different psychologists use different cut-offs for “low scores” ranging from 2 percentile to 33 percentile (Karr et al. considered 2, 5, 9, 16, and 25th percentiles) and they also use different labels for these “low scores” such as “low average”, “impaired”, etc.. Thus, independently of the base rates problem, an examinee is declared impaired or unimpaired depending on any given psychologist’s personal choice of the cut-off and the label. In turn, the assessment results depend on which psychologist one sees, and often, who is paying the psychologist and what opinions they are paying the psychologist for (see Edens et al. (2012). “Hired Guns”, “Charlatans”, and Their “Voodoo Psychobabble”: Case Law References to Various Forms of Perceived Bias Among Mental Health Expert Witnesses, Psychological Services, 9(3), 259-271.
Conclusions
Multivariate base rates of low scores are not percentile ranks.
Dr. Westcott’s under-oath testimony was false. Dr. Westcott’s demonstrated lack of knowledge of multivariate base rates, lack of knowledge as to what causes them, and lack of knowledge of the basic principles of base rates demonstrate Dr. Westcott’s incompetence, and inability to interpret multiple assessment scores. Dr. Allan Mandel, President of Mandel & Associates Ltd, was involved with and provided “quality control” over Dr. Westcott’s September 15, 2010 Report. In turn, this indicates that Dr. Allan Mandel is similarly ignorant of the problem of multivariate base rates. Presumably, if Dr. Allan Mandel was not ignorant, Dr Allan Mandel would not have approved Dr. Westcott’s September 15, 2010 Report.
It is somewhat disturbing that psychologists such as Dr. Allan Mandel who lacks understanding of essential concepts in psychometric and psychological assessment (e.g., multivariate base rates) are apparently “teaching psychological assessment techniques to graduate students”, that is, to future clinical psychologists, at the University of Calgary (as per Dr. Allan Mandel’s information retrieved Feb. 17, 2025).
Dr. Troy Janzen, the Complaints Director and Deputy Registrar, College of Alberta Psychologists, dismissed Ms. T’s complaints about Dr. Westcott’s incompetence, including Dr. Westcott’s failure to take into account the base rates as was also pointed out by Dr. L and Dr. G. In his decision letter dated September 27, 2022, Dr. Janzen did not address this allegation and did not even mention “base rate” or “base rates” a single time in his entire decision.