But almost everything considered infallible has proven extremely fallible when subjected to rigorous examination by zealous defense attorneys and, more importantly, courts willing to actually question the things said to them by law enforcement.
It’s the questioning that matters. When courts do it, precedent is created and guidelines for evidence submission are established. That’s why this decision [PDF] — handed down by the Maryland Supreme Court — matters. It not only questions the assumptions about ballistics evidence, it demands better evidence in the future, if cops want to use so-called bullet-matching to lock people up. (h/t Short Circuit)
Here’s the lead-in to the court’s 128-page decision:
At the trial of the petitioner, Kobina Ebo Abruquah, the Circuit Court for Prince George’s County permitted a firearms examiner to testify, without qualification, that bullets left at a murder scene were fired from a gun that Mr. Abruquah had acknowledged was his. Based on reports, studies, and testimony calling into question the reliability of firearms identification analysis, Mr. Abruquah contends that the circuit court abused its discretion in permitting the firearms examiner’s testimony. The State, relying on different studies and testimony, contends that the examiner’s opinion was properly admitted.
Applying the analysis required by Rochkind v. Stevenson, 471 Md. 1 (2020), we conclude that the examiner should not have been permitted to offer an unqualified opinion that the crime scene bullets were fired from Mr. Abruquah’s gun. The reports, studies, and testimony presented to the circuit court demonstrate that the firearms identification methodology employed in this case can support reliable conclusions that patterns and markings on bullets are consistent or inconsistent with those on bullets fired from a particular firearm. Those reports, studies, and testimony do not, however, demonstrate that that methodology can reliably support an unqualified conclusion that such bullets were fired from a particular firearm.
The court notes that most courts have not challenged this form of evidence, despite it being in regular use since 1906. The science (such as it were) is this: each gun manufactured has certain distinct imperfections in the barrel. These imperfections mark the outside of bullets as they travel through the barrel. Thus, anyone with a decent microscope and access to the gun and the bullets can verify whether or not the bullets were fired from this particular gun.
For the most part, that quasi-scientific theory has done what the government has wanted it to do: secure convictions. It’s only in recent years that actual scientists (rather than the ones employed by law enforcement entities) have questioned the presumed uniqueness of marks left on bullets by gun barrels.
Leading off its criticism of assuming ballistics evidence is good evidence, the court cites a 2009 report by the National Research Council (NRC) of the National Academies of Science. That report said the standards created by the Association of Firearm and Tool Mark Examiners (ATFE) were faulty because so much of what was assumed to be scientifically sound was little more than examiners’ subjective interpretations of marks found on bullets.
With respect to firearms identification specifically, the NRC criticized the AFTE Theory as lacking specificity in its protocols; producing results that are not shown to be accurate, repeatable, and reproducible; lacking databases and imaging that could improve the method; having deficiencies in proficiency training; and requiring examiners to offer opinions based on their own experiences without articulated standards.
In particular, the lack of knowledge “about the variabilities among individual tools and guns” means that there is an inability of examiners “to specify how many points of similarity are necessary for a given level of confidence in the result.” Indeed, the NRC noted, the AFTE’s guidance, which is the “best . . . available for the field of toolmark identification, does not even consider, let alone address, questions regarding variability, reliability, repeatability, or the number of correlations needed to achieve a given degree of confidence.”
Not great. And that conclusion was presented to the scientific community (which includes cop labs) nearly 15 years ago. But the ATFE changed nothing, despite even the federal government (via the President’s Council of Advisors of Science and Technology [PCAST]) opening questioning several law enforcement forensic techniques.
With respect to firearms identification specifically, PCAST described the AFTE Theory as a “circular” method that lacks “foundational validity” because appropriate studies had not confirmed its accuracy, repeatability, and reproducibility. PCAST concluded that the studies performed to that date, with one exception, were not properly designed, had severely underestimated the false positive and false negative error rates, or otherwise “differ[ed] in important ways from the problems faced in casework.” Among other things, PCAST noted design flaws in existing studies, including: (1) many were not “black-box” studies, and (2) many were closed-set studies, in which comparisons are dependent upon each other and there is always a “correct” answer within the set…
The government cites studies showing minuscule error rates by ATFE examiners, with one study showing a near-zero rate of false positives. But the court says these were tests controlled by the ATFE where examiners knew they were being tested and every test set included a test bullet fired by the test gun.
In “black box” studies (only two appear to meet this description), ATFE examiners fared much worse. Test sets did not always contain a match. Examiners didn’t know they were being tested. In those tests, the error rate was exponentially higher: more than a third of the matches were declared “inconclusive.” In the other test, positive results (i.e., supposed matches) varied as much as 15% between sets of examiners. Negative results (non-matches) varied nearly as much: 13-14% between sets of examiners over two rounds of testing.
Is being right 74-80% of the time the evidentiary standard in the US criminal justice system? Obviously, it shouldn’t be. But it has been because examiners routinely overstated the confidence of their findings and the US court system basically never bothered to wonder if the experts might be wrong.
Having successfully challenged the expertise of the expert — in this case, the firearms examiner who tested the gun belonging to the suspect — the defense brought its own expert. Unsurprisingly, this expert agreed with actual scientists, rather than the ones cops rely on to generate evidence for criminal prosecutions.
Mr. Abruquah presented testimony and an extensive affidavit from David Faigman, Dean of the University of California Hastings College of Law, whom the court accepted as an expert in statistical and methodological bases for scientific evidence, including research design, scientific research, and methodology. Dean Faigman discussed several concerns with the validity of the AFTE Theory, which were principally premised on the subjective nature of the methodology, including: (1) the difference in error rates between closed- and open-set tests; (2) potential biases in testing that might skew the results in studies, including (a) the “Hawthorne effect,” which theorizes that participants in a test who know they are being observed will try harder; and (b) a bias toward selecting “inconclusive” responses in testing when examiners know it will not be counted against them, but that an incorrect “ground truth” response will; (3) an absence of pre-testing and control groups; (4) the “prior probability problem,” in which examiners expect a certain result and so are more likely to find it; and (5) the lack of repeatability and reproducibility effects.
Dean Faigman agreed with PCAST that the Ames I Study “generally . . . was the right approach to studying the subject.” He observed, however, that if inconclusives were counted as errors, the error rate from that study would “balloon” to over 30%. In discussing the Ames II Study, he similarly opined that inconclusive responses should be counted as errors. By not doing so, he contended, the researchers had artificially reduced their error rates and allowed test participants to boost their scores. By his calculation, when accounting for inconclusive answers, the overall error rate of the Ames II Study was 53% for bullet comparisons and 44% for cartridge case comparisons—essentially the same as “flipping a coin.”
From 75-80% certainty to a coin flip. That’s not evidence. That’s law enforcement agencies believing (and talking courts into believing) anything is science as long as it involves microscopes and decimal points. That’s not good enough, and another court in Maryland — in a case involving this same firearms expert — has already restricted the government from writing checks its pseudoscience can’t cash.
Following issuance of the PCAST Report, some courts have imposed yet more stringent limitations on testimony. One example of that evolution—notable because it involved the same judicial officer as Willock, Judge Grimm, as well as the same examiner as here, Mr. McVeigh—is in United States v. Medley, (D. Md. Apr. 24, 2018). In Medley, Judge Grimm thoroughly reviewed the state of knowledge at that time concerning firearms identification, including developments since his report and recommendation in Willock. Judge Grimm restricted Mr. McVeigh to testifying only “that the marks that were produced by the . . . cartridges are consistent with the marks that were found on the” recovered firearm, and precluded him from offering any opinion that the cartridges “were fired by the same gun” or expressing “any confidence level” in his opinion.
Much more evidence that undercuts the certainty of ATFE examiners is presented by the court, leading to this, which makes it clear the stuff the government has done for more than a century is no longer acceptable in Maryland courts.
We conclude, however, for reasons discussed above, that although the studies and other information in the record support the use of the AFTE Theory to reliably identify whether patterns and lines on bullets of unknown origin are consistent with those known to have been fired from a particular firearm, they do not support the use of that methodology to reliably opine without qualification that the bullets of unknown origin were fired from the particular firearm.
That’s the upshot of the decision. The state can still bring in its firearms “experts.” However, they’ll be limited in what they can say in terms of conclusions. If they still wish to testify, they’ll have to acknowledge their work is somewhere between 15% wrong and a coin flip. And that’s what jurors will factor into their discussions about a person’s guilt or innocence.
It doesn’t actually create solid guidelines but it does at least tell the government what isn’t tolerable. And it’s far more than any other courts in the nation have done, even after having questioned the veracity of supposed “expert” testimony. Hopefully, this will spread to other courts dealing with the same sort of junk science and this will finally force examiners to engage in actual science, rather than relying on subjective takes and foregone conclusions.