Thought you all might be interested in this, if you haven’t already seen it. Feel free to ignore if not. I should start collecting papers like this … -- Bert Bert Gunter Genentech Nonclinical Biostatistics Hi Folks: Attached is a paper from Nature Neuroscience discussing some fundamental flaws in the way neuroscientists (but not really **just** neuroscientists) do statistical analysis. Statistically, this all falls into the familiar category of “overfitting, ” examples of which are as prosaic as using unadjusted P-Values from stepwise multiple regression to choose “important” variables, to use of the same – or at least nonindependent – test and training datasets to both fit and estimate accuracy of classifiers, to some somewhat more subtle manifestations of the phenomenon discussed in this paper. I would claim that the message of this paper is that: 1. The malpractice is widespread (beyond neuroscience certainly) with potentially serious impact on the rigor and reliability of scientific publications across a broad range of disciplines, but especially in complex ones like ecology, biology, and the social sciences that are inherently empirical and do not have parsimonious mechanistic models to rely on (such as physics or electrical engineering); 2. It is especially associated with and driven by advances in technology (e.g. imaging, microarray, multichannel sensors) that up the dimensionality and volume of data; 3. It highlights a fundamental gap in the statistical understanding and training of scientists that compromises their ability to do good science. Summarizing: This paper argues for skepticism in accepting claims of scientists based on their versions of appropriate statistical analysis (at least in “similar” circumstances to those the paper describes). But one needs to balance such skepticism with the need to build on the results of others to do science. How does one balance these issues? Anyway, enjoy. Feedback/comments/disagreement welcome, as always -- Bert