Thoughts About the Roles of AI for Statistics

Frank Harrell

Department of Biostatistics
Vanderbilt University School of Medicine

Main Roles for LLMs/AI for Statistics

  • Learning new focused statistical methods on demand
  • Critiquing your understanding of methods
  • Critiquing draft talks and papers
  • Mathematical statistics assistant
  • Developing statistical analysis plans (with trepidation)
  • Developing simulations to test performance of proposed methods
  • Coding

Coding

  • Claude Projects and Claude Code are too good not to use for coding
  • Claude can write long, complex code to do difficult tasks
  • It can re-write the code repeatedly to handle a increasing list of desired options
  • Best to assume the code contains a critical error
  • Claude can assist in finding errors
  • When finished, it can easily re-write in another language if better suited

Our Roles in Coding

  • Specifying task goals, inputs, and preferred coding language
  • AI produces first draft of code
  • Interact extensively with AI to develop code
  • Primary human roles: specification and extensive testing

Coding Successes

  • Translating old Fortran code used in R packages to modern Fortran
  • Translating C++ code to Fortran
  • Creating general-purpose utility scripts
    • stitch_course which produced this
    • prprint which pretty-prints R code and symbol tables with markers for start and end of {} blocks
    • Adding interactive search to an existing long web page
    • Create an index file for a folder with clickable links and ability to sort entries by a date contained within the file bodies

Coding Successes, continued

  • AI processed an image in a published paper and wrote R code to recreate the image

Can We Be Lazy Programmers?

  • Becoming an R expert saves time in the long run
  • It also prepares you for being able to detect and fix errors in AI-generated code
  • Best approach for those using R per week
  • For causual users, AI may be relied upon
  • For beautiful concise syntaxes such as data.table we can use AI for first drafts

AI for Methods Development

  • AI is my math stat assistant, e.g., matrix calculus
  • Claude helpful in exploring theoretical effects of a model's lack of fit
  • Generate simulation code to check performance of a method

Learning Statistical Methods Fundamentals vs AI on Demand

  • Example: Regression Modeling Strategies book and 4-day course
  • Claude RMS skills
  • Difficult to have a complete perspective when learning piecemeal
  • You can bring your knowledge to a situation faster than you can interact with AI
  • Relying solely on AI makes it difficult to catch mistakes in statistical analysis choices

Is Using AI Better Than Trusting Your Memory?

  • For casual users of statistical models (and stat in general), using AI to create analysis plans is not a good idea if you have a statistical professional as a collaborator
  • If you don't, and you don't have the time to invest in systematic learning, using AI is better than not using it

Big Picture

  • Analysts not engaged in lifelong learning are already replaceable by AI
  • Embrace AI to make yourself a better researcher and analyst
  • Develop more nuanced and detailed statistical analysis plans
  • Develop and use simulations far more often to validate performance of proposed approaches
  • Evaluate everything you do using
    • AI-assisted research
    • AI-built simulations

Summary

Use AI to raise the bar for productivity, efficiency, problem-solving, reliability

More Information

  • hbiostat.org/LLM
  • github.com/harrelfe/skills
    • Example Claude chat: hbiostat.org/LLM/chat1
  • hbiostat.org/rflow/long#sec-long-ai
  • github.com/harrelfe/rscripts
  • fharrell.com/post/mle

Usage: marp --html ai.md