Tips for Biostatisticians Collaborating with Non-Biostatistician Medical Researchers

Frank Harrell

Department of Biostatistics
Vanderbilt University School of Medicine

2024-07-30

Consultation vs. Collaboration

Consultation

  • No prior relationship with investigator
  • Short-term time investment
  • Insufficient time to really understand measurements and medical background
  • Biostatistician is not usually a co-author
  • Investigator may be doing her own statistical analyses

Collaboration

  • Optimal when working with the investigator on multiple projects over a long time span - your knowledge about the medical area
  • Long-term investment (months to decades)
  • Time to understand medical background and measurements
  • Time to build trust and optimize division of labor

Collaboration, continued

  • Carefully establish who is responsible for which type of thinking
  • Biostatistician needs to understand 11% of what the investigator knows about the problem
  • Investigator needs to understand 11% of the biostatistical methods used
  • Biostatistician is very often a co-author

Distinguishing Consultation from Collaboration

  • Offer frequent consultation hours as a research community service
  • At Vanderbilt we have offered daily biostat clinics since 2005
    • Each day has a different theme
  • Assist investigator teams in any way we can during the hour
  • Best usage: initial brainstorming about measurements & research ideas, critique survey drafts, abandoning futile projects
  • Direct hallway consultations from investigators to a clinic
  • Let everyone know that everything else is collaboration

Optimal Collaboration

Bring Statistical Principles to Collaboration

  • Use methods that have been shown to work by simulation & theory
  • Understand uncertainty and account for it by including parameters for things you don't know
  • Design experiments to maximize information
  • Understand how measurements were made
  • Be more interested in questions and estimation than hypothesis testing
  • Verify that the sample size will support the intended analysis

Statistical Principles, continued

  • Use all the information in the raw data during analysis
  • Watch out for procedures that easily declare noise as signal
  • Present information in an intuitive way that maximizes information content and leads to correct perceptions
  • Make statistical analysis and reports 100% reproducible
    • Entire report (including tables and graphs) regenerated with a single command
    • No interactive statistical computing that guides later analyses or provides analytical results

Keys to Optimal Collaboration

  • Expect to be respected
  • Be surprised when you're not, and demand respect
  • If early-career, have backup of senior biostatisticians
  • Use nice ways to message to investigators that you are here because of your biostatistics expertise, and you don't expect to make decisions about medical principles

Keys to Optimal Collaboration, continued

  • It is never acceptable to choose a statistical method because the investigator used that in the past or their intended journal tends to use that method
  • Statistical approaches are chosen on the basis of their being tailored to the type of data and goals of the research
  • Having both investigators and biostatisticians analyzing the data almost never works
    • Impossible to figure out who did what in a manuscript
    • Biostatisticians are responsible for the accuracy of all computed quantities, tables, and graphics

Keys to Optimal Collaboration, continued

  • Always question endpoint
  • Endpoints must preserve information in the raw data
    • Don't dichotomize an ordinal or continuous Y (or X)
  • When a clinical investigator states that a certain categorization has been validated, it never has been

Examples of Bad Endpoints

  • Change from baseline and % change
  • Time until the first of several types of events
    • Especially when some events are recurrent or events have differing severities
  • Time to recovery
    • Ignores unrecovery, close calls, and can't handle interrupting events
  • Time until a lab value is in a normal or an abnormal range
  • Time to doubling of serum creatinine

Examples of Bad Endpoints, continued

  • Acute kidney injury (standard AKI definitions)
  • Ventilator-free days
  • Most ratios
  • BMI when it doesn't adequately summarize weight and height
  • Not Y=BMI; analyze weight, covariate adjusted for initial weight, height, age

General Considerations for Endpoints

  • Don't use Y that means different things to different subjects
    • E.g.: impact of time to doubling of SCr depends on initial SCr
    • Time to recovery must be shorter for minimally diseased pts
  • Instead of change from baseline use raw response and covariate adjust for baseline
  • Treat longitudinal data as longitudinal
  • See hbiostat.org/endpoint

Crowdsourcing Design & Analysis Planning

  • datamethods.org
  • Place where methodologists and subject matter investigators meet
  • Many clinical investigators, clinical trialists, and biomarker researchers have long discussions with biostatisticians, epidemiologists, health services researchers, etc.

Overall Key to Optimal Collaboration

  • Your job is not to give the investigator what she wants
  • Your job is to give her what she needs
  • Over a long collaboration you teach the collaborator to want what she needs

More Information

Usage: marp --html talk.md

See https://www.hashbangcode.com/article/seven-tips-getting-most-out-marp