12–17 Jul 2026
University of Graz
Europe/Vienna timezone

A Mathematical Framework for Per-Read and Per-Sequence Error Characterization in Single-Molecule Sequencing

14 Jul 2026, 18:30
2h
University of Graz

University of Graz

Poster Numerical, Computational, and Data-Driven Methods Poster Presentations

Speaker

Pranjal Srivastava (University of Michigan)

Description

We present a mathematical framework for quantifying basecalling error at multiple scales in single-molecule (nanopore) sequencing, from individual bases to whole-sequence classification.
We define hierarchical Phred-like quality scores — per-base, per-read, and per-sequence — and prove via Jensen's inequality that averaging in the Phred domain systematically overestimates accuracy relative to the Phred transform of the mean error. This concavity-driven bias has direct consequences for quality reporting. We address it with an alignment-based correctness score incorporating predicted basecaller confidence and empirical accuracy into a single Phred-scale summary.
We then lift the analysis to sequences by constructing a confusion matrix indexed by true targets and basecaller assignments. Row-normalization produces a stochastic matrix of pairwise misclassification probabilities; its Frobenius distance from the identity yields a Phred-like scalar of classifier fidelity. Clopper–Pearson confidence intervals from the multinomial row structure, with minimum read-count bounds, ensure reliable estimation of rare confusions. A bridge between scales is provided by per-base reliability zones: under conditional independence, the product of position-specific error rates at discriminating sites predicts which off-diagonal entries dominate, enabling anticipation of sequence-level misclassification from basewise profiles

Author

Pranjal Srivastava (University of Michigan)

Co-authors

Brian Athey (University of Michigan) Gregory Farnum (University of Michigan) Haoran Li (University of Michigan) Mingze Sun (University of Michigan)

Presentation materials

There are no materials yet.