A linear-time algorithm that avoids inverses and computes Jackknife (leave-one-out) products like convolutions or other operators in commutative semigroups

Abstract Background Data about herpesvirus microRNA motifs on human circular RNAs suggested the following statistical question. Consider independent random counts, not necessarily identically distributed. Conditioned on the sum, decide whether one of the counts is unusually large. Exact computation...

Full description

Bibliographic Details
Main Authors: John L. Spouge, Joseph M. Ziegelbauer, Mileidy Gonzalez
Format: Article
Language:English
Published: BMC 2020-09-01
Series:Algorithms for Molecular Biology
Subjects:
Online Access:http://link.springer.com/article/10.1186/s13015-020-00178-x
Description
Summary:Abstract Background Data about herpesvirus microRNA motifs on human circular RNAs suggested the following statistical question. Consider independent random counts, not necessarily identically distributed. Conditioned on the sum, decide whether one of the counts is unusually large. Exact computation of the p-value leads to a specific algorithmic problem. Given $$n$$ n elements $$g_{0} ,g_{1} , \ldots ,g_{n - 1}$$ g 0 , g 1 , … , g n - 1 in a set $$G$$ G with the closure and associative properties and a commutative product without inverses, compute the jackknife (leave-one-out) products $$\bar{g}_{j} = g_{0} g_{1} \cdots g_{j - 1} g_{j + 1} \cdots g_{n - 1}$$ g ¯ j = g 0 g 1 ⋯ g j - 1 g j + 1 ⋯ g n - 1 ( $$0 \le j < n$$ 0 ≤ j < n ). Results This article gives a linear-time Jackknife Product algorithm. Its upward phase constructs a standard segment tree for computing segment products like $$g_{{\left[ {i,j} \right)}} = g_{i} g_{i + 1} \cdots g_{j - 1}$$ g i , j = g i g i + 1 ⋯ g j - 1 ; its novel downward phase mirrors the upward phase while exploiting the symmetry of $$g_{j}$$ g j and its complement $$\bar{g}_{j}$$ g ¯ j . The algorithm requires storage for $$2n$$ 2 n elements of $$G$$ G and only about $$3n$$ 3 n products. In contrast, the standard segment tree algorithms require about $$n$$ n products for construction and $$\log_{2} n$$ log 2 n products for calculating each $$\bar{g}_{j}$$ g ¯ j , i.e., about $$n\log_{2} n$$ n log 2 n products in total; and a naïve quadratic algorithm using $$n - 2$$ n - 2 element-by-element products to compute each $$\bar{g}_{j}$$ g ¯ j requires $$n\left( {n - 2} \right)$$ n n - 2 products. Conclusions In the herpesvirus application, the Jackknife Product algorithm required 15 min; standard segment tree algorithms would have taken an estimated 3 h; and the quadratic algorithm, an estimated 1 month. The Jackknife Product algorithm has many possible uses in bioinformatics and statistics.
ISSN:1748-7188