Genome arithmetics

The bnp.arithmetics module contains functions for analysing genomic intervals.

API documentation

bionumpy.arithmetics.forbes(chromosome_sizes: ChromosomeSize, intervals_a: Interval, intervals_b: Interval) float[source]

Computes the Forbes similarity index for two sets of intervals.

Parameters

chromosome_sizesChromosomeSize

A ChromosomeSizes, typically from reading a chromosome.sizes file with bnp.open()

intervals_aInterval

Must be sorted. Can be read using bnp.open on a bed-file.

intervals_aInterval

Must be sorted. Can be read using bnp.open on a bed-file.

Returns

float

The forbes similarity index.

Examples

>>> from bionumpy.arithmetics import forbes, sort_intervals
>>> from bionumpy.datatypes import Interval
>>> a = Interval.from_entry_tuples([("chr1", 10, 20), ("chr2", 20, 30)])
>>> b = Interval.from_entry_tuples([("chr2", 15, 25), ("chr1", 10, 40)])
>>> a_sorted = sort_intervals(a, sort_order=["chr1", "chr2"])
>>> b_sorted = sort_intervals(b, sort_order=["chr1", "chr2"])
>>> forbes({"chr1": 100, "chr2": 200}, a_sorted, b_sorted)
5.625
bionumpy.arithmetics.jaccard(chromosome_sizes: ChromosomeSize, intervals_a: Interval, intervals_b: Interval) float[source]

Computes the Jaccard similarity index for two sets of intervals.

Parameters

chromosome_sizesChromosomeSize

A ChromosomeSizes, typically from reading a chromosome.sizes file with bnp.open()

intervals_aInterval

Must be sorted. Can be read using bnp.open on a bed-file.

intervals_aInterval

Must be sorted. Can be read using bnp.open on a bed-file.

Returns

float

The forbes similarity index.

Examples

See forbes for examples.

bionumpy.arithmetics.sort_intervals(intervals: ~bionumpy.bnpdataclass.bnpdataclass.Interval, chromosome_key_function: callable = <function <lambda>>, sort_order: ~typing.List[str] | None = None) Interval[source]

Sort intervals on “chromosome”, “start”, “stop”

Parameters

intervalsInterval

Unsorted intervals

Returns

Interval

Sorted intervals

bionumpy.arithmetics.merge_intervals(*args, **kwargs)
bionumpy.arithmetics.get_pileup(intervals: Interval, chromosome_size: int) GenomicRunLengthArray[source]

Get the number of intervals that overlap each position of the chromosome/contig

This uses run length encoded arrays to handle the sparse data that we get from intervals.

Parameters

intervalsInterval,

Intervals on the same chromosome/contig

chromosome_sizeint

size of the chromsome/contig

Examples

>>> from bionumpy.datatypes import Interval
>>> from bionumpy.arithmetics import get_boolean_mask, get_pileup
>>> intervals = Interval(["chr1", "chr1", "chr1"], [3, 5, 10], [8, 7, 12])
>>> pileup = get_pileup(intervals, 20)
>>> print(pileup)
[0 0 0 1 1 2 2 1 0 0 1 1 0 0 0 0 0 0 0 0]
bionumpy.arithmetics.get_boolean_mask(intervals: Interval, chromosome_size: int)[source]

Get a boolean mask representing where any inteval hits

Uses run length encoded binary arrays to represent the areas covered by any interval. The mask that is returned supports numpy ufuncs, so that you can run logical operations on them s.a. & | ~ and also numpy indexing so you can use it to filter positions and intervals.

Parameters

intervalsInterval

Intervals on the same chromosome/contig

chromosome_sizeint

The size of the chromosome/contig

Examples

>>> intervals = Interval(["chr1", "chr1", "chr1"], [3, 5, 10], [8, 7, 12])
>>> print(intervals)
Interval with 3 entries
               chromosome                    start                     stop
                     chr1                        3                        8
                     chr1                        5                        7
                     chr1                       10                       12
>>> mask = get_boolean_mask(intervals, 20)
>>> print(mask.astype(int))
[0 0 0 1 1 1 1 1 0 0 1 1 0 0 0 0 0 0 0 0]

Get complement of the mask:

>>> complement = ~mask
>>> print(complement.astype(int))
[1 1 1 0 0 0 0 0 1 1 0 0 1 1 1 1 1 1 1 1]

Get the intersections (&) and union (|) of the mask and another mask

>>> other_mask = get_boolean_mask(Interval(["chr1"], [9], [15]), 20)
>>> intersection = mask & other_mask
>>> print(intersection.astype(int))
[0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0]
>>> union = mask | other_mask
>>> print(union.astype(int))
[0 0 0 1 1 1 1 1 0 1 1 1 1 1 1 0 0 0 0 0]

Find wether some positions overlap the mask: >>> print(other_mask[intervals.start]) [False False True]