Genome arithmetics
The bnp.arithmetics module contains functions for analysing genomic intervals.
API documentation
- bionumpy.arithmetics.forbes(chromosome_sizes: ChromosomeSize, intervals_a: Interval, intervals_b: Interval) float [source]
Computes the Forbes similarity index for two sets of intervals.
- chromosome_sizesChromosomeSize
A ChromosomeSizes, typically from reading a chromosome.sizes file with
- intervals_aInterval
Must be sorted. Can be read using on a bed-file.
- intervals_aInterval
Must be sorted. Can be read using on a bed-file.
- float
The forbes similarity index.
>>> from bionumpy.arithmetics import forbes, sort_intervals >>> from bionumpy.datatypes import Interval >>> a = Interval.from_entry_tuples([("chr1", 10, 20), ("chr2", 20, 30)]) >>> b = Interval.from_entry_tuples([("chr2", 15, 25), ("chr1", 10, 40)]) >>> a_sorted = sort_intervals(a, sort_order=["chr1", "chr2"]) >>> b_sorted = sort_intervals(b, sort_order=["chr1", "chr2"]) >>> forbes({"chr1": 100, "chr2": 200}, a_sorted, b_sorted) 5.625
- bionumpy.arithmetics.jaccard(chromosome_sizes: ChromosomeSize, intervals_a: Interval, intervals_b: Interval) float [source]
Computes the Jaccard similarity index for two sets of intervals.
- chromosome_sizesChromosomeSize
A ChromosomeSizes, typically from reading a chromosome.sizes file with
- intervals_aInterval
Must be sorted. Can be read using on a bed-file.
- intervals_aInterval
Must be sorted. Can be read using on a bed-file.
- float
The forbes similarity index.
See forbes for examples.
- bionumpy.arithmetics.sort_intervals(intervals: ~bionumpy.bnpdataclass.bnpdataclass.Interval, chromosome_key_function: callable = <function <lambda>>, sort_order: ~typing.List[str] = None) Interval [source]
Sort intervals on “chromosome”, “start”, “stop”
- intervalsInterval
Unsorted intervals
- Interval
Sorted intervals
- bionumpy.arithmetics.merge_intervals(*args, **kwargs)
- bionumpy.arithmetics.get_pileup(intervals: Interval, chromosome_size: int) GenomicRunLengthArray [source]
Get the number of intervals that overlap each position of the chromosome/contig
This uses run length encoded arrays to handle the sparse data that we get from intervals.
- intervalsInterval,
Intervals on the same chromosome/contig
- chromosome_sizeint
size of the chromsome/contig
>>> from bionumpy.datatypes import Interval >>> from bionumpy.arithmetics import get_boolean_mask, get_pileup >>> intervals = Interval(["chr1", "chr1", "chr1"], [3, 5, 10], [8, 7, 12]) >>> pileup = get_pileup(intervals, 20) >>> print(pileup) [0 0 0 1 1 2 2 1 0 0 1 1 0 0 0 0 0 0 0 0]
- bionumpy.arithmetics.get_boolean_mask(intervals: Interval, chromosome_size: int)[source]
Get a boolean mask representing where any inteval hits
Uses run length encoded binary arrays to represent the areas covered by any interval. The mask that is returned supports numpy ufuncs, so that you can run logical operations on them s.a. & | ~ and also numpy indexing so you can use it to filter positions and intervals.
- intervalsInterval
Intervals on the same chromosome/contig
- chromosome_sizeint
The size of the chromosome/contig
>>> intervals = Interval(["chr1", "chr1", "chr1"], [3, 5, 10], [8, 7, 12]) >>> print(intervals) Interval with 3 entries chromosome start stop chr1 3 8 chr1 5 7 chr1 10 12 >>> mask = get_boolean_mask(intervals, 20) >>> print(mask.astype(int)) [0 0 0 1 1 1 1 1 0 0 1 1 0 0 0 0 0 0 0 0]
Get complement of the mask:
>>> complement = ~mask >>> print(complement.astype(int)) [1 1 1 0 0 0 0 0 1 1 0 0 1 1 1 1 1 1 1 1]
Get the intersections (&) and union (|) of the mask and another mask
>>> other_mask = get_boolean_mask(Interval(["chr1"], [9], [15]), 20) >>> intersection = mask & other_mask >>> print(intersection.astype(int)) [0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0] >>> union = mask | other_mask >>> print(union.astype(int)) [0 0 0 1 1 1 1 1 0 1 1 1 1 1 1 0 0 0 0 0]
Find wether some positions overlap the mask: >>> print(other_mask[intervals.start]) [False False True]