Using NumPy functionality
-------------------------

This is a collection of short examples showing how NumPy concepts like indexing, broadcasting and reductions can be used effectively in BioNumPy:

.. code-block:: python

    import bionumpy as bnp
    import numpy as np
    import matplotlib.pyplot as plt
    
    #### Boolean indexing
    intervals = bnp.open("example_data/small_interval.bed").read()
    
    # Get intervals larger than 20
    mask = (intervals.end-intervals.start) > 20
    print(intervals[mask])
    
    
    ### slice-indexing
    # Skip first and last 10 intervals and extract every 2nd
    print(intervals[10:-10:2])
    
    
    ###  List indexing
    reference_sequence = bnp.open("example_data/small_genome.fa.fai")["1"]
    
    # Get indices of all Gs in reference
    indices = np.flatnonzero(reference_sequence[:-1] == "G")
    
    # Get all letters follwing a G
    next_letters = reference_sequence[indices+1]
    
    # Count the letters following a G
    counts = bnp.count_encoded(bnp.as_encoded_sequence_array(next_letters, bnp.DNAEncoding))
    print(counts)
    
    
    ### Broadcasting
    # Get a one_hot_encoding of a sequence
    one_hot_encoding = reference_sequence[:, np.newaxis] == "ACTG"
    print(one_hot_encoding)
    
    
    ### Reductions
    # Get reads from file
    reads = bnp.open("example_data/big.fq.gz").read()
    
    # Get proportion of G's in each sequence
    g_count = (reads.sequence == "G").mean(axis=-1)
    plt.hist(g_count); plt.show()