Working with BAM-files
Before following this tutorial, we assume you have already followed the introduction part of reading files (see Reading files).
The following example show how to handle bam files. It checks whether base pairs pairs that are softclipped in alignments differ from the base pairs in the rest of the reads. It reads in a .bam file and uses the cigar_op attribute to find softclipped reads.
import bionumpy as bnp
from npstructures import ragged_slice
def test_bamquality():
# Open the aligments file
alignments = bnp.open("example_data/test.bam").read()
# Extract the first cigar operation for each alignment
start_cigar = alignments.cigar_op[..., 0]
# Get aligments that start with soft-clip
start_clipped_alignments = alignments[start_cigar == "s"]
# Get the number of softclipped bases
n_clipped_bases = start_clipped_alignments.cigar_length[..., 0]
# Extract clipped bases
clipped_bases = ragged_slice(start_clipped_alignments.sequence,
ends=n_clipped_bases)
# Count bases in softclipped regions
print(bnp.count_encoded(clipped_bases.ravel()))
# Count bases in whole reads
print(bnp.count_encoded(alignments.sequence.ravel()))
if __name__ == "__main__":
test_bamquality()