Filtering FASTQ reads

Before following this tutorial, we assume you have already followed the introduction part of reading files (see Reading files).

The following is an example of a small script that filters FASTQ reads. This example illustrates the use of multiple functions decorated with @streamable(). Each function is designed so that it initially works on one chunk, but with the streamable descorator, we can send chunks from a file and BioNumPy handles the rest for us.

This example also illustrates how to chain multiple functions.

import bionumpy as bnp


def test(file="example_data/big.fq.gz", out_filename="example_data/big_filtered.fq.gz"):
    with bnp.open(out_filename, 'w') as out_file:
        for reads in bnp.open(file).read_chunks():
            min_quality_mask = reads.quality.min(axis=-1) > 1
            max_quality_mask = reads.quality.mean(axis=-1) > 10
            mask = min_quality_mask & max_quality_mask
            print(f'Filtering reads: {len(reads)} -> {mask.sum()}')
            out_file.write(reads[mask])


if __name__ == "__main__":
    test()