Filtering FASTQ reads
Before following this tutorial, we assume you have already followed the introduction part of reading files (see Reading files).
The following is an example of a small script that filters FASTQ reads. This example illustrates the use of multiple functions decorated with @streamable(). Each function is designed so that it initially works on one chunk, but with the streamable descorator, we can send chunks from a file and BioNumPy handles the rest for us.
This example also illustrates how to chain multiple functions.
import bionumpy as bnp
def test(file="example_data/big.fq.gz", out_filename="example_data/big_filtered.fq.gz"):
with bnp.open(out_filename, 'w') as out_file:
for reads in bnp.open(file).read_chunks():
min_quality_mask = reads.quality.min(axis=-1) > 1
max_quality_mask = reads.quality.mean(axis=-1) > 10
mask = min_quality_mask & max_quality_mask
print(f'Filtering reads: {len(reads)} -> {mask.sum()}')
out_file.write(reads[mask])
if __name__ == "__main__":
test()