Pandas Interoperability
The bionumpy package is designed to work well with pandas. This is useful when you want to use the powerful data manipulation tools provided by pandas on the data you read with bionumpy.
It is straightforward to convert a chunk of a file or a whole file to a Pandas DataFrame:
import bionumpy as bnp
f = bnp.open("example_data/big.fq.gz")
for chunk in f.read_chunks():
df = chunk.topandas()
print(df)
name sequence quality
0 2fa9ee19-5c51-4281-abdd-ea... CGGTAGCCAGCTGCGTTCAGTATGGA... [10, 5, 5, 12, 5, 4, 3, 4,...
1 1f9ca490-2f25-484a-8972-d6... GATGCATACTTCGTTCGATTTCGTTT... [5, 4, 5, 4, 6, 6, 5, 6, 1...
2 06936a64-6c08-40e9-8a10-0f... GTTTTGTCGCTGCGTTCAGTTTATGG... [3, 5, 6, 7, 7, 5, 4, 3, 3...
3 d6a555a1-d8dd-4e55-936f-ad... CGTATGCTTTGAGATTCATTCAGGAG... [2, 3, 4, 4, 4, 4, 6, 6, 7...
4 91ca9c6c-12fe-4255-83cc-96... CGGTGTACTTCGTTCCAGCTAGATTT... [4, 3, 5, 6, 3, 5, 6, 5, 5...
.. ... ... ...
995 2eef382a-21f7-4a5b-a8d8-64... CGTTTGCGCTGGTTCATTTTATCGGT... [2, 6, 2, 2, 4, 2, 3, 3, 4...
996 18949e40-d30d-49f7-8a1c-c2... GCGTACTTCGTTCAGTTTCGGAAGTG... [2, 2, 2, 3, 3, 3, 5, 7, 7...
997 f4aeadf5-174e-4974-aef1-8b... CAGTAATACTTCGTTCCAGTTCTGGG... [9, 6, 11, 10, 2, 3, 3, 3,...
998 6b3cb23e-3f71-435b-835f-78... CTGTTGTACTTCGATTCATTCAGGTG... [5, 3, 3, 5, 6, 4, 5, 3, 5...
999 d65b5418-65d5-4bf3-aac8-aa... CGGTGACGCTGGTTTAAATCTAACGG... [7, 3, 4, 3, 4, 2, 2, 3, 3...
[1000 rows x 3 columns]
Similarily, you can convert a pandas dataframe to a BnpDataclass object:
import bionumpy as bnp
import pandas as pd
df = pd.DataFrame({
"name": ["read1", "read2", "read3"],
"sequence": ["ACGT", "TGCA", "ACGT"],
})
print(bnp.datatypes.SequenceEntry.from_data_frame(df))
SequenceEntry with 3 entries
name sequence
read1 ACGT
read2 TGCA
read3 ACGT