Uneven reading time within a SRA file

I am downloading SRA files with prefetch and then reading them in C++ by iterating over ngs::ReadCollection. All calculations are running in the Cloud on AWS. I have found a small number of seeingly "pathological" SRA runs that take longer to read as the iteration progresses through the file. 

For example, the graph below shows the time required to read sequential, 0.1% chunks of ERR3212419 (where the x-axis is the cumulative number of reads read as a percentage of the total number of reads in the SRA run). As shown in the graph, the first 16% of reads can be read from disk relatively quickly (approximately 2 seconds per 0.1% chunk). However, the time to read the same number of reads then jumps to approximately 12 seconds, and then jumps again to over 100 seconds. (I stopped after loading 21% of the reads).

Is there a way to read this SRA record (and records like it), so that the time required to read different parts of the file is even? This is important because I would like to read SRA records (from disk) in parallel, and the uneven time-to-read makes for significant load imbalances. In this example, parallel workers reading near the beginning of the file finish much faster than the parallel worker reading near the end of the file.
 
<img width="515" alt="image" src="https://user-images.githubusercontent.com/36452131/85056356-520e5500-b15c-11ea-8336-ea94c8d96d72.png">


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uneven reading time within a SRA file #29

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uneven reading time within a SRA file #29

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions