Announcing Illuminate, a python module for Illumina sequencing metrics
Yesterday, my company, InVitae, allowed me to publicly open-source a library I’ve been working on for the past few months, a python module that provides programmatic access to the binary-formatted metrics output by sequencing runs on the Illumina HiSeq and MiSeq machines.
The aim of this tool was to provide a programmatic interface to the metrics resultant from MiSeq and HiSeq genetic sequencing runs. Up until now, there hasn’t been an Illumina metrics tool you could integrate into your automation pipeline — only Windows-based viewers where the data is stuck on the screen, online services requiring you to send your data over the wire, and a few tentative command-line efforts that feel more like proof-of-concept than tool.
You can use Illuminate as a command-line tool to “illuminate” your MiSeq and HiSeq runs. There’s some support for outputting to file but nothing fancy. (I wrote the CLI using docopt, which made the process actually fun! I hope docopt becomes part of standard python.)
But the real strength of Illuminate is its object-oriented encapsulation of the individual metrics and standardized approach to data delivery. Each parser delivers both a raw data dictionary and a pandas DataFrame, allowing researchers and bioinformaticians a familiar method of data manipulation (pandas feels a lot like R).
When I post something so specialized to a general-purpose personal blog like this one, I can’t help but feel a bit like Rick in one of my favorite episodes of The Young Ones:
“But now after years of stagnation, the TV people have finally woken up to the need for locally-based minority programs! made by amateurs! and perhaps of interest to only 2 or 3 people! It’s IMPORTANT, right?! It’s NOW!”
But I figure, the right folks will know what to do with this library.
There’s a detailed README on the Illuminate repository page, where you can find out what’s required to get this thing going. All feedback gratefully accepted.