compression – bit.summa

Speedy BED conversion tool: convert2bed

Finishing touches are in place for my convert2bed tool (GitHub site). This utility converts common genomics data formats (BAM, GFF, GTF, PSL, SAM, VCF, WIG) to lexicographically-sorted UCSC BED format. It offers two benefits over alternatives: It runs about 3-10x as fast as bedtools *ToBed equivalents It converts all input fields in as non-lossy a […]

Regression testing of SHA-1 signatures via command-line

I wrote a data extraction utility which uses PolarSSL to export a Base64-encoded SHA-1 digest of some internal metadata (a string of JSON-formatted data), to help validate archive integrity: $ unstarch –sha1-signature .foo 7HkOxDUBJd2rU/CQ/zigR84MPTc= So far, so good. But now I want to validate that the metadata are being digested correctly through some independent means, […]

A bash shell one-liner to strip the file extension

Here’s a one-liner that converts jarch files to starch format, stripping the input file’s extension so that it can be replaced with a new one: $ for i in `ls *.jarch`; do echo “${i%.*}.starch”; gchr $i | starch – > “${i%.*}.starch”; done

Playing with SHA-1 hashing in PolarSSL

PolarSSL is a C-based cryptography and SSL library which has a GPL license, which makes it ideal for use with BEDOPS, where I plan to use it for quick SHA-1 hashes of metadata, so as to help validate the integrity of the archive. I’ve been testing it out in Mac OS X 10.8 and it […]

Hex words are magic

I plan to use a magic number in the second major release of our BEDOPS suite to uniquely identify starch-formatted archive files. Looking at the first few bytes of the archive will help us because I plan to move the metadata to the back of the archive file, and it would be expensive to seek […]