SkyView, Bzip2 and the SDSS

A couple of years ago the Sloan Digital Sky Survey underwent a major reorganization and completely revamped its data. One of the changes that was made in the transition from SDSS DR7 to SDSS DR8 was that new releases were compressed in BZIP2 format which provided a better overall compression ratio.

Unfortunately, the BZIP2 format is not supported by Java native IO libraries in the same way that the GZIP format is. So we needed to make some changes to SkyView to accommodate the new compression. Our first idea was to incorporate a new BZIP2 library into the SkyView JAR. There is a JAR file available through the Apache Commons (there’s a JAR file in the binary downloads) and we tried using that. It worked, but it didn’t take us long to notice that it was quite slow. Much slower than we were used to with GZIP files. Also, the Apache library although very liberally licensed is under a license unlike the other public domain libraries that we have incorporated into the jar (notably the ImageJ library we use to generate quick look images).

When we investigated using the bunzip2 command that we found on our system we noted that this seemed to be much faster — fast enough the decompression no longer dominated the time required to generate images. However it was unclear how commonly users would have access to bunzip2.

Given this situation we decided to provide a dual path to enable users to use bzip2 files. If there is a bunzip2 executable available, then the user can define a logical name BZIP_DECOMPRESSOR to point to that executable. SkyView will spawn off a process to run the decompression using that command. If you don’t tell it, SkyView does not look for the bunzip2 command in any specific location — we certainly don’t know where it would be found on a Windows or Mac though on Unix machines it often seems to be in /usr/bin.

If there is no executable available, then SkyView will look for the Apache commons classes. However to minimize the licensing issues we did not include these in the JAR. It’s up to the user of the JAR to add them into the class path. Normally users will execute SkyView as an executable JAR. The executable JAR defines the classpath such that it’s hard to override by a user. Suppose you have the Apache Commons JAR as bzip2.jar, then SkyView needs to be run as
java -cp skyview.jar:bzip2.jar skyview.executive.Imager [regular arguments] …
to get the new compression capabilities into the path. You can’t just do
java -cp bzip2.jar skyview.jar [regular arguments]

So you can use bzip2 files within SkyView though it requires a little work and we might have found better ways to do this.

However we do need to apologize to our users for never documenting these changes in the user’s guide or in the blog till now. It doesn’t do much good to add a capability if no one knows how to use it!

If you have any questions about BZIP2 files please let us know.

This entry was posted in Discussion and tagged , , . Bookmark the permalink.

2 Responses to SkyView, Bzip2 and the SDSS

  1. rae says:

    Sorry, I don’t understand what you mean when you say “then the user can define a logical name BZIP_DECOMPRESSOR to point to that executable”
    This is how I did it: “$ java -jar skyview.jar survey=SDSSg sampler=null deedger=null Compress=T position=236.24709547,-0.4752636 BZIP_DECOMPRESSOR=/user/bin/bunzip2”
    But I still have retrieval error.
    I don’t understand how to do it correctly. Can you give an example? Thanks.

  2. Laura says:

    Hello,
    Try setting the decompressor executable location in the shell environment before running the jar file. On the shell command line on my Mac I would run:
    setenv BZIP_DECOMPRESSOR /user/bin/bunzip2

    and then
    java -jar skyview.jar survey=SDSSg sampler=null deedger=null Compress=T position=236.24709547,-0.4752636

    Let me know if this does not work for you.

    -Laura

Leave a Reply

Your email address will not be published. Required fields are marked *