Tabbed layout for Xamin results added to Xamin version 5.0

Xamin version 5.0 introduces the tabbed layout for grid formatted query results. This new output format replaces the current multi-window grid output. All new queries will be added to one main panel and each can be displayed by clicking its associated tab (figure 1). Tabs are labeled with the query number and mousing over the tab will display the more descriptive table name. Grids can be separated from the panel with a drag gesture, typically by clicking on the associated tab and dragging it to the desktop (figure 2). Tabs cannot be dragged back to the main panel.

figure 1

figure 2

Query visualization introduced with Xamin version 5.0

An interactive visualization component using the AladinLite data visualizer developed at the Centre de Données astronomiques de Strasbourg (CDS) has been added to Xamin grid queries. This new feature allows users to select regions visually and to display the location of observations and targets against images of the HEASARC archive holding. User can also select an area within the Aladin widow to transfer coordinates to the Xamin window for a new query.

Positional grid-formatted queries will trigger the associated region and query results to be displayed in the Aliadin Lite window (figure 1). Results will be  shown as markers on the image.

Selecting a row in the results grid highlights the element in the Aladin window and vice-versa. (1, 2). Users can immediately determine the identity of interesting objects, or pick out objects with special non-spatial characteristics. The bar below the image window (3) shows the title of the associated grid. Clicking the Aladin icon (4) on the desktop will redisplay the Aladin window and bring it to the forefront. Clicking the Update icon (5) will replace the markers in the Aladin window with currently displayed query results. Clicking the Restore icon (6) will return the Aladin window to the initial state of the displayed query results.

Different background image data can be selected from a list of missions (figure 2).

A shift-click in the Aladin window (figure 3) will encircle the clicked position (1) and transfer the center coordinates and radius of the enclosed circle to the Xamin window (2) in quick preparation for a new query.

figure 1

AladinLite Window displayed in Xamin showing image base layer selections

figure 2

figure 3

The State of Xamin and Browse

You may have noticed that a few problems have plagued the Browse and Xamin services over the last week. We have resolved many issues but anticipate that a few may still be pop up. We appreciate your patience as we implement this government mandated transition from http to https. Feel free to contact us if you have problems via the Browse Feedback form or Bug reports and Feature requests page under the Xamin Help menu.

Xamin Batch Interface issues

A new version of the jar file used for the Xamin command line interface has been released. The older version contains a URL that is no longer valid and queries using this jar file will fail.

There are two ways to fix this:

  • Users can download a new version of Xamin’s users.jar file
  • Users can use the existing users.jar but need to specify the environment variable XAMIN_CLI_URL to point to https://heasarc.gsfc.nasa.gov/xamin/cli

If you have questions please let us know. We apologize for the inconvenience.

Xamin update August 2016

For this Xamin update we have reinstated the Product Explorer for Grid queries and added bibliography table options.

  • The Product Explorer allows users to see and select the kinds of products associated with query results displayed as a grid. Look for the Product Explorer button in the top toolbar of grid table results. Documentation can be found on our Xamin Web Guide
  • Xamin now accepts two new arguments at run time:

       bibbase=xxx

    can be used to point to a table you want to use as your base zzbib table. In principle this can be a user table (but you need to log in separately).

       bibext=xxx

    can be used to point to a table you want to use to supplement the zzbib table. I.e., if you want to use mybib as your base bibliography table use

    http://heasarc.gsfc.nasa.gov/xamin?bibbase=mybib

    This assumes (but does not check) that a table mybib exists.

The Xamin Command Line Interface (CLI) and Data Products

We recently reviewed how data products are handled in the Xamin Command Line Interface and feel that some information on how to download data products might be helpful. (For help getting set up to use the CLI interface see our documentation.)

Let’s start with an example. The following command will return table results from the suzamaster table for object cen a:

java -jar users.jar table=suzamaster position="cen a" fields=name,ra,dec

The output is:

name |ra |dec |exposure
Cen A |13 25 28.68|-43 01 03.7|64675.90000
CENTAURUS_A|13 25 21.86|-43 04 19.2|62432.80000
CENTAURUS_A|13 25 19.73|-43 04 07.7|51278.70000
CENTAURUS_A|13 25 18.74|-43 03 53.6|55944.20000
CENTAURUS A|13 25 27.60|-43 01 09.0| -99.00000
CENTAURUS A|13 25 27.60|-43 01 09.0| -99.00000
Number of rows: 6
Number of columns: 4
—- Messages —-
Info: Search Radius
Search radius: 10′ from suzamaster

Info: Name resolution
Position cen a corresponds to J2000 decimal coordinates:201.365063 -43.019112
—————–

If you want to list the data products for each row add the parameter “products” to the command:

java -jar users.jar table=suzamaster position=”cen a” fields=name,ra,dec products

The output for this command is:

name |ra |dec |exposure
Cen A |13 25 28.68|-43 01 03.7|64675.90000

# Product suzaku.obs: Observation Directory
> /FTP/suzaku/data/obs/1/100005010/

# Product link.suzaxislog: Suzaku XIS Configuration Log
> query?table=suzaxislog&constraint=obsid=100005010

CENTAURUS_A|13 25 21.86|-43 04 19.2|62432.80000

# Product suzaku.obs: Observation Directory
> /FTP/suzaku/data/obs/7/704018010/

# Product link.suzaxislog: Suzaku XIS Configuration Log
> query?table=suzaxislog&constraint=obsid=704018010

CENTAURUS_A|13 25 19.73|-43 04 07.7|51278.70000

# Product suzaku.obs: Observation Directory
> /FTP/suzaku/data/obs/7/704018020/

# Product link.suzaxislog: Suzaku XIS Configuration Log
> query?table=suzaxislog&constraint=obsid=704018020

CENTAURUS_A|13 25 18.74|-43 03 53.6|55944.20000

# Product suzaku.obs: Observation Directory
> /FTP/suzaku/data/obs/7/704018030/

# Product link.suzaxislog: Suzaku XIS Configuration Log
> query?table=suzaxislog&constraint=obsid=704018030

CENTAURUS A|13 25 27.60|-43 01 09.0| -99.00000
CENTAURUS A|13 25 27.60|-43 01 09.0| -99.00000

Number of rows: 6
Number of columns: 8

—- Messages —-
Info: Search Radius
Search radius: 10′ from suzamaster

Info: Name resolution
Position cen a corresponds to J2000 decimal coordinates:201.365063 -43.019112
—————–

As you can see, if data products are available then product names and partial URLs strings are listed below each row of the table results. These URL strings can used with a command line data transfer tool (eg. wget, curl) to download data from the HEASARC server.

In this example there are two types of URLs – one type is used to download data products (eg., > /FTP/suzaku/data/obs/7/704018030/) and the other type (eg., > query?table=suzaxislog&constraint=obsid=100005010) is used to display data from related tables.

For data product downloads an example of a wget command for Unix type machines would be:

wget -q -nH -r -l0 -c -N -np -R ‘index*’ -erobots=off –-retr-symlinks –cut-dirs=1 http://heasarc.gsfc.nasa.gov/FTP/suzaku/data/obs/1//100005010/

If you are interested in what each of the wget options does take a look at this recent blog post.

If you would like to download data products at the same time that you run the java command the following Unix shell script example should work:

#!/bin/sh
java -jar users.jar table=swiftxrlog position=ngc4626 products | \
grep '>' | \
sort | \
uniq | \
sed -e "s/ > \/FTP/wget -q -nH --cut-dirs=6 -r -l0 -c -N -np -R 'index*' -erobots=off --retr-symlinks http\:\/\/heasarc.gsfc.nasa.gov\/FTP/" | \
sh

The URL string for Link Products (e.g., link.suzaxislog) can be used to view Xamin catalog results for related tables. An example of this type of command is:

wget -O suzaxislog_results “http://heasarc.gsfc.nasa.gov/xamin/query?table=ascamaster&radius=60&position=22+02+43.2,42+16+48″

The output is a list of rows from the ascamaster table for object cen a.

We will be creating a post soon with download instructions for users who may not have access to all the commands (eg, sed) used in our example shell script.

Xamin Data Products Download Commands

In addition to the option to download data products as a tar file Xamin will display a list of separate download commands. Users can then transfer individual data files from the command line or create a download script.

These download commands are listed under the Wget Commands Tab of the Products Cart as seen in this screen grab:

wgetcommands

wget is a very useful command with many, many options and it took some time to determine which options were needed to traverse the complicated HEASARC data products directory structure to find and transfer the products.

The following is one of the wget commands listed in the Products Cart.

wget -q -nH -r -l0 -c -N -np -R 'index*' -erobots=off --retr-symlinks --cut-dirs=7 http://heasarc.gsfc.nasa.gov/FTP/chandra/data/science/ao01/cat5//1828/primary/acisf01828_001N004_fov1.fits.gz

We thought some of our users might be interested in knowing exactly what these options do. We used GNU Wget online manual for most of our research.

-q: quiet mode, no output is displayed

-nH: directories created on the user’s system will not be named using the HEASARC host name

-r -10: directories will be searched recursively up to 10 levels down

-c: files only partially downloaded from a previous execution will continue to to downloaded

-N: a file will not be downloaded if a local file of the same name exists and is newer

-np: the search will not ascend up to the parent directory during recursion

-R ‘index*’: unnecessary files will not be downloaded

-erobots=off: do not observe directives from the robots.text file that might prevent downloads

–retr-symlinks: symbolic links are traversed and the pointed-to files are retrieved

–cut-dirs=N: skip N number of the components in the URL directory structure when the directory structure is created on user’s system

ConeSearch usage filtering

For the past several weeks we have been seeing extremely heavy usage (>~ 1 request per second) with several distinct users sending tens of thousands of cone search requests in rapid succession. Unfortunately, they generally seem to be sending a series of simultaneous requests until things back up, rather than doing one or two streams of requests waiting for one request to finish before starting another. Generally we’ve been able to respond to these requests, but the response time has been awful and the system load has made the interactive interface largely unusable.

To address this we’ve added in a filtering of input cone search requests. If we have more than two requests going on a given server for the same table, then the new request will immediately fail. Since we have two servers there may still be as many as four queries executing on the same table.

In the few hours since we’ve implemented this we’ve noted that the interactive interfaces are doing much better, though not as crisp as we might like. In the near future we will be moving the database to a much faster machine which should address these problems more definitively.

Positional matching in Postgres

Positional Queries

One of the fundamental operations for astronomical databases is finding objects by position.  Two common scenarios are where we may want all of the objects near to a specified location or a positional cross-correlation between two tables.  There are many ways to do these and there are some special products available in Postgres for spatial indices. 

In this article we look at 5 different ways for doing positional searches in Postgres:

  1. Haversine: Using the Haversine formula with tables clustered on an index in declination.  A simple constraint on declination is included to make the index useful.
  2. Unit Vector: Using the dot product of unit vectors with tables clustered on an index on Dec.  A simple constraint on declination is included to make the index useful.
  3. PostGIS: Using ST_DWithin with tables where a PostGIS geography object has been added representing the RA/Dec of each row.  Tables are clustered on an index using the geography object.
  4. PGSphere: Using the includes operator (~) in PGSphere with tables where an spoint object has been added representing  the position of each row.  Tables are clustered on an index on the spoint object.
  5. Q3C: Using the q3c_join  function in Q3C with tables clustered on an index on the q3c_ang2ipix(ra,dec).  In this case no column was added to the table.

Tables Used

We created three tables optimized for each search scheme.  These include stripped down versions of the Swift observations table and XMM optical monitor object catalogs with 76,301 and 2,880,728 rows respectively.  Most of the columns other than positions have been removed, but a couple of id columns are retained.  A copy of one of the HEASARC’s master position table was also made. This contains the table name, position and default search radius for each row in every table in the HEASARC database that contains a position, excluding tables with a default search radius larger than 1 degree.  This table contains about 36,000,000 rows. Since some schemes could handle null positions and others could not, there was a slight variation here.

After each table was created, a single index was created, the table was clustered on the index and a vacuum analyze was performed.

The Queries

We ran queries on multiple positions on our master positions tables looking for how many matches we’d have in the HEASARC tables for each scheme.  The positions table has fields table_name, default_search_radius, ra and dec. The queries look something like:

   select table_name,count(*) from positions where
   ... position is within 1 degree of the specified position ...
   and
   ... position is within the default search radius of the specified position...
   group by table_name order by table_name
 

Since the default search radius is less than 1 degree, the first constraint does an initial limit to the table. Where I have looked, queries which have only the second constraint which uses the per row search radius cannot use the indexes efficiently.

For the Haversine and Unit vector approaches we also included a constraint like

 dec between dec0-1 and dec0+1 

where dec0 was the requested declination. This is the constraint that takes advantage of the index.

See the details below for the syntax used in each approach to do the positional constraint.

Note: Initially we used a positional grid with integer-valued grid points. This led to differences between the methods. Some of the data were at half-integer grid points and some table positions were exactly the default search radius away from the grid. Depending upon roundoff, which varied from method to method, the position might or might not be included in the query results. We added pi/20 to each grid coordinate to minimize this problem. Results of all five methods using this offset grid were identical.

A grid of 162 positions were queried. The times given below in the Tables Time are the times to do all 162 queries.

A correlation test did a cross-match of the Swift master table and the XMM Optical Monitor object table looking for pairs within 0.01 degrees. The times and resulting counts are given in the table.

All tests were run on Postgres 9.2.1 on in Linux using a Quad core machine with 2.66 GHz cores. However queries were run purely sequentially.

Results

Each test was run multiple times. In many cases the subsequent queries showed evidence of improvement due to caching of some elements of the query. In practice users often run the same query multiple times (e.g., asking for different formats) or very similar queries, so that the better cached performance is a real advantage.

We restarted the database before running the tests and we show times for the first query and then the average of three subsequent reruns. However we cycled through each of the 5 sets of tables first before rerunning any query.

Approach Ext. Lib. Index Tables Time X-Corr time X-Corr Count
Initial Subseq. Initial Subseq.
Haversine No dec 134 129 77.5 77.5 228,111
Unit Vector No dec 60.3 60.1 37.9 37.4 228,111
PostGis Yes Point(ra,dec) 33.8 3.8 39.6 5.7 228,111
PGSphere Yes spoint(ra,dec) 1.3 1.3 34.5 9.7 228,114
Q3C Yes q3c_ang2ipix(ra,dec) 3.7 3.6 7.9 2.1 228,111

The results are summarized in the figure. Note the logarithmic y-axis.

Just to give some perspective on these results I tried to run the same correlation in TOPCAT. An attempt to duplicate the crossmatch failed due to memory constraints.
A cross-match using similar criteria but between Swift and a 900,000 row table took about 10 seconds (not including time to load the tables) on the same hardware. It suggests TOPCAT’s performance might be comparable to the Unit vector approach. However we have not pursued this seriously.

Notes on the approaches

Haversine

We use the standard Haversine formula and add a simple constraint on declination which enables the index on dec to speed up the query.

SQL (for cross-correlation query):

select count(*) from swiftdec a, xmmdec b
    where
a.dec between b.dec - .01 and b.dec + .01 and
( sin(radians(b.dec-a.dec)/2)^2 + cos(radians(a.dec))*cos(radians(b.dec))*sin(radians(b.ra-a.ra)/2)^2
      <
  sin(radians(.01)/2)^2
)

Unit Vector

In this approach the vector of the position on the unit sphere is included which enables an exact constraint on distance to be done without any trigonometric functions used.  The value on the right of the > is simply cos(.01deg).  This requires no special library but adds three columns to the table.

SQL:

select count(*) from swiftunit a, xmmunit b
where 
   a.dec between b.dec -.01 and b.dec+.01 and
   a.__x*b.__x + a.__y*b.__y + a.__z*b.__z > 0.999999984769129

PostGIS

Since the PostGIS function return results by default in meters on the spheroid we set the last parameter of the st_dwithin to false to do calculations on a sphere. Apparently PostGIS uses an Earth radius of 6371.0087714 km.  We use that to convert .01 degrees to a distance in meters.  A user defined spherical projection to the unit sphere might be easy to define and would get rid of this mysterious constant from our calculations.

PostGIS version 2.0.1 was used. PostGIS is the heaviest weight of the three libraries to install. It has a lot of features and extensive documentation-though It is unclear how many of the features are useful to us. It requires several other libraries to be downloaded and installed locally first.

SQL:

select count(*) from swiftgeog a, xmmgeog b 
where
    st_dwithin(a.geog,b.geog,1111.950792, false)

PGSphere

In PGSphere the ~ operator indicates the the right hand operator includes the left hand operator. There are many possible ways to construct the constraint using PGSphere operators, but only some of them seem to use the index we had made on the pnt field.

The PGSphere cross-correlation gave a slightly different result than the other approaches.

It is unclear if PGSphere is being actively maintained. The last version (V1.1.1) is from early 2010. Some tweaking of the code was required to get it to run in Postgres 9.2. In particular the languages specified for various functions needed to be converted to lower case. PGSphere has the clearest documentation (though we found an error in the example of how to define indexes).

SQL:

select count(*) from swiftpgs a, xmmpgs b 
where 
    scircle(a.pnt, radians(.01))~b.pnt

Q3C

The tables include an index on q3c_ang2ipix(ra,dec). However examples given in the Q3C README file suggested creating the functional index without saving this as a value in the table, so we did not create a column with this value.

The latest version (1.4.13) of Q3C is from early 2012. No issues were found in compiling Q3C into Postgres, but it has limited documentation. The README file seems to be it.

SQL:

select count(*) from swiftq3c a, xmmq3c b 
where q3c_join(a.ra,a.dec,b.ra,b.dec,.01)

Sensitivity to Query Details

Some amount of tweaking of the queries was performed to ensure reasonable performance. It was easy to write queries in ways that did not take any advantage of the indices and required a full table scan. E.g., when doing our positional searches using Q3C, we got dramatic differences based upon whether specific a constraint like
where q3c_join(187.25, 2.05, ra,dec, 1)
versus
where q3c_join(ra,dec, 187.25, 2.05, 1)
I had to try a few different syntaxes in PGSphere before I found one that engaged the indexes and sped up the query.

This suggests that supporting free-form user inputs may be incompatible with efficient querying of the database.

For me, Postgres’ explain functionality was essential in finding efficient queries.

Conclusions

Adding unit vector columns to the table cuts the query time in two. At least in this scenario the simple trigonometric approach is CPU limited so that adding addition fields to the table which reduces the complexity of the offset calculation is worth it.

Much greater reductions are possible using specialized indices. Of the three possibilities studied, Q3C has generally good performance, running about twice as fast the the PostGIS geography type.

The PGSphere library is very fast in doing point queries and but a bit slower than both Q3C and PostGIS for cross-correlations. The slight disagreement in cross-corrleation counts is a bit disconcerting but it’s possible that it is due to the kind of rounding issues we discovered in the positional queries.

For correlations all of the methods using spherical indexing seem to have some startup/caching cost.  The second and subsequent iterations run about three times as fast as the first.  The implication on an operational system are unclear and presumably depend critically upon the query mix.

For the positional queries PostGIS still shows strong evidence of caching, but PGSphere and Q3C do not. Note that the results for the positional queries are for an aggregate 162 queries. The averaged times for individual queries ranged from about 10-300 milliseconds.

Although one should be cautious extrapolating from any small number of tests, it does appear that spatial indices substantially improve performance.  We see an improvement of somewhere between a factor of 2-15 in the time taken for queries.

Either Q3C or PostGIS seem like reasonable choices. Q3C gives the better performance and has a more transparent syntax for astronomers. However PostGIS has a much broader community and far greater certainty of continued support. PGSphere’s performance in positional queries is remarkably good but the lack of clear support and variance in results in the cross-correlation are worrying.