Xamin Version 3.0

Yesterday we released a major update to our Xamin interface. We hope the new interface is more attractive and user friendly. It also includes a major upgrade to the underlying JavaScript library. Features in the new interface are described below. If you want to get back to the old style, that interface is still around. You can get back to it using the links provided, or you can start using the URL http://heasarc.gsfc.nasa.gov/xamin/xamin_old.jsp

Release notes for the new version:

  • The Application Console:
    The overall look for the Xamin Web interface is very different. It emulates a computer console. If you expand your browser to use the full screen the console emulation can be quite close. There are buttons along the bottom of the interface that allow you to display and hide each major pane active in your Xamin session. Buttons in the background can be used to activate the query, feedback, information and user panes.
  • Start Button:
    The ‘Start’ — named for its location more than its functionality — button (bottom left) allows users to get to standard windows, to change the background and to organize, hide, or delete all data windows.
  • Session configuration:
    The Session menu in the Query pane now has a submenu allowing you to set up a saved session configuration that will be automatically applied when you start an Xamin session. This uses a persistent cookie so that it does not require a user account. However since the amount of information that can be stored in the cookie is small, it is mostly useful for storing default menu configurations and the like. Adding a table may exceed the capacity of the cookie and the attempt to save the configuration will fail.
  • Output formats:
    In the Options menu in the Query Pane users now have many more output format options including text downloads and rendering text in other windows.
  • Theme option removed:
    There were three somewhat different themes allowed in the old version. These changed the color scheme (and sometimes scaling) of the display but had no other effect on functionality.
  • Options:
    The Options menu has been reorganized a bit to try to put things together more logically in the sub menus. E.g., the Output content submenu now controls the coordinate and time formats, whether data are rounded and which fields are shown by default.
  • Query start buttons:
    The buttons used to start queries have been simplified a bit. Instead of two buttons used to start index queries where we look for tables which have data matching your criteria, there is now a single button whose label text indicates if the query will be made for all HEASARC tables or look only within the set of tables you have specified.
  • Index and table query buttons switched:
    The relative positions of the index and table query buttons have been reversed. The table query buttons now come first.Similarly, there are now only three buttons to submit queries of tables. You use the same button to start a single table query or a correlation. You pick the button based upon whether you wish to see the results as a table, do a plot or save the results in your Xamin environment.
  • Bibcodes:
    Users can now explicitly enter bibcodes for discovery and table queries.
  • File uploads:
    You can immediately upload a file to use in queries in Xamin without having to first log in to a user account. The file can be used in a sequence of queries, but it will be uploaded each time. [We originally made this change in the development version, but moved it into the operational system early.]
  • Virtual Observatory Tables:
    Many more virtual observatory resources are available.
  • Keyword search results:
    The results for a keyword search of tables are now formatted where you can collapse and expand the results for HEASARC and VizieR to focus on the tables you are interested in. More information about VO table resources is now available through the information panel.
  • Keyword search results queries:
    You can now choose to add the list of HEASARC tables matching your keyword search into the selected tables. You can then do a search (typically an index query since you cannot do correlations on more than 4 tables) on those tables. Xamin V2 also had facilities to search these tables in an index query but did so using an inefficient query syntax. Results will now return much faster.
  • Selecting parameters:
    The parameter explorer has been substantially revised. Previously a red parameter bar was used to signal the end of the displayed parameters. Users could drag this bar to change what parameters were to be displayed, or move parameters around. It is still possible to move parameters (including groups of parameters) around, but there are now icons on each parameter to select/deselect the parameter. Each row also an icon for changing the boundary between the selected/deselected parameters
  • Printouts:
    The ability to print queries has been improved and expanded. Discovery queries (where a list of tables containing matching objects is retrieved) and plots can now also be printed. Any pane that can be printed has a printer tool on the right of the title bar.
  • Feedback:
    The same form is now used to both submit new feedback requests and display existing feedback.
  • No prompt on exit:
    Xamin no longer prompts for confirmation that you want to leave the service when you have more than one plot/table displayed.
  • Multiple plot control windows:
    Previously only a single plot control window was allowed. When a new plot request was made any existing plot control window was destroyed and replaced by the result from the new plot request. Since these are bulky windows, users may want to remove them as soon as they are finished with a particular plot. Plot control windows are cleaned up when users use the Clear Data option on the Start button.
  • Single box notice:
    When using the single box interfaces users will be notified immediately when the single box query initiates. Previously the system could appear inactive while Xamin was checking whether input parameters were target names.
  • Tutorial movies:
    The introductory tutorial in the Xamin Web Users Guide is now given as a set of movies rather than text.  We also have a Introduction to Xamin page.

Enhanced Bibcode Access In Xamin

The recently released version 2.1.6 of Xamin has some enhanced capabilities for handling bibcode queries. For some time the web interface has allowed users to ask whether any tables have rows that index observations which are referenced in a particular article using the syntax

http://heasarc.gsfc.nasa.gov/xamin/xamin.jsp?bibcode=xxxx

where xxxx is the 19 character bibcode. The new version gives a bit more feedback about which bibcode was used when such queries are made.

The zzbib table can be used to find tables and rows matching a particular bibcode in the batch interface with a syntax like

java -jar users.jar table=zzbib constraint=bibcode="'xxxx'"

E.g.,
java -jar users.jar table=zzbib constraint=bibcode="'2014Natur.505..378C'"
bibcode |table_name|id_fields|id
2014Natur.505..378C|rosmaster |seq_id |RP200984N00
2014Natur.505..378C|rosmaster |seq_id |RP201583N00
2014Natur.505..378C|rosmaster |seq_id |RP201584N00
2014Natur.505..378C|rosmaster |seq_id |RP201585N00
2014Natur.505..378C|rosmaster |seq_id |RP201586N00
2014Natur.505..378C|rosmaster |seq_id |RP201587N00

Number of rows: 6
Number of columns: 4

will find the rows in zzbib that reference a particular bibcode. The other fields in zzbib indicate the table, the column of the table used to id specific rows the table, and the value for that id. [Note that since the bibcode is a string parameter in the zzbib table, the bibcode must be enclosed in single quotes. The enclosing double quotes make sure that the single quotes are actually part of the parameter and not processed by the shell.]

In the new release of Xamin you can also add bibcode constraints to particular tables with the new bibcode parameter:

java -jar users.jar table=rosmaster bibcode=xxxx

We can repeat our example above but get the actual data for the ROSMASTER rows

java -jar users.jar table=rosmaster bibcode=2014Natur.505..378C
seq_id |instrument|exposure|ra |dec |name
RP201587N00|PSPCB | 7749|22 43 33.6|44 00 00|EV LAC
RP201586N00|PSPCB | 5928|22 43 07.2|44 10 12|EV LAC
RP201585N00|PSPCB | 4561|22 43 09.6|44 10 12|EV LAC
RP201584N00|PSPCB | 3854|22 43 07.2|44 12 36|EV LAC
RP201583N00|PSPCB | 3445|22 43 07.2|44 13 12|EV LAC
RP200984N00|PSPCB | 2673|22 46 50.4|44 19 48|GL 873

Number of rows: 6
Number of columns: 6


will find all the rows in rosmaster referred to in a particular bibcode. You can add additional constraints to the query if you like. In this context since we have special processing of the bibcode it need note be enclosed in quotes, but if the bibcode contains special characters like ampersand you may need to use quotes or some other mechanism to escape shell processing.

Xamin V2.2 Released

The major change to this version is that Xamin now generates preview image windows giving SkyView survey cutouts for queries that involve a single position. By default DSS and RASS cutouts are created, but this can be controlled in the Options/Query Control/Survey Img. option. The cutouts start out as 100×100 but can be resized and the image will be appropriately recomputed.

The internal release procedures have also been significantly updated but this will not affect users.

ConeSearch usage filtering

For the past several weeks we have been seeing extremely heavy usage (>~ 1 request per second) with several distinct users sending tens of thousands of cone search requests in rapid succession. Unfortunately, they generally seem to be sending a series of simultaneous requests until things back up, rather than doing one or two streams of requests waiting for one request to finish before starting another. Generally we’ve been able to respond to these requests, but the response time has been awful and the system load has made the interactive interface largely unusable.

To address this we’ve added in a filtering of input cone search requests. If we have more than two requests going on a given server for the same table, then the new request will immediately fail. Since we have two servers there may still be as many as four queries executing on the same table.

In the few hours since we’ve implemented this we’ve noted that the interactive interfaces are doing much better, though not as crisp as we might like. In the near future we will be moving the database to a much faster machine which should address these problems more definitively.

Heavy Usage

This morning we had a user making very heavy usage of Xamin sending multiple cone search requests per second. Requests piled up in the database and this slowed down the system considerably. Requests that normally took a second could take several minutes. Metadata queries in the interactive versions of Xamin would often time out so that the Web interface was not really working.
We have temporarily disabled access for this IP address.

We’ll be looking at how we can meter usage more effectively in the future and apologize to those adversely affected.

Xamin 1.7 released

A new information pane is now included in the main Xamin web interface. This pane shows information about groups of tables and the detailed table documentation. Previously this information had been displayed in the bottom of the main query pane. The new arrangement makes it much easier to browse table documentation.

Positional matching in Postgres

Positional Queries

One of the fundamental operations for astronomical databases is finding objects by position.  Two common scenarios are where we may want all of the objects near to a specified location or a positional cross-correlation between two tables.  There are many ways to do these and there are some special products available in Postgres for spatial indices. 

In this article we look at 5 different ways for doing positional searches in Postgres:

  1. Haversine: Using the Haversine formula with tables clustered on an index in declination.  A simple constraint on declination is included to make the index useful.
  2. Unit Vector: Using the dot product of unit vectors with tables clustered on an index on Dec.  A simple constraint on declination is included to make the index useful.
  3. PostGIS: Using ST_DWithin with tables where a PostGIS geography object has been added representing the RA/Dec of each row.  Tables are clustered on an index using the geography object.
  4. PGSphere: Using the includes operator (~) in PGSphere with tables where an spoint object has been added representing  the position of each row.  Tables are clustered on an index on the spoint object.
  5. Q3C: Using the q3c_join  function in Q3C with tables clustered on an index on the q3c_ang2ipix(ra,dec).  In this case no column was added to the table.

Tables Used

We created three tables optimized for each search scheme.  These include stripped down versions of the Swift observations table and XMM optical monitor object catalogs with 76,301 and 2,880,728 rows respectively.  Most of the columns other than positions have been removed, but a couple of id columns are retained.  A copy of one of the HEASARC’s master position table was also made. This contains the table name, position and default search radius for each row in every table in the HEASARC database that contains a position, excluding tables with a default search radius larger than 1 degree.  This table contains about 36,000,000 rows. Since some schemes could handle null positions and others could not, there was a slight variation here.

After each table was created, a single index was created, the table was clustered on the index and a vacuum analyze was performed.

The Queries

We ran queries on multiple positions on our master positions tables looking for how many matches we’d have in the HEASARC tables for each scheme.  The positions table has fields table_name, default_search_radius, ra and dec. The queries look something like:

   select table_name,count(*) from positions where
   ... position is within 1 degree of the specified position ...
   and
   ... position is within the default search radius of the specified position...
   group by table_name order by table_name
 

Since the default search radius is less than 1 degree, the first constraint does an initial limit to the table. Where I have looked, queries which have only the second constraint which uses the per row search radius cannot use the indexes efficiently.

For the Haversine and Unit vector approaches we also included a constraint like

 dec between dec0-1 and dec0+1 

where dec0 was the requested declination. This is the constraint that takes advantage of the index.

See the details below for the syntax used in each approach to do the positional constraint.

Note: Initially we used a positional grid with integer-valued grid points. This led to differences between the methods. Some of the data were at half-integer grid points and some table positions were exactly the default search radius away from the grid. Depending upon roundoff, which varied from method to method, the position might or might not be included in the query results. We added pi/20 to each grid coordinate to minimize this problem. Results of all five methods using this offset grid were identical.

A grid of 162 positions were queried. The times given below in the Tables Time are the times to do all 162 queries.

A correlation test did a cross-match of the Swift master table and the XMM Optical Monitor object table looking for pairs within 0.01 degrees. The times and resulting counts are given in the table.

All tests were run on Postgres 9.2.1 on in Linux using a Quad core machine with 2.66 GHz cores. However queries were run purely sequentially.

Results

Each test was run multiple times. In many cases the subsequent queries showed evidence of improvement due to caching of some elements of the query. In practice users often run the same query multiple times (e.g., asking for different formats) or very similar queries, so that the better cached performance is a real advantage.

We restarted the database before running the tests and we show times for the first query and then the average of three subsequent reruns. However we cycled through each of the 5 sets of tables first before rerunning any query.

Approach Ext. Lib. Index Tables Time X-Corr time X-Corr Count
Initial Subseq. Initial Subseq.
Haversine No dec 134 129 77.5 77.5 228,111
Unit Vector No dec 60.3 60.1 37.9 37.4 228,111
PostGis Yes Point(ra,dec) 33.8 3.8 39.6 5.7 228,111
PGSphere Yes spoint(ra,dec) 1.3 1.3 34.5 9.7 228,114
Q3C Yes q3c_ang2ipix(ra,dec) 3.7 3.6 7.9 2.1 228,111

The results are summarized in the figure. Note the logarithmic y-axis.

Just to give some perspective on these results I tried to run the same correlation in TOPCAT. An attempt to duplicate the crossmatch failed due to memory constraints.
A cross-match using similar criteria but between Swift and a 900,000 row table took about 10 seconds (not including time to load the tables) on the same hardware. It suggests TOPCAT’s performance might be comparable to the Unit vector approach. However we have not pursued this seriously.

Notes on the approaches

Haversine

We use the standard Haversine formula and add a simple constraint on declination which enables the index on dec to speed up the query.

SQL (for cross-correlation query):

select count(*) from swiftdec a, xmmdec b
    where
a.dec between b.dec - .01 and b.dec + .01 and
( sin(radians(b.dec-a.dec)/2)^2 + cos(radians(a.dec))*cos(radians(b.dec))*sin(radians(b.ra-a.ra)/2)^2
      <
  sin(radians(.01)/2)^2
)

Unit Vector

In this approach the vector of the position on the unit sphere is included which enables an exact constraint on distance to be done without any trigonometric functions used.  The value on the right of the > is simply cos(.01deg).  This requires no special library but adds three columns to the table.

SQL:

select count(*) from swiftunit a, xmmunit b
where 
   a.dec between b.dec -.01 and b.dec+.01 and
   a.__x*b.__x + a.__y*b.__y + a.__z*b.__z > 0.999999984769129

PostGIS

Since the PostGIS function return results by default in meters on the spheroid we set the last parameter of the st_dwithin to false to do calculations on a sphere. Apparently PostGIS uses an Earth radius of 6371.0087714 km.  We use that to convert .01 degrees to a distance in meters.  A user defined spherical projection to the unit sphere might be easy to define and would get rid of this mysterious constant from our calculations.

PostGIS version 2.0.1 was used. PostGIS is the heaviest weight of the three libraries to install. It has a lot of features and extensive documentation-though It is unclear how many of the features are useful to us. It requires several other libraries to be downloaded and installed locally first.

SQL:

select count(*) from swiftgeog a, xmmgeog b 
where
    st_dwithin(a.geog,b.geog,1111.950792, false)

PGSphere

In PGSphere the ~ operator indicates the the right hand operator includes the left hand operator. There are many possible ways to construct the constraint using PGSphere operators, but only some of them seem to use the index we had made on the pnt field.

The PGSphere cross-correlation gave a slightly different result than the other approaches.

It is unclear if PGSphere is being actively maintained. The last version (V1.1.1) is from early 2010. Some tweaking of the code was required to get it to run in Postgres 9.2. In particular the languages specified for various functions needed to be converted to lower case. PGSphere has the clearest documentation (though we found an error in the example of how to define indexes).

SQL:

select count(*) from swiftpgs a, xmmpgs b 
where 
    scircle(a.pnt, radians(.01))~b.pnt

Q3C

The tables include an index on q3c_ang2ipix(ra,dec). However examples given in the Q3C README file suggested creating the functional index without saving this as a value in the table, so we did not create a column with this value.

The latest version (1.4.13) of Q3C is from early 2012. No issues were found in compiling Q3C into Postgres, but it has limited documentation. The README file seems to be it.

SQL:

select count(*) from swiftq3c a, xmmq3c b 
where q3c_join(a.ra,a.dec,b.ra,b.dec,.01)

Sensitivity to Query Details

Some amount of tweaking of the queries was performed to ensure reasonable performance. It was easy to write queries in ways that did not take any advantage of the indices and required a full table scan. E.g., when doing our positional searches using Q3C, we got dramatic differences based upon whether specific a constraint like
where q3c_join(187.25, 2.05, ra,dec, 1)
versus
where q3c_join(ra,dec, 187.25, 2.05, 1)
I had to try a few different syntaxes in PGSphere before I found one that engaged the indexes and sped up the query.

This suggests that supporting free-form user inputs may be incompatible with efficient querying of the database.

For me, Postgres’ explain functionality was essential in finding efficient queries.

Conclusions

Adding unit vector columns to the table cuts the query time in two. At least in this scenario the simple trigonometric approach is CPU limited so that adding addition fields to the table which reduces the complexity of the offset calculation is worth it.

Much greater reductions are possible using specialized indices. Of the three possibilities studied, Q3C has generally good performance, running about twice as fast the the PostGIS geography type.

The PGSphere library is very fast in doing point queries and but a bit slower than both Q3C and PostGIS for cross-correlations. The slight disagreement in cross-corrleation counts is a bit disconcerting but it’s possible that it is due to the kind of rounding issues we discovered in the positional queries.

For correlations all of the methods using spherical indexing seem to have some startup/caching cost.  The second and subsequent iterations run about three times as fast as the first.  The implication on an operational system are unclear and presumably depend critically upon the query mix.

For the positional queries PostGIS still shows strong evidence of caching, but PGSphere and Q3C do not. Note that the results for the positional queries are for an aggregate 162 queries. The averaged times for individual queries ranged from about 10-300 milliseconds.

Although one should be cautious extrapolating from any small number of tests, it does appear that spatial indices substantially improve performance.  We see an improvement of somewhere between a factor of 2-15 in the time taken for queries.

Either Q3C or PostGIS seem like reasonable choices. Q3C gives the better performance and has a more transparent syntax for astronomers. However PostGIS has a much broader community and far greater certainty of continued support. PGSphere’s performance in positional queries is remarkably good but the lack of clear support and variance in results in the cross-correlation are worrying.

Batch Queries Moving to Xamin

On October 2, 2012 we will begin sending batch queries to our  Xamin interface.  Internal tests have shown Xamin about twice as fast as the current interface.

The switch will be accomplished by internally redirecting batch queries sent to the current URL to an Xamin URL.  Non-batch queries (used by our Browse interface) are not affected.  The redirection will be done internally within our site so that web clients will not see an HTTP 30x redirection response.

There are some small differences in the formats of data returned discussed in a previous post, but we hope that this transition should be largely invisible to users.  If you want to switch over early, you can query the URL

http://heasarc.gsfc.nasa.gov/xamin/BrowseBatch

rather than the

http://heasarc.gsfc.nasa.gov/cgi-bin/W3Browse/w3query.pl

URL currently invoked.  After next Tuesday batch requests to the w3query.pl URL will be transparently redirected to Xamin.  If you want to invoke the old behavior after next week you can switch to using

http://heasarc.gsfc.nasa.gov/cgi-bin/W3Browse/w3query_noredir.pl

for your batch requests.

Versions of our batch scripts that will point consistently to either Xamin or Browse are available at http://heasarc.gsfc.nasa.gov/FTP/software/web_batch.  We recommend that you use the wget versions of scripts if possible since the simple webquery.pl tool used as an alternative has no ability to negotiate firewalls or redirections.