Data format

Overview

The data format used in all scripts is based on the Astropy Table format. In the Build the table section, we show how these Astropy Tables are created from the DM butler. This work is automatically done for the deepCoadd_meas, deepCoadd_forced_src, and forced_src catalogs, when avalaible, and then saved together in one single hdf5 file. The procedure to write and read these files and work with the loaded tables are described in the Working with the table section.

Build the table

The main table is built from the LSST DM butler as shown in the diagram [1] below:

Data table construction

For each filter f, an Astropy Table is created for all available patches p1, p2, etc. Since we have the same number of patches for all filters, which contain the exact same number of sources, all table (1,p), (2,p), etc., created from a patch will be of the same size for all filter. The Astropy Tables created from all individual filters/patches set will then be vertically stacked. This means that if we respectively have N1, N2 and N3 sources for the patches 1, 2 and 3, the output table will contains N1 + N2 + N3 sources. After the first stack, we end up with one table per filter, all containing the same number of sources. These per-filter tables are finally stacked together to form the final table shown on the right-hand side of the diagram.


[1]Diagram created using https://www.jgraph.com/. Use the https://www.draw.io/?demo=1 application and the last xml file from this repository to update the diagram if needed.

WCS

The WCS computed during the data processing is also stored in the hdf5 file in the wcs path. If you load the data using the read_hdf5 function, the output will be a dictionary containing a wcs key, which refers to an astropy.wcs.WCS object. The skycoord_to_pixel and pixel_to_skycoord functions take this wcs object to convert the input coordinates into an output format (sky <-> pixel).

All the tables (correspondig to the catalogs listed previously) already contain three coordinates columns:

  • coord_ra and coord_dec: they are the (ra, dec) coordinates in radian;
  • coord_ra_deg and coord_dec: they are the (ra, dec) coordinates in degree;
  • x_Src and y_Src: they are the (x, y) position in pixel.

Working with the table

Note

The corresponding Jupyter notebook can be found here. You can also reproduce these results in ipython.

Astropy tables are great to work with, and can be used for all kind of analysis in the context of our cluster study. You can apply filters, group by column, concatenate them, etc. For a detailed review on Astropy tables, see there.

Load the table

The Astropy tables created by the clusters_data step are saved in an hdf5 file, and contains two main tables, corresponding to two output catalogs of the data processing using the DM stack. As an example, we will use here the deepCoadd_forced_src catalog, corresponding to the forced photometry processing (some details).

If you want to start an analysis with an existing hdf5 file containing catalogs, you can use the one we have created for MACSJ2243.3-0935, which is saved at CC-IN2P3 under:

/sps/lsst/data/clusters/MACSJ2243.3-0935/analysis/output_v1/MACSJ2243.3-0935_data.hdf5

To load the deepCoadd_forced_src catalog, do:

from Clusters import data
f = "/sps/lsst/data/clusters/MACSJ2243.3-0935/analysis/output_v1/MACSJ2243.3-0935_data.hdf5"
d = data.read_hdf5(f)
fc = d['deepCoadd_forced_src']

d is a dictionnary containing the ‘deepCoadd_forced_src’, the ‘deepCoadd_meas’ catalogs and the ‘wcs’ object.

print d.keys()
['deepCoadd_forced_src', 'wcs', 'deepCoadd_meas']

and fc is an astropy table

print fc
base_CircularApertureFlux_3_0_flag_sincCoeffsTruncated ... coord_dec_deg
                                                       ...      deg
------------------------------------------------------ ... --------------
                                                 False ... -9.50417299504
                                                  True ... -9.50631091083
                                                  True ... -9.50631273401
                                                  True ... -9.50632589495
                                                  True ...  -9.5063327395
                                                 False ...  -9.5062460577
                                                  True ... -9.50629874096
                                                  True ... -9.50635437897
                                                 False ... -9.50600120865
                                                 False ... -9.50549567214
                                                   ... ...            ...
                                                 False ... -9.73333093082
                                                 False ... -9.73388006895
                                                 False ...  -9.7302761071
                                                 False ... -9.73010079525
                                                 False ... -9.72701283749
                                                 False ...  -9.7273114286
                                                 False ... -9.91085559972
                                                 False ... -9.91084514606
                                                 False ...  -9.8851539436
                                                 False ... -9.88578472829
Length = 1050500 rows

As you can see, there are

N = len(fc)
print N, "rows"
1050500 rows

in this table. This number correspond to the number of sources (ns) times the number of filters (nf): N = ns x nf. In this table, we have the following number of filter:

filters = set(fc['filter'])
nf = len(filters)
print nf, "filters:", filters
5 filters: set(['i', 'r', 'u', 'z', 'g'])

The number of sources in this catalog if thus:

ns = N / nf
print ns, "sources"
210100 sources

The number of columns corresponding to the number of keys available in the catalog is:

print "%i columns" % len(fc.keys())
for k in sorted(fc.keys())[:10]:
    print k
207 columns
base_CircularApertureFlux_12_0_flag
base_CircularApertureFlux_12_0_flag_apertureTruncated
base_CircularApertureFlux_12_0_flux
base_CircularApertureFlux_12_0_fluxSigma
base_CircularApertureFlux_12_0_mag
base_CircularApertureFlux_12_0_magSigma
base_CircularApertureFlux_17_0_flag
base_CircularApertureFlux_17_0_flag_apertureTruncated
base_CircularApertureFlux_17_0_flux
base_CircularApertureFlux_17_0_fluxSigma

Apply filters

You can filter this table to, for example, only keep the i and r magnitude of the modelfit_CModel_mag for all sources:

magi = fc['modelfit_CModel_mag'][fc['filter'] == 'i']
magr = fc['modelfit_CModel_mag'][fc['filter'] == 'r']

and plot them against each other

# ignore the following line
%matplotlib inline
import pylab
pylab.scatter(magi, magr)
pylab.xlabel('i mag')
pylab.ylabel('r mag')
pylab.title('%i sources (galaxies+stars)' % len(magi))
<matplotlib.text.Text at 0x7f55453994d0>
_images/data_tuto_18_1.png

A few standard filters have been implemented in data and can be used directly to get a clean sample of galaxies:

# ignore the following line
import warnings; warnings.filterwarnings("ignore")
data_filtered = data.filter_table(d)
fc_filtered = data_filtered['deepCoadd_forced_src']

The same plot as in the above example now looks like

magi_filtered = fc_filtered['modelfit_CModel_mag'][fc_filtered['filter'] == 'i']
magr_filtered = fc_filtered['modelfit_CModel_mag'][fc_filtered['filter'] == 'r']
pylab.scatter(magi_filtered, magr_filtered)
pylab.xlabel('i mag')
pylab.ylabel('r mag')
pylab.title('%i sources (clean sample of galaxies)' % len(magi_filtered))
<matplotlib.text.Text at 0x7f55451f92d0>
_images/data_tuto_23_1.png

See the code for a few other examples on how to use filters.

Add a new column

You can also add a new column to the table (examples here)

from astropy.table import Column

Create a simple shifted magnitude array

shifted_mags = fc_filtered['modelfit_CModel_mag'] + 2

Add it to the initial table and plot it against the initial magnitude (for the i filter here)

fc_filtered.add_column(Column(name='shifted_mag', data=shifted_mags,
                              description='Shifted magnitude', unit='mag'))
magi_filtered = fc_filtered['modelfit_CModel_mag'][fc_filtered['filter'] == 'i']
magi_shifted =  fc_filtered['shifted_mag'][fc_filtered['filter'] == 'i']
pylab.scatter(magi_filtered, magi_filtered)
pylab.scatter(magi_filtered, magi_shifted, c='r')
pylab.xlabel('i mag')
pylab.ylabel('shifted i mag')
pylab.title('%i sources (clean sample of galaxies)' % len(magi_filtered))
<matplotlib.text.Text at 0x7f55449552d0>
_images/data_tuto_30_1.png

You can also add several columns using fc.add_columns([Columns(...), Columns(...), etc]).

Filter around the cluster center

If you only want to work on a sample of galaxies center around the cluster at a certain radius, do:

confile = '/sps/lsst/data/clusters/MACSJ2243.3-0935/analysis/output_v1/MACSJ2243.3-0935.yaml'
config = data.load_config(confile)
output = data.filter_around(fc_filtered, config, exclude_outer=20, exclude_inner=3, unit='arcmin', plot=True)
_images/data_tuto_32_0.png

The output of filter_around is a filtered data table. You can also choose a different unit:

output = data.filter_around(fc_filtered, config, exclude_outer=0.3, exclude_inner=0.1, unit='degree', plot=True)
_images/data_tuto_34_0.png