Sorting, Selecting, and Processing VAMPIRES Data

VAMPIRES data is most efficiently organized through SQL-esque filtering and grouping based on the FITS headers for each file. The following is a guide for sorting and selecting VAMPIRES data for various scenarios. The code snippets will be written using Python’s pandas library as well as sqlite-compatible SQL.

Primer: Scraping Headers

To begin, we’ll talk about how to scrape VAMPIRES data headers and quickly summarize them in preparation for the tutorials below. All these examples will assume you are working on scexao6.

First off we’re going to make a user data folder to copy our data to for more efficient processing. Canonically you should be using /mnt/userdata/<username>, however if this disk is full you can also use /mnt/tier1/<username>_userdata. Change directories into this folder and use the following helper script to parse headers from any FITS file:

sc6 $ scxkw-header-table /mnt/sdata/<date>/ARCHIVED/vgen2/VMP*.fits.fz

by default this will output a CSV file to header_table.csv, but you can add a custom filename with the -o/--output flag:

sc6 $ scxkw-header-table -o 20251004_table.csv /mnt/sdata/20251004/ARCHIVED/vgen2/VMP*.fits.fz

Now, let’s load the table into a pandas DataFrame

import pandas as pd

table = pd.read_csv("header_table.csv")

If you prefer using sqlite, you can load the CSV table into memory and launch the interpreter with

sc6 $ sqlite3

.mode csv
.import header_table.csv headers

Tip: Multiple Nights

In order to scrape multiple nights’ headers, simply concatenate the headers

sc6 $ for d in ("20251004", "20251005", "20251006); do scxkw-header-table -o $d_headers.csv $d; done
sc6 $ cat 202510*_headers.csv > 202510_combined_headers.csv

Primer: Objects and Data Types

To summarize the data, we can quickly group it all by data type, object name, and camera, as well as printing the total number of files for each grouping.

In pandas

table.value_counts(["OBJECT", "DATA-TYP", "OBS-MOD", "U_CAMERA"])

OBJECT    DATA-TYP  OBS-MOD   U_CAMERA
HR8206    OBJECT    IMAG_MBI  1           536
HIP99770  STANDARD  IMAG_MBI  1            44
BD254655  STANDARD  IMAG_MBI  1            38
HR8206    DARK      IMAG_MBI  1            21
BD254655  OBJECT    IMAG_MBI  1            13

Filter for given objects

If you just want all the data for a given list of objects (and you weren’t doing PDI), you can filter with DATA-TYP and OBJECT keywords

In pandas

sub_table = table.query("`DATA-TYP` in ('OBJECT', 'STANDARD') and OBJECT in ('HR8206', 'HIP99770')")

Filter for calibration files

To get all the calibration files for the night, just sort by DATA-TYP

Using pandas

calib_table = table.query("`DATA-TYP` in ('DARK', 'SKYFLAT', 'FLAT', 'COMPARISON')")

Filter data for PDI

Synchronized and deinterleaved data needs a little more filtering to discard the frames which could not be synchronized correctly.

Using pandas

pdi_table = table.query("`DATA-TYP` == 'OBJECT' and OBJECT in ('ABAUR', 'HD34700') and U_SYNC and U_FLC != 'D'")

Prepare filelist

Once you’ve filtered the data you should save the list of file paths to a text file.

Tip: Combining Tables

To combine two tables, say the subtable for your objects and the calibration files, merge them with

comb_table = pd.concat((sub_table, calib_table))

Using pandas

paths = "\n".join(str(p) for p in sub_table["path"])
with open("filelist.txt", "w") as fh:
    fh.write(paths); fh.write("\n")

Decompressing and sorting data

Now that you have a list of the files you want to process, activate the dpp conda environment

sc6 $ conda activate dpp

and now use dpp sort to copy and decompress the data read in from the filelist

sc6 (dpp) $ dpp sort --decompress --copy $(< filelist.txt)

Data processing

The rest of the data processing is explained in the VAMPIRES DPP documentation