Frequently Asked Questions (FAQ)#

Pipeline and Configuration version mismatch#

I can’t run the pipeline because it says my TOML file has a version mismatch?

In order to try and manage compatibility for the pipeline, your configuration file has a version key in it. This key must be compatible (within SemVer) with the installed version of vampires_dpp. There are two approaches to fixing this:

  1. (Recommended) Call dpp upgrade to try to automatically upgrade your configuration

  2. Downgrade vampires_dpp to match the version in your configuration

I’m getting warnings about centroid files, help!#

The blah blah explain it.

TODO

Performance#

It’s slow. It’s so, so slow. Help.

It’s hard to process data in the volumes that VAMPIRES produces, but there are some tips for speeding it up.

  1. Use an SSD (over USB 3 or thunderbolt)

Faster storage media reduces slowdowns from opening and closing files, which happens a lot throughout the pipeline

Important

If using a portable SSD or HDD, make sure to use a high-speed cable plugged into a high-speed port on your computer. Tools like dd, lsusb, cyme or CrystalDiskMark can be used to verify your connection and read/write speeds to the drive.

  1. Don’t save intermediate files

The time it takes to open a file, write to disk, and close it will add a lot to your overheads, in addition to the huge increase in data volume

  1. Use multi-processing

Using more processes should improve some parts of the pipeline, but don’t expect multiplicative increases in speed since most operations are limited by the storage IO speed.

Semaphore Warnings#

If you run the pipeline and you see errors like this:

UserWarning: resource_tracker: There appear to be 5 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

that is okay. This can happen during multiprocessing and will clear up after your computer restarts. The pipeline and the rest of your computer will run fine even if you see this.