assembleCoadd throws zip() argument 2 is shorter than argument 1 in 5% of DC2 testing
Description
Confluence content
Issue Matrix
hideActivity
Alright, this is merged now. I think there’s still the issue about different number of datasets when running the bare pipetask
and when using the cm-tools that I don’t quite understand. Leaving that for another ticket.
My problem with not getting it tested in the bps environment was due to a syntax error – missing semicolon at the end of the custom_lsst_setup: setup -j -r /special/branch/to/test/drp_tasks;
Thanks to @mgower for tracking this down. I can confirm if I run with this branch’s changes that there are no more zip() errors.
Thanks for the confirmation again. There’s still some underlying issue related to the number of datasets being different but that’s outside the scope of this ticket.
Jenkins is green: https://rubin-ci.slac.stanford.edu/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/1929/pipeline
Arun: Your fixes work, and you should merge them in to main now. I am having trouble getting ‘custom_lsst_setup’ in bps submit to run with your changes, but I am able to run a pipetask ‘by itself’ (as in your tests) and get all the way thru step3, including assembleCoadd, without error. I will take up the bps submit issue separately (or maybe I am configuring the custom_lsst_setup incorrectly). A run that worked is at, for instance, /sdf/home/y/yanny/rubin-user/testit/testitfull with log in testitfull.logarun (group1 of step3 rescue) and testitfullg0 with log in testitfullg0.logarun for group0 of step3 rescue. it is still running but successfully made it past the assembleCoadd pipetask.
With the changes in this ticket, it cannot possibly fail with the zip
error (at least not at the same place), since that’s removed in favor of an alternative approach.
The assembleCoadd pipetask which occurs during step3 processing, is throwing an error:
assembleCoadd:{band: 'i', skymap: 'DC2', tract: 3828, patch: 40})(singleQuantumExecutor.py:298) - Execution of task 'assembleCoadd' on quantum {band: 'i', skymap: 'DC2', tract: 3828, patch: 40} failed. Exception ValueError: zip() argument 2 is shorter than argument 1
This occurred in about 3% (7/297) to 5% (14/294) of patches for DC2 w_2024_28 processing. For instance (tract,patch) = (3828,40), (3829,46) , (3828,11)
It is repeatable.
These collections can be used for input (partial how to reproduce):
These failures are enough to block step3 from completing consolidateObjectTable in step3 so we will pause w_2024_28 testing.
This may be related to https://rubinobs.atlassian.net/browse/DM-45028 where another assembleCoadd error was addressed.