Investigate ApPipe TypeError in diaPipe
Description
is triggering
relates to
Issue Matrix
hideActivity
Sorry, I just realized that you did not approve dax_apdb PR after merging it (I just saw a green button, my bad). Let me know if you want me to revert it.
Thanks for review and comments. I agree that we need a better fix, and I made a new ticket for it. I don’t think that these changes make the situation worse, they at least should fix the weekly so that you can use it to make new APDB instances in postgres.
I’m vetoing (even though vetoes aren’t allowed by our review process) the ap_verify
changes; running against shared databases is exactly what ap_verify
shouldn’t be able to do, because it’s supposed to be a sandboxed system.
The dax_apdb
changes seem to me to be making the code more broken, not less – we’d still have to fix the actual timestamp handling in ap_association
, and then undo the compensatory changes in dax_apdb
at the same time. All that would change is that it would be harder to tell that the code is wrong.
I’ve posted standard comments on the dax_apdb
PR, but it seems to me that changing the four relevant lines in ap_association
would have been much simpler, and would actually solve the problem (no change is needed when using time_processed
, because everything’s in terms of astropy.Time
anyway).
@Krzysztof Findeisen, thanks for agreeing to look at my fix. I have checked that ap_verify now works (with Postgres) after this fix, and it did fail before. And I have added a `--namespace` option to ap_verify to be able to run it against shared postgres instance.
I have decided to make the simplest fix that restores previous behavior - Apdb again returns naive timestamps to clients. A better alternative is to make all clients to use aware timestamps consistently, but this is a larger project for the future.
I do not think this is issue is related to schema versioning. In my view, the interface that we currently define for Apdb leaves a lot of freedom w.r.t. types or even shape of the DataFrames returned from their methods. As an example, sqlite backend currently returns time_processed
as strings, and it works OK so far in a sense that no one actually tried to use it as datetime
instances (and pandas seems to be OK converting strings to datetime64
). If we want to make our interface more type-safe, we can require that timestamps should be returned as datetime64
(naive or aware). But that again does not mean that schema version has to change, versioning is mostly for the things at database level. And we can also replace pandas with a better format, but this again does not imply schema version changes.
One way to solve current issue is to switch back to returning naive timestamps from get
methods (and assume that everyone treats them as UTC). I do not like naive datetimes, but it’s up to you to decide what you prefer. And even if we agree on naive timestamps, I’d still strongly prefer to save them with correct timezone in database.
While running HSC on
w_2024_31
for https://rubinobs.atlassian.net/browse/DM-45634, I hit this error on several thousand quanta:Sample Error File:
/sdf/home/e/elhoward/u/repo-main-logs/DM-45634/submit/u/elhoward/DM-45634/w_2024_31/HSC/20240807T022734Z/jobs/diffim/19658/diffim_19658_68.234179.err
There were also several
ip_diffim_DipoleFit
warnings that preceded this.I will attempt to find a consistent repro pipetask run, but the last few dataIds I've tested have succeeded on an individual level.