Create MatchProbabilistic (Pipeline)Task

Description

Following DM-31781, wrap the ProbabilisticMatcher as a (Pipeline)Task generating DataFrames.

Linked work items

is FS-depended by

DM-33221

Create DiffMatchedTractCatalogTask to aggregate match statistics for metrics

Checklist

Lucidchart Diagrams

Issue Matrix

hide

Activity

Show:

Dan Taranu

December 6, 2021 at 4:40 PM

That already exists with the match_n_finite_min field in MatchProbabilisticConfig. For example, if you're matching on two spatial coordinates and three fluxes, you can set it to 4 to allow one NaN chi value (presumably a flux). For that matter, you can also choose to compute chi without using spatial coordinates at all, although I'm not sure why anyone would want to.

If someone wants more fine-grained control of which columns require measurements, like say requiring g + r fluxes but not u or izy, I think that would require more config fields, but I didn't see a need for that feature at the moment.

morrisoncb

December 4, 2021 at 12:48 AM

Okay. The only comment I have is comparing to data with NaN columns. From experience, you could have it that one of your bands has a NaN for a given value while all the rest of them are fine. If you want to add this functionality (assuming I've read the code correctly.) I'm happy to have it as another ticket.

Dan Taranu

December 3, 2021 at 2:11 AM

Passed Jenkins after yet another pipe_tasks merge.

Thanks for the feedback. I added more inline comments. I don't really see a good way to refactor the method, because most of it is setting up variables that need to stay in scope for the inner loop. See if you agree based on the comments I added.

morrisoncb

December 2, 2021 at 9:36 PM

Looks good mostly. My main suggest is breaking up some of the code in the match method into sub methods to make the calculations more clear. Also adding inline comments would help.

Dan Taranu

December 1, 2021 at 6:32 PM

(edited)

I experimented with using pydantic for configuring the matcher, but this proved to have no real benefits over pex_config, so I settled on the following structure:

meas_astrom:

matcher_probabilistic.py: MatchProbabilisticConfig and MatcherProbabilistic (a plain old class). The most generic matching functions with no stack dependencies, only pex_config, so this could be spun off at some point (I don't know if there's a good way now).
match_probabilistic.py: MatchProbabilisticTask, including all stack dependencies. Also has a single test case in tests/test_match_probabilistic_task.py. Could maybe use more to cover other config options.

pipe_tasks:

match_tract_catalog.py: MatchTractCatalogConfig/Task pipeline task with configurable abstract MatchTractCatalogSubConfig/Task classes, in case someone wants to use/implement another matcher.
match_tract_catalog_probabilistic.py: MatchTractCatalogProbabilisticConfig/Task classes that inherit from both MatchTractCatalogSubConfig/Task and MatchProbabilisticConfig/Task.

I haven't added any tests to pipe_tasks yet; I'm not sure if they're warranted. If I were to do so, I'd probably copypasta from the existing test_matchFakes.py and/or test_parquet.py since that is the intended use.

To verify that this works, I ran:

pipetask --long-log run -b /repo/dc2 --input 2.2i/runs/test-med-1/w_2021_40/DM-32024,2.2i/truth_summary --output "u/dtaranu/DM-32034/w_2021_40_match_0.5asec" -d "instrument = 'LSSTCam-imSim' and tract=3828 and skymap='DC2'" -p /project/dtaranu/dc2/match_catalog_pipe/match_tract_catalog_probabilistic.yaml --register-dataset-types

The associated yaml/config is valid for w40 and probably w44/6 and will eventually go into DRP pipelines, either after objectTable_tract generation, or as part of faro (TBD). In principle this could beusedto match against an observed reference catalog, not just a truth catalog (e.g. an HST-derived COSMOS catalog for HSC UDeep), but I haven't implemented using uncertainties in the reference catalog yet. I imagine I'd just add them in quadrature to each target source's errors.

Once the matcher is being run as part of DRP, I'll update the notebook from https://rubinobs.atlassian.net/browse/DM-31781#icft=DM-31781 to use the generated datasets to make DC2 vs truth plots.

Resize issue view side panel

Done

Pinned fields

Click on the next to a field label to start pinning.

Details

Assignee

Dan Taranu

Reporter

Dan Taranu

Labels

Reviewers

morrisoncb

Story Points

RubinTeam

Data Release Production

Priority

Undefined

Checklist

Created October 4, 2021 at 9:06 PM

Updated January 29, 2022 at 2:05 AM

Resolved December 6, 2021 at 6:24 PM