Create MatchProbabilistic (Pipeline)Task
Description
is FS-depended by
Checklist
Lucidchart Diagrams
Issue Matrix
hideActivity

That already exists with the match_n_finite_min
field in MatchProbabilisticConfig
. For example, if you're matching on two spatial coordinates and three fluxes, you can set it to 4 to allow one NaN chi value (presumably a flux). For that matter, you can also choose to compute chi without using spatial coordinates at all, although I'm not sure why anyone would want to.
If someone wants more fine-grained control of which columns require measurements, like say requiring g + r fluxes but not u or izy, I think that would require more config fields, but I didn't see a need for that feature at the moment.

Okay. The only comment I have is comparing to data with NaN columns. From experience, you could have it that one of your bands has a NaN for a given value while all the rest of them are fine. If you want to add this functionality (assuming I've read the code correctly.) I'm happy to have it as another ticket.

Passed Jenkins after yet another pipe_tasks
merge.
Thanks for the feedback. I added more inline comments. I don't really see a good way to refactor the method, because most of it is setting up variables that need to stay in scope for the inner loop. See if you agree based on the comments I added.

Looks good mostly. My main suggest is breaking up some of the code in the match method into sub methods to make the calculations more clear. Also adding inline comments would help.

I experimented with using pydantic for configuring the matcher, but this proved to have no real benefits over pex_config, so I settled on the following structure:
meas_astrom:
matcher_probabilistic.py: MatchProbabilisticConfig and MatcherProbabilistic (a plain old class). The most generic matching functions with no stack dependencies, only pex_config, so this could be spun off at some point (I don't know if there's a good way now).
match_probabilistic.py: MatchProbabilisticTask, including all stack dependencies. Also has a single test case in tests/test_match_probabilistic_task.py. Could maybe use more to cover other config options.
pipe_tasks:
match_tract_catalog.py: MatchTractCatalogConfig/Task pipeline task with configurable abstract MatchTractCatalogSubConfig/Task classes, in case someone wants to use/implement another matcher.
match_tract_catalog_probabilistic.py: MatchTractCatalogProbabilisticConfig/Task classes that inherit from both MatchTractCatalogSubConfig/Task and MatchProbabilisticConfig/Task.
I haven't added any tests to pipe_tasks yet; I'm not sure if they're warranted. If I were to do so, I'd probably copypasta from the existing test_matchFakes.py and/or test_parquet.py since that is the intended use.
To verify that this works, I ran:
pipetask --long-log run -b /repo/dc2 --input 2.2i/runs/test-med-1/w_2021_40/DM-32024,2.2i/truth_summary --output "u/dtaranu/DM-32034/w_2021_40_match_0.5asec" -d "instrument = 'LSSTCam-imSim' and tract=3828 and skymap='DC2'" -p /project/dtaranu/dc2/match_catalog_pipe/match_tract_catalog_probabilistic.yaml --register-dataset-types
The associated yaml/config is valid for w40 and probably w44/6 and will eventually go into DRP pipelines, either after objectTable_tract generation, or as part of faro (TBD). In principle this could beusedto match against an observed reference catalog, not just a truth catalog (e.g. an HST-derived COSMOS catalog for HSC UDeep), but I haven't implemented using uncertainties in the reference catalog yet. I imagine I'd just add them in quadrature to each target source's errors.
Once the matcher is being run as part of DRP, I'll update the notebook from https://rubinobs.atlassian.net/browse/DM-31781#icft=DM-31781 to use the generated datasets to make DC2 vs truth plots.
Details
Details
Assignee

Reporter

Labels
Reviewers

Following DM-31781, wrap the ProbabilisticMatcher as a (Pipeline)Task generating DataFrames.