Load entire scarlet model dataset locally when not specifying a blend_id parameter

Description

In https://rubinobs.atlassian.net/browse/DM-49537 we implemented storing scarlet models as zip files so that individual blends could be loaded quickly without loading the entire dataset. This has been causing I/O issues when the entire dataset is loaded (ie. for populating an entire source catalog) as it makes 1000s of separate I/O calls. This ticket is to fix that by loading the entire dataset into memory when no blend ID is (or blend IDs are) specified.

Linked work items

relates to

DM-50506

Ensure HTTPResourcePath closes connection when doing partial reads

Issue Matrix

hide

Activity

Show:

Tim Jenness

April 28, 2025 at 9:22 PM

The reason the NotImplemented line is not being hit is because the butler will download to a local file and read that if it knows that the file is going to be cached – that is the default butler configuration. There is a test in obs_base that explicitly prevents caching for a test but it’s probably not worth your while here. I think IN2P3 hit the problem because in batch we are more targeted with our specification as to which files should be cached.

Tim Jenness

April 28, 2025 at 9:09 PM

(sorry, pressed the wrong button first time)

Thanks for unifying the read code for stream and path.

Fred Moolekamp

April 24, 2025 at 7:41 PM

(edited)

Restarted Jenkins due to broken main: https://rubin-ci.slac.stanford.edu/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/4918/pipeline/

Resize issue view side panel

Done

Pinned fields

Click on the next to a field label to start pinning.

Details

Assignee

Fred Moolekamp

Reporter

Fred Moolekamp

Reviewers

Tim Jenness

Story Points

RubinTeam

Data Release Production

Components

meas_extensions_scarlet

Checklist

Created April 24, 2025 at 6:36 PM

Updated April 30, 2025 at 2:14 AM

Resolved April 30, 2025 at 2:13 AM