Skip to content

Conversation

@BrianMichell
Copy link

Supersedes #264

Removes bool open_as_void = false default in C++ function declarations and definitions.

Uses a derived DataCache implementation similar to strategy used in neuroglancer_precomputed driver for UnshardedDataCache and ShardedDataCache

@laramiel
Copy link
Collaborator

laramiel commented Jan 6, 2026

Ok, I imported this locally and am looking at a few things.

  1. Please make sure that all the tests build and pass. By removing the open_as_void = true, the spec tests fail to build.

@BrianMichell
Copy link
Author

Ok, I imported this locally and am looking at a few things.

  1. Please make sure that all the tests build and pass. By removing the open_as_void = true, the spec tests fail to build.

My bad, I don't know how I missed the compilation issue for the spec test. I've fixed the issue and resolved your comments.

Thanks for taking a look and getting back to this so fast!

@BrianMichell
Copy link
Author

Sorry for the delay responding to the latest round of feedback. I believe everything raised so far has been addressed.

/// \param partial_metadata Constraints in the form of partial zarr metadata.
/// \param selected_field The field to which `schema` applies.
/// \param schema Schema constraints for the `selected_field`.
/// \param open_as_void If true, opens the array as raw bytes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

selected_field and open_as_void are mutually exclusive in GetNewMetadata.
At least document that & add an error return/test case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also for GetSpecRankAndFieldInfo

fill_value = (*metadata.fill_value)[field_index];
// For void access, synthesize a byte-level fill value
if (open_as_void) {
const Index nbytes = metadata.dtype->bytes_per_outer_element;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect this to get the fill_value from the metadata. More like:

size_t field_index = 0; // open_as_void has a single field.
if (!open_as_void) {
TENSORSTORE_ASSIGN_OR_RETURN(
field_index,
GetFieldIndex(*metadata.dtype, selected_field));
}
fill_value = (*metadata.fill_value)[field_index];

That might require CreateVoidMetadata to set a proper fill value?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes CreateVoidMetadata should set a proper fill value.

#include "absl/status/status.h"
#include "absl/strings/cord.h"
#include <nlohmann/json_fwd.hpp>
#include "riegeli/bytes/cord_reader.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing BUILD deps for these.

/// \file
/// Support for encoding/decoding zarr "dtype" specifications.
/// See: https://zarr.readthedocs.io/en/stable/spec/v2.html

Copy link
Collaborator

@laramiel laramiel Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is missing some includes (*it already was, but now it's missing more).

#include <optional>
#include <string>
#include <string_view>
#include <vector>

#include "absl/status/status.h"
#include "tensorstore/index.h"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the BUILD deps for the new non-std library includes as well.

friend void to_json(::nlohmann::json& out, // NOLINT
const ZarrDType& dtype);

/// Lazily-computed cache for GetVoidField().
Copy link
Collaborator

@laramiel laramiel Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we need this cached? Seems like just returning a Field is reasonable, as the CreateVoidMetadata is already going to copy it into vector, and it's not expensive to recreate when opening a new zarr volume.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See jbms@ below; he's in favor of keeping this cached but guarding it.

// since we're treating the data as raw bytes regardless of the actual dtype.
// Shape is allowed to differ (handled by base class for resizing).
// Other fields like compressor, order, chunks must still match.
if (existing_metadata.dtype.bytes_per_outer_element !=
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could just rely on the normal validate but applied to the void metadata.

// contain only the void field, allowing standard encode/decode to work.
// CreateVoidMetadata uses the same chunks and bytes_per_outer_element as
// the original validated metadata, so it should never fail.
initializer.metadata = CreateVoidMetadata(original_metadata).value();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably would be better to have the original metadata cache the pointer to the void metadata and initialize it on first access.

field_index,
GetFieldIndex(metadata.dtype, spec().selected_field));
}
TENSORSTORE_RETURN_IF_ERROR(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to make sure to validate the schema against the void metadata, not the regular metadata. Note: The partial_metadata validation is still against the regular metadata, though.


// Set fill_value for the single void field.
// Empty/null fill value is handled by GetChunkGridSpecification.
metadata->fill_value.resize(1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to have a proper fill value here.

}

const ZarrDType::Field* ZarrDType::GetVoidField() const {
if (!void_field_cache_) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cache isn't thread safe which is a problem. However, instead of caching void_field instead could just cache the entire void metadata (and do it thread safely).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually looking more closely I see that it will be inconvenient to not have this cache, since we need a stable pointer to the void field. So you should keep the cache but just make it thread safe, e.g. using absl::call_once.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants