Skip to content

Conversation

@rowlesmr
Copy link
Collaborator

@rowlesmr rowlesmr commented Jun 24, 2025

From discussions on what a detector is. See:
#204 (comment)
#204 (comment)
#190

I though I'd pull it out into its own branch so we can have a look at what it would look like.

This is my own summary of what is going on.

The scope of this is "what do we need to define an 'instrument'?", with a specific focus on "What is a detector (in a pdCIF-sense)?"

To my mind, an instrument comprises:

  • source: singular, potentially multi-wavelength, but only one source.
  • incident optics: singular
  • specimen geometry: singular, size, orientation...
  • diffracted beam optics: at least one, potentially many
  • detector: at least one, potentially many

.

This is my own summary of what is going on.

It was kind of precipitated by my asking in #190 if we needed to redefine/extend DIFFRN_DETECTOR.

Initially, @jamesrhester was of the opinion:

I think we need:

  • _pd_instr_detector.instr_id - which instrument the detector is attached to, not a key data name
  • _pd_instr_detector.id - child of _diffrn_detector.id and also key to this category. Doing it this way is essentially the same as adding all of the _pd_instr_detector data names to diffrn_detector. These detectors may be composed of multiple elements as per imgCIF. (see discussion a couple of paragraphs below)
  • _pd_diffractogram.detector_id instead of _pd_diffractogram.instr_id (see important change below).

where the important change is

In the multi-analyser example, each scintillator will produce its own diffractogram. From this we deduce that each diffractogram is associated with a detector, not with an instrument. So we shouldn't have _pd_diffractogram.instr_id, instead we should have a _pd_diffractogram.detector_id. Otherwise we can't present raw data, only data after consolidation. This also mirrors the way in which imgCIF works, with the raw data associated with a detector element, not with a source or instrument.

but recall that _pd_meas.detector_id exists, and is a linked dataname to _pd_instr_detector.id; we can loop a (1D) diffractogram by detector as well.

From, #204 (comment), we want to preserve backwards compatibility, so we restrict PD_INSTR_DETECTOR, (or maybe only _pid.id?) to be a 1D detector, especially as the data names in PD_DATA can only represent a 1D dataset.

Apparantly all this taken as a whole means _pd_instr_detector.id != _diffrn_detector.id, and so _pd_instr_detector.id CANNOT BE a child of _diffrn_detector.id, contrary to James' initial opinion.

Questions from @jamesrhester :

  • We still need to track how data from a 2D detector (potentially defined in imgCIF) is turned into a _pd_proc.*, 1d diffractogram; perhaps we have a _diffrn_detector.diffractogram_id pointing to the diffractogram that has been derived from this detector?
  • Is it possible for more than one diffractogram to be derived from a single detector?

.

On the explicit restriction of PD_INSTR_DETECTOR to be 1D, @briantoby wrote in the PD chapter in Vol G:

The CIF syntax, as opposed to the STAR File syntax, is not well suited to han-
dling large multi-dimensional data sets. For some two-dimensional image formats,
this deficiency was addressed by the development of imgCIF (see Chapters 2.3 and
3.12). It is also true that the CIF syntax is not well suited to storing unprocessed
powder-diffraction measurements from the many instruments that use area detectors
(particularly for the case of the three-dimensional data structures needed for modern
TOF instruments). Even in these cases, however, diffraction intensities are commonly
reduced to simpler representations, such as might be input to a Rietveld refinement
program. The pdCIF definitions are intended for use with these reduced data sets.

It is nearly explicit here that (pdCIF) diffractograms are only one-dimensional, which implies 1D (pdCIF) detectors.

@rowlesmr
Copy link
Collaborator Author

rowlesmr commented Jun 24, 2025

Low hanging fruit - need to add:

  • _pd_instr.radiation_id - which source the instrument is attached to, not a key data name
  • _pd_instr_detector.instr_id - which instrument the detector is attached to, not a key data name

Questions to think about:

  1. What are the implications of defining a diffractogram as to the detector whence it came? that is, _pd_diffractogram.detector_id instead of _pd_diffractogram.instr_id
  2. Are there any practical changes by defining PD_DIFFRACTOGRAM, PD_DATA, and PD_INSTR_DETECTOR to be 1D?
  3. Does the 1D definition and defining a diffractogram in terms of the detector it came from interact, in light of the existence of _pd_meas.detector_id? Note that there is currently no _pd_proc.detector_id.
  4. How do we link a set of _pd_proc.* data names to the original raw data?
  5. How do we link a _pd_instr_detector.id to the physical, potentially 1, 2, or 3D detector? should we?

@rowlesmr rowlesmr changed the title PD_INSTR & PD_INSTR_DETECTOR: what _is_ a detector in pdCIF? What makes an instrument? What _is_ a detector in pdCIF? Jun 24, 2025
jamesrhester
jamesrhester previously approved these changes Jun 25, 2025
@jamesrhester
Copy link
Contributor

Questions to think about:

  1. What are the implications of defining a diffractogram as to the detector whence it came? that is, _pd_diffractogram.detector_id instead of _pd_diffractogram.instr_id

As there is a direct link from pd_instr_detector to pd_instr, no information can be lost, so no consequences.

  1. Are there any practical changes by defining PD_DIFFRACTOGRAM, PD_DATA, and PD_INSTR_DETECTOR to be 1D?

As there appears to be no consideration of 2D data in the powder dictionary, I think we are just making explicit what was previously implicit (as @rowlesmr notes above).

  1. Does the 1D definition and defining a diffractogram in terms of the detector it came from interact, in light of the existence of _pd_meas.detector_id? Note that there is currently no _pd_proc.detector_id.

Very important question. Is a diffractogram the raw or processed data? Because if the former, there could be many detectors, but if the latter, then we are currently saying there is only one. There is a lot of untangling to do, because I've just realised that PD_DATA/PD_PROC/MEAS/CALC are all supposed to be aspects of one single category with identical key data names. I need to think about it.

  1. How do we link a set of _pd_proc.* data names to the original raw data?

If pd_meas is missing, there is no raw 1D data so it is not possible to link to the raw data. If 2D data, then I've suggested a _diffrn_detector.diffractogram_id.

  1. How do we link a _pd_instr_detector.id to the physical, potentially 1, 2, or 3D detector? should we?

I think this category should be limited to 1D detectors in the first instance. Although we could think about adding a link from diffrn_detector to pd_instr_detector if the data names make sense for an area detector (e.g. distances might need to be defined as along the normal to the detector and so on).

@rowlesmr
Copy link
Collaborator Author

rowlesmr commented Jun 25, 2025

  1. What are the implications of defining a diffractogram as to the detector whence it came? that is, _pd_diffractogram.detector_id instead of _pd_diffractogram.instr_id

As there is a direct link from pd_instr_detector to pd_instr, no information can be lost, so no consequences.

I can see how this works where each detector can be said to contribute a diffractogram, such as in a multi-analyser setup, or combination of a mythen and area-detector. But what about the use of detector_id as a proxy of channel in an energy-dispersive detector?

_pd_instr_detector.id is defined as

A code which identifies the detector or channel number in a
position-sensitive, energy-dispersive or other multiple-detector
instrument for which the individual instrument geometry is being
defined. Where a single detector is used, this may be omitted.

and _pd_meas.detector_id acts in that stead to help define a diffractogram. Can we make up a new _pd_meas.channel (or element, or other fancy word) to talk about the channel within a physical detector (ie as an alias for _pd_meas.detector_id)? and redefine _pd_instr_detector.id to only talk about detectors as a "whole". How much legacy are we dealing with? @briantoby @vaitkus - do you know if it has been used a lot?

@rowlesmr
Copy link
Collaborator Author

2. Are there any practical changes by defining PD_DIFFRACTOGRAM, PD_DATA, and PD_INSTR_DETECTOR to be 1D?

As there appears to be no consideration of 2D data in the powder dictionary, I think we are just making explicit what was previously implicit (as @rowlesmr notes above).

My thoughts as well.

@rowlesmr
Copy link
Collaborator Author

rowlesmr commented Jun 25, 2025

4. How do we link a set of _pd_proc.* data names to the original raw data?

If pd_meas is missing, there is no raw 1D data so it is not possible to link to the raw data. If 2D data, then I've suggested a _diffrn_detector.diffractogram_id

How does that track if you want to combine several detectors worth of diffractograms into a single one? A la a multi analyser? Or even what I did with my program convas, doing running averages of in situ data (which I don't think is possible to record outside of a text description and just providing proc data)

This may also bleed into your response to point 3.

A few potentially disconnected thoughts

I think of diffractograms in terms of ordered xy files I feed into an analysis/visualisation program; two files, two diffractograms, x is single-valued and always increasing (or decreasing) . They can be processed to combine them for further analysis, producing less diffractograms.

Also on tof. In the couple of examples I've seen, each bank of detectors produces a "diffractogram", and many of these are simultaneously analysed to cover a wide d-space range. Although I don't know the processing going on in the background to get that 1d dataset.

@rowlesmr
Copy link
Collaborator Author

5. How do we link a _pd_instr_detector.id to the physical, potentially 1, 2, or 3D detector? should we?

I think this category should be limited to 1D detectors in the first instance. Although we could think about adding a link from diffrn_detector to pd_instr_detector if the data names make sense for an area detector (e.g. distances might need to be defined as along the normal to the detector and so on).

My initial thoughts are I think that _pd_instr_detector.id should be linked to diffrn_detector, regardless of the detector dimension. My thought is that you should be able to trace the 1D processed to the original 2d data and detector (although that may already be possible through other data names). Yes, distance and such data items would need to be updated to deal with size issues like that.

@rowlesmr
Copy link
Collaborator Author

rowlesmr commented Jun 25, 2025

  1. Does the 1D definition and defining a diffractogram in terms of the detector it came from interact, in light of the existence of _pd_meas.detector_id? Note that there is currently no _pd_proc.detector_id.

Very important question. Is a diffractogram the raw or processed data? Because if the former, there could be many detectors, but if the latter, then we are currently saying there is only one. There is a lot of untangling to do, because I've just realised that PD_DATA/PD_PROC/MEAS/CALC are all supposed to be aspects of one single category with identical key data names. I need to think about it.

.

Also, the definition of raw here is also up for discussion. Lab XRDs often have strip detectors, so the "raw" data you get off them is already processed to combine the few hundred detectors that exist in "the" detector. And large facilities aren't simpler.

@rowlesmr
Copy link
Collaborator Author

Holy edge cases, Batman!

https://pubmed.ncbi.nlm.nih.gov/34429720/

A single physical 2D detector, 9 analyser crystals, 9 2D-regions-of-interest on the physical detector as "virtual detectors". All for one (rather good) one-dimensional diffractogram.

@rowlesmr
Copy link
Collaborator Author

_pd_instr_detector.id should be linked to diffrn_detector

but if we do that in an imgCIF-sense, we get linked back to _diffrn.id? Why does changing the specimen temperature affect the detector? or are you trying to describe that data coming from it?

@jamesrhester
Copy link
Contributor

OK. I've gone back to basics to try and figure this out.

Axiom:

  • It must be possible for PD_MEAS, PD_CALC, and PD_PROC to be looped together or separately. This is a fundamental design requirement of the original dictionary (This is the same case as the U_aniso being allowed to loop together or separately from the atomic positions in the core dictionary.)

The deductions that follow:

  1. PD_MEAS/CALC/PROC are all child categories of PD_DATA (this official "child" relationship of Loop categories is what allows separate/together looping). This means those categories must share key data names, including _pd_diffractogram.id. The current dictionary fulfills this requirement.
  2. Therefore pd_diffractogram cannot be used exclusively to label raw/processed/calculated data. It is instead equivalent to pd_data_overall, providing information about the data as a whole.
  3. As multiple detectors can contribute to PD_MEAS, pd_diffractogram as overall information cannot always be associated with a single detector.
  4. So _pd_diffractogram.detector_id is ill-defined and should be removed.

Linking data to detectors and instruments

Propose retaining/adding:

  1. _pd_instr_detector.instr_id linking to pd_instr
  2. _pd_diffractogram.instr_id linking to pd_instr
  3. _pd_instr_detector.diffrn_detector_id linking 2D detector elements to post-sample optics.

How this plays out for linking instrument to data:

1D data

  1. If raw data are presented in PD_MEAS: pd_meas.detector_id -> pd_instr_detector -> pd_instr
  2. If processed data only are presented in PD_PROC: pd_proc -> pd_diffractogram -> pd_instr. If both (1) and (2) are present, then these values must agree. Likewise if only PD_MEAS is present both paths must agree (only one of _pd_meas.detector_id or _pd_diffractogram.instr_id is required).

2D data

  1. Raw 2D data are described using the usual imgCIF data names.
  2. The resulting diffractogram is given in PD_PROC.
  3. The instrument is identified as per 1D data, point (2) above.
  4. The detectors are identified as for 1D data with the extra pointer to the 2D detector

Notes

  1. The ID22 "area detector behind analyser crystals" is handled by having multiple pd_instr_detectors all pointing to the same diffrn_detector_element of a monolithic detector. There is not quite enough information to reduce the data as the relative positioning of the analyser crystals needs to be expressed and there are no data names for this. We don't have to figure that out now.
  2. If strip detectors like the Mythen produce 2D raw data, that's one route; if they produce 1D raw data (sum pixels axially internally), that's another. I think the point is that PD_PROC are the data that ready for fitting or calibratoin, PD_MEAS are the data as far back as you can go towards raw data.
  3. I've reversed the link direction for the diffrn_detector_element - pd_instr_detector link so that we aren't creating a new 2D element for every separate post-sample optical setup.

Have I missed anything?

@rowlesmr
Copy link
Collaborator Author

rowlesmr commented Jun 26, 2025

  • We can link multiple _pd_instr_detector.id to a single _diffrn_detector.id, so we can continue to use _pid.id as a channel proxy*.
  • We can have multplie 1D detectors contributing to a single PD_MEAS diffractogram.
  • We can have a 2D detector giving us PD_PROC diffractogram directly. The details on how this data reduction is done is delegated to other things to record.

Questions:

  • Can we have multiple 2D detectors giving us a single PD_PROC diffractogram? - This has to be a yes. Refer to ID22. What does this look like in a CIF file? Does a link between the source 2D data and the _pd_proc.diffractogram_id need to made directly, or is the _diffrn_detector.id sufficient (diffractogram.id -> instr.id -> instr_detector.id -> dffrn_detector.id)?
  • Can we combine 1D and 2D detectors giving us a single PD_PROC diffractogram? - This has to be a yes. 1D data can be stored in PD_MEAS, 2D data has to be in imgCIF. _pd_proc.* data names just appear, and the proc process is described using other data names, and everything is both hunky and dory. Need to ensure _pd_data.point_id values are appropriately assigned.

To do:

  • add _pd_instr_detector.diffrn_detector_id, to allow pdCIF detectors to be linked to physical detectors.
  • redefine DIFFRN_DETECTOR to have a key data name, _dd.id, matching the type in imgCIF. Is this a multiblock thing or a pdCIF thing? I think the latter.
  • do the * thing, below
  • I propose we make _pd_meas.channel (or detector_element or some other word) to explicitly deal with channel, and leave detector_id to handle physical detectors. We redefine _pdi.id so that it should be the ID of a physical detector, while noting (and still allowing, but frowning upon) its use as a channel proxy. Make this an Assigned Real Number?

@jamesrhester
Copy link
Contributor

Can we have multiple 2D detectors giving us a single PD_PROC diffractogram? - This has to be a yes. Refer to ID22. What does this look like in a CIF file? Does a link between the source 2D data and the _pd_proc.diffractogram_id need to made directly, or is the _diffrn_detector.id sufficient (diffractogram.id -> instr.id -> instr_detector.id -> dffrn_detector.id)?

The diffrn_detector.id is sufficient, as you point out.

Can we combine 1D and 2D detectors giving us a single PD_PROC diffractogram? - This has to be a yes. 1D data can be stored in PD_MEAS, 2D data has to be in imgCIF. _pd_proc.* data names just appear, and the proc process is described using other data names, and everything is both hunky and dory. Need to ensure _pd_data.point_id values are appropriately assigned.

Yes, that looks right.

redefine DIFFRN_DETECTOR to have a key data name, _dd.id, matching the type in imgCIF. Is this a multiblock thing or a pdCIF thing? I think the latter.

We are duplicating what is already in imgCIF, so we should probably put it in multiblock as it is more universal than just powder.

I propose we make _pd_meas.channel (or detector_element or some other word) to explicitly deal with channel, and leave detector_id to handle physical detectors. We redefine _pdi.id so that it should be the ID of a physical detector, while noting (and still allowing, but frowning upon) its use as a channel proxy. Make this an Assigned Real Number?

That sounds reasonable.

@rowlesmr
Copy link
Collaborator Author

Need to accept COMCIFS/MultiBlock_Dictionary#31 before this one

@rowlesmr
Copy link
Collaborator Author

rowlesmr commented Jun 26, 2025

Should _pd_meas.channel be a Real or Integer?

It is designed to refer to a channel or subsection of a detector to which we can apportion an intensity. That would imply integer, but I would like it to be as general as possible.

@rowlesmr rowlesmr requested a review from jamesrhester June 26, 2025 11:20
@rowlesmr rowlesmr dismissed jamesrhester’s stale review June 26, 2025 11:20

Other commits pushed after review

@rowlesmr rowlesmr force-pushed the PD_INSTR-and-PD_INSTR_DETECTOR-descriptions branch from 3a2298c to 3d7cc56 Compare June 26, 2025 13:16
@jamesrhester
Copy link
Contributor

Should _pd_meas.channel be a Real or Integer?

It is designed to refer to a channel or subsection of a detector to which we can apportion an intensity. That would imply integer, but I would like it to be as general as possible.

As a rule it is best for an identifier to never contain any other information (because eventually the information content rules it out as an identifier - hkl was used as an identifier, then we got incommensurate structures and multiple measurements. Oops!). So it should be an integer or even opaque code. A Real would also imply that the printed value was an approximation.

@rowlesmr
Copy link
Collaborator Author

I don't think it should an opaque code, as there is an ordering with channels, this one comes before that one. I'll go with Integer.

@rowlesmr
Copy link
Collaborator Author

rowlesmr commented Jul 6, 2025

I think this is ready to go now, given that COMCIFS/MultiBlock_Dictionary#31 is merged.

What's in here?

  • Updated descriptions in PD_DATA, PD_CALC, PD_MEAS, and PD_PROC specifying that they are 1D datasets, and that higher dimensional datasets must be reduced before putting them into one of these.
  • Added _pd_meas.channel to act as a data name to hold channel number-like information, rather than making _pd_meas.detector_id and _pd_instr_detector.id act both as channel and detector identifiers.
  • Clarified that _pd_meas.detector_id and _pd_instr_detector.id should both be used to refer to detectors, rather than individual channels, although it doesn't outlaw this in order to maintain compatibility with previous behaviour.
  • Gave some more info on radiation sources in PD_INSTR description.
  • Updated distance descriptions in PD_INSTR and PD_INSTR_DETECTOR data names.
  • Added _pd_instr.radiation_id, linked to _diffrn_radiation.id, so the instrument knows which source is plugged in to it.
  • Added _pd_instr_detector.diffrn_detector_id linked to _diffrn_detector.id so we know information about the detector represented by _pid.id.
  • Added _pd_instr_detector.diffrn_id linked to _diffrn.id. This is there only to maintain compatibility with imgCIF and DIFFRN_DETECTOR. It shouldn't be used; if it were, there would be a _pd_instr_detector.id per _diffrn.id! If you want a _diffrn.id value, look up _pd_diffractogram.diffrn_id. Words to this effect are in the dataname description.
  • Added _pd_instr_detector.instr_id to link the detectors to the instrument (_pd_instr.id) they're attached to.

The PD_DIFFRACTOGRAM links to imgCIF are given in #234

Copy link
Contributor

@jamesrhester jamesrhester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. As noted in my review comments, there is some finessing going on regarding a change in the way of expressing raw TOF measurements, but it was always a stretch as Vol G 3.3.8.5 notes. Going forward we'll need to use imgCIF pointing to external images to handle the dataset size.

@jamesrhester
Copy link
Contributor

I would have made some edits myself but not sure how to make them appear as "suggested edits" rather than new commits to the branch.

From @jamesrhester 's suggestions
@rowlesmr rowlesmr merged commit 3343e65 into COMCIFS:master Jul 8, 2025
5 of 6 checks passed
@rowlesmr rowlesmr deleted the PD_INSTR-and-PD_INSTR_DETECTOR-descriptions branch July 11, 2025 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants