clustering detector hits by time as well #559

sm2909 · 2025-11-06T17:35:48Z

for issue #523

Clustering was supposed to be based on position and time simultaneously.
To achieve this, I implemented a Spatio-Temporal DBSCAN (ST-DBSCAN) algorithm, which includes a point in a cluster only when both spatial and temporal constraints are satisfied.

Traditional DBSCAN uses a Euclidean metric, which mixes position and time dimensions. This could incorrectly group points that are spatially close but temporally far apart (for example, position inside but time outside the threshold, and still euclidean distance inside threshold). The new approach avoids that issue by handling both constraints explicitly.

That is why I have used this new algorithm. Any prior implementation of this did not exist in julia libraries, so I have coded it from scratch

A new dependency Neighbors.jl was added in project.toml for this algorithm

tc = SolidStateDetectors.cluster_detector_hits(t, 20u"μm", 100u"s")

When this is run, clustering will happen with time constraint as well. Cluster properties (pos, edep) are calculated as before with the addition of thit as well.

tc.thit[1]
3-element view(::Vector{Quantity{Float32, 𝐓, Unitful.FreeUnits{(s,), 𝐓, nothing}}}, 1:3) with eltype Quantity{Float32, 𝐓, Unitful.FreeUnits{(s,), 𝐓, nothing}}:
 1.7250684f6 s
 1.7992052f6 s
 1.7991948f6 s

(This was run in the same case with which you demonstrated me in previous comment)

The problem that it has is that it is taking a lot of computer resources to compute the clusters. I tried using the best approaches (KDTree, inrange from Neighbors.jl), but the issue is arising due to the complexity of the problem as we have to simultaneously cluster based on position and time.

This is the computer resources typically being used when clustering using cluster_detector_hits function:

133.839229 seconds (35.63 M allocations: 1.751 GiB, 0.57% gc time, 99.71% compilation time)

(There might be a better approach or performance gains to be found. I tried my best and came up with this.)

Open for any modifications, which might be required for it to be integrated with whole project.

fhagemann · 2025-11-06T21:15:29Z

Thanks for the initiative, I will look into this in more detail once I find a quite moment.
Maybe a quick question to get an order of magnitude: the 133.839229 seconds you quote is for how many events/rows of the initial table?

sm2909 · 2025-11-07T03:22:07Z

1000 rows

These are the exact commands that I ran:

using SolidStateDetectors, Geant4, Unitful
sim = Simulation{Float32}(SSD_examples[:InvertedCoax])
source = IsotopeSource(90, 228, 0.0, 0.0, CartesianPoint(0.05,0,0.05))
app = G4JLApplication(sim, source)
t = run_geant4_simulation(app, 1_000)
@time tc = SolidStateDetectors.cluster_detector_hits(t, 20u"μm", 100u"s")

fhagemann · 2025-11-07T19:27:36Z

I see, thanks!
So, what I'm seeing is that these changes do two things:

First, updating the current method of cluster_detector_hits to the new implementation of CartesianPoint (Change AbstractCoordinatePoint to not be a FieldVector #489), which on the current main branch does not seem to be working with the Geant4 output.
Update the clustering algorithm to cluster both time and/or positions

Would it be very cumbersome for you to split these two things into two commits, to see what exactly the changes to the clustering algorithm are?

And whether the long run-time of 2 minutes results from the updates to the new CartesianPoints (maybe benchmark this before implementing the timing clustering and see how long it takes to run for 1000 events) or the new clustering?

fhagemann · 2025-11-07T21:51:09Z

And it also looks like there are no tests right now for this.

You could add some lines in @testset "Run Geant4 simulations" in test/test_geant4.jl
after creating an checking the evts table and before running the #Generate waveforms block, something like this:

    # Cluster events by radius
    @test_nowarn clustered_evts = SolidStateDetectors.cluster_detector_hits(evts, 10u"µm")
    @test length(clustered_evts) == length(evts)
    @test length(flatview(clustered_evts.pos)) <= length(flatview(evts.pos))
    
     # Cluster events by time
    @test_nowarn clustered_evts = SolidStateDetectors.cluster_detector_hits(evts, 10u"µm", 100u"s")
    @test length(clustered_evts) == length(evts)
    @test length(flatview(clustered_evts.pos)) <= length(flatview(evts.pos))

(note that flatview requires using ArraysOfArrays at the beginning of the file as well).

This will ensure that these methods get covered by the tests and will result in errors whenever something is changed that might break this implementation.

sm2909 · 2025-11-08T17:24:29Z

I have split the changes into 2 commits as you asked, adding the tests as well.

Benchmarks after first commit: (without time clustering)

@time tc = SolidStateDetectors.cluster_detector_hits(t, 20u"μm")
  6.686447 seconds (6.65 M allocations: 390.600 MiB, 10.31% gc time, 96.51% compilation time)

Benchmarks after second commit: (with time clustering)

@time tc = SolidStateDetectors.cluster_detector_hits(t, 20u"μm", 100u"s")
573.671077 seconds (151.57 M allocations: 7.010 GiB, 0.77% gc time, 99.91% compilation time)

@time tc = SolidStateDetectors.cluster_detector_hits(t, 20u"μm", 100u"s")
405.883093 seconds (151.57 M allocations: 7.010 GiB, 0.68% gc time, 99.91% compilation time)

The runtime is random, but is usually between 5-10 minutes for the trials that I did today. (1000 events)

sm2909 · 2025-11-08T17:28:47Z

Few important questions:

The current clustering output table gives pos as static vector. Should these be also returned as CartesianPoint to match the new patterns?
The current code only has clustering for BOTH pos AND time, individual clustering are not available. Older implementations with only pos parameter might break. There are 2 solutions to this:

a) add the old function for only position based clustering along with the new functions (has better performance as well)

b) keep these parameters as kwargs, and have some default value if user doesn't specify the clustering radius. Example: if user only wants position clustering, then radius_time will be kept infinity. (will have performance issues)

fhagemann

Here are some comments.

I'm wondering here: would it maybe be more efficient to

split all events by time (separate a row into multiple rows if the hits are more than eps_time apart). I expect most of the Geant4 hits to have identical/very similar thit entries if they resulted from the same nuclear decay.
then, perform the spatial clustering like we used to before (or the more efficient implementation)?

Based on your experience in implementing this, do you think that this would be faster?
Or do you have some feeling for what is limiting the runtime of this clustering?

test/test_geant4.jl

src/ChargeClustering/ChargeClustering.jl

Project.toml

fhagemann · 2025-11-08T22:39:24Z

Few important questions:
1. The current clustering output table gives pos as static vector. Should these be also returned as CartesianPoint to match the new patterns?

2. The current code only has clustering for BOTH pos AND time, individual clustering are not available. Older implementations with only pos parameter might break. There are 2 solutions to this:
a) add the old function for only position based clustering along with the new functions (has better performance as well)

b) keep these parameters as kwargs, and have some default value if user doesn't specify the clustering radius. Example: if user only wants position clustering, then radius_time will be kept infinity. (will have performance issues)

See my previous message:
I would like to see whether it might make sense to somewhat separate time and spatial clustering.
Right now, we might need to CLUSTER in space, but actually SPLIT in time (one row can have hits from different positions and times -> we should split events at different times into different rows, and THEN cluster hits with similar distances into fewer hits).

cluster_detector_hits could then consist of two steps: split in time, and cluster in space.
Then we could make a base method that takes both cluster_radius and cluster_time as keyword arguments, by specifying them to be Unitful.Length or Unitful.Time and default values NaN that would just skip the splitting/clustering.

In the end, there can be other methods of cluster_detector_hits that would just call the base method based on the inputs, e.g.

#main method
function cluster_detector_hits(
    table::TypedTables.Table;
    cluster_radius::Union{<:Real, Unitful.Length} = NaN, # keyword argument
    cluster_time::Union{<:Real, Unitful.Time} = NaN,     # NaN = skip the clustering/splitting
    )
    @assert :pos in TypedTables.columnnames(table) "Table has no column `pos`"
    @assert :thit in TypedTables.columnnames(table) "Table has no column `thit`"
    @assert :edep in TypedTables.columnnames(table) "Table has no column `edep`"
    @assert :detno in TypedTables.columnnames(table) "Table has no column `detno`"
    clustered_nt = map(
        evt -> cluster_detector_hits(evt.detno, evt.edep, evt.pos, evt.thit, cluster_radius, cluster_time),
        table
    )
end

# convenience functions (allowing for the old method syntax)
cluster_detector_hits(table::TypedTables.Table, cluster_radius::Unitful.Length)  = cluster_detector_hits(table; cluster_radius)
cluster_detector_hits(table::TypedTables.Table, cluster_time::Unitful.Time) = cluster_detector_hits(table; cluster_time)
cluster_detector_hits(table::TypedTables.Table, cluster_radius::Unitful.Length, cluster_time::Unitful.Time) = cluster_detector_hits(table; cluster_radius, cluster_time)
cluster_detector_hits(table::TypedTables.Table, cluster_time::Unitful.Time, cluster_radius::Unitful.Length) = cluster_detector_hits(table; cluster_radius, cluster_time)

But we can worry about the convenience functions in the very last step.

sm2909 · 2025-11-09T03:24:37Z

Here are some comments.

I'm wondering here: would it maybe be more efficient to

split all events by time (separate a row into multiple rows if the hits are more than eps_time apart). I expect most of the Geant4 hits to have identical/very similar thit entries if they resulted from the same nuclear decay.

then, perform the spatial clustering like we used to before (or the more efficient implementation)?

Based on your experience in implementing this, do you think that this would be faster? Or do you have some feeling for what is limiting the runtime of this clustering?

That would lead to incorrect clustering. Consider this toy example:

eps_time = 10s
eps_pos = 10m
(consider position 1d here for easier visualization)

hit 1: 1s, 1m
hit 2: 10s, 100m
hit 3: 15s, 5m

First clustering by time, all hits are in same cluster (eps_time < 10).
Then cluster based on position, [1, 3] are a cluster and [2] is another cluster.
But [1, 3] should not belong to same cluster as eps_time > 10 between them.

(Earlier hit 2 formed the common link between hit 1 and 3 for eps_time. But as finally 2 is not in the same cluster, hit 1 and 3 should not be kept in the same cluster.)

fhagemann · 2025-11-09T04:02:16Z

That would lead to incorrect clustering. Consider this toy example:

eps_time = 10s eps_pos = 10m (consider position 1d here for easier visualization)

hit 1: 1s, 1m hit 2: 10s, 100m hit 3: 15s, 5m

First clustering by time, all hits are in same cluster (eps_time < 10). Then cluster based on position, [1, 3] are a cluster and [2] is another cluster. But [1, 3] should not belong to same cluster as eps_time > 10 between them.

(Earlier hit 2 formed the common link between hit 1 and 3 for eps_time. But as finally 2 is not in the same cluster, hit 1 and 3 should not be kept in the same cluster.)

I see what you mean.
The idea behind time clustering would be to separate events that are more than a given eps_time apart. This is because, in our pulse shape simulation, we start drifting all charge clouds at the same initial time.
Other than with distance, I would also like to make sure that grouped events are never more than eps_time apart.

So, even if the hits were

hit 1: 1s, 1m 
hit 2: 10s, 1m
hit 3: 15s, 1m

and eps_time was 10s, I would like to NEVER group hit 1 and hit3.

Let`s assume this toy model:

for N in 1:1000
hit (N): (N) ns, 1m

If I choose eps_time = 1u"ns" because my simulation step is 1u"ns", I would group all hits within the 1u"µs" together, even though this might not be what I want. I, at most, would like the first two hits to be combined and start a new group with the third hit etc.

Just to think out loud (doesn't mean that this is the best way to go): the time grouping could either start with the first hit and combine everything that's withing eps_time after the first hit.
OR we can sort all events by time and start grouping with the two hits that are closest together, group them, and then keep on adding more events if they are withing eps_time or start with new events afterwards.

The time clustering should be there to make sure that we only drift hits together, if they actually coincide "enough" in time, whereas the space clustering makes sure that we do not simulate thousands of small charge clouds (which might be the output of Geant4), especially if they are all very close together.

fhagemann · 2025-11-09T04:34:16Z

As of SolidStateDetectors@0.10.27 (pre #489), I was using this to just split row of a Table evts with non-identical thit:

using ArraysOfArrays

elem_ptr = deepcopy(evts.thit.elem_ptr)
kernel_size = deepcopy(evts.thit.kernel_size)
added = 0
for i in eachindex(evts.thit)
    thit = evts.thit[i]
    d = findall(diff(thit) .> zero(eltype(thit)))
    isempty(d) && continue
    splice!(elem_ptr, i + added, elem_ptr[i + added] .+ [0; d])
    append!(kernel_size, (() for _ in d))
    added += length(d)
end

evts_split = Table(merge(NamedTuple(c => begin
    if c == :evtno 
        collect(eachindex(kernel_size))
    else
        f = getproperty(evts, c)
        VectorOfArrays(f.data, elem_ptr, kernel_size)
    end end
for c in columnnames(evts))))

sm2909 · 2025-11-09T10:05:05Z

That would lead to incorrect clustering. Consider this toy example:
eps_time = 10s eps_pos = 10m (consider position 1d here for easier visualization)
hit 1: 1s, 1m hit 2: 10s, 100m hit 3: 15s, 5m
First clustering by time, all hits are in same cluster (eps_time < 10). Then cluster based on position, [1, 3] are a cluster and [2] is another cluster. But [1, 3] should not belong to same cluster as eps_time > 10 between them.
(Earlier hit 2 formed the common link between hit 1 and 3 for eps_time. But as finally 2 is not in the same cluster, hit 1 and 3 should not be kept in the same cluster.)

I see what you mean. The idea behind time clustering would be to separate events that are more than a given eps_time apart. This is because, in our pulse shape simulation, we start drifting all charge clouds at the same initial time. Other than with distance, I would also like to make sure that grouped events are never more than eps_time apart.

So, even if the hits were
hit 1: 1s, 1m 
hit 2: 10s, 1m
hit 3: 15s, 1m
and eps_time was 10s, I would like to NEVER group hit 1 and hit3.

Let`s assume this toy model:
for N in 1:1000
hit (N): (N) ns, 1m
If I choose eps_time = 1u"ns" because my simulation step is 1u"ns", I would group all hits within the 1u"µs" together, even though this might not be what I want. I, at most, would like the first two hits to be combined and start a new group with the third hit etc.

Just to think out loud (doesn't mean that this is the best way to go): the time grouping could either start with the first hit and combine everything that's withing eps_time after the first hit. OR we can sort all events by time and start grouping with the two hits that are closest together, group them, and then keep on adding more events if they are withing eps_time or start with new events afterwards.

The time clustering should be there to make sure that we only drift hits together, if they actually coincide "enough" in time, whereas the space clustering makes sure that we do not simulate thousands of small charge clouds (which might be the output of Geant4), especially if they are all very close together.

Thanks for clarifying the use-case of this algorithm.

As we just want to bring together the hits "close enough" in space and time, we might not need dbscan algorithm at all. Rather it could be achieved with a lot simpler algorithms.

For the 1:1000 hits example that you gave, would give same cluster if we do it by dbscan, however we implement it. As long as the hits are dense in time (or space), it will keep on adding. It would lead to huge clusters which is not what we need.

Using the following algorithm would be better in solving our purpose (logic):

for each hit:
    if hit already in a cluster: continue
    search all other hits which are not already part of a cluster & lie with eps of hit
    if satisfies above condition: cluster with hit
end

The implementation for this would be straightforward as well which means runtime will be low.

fhagemann · 2025-11-09T10:19:47Z

I agree for time grouping/splitting. However, I would still like to keep the dbscan for space clustering.

oschulz · 2025-11-09T11:44:32Z

We may actually want to move away from DBSCAN as the default clustering algorithm. DBSCAN can group connected elongated/curved/... agglomerations of charges. We want something that prefers spherical clusters.

oschulz · 2025-11-09T12:10:57Z

I've had a little chat with ChatGPT about this exploring options and possible implementation: https://chatgpt.com/share/69108494-2248-800e-9d83-3b1b29157252

sm2909 · 2025-11-10T15:27:41Z

we can sort all events by time and start grouping with the two hits that are closest together, group them, and then keep on adding more events if they are withing eps_time or start with new events afterwards.

I have used this approach to cluster by time first and then by radius.

The pos column in output table is CartesianPoint type now.

No new dependency than the previously existing code.

fhagemann

Thanks! I had a quick look, but didn't run any code.

Maybe add to the tests how you would expect this function to behave
(e.g. what eltype the column pos has after running this function etc.).

src/ChargeClustering/ChargeClustering.jl

test/test_geant4.jl

src/ChargeClustering/ChargeClustering.jl

sm2909 · 2025-11-12T13:46:37Z

Added some more test: checking eltype of pos, sum of all edep, no NaN in output table
push! in previous has been moved to helper function to save allocations
pos type is converted to CartesianPoint in the function (with table input) to avoid errors.
Removed Clustering@0.14 supporter method as you suggested.

fhagemann · 2025-11-12T17:12:10Z

Ok, sorry for making this chaotic, but in PR #558, I added code that would make sure that the OLD implementation of cluster_detector_hits would work and merged this into main.
This also included some clever way to deal with both CartesianPoint and StaticVector as pos column of the Table.
d6a62f3

Could you build your changes on top of this, and see if the tests require changes?

src/ChargeClustering/ChargeClustering.jl

fhagemann · 2025-11-24T19:36:01Z

One thing that I saw when running this: we might wanna split events that are more than cluster_time apart in time into different rows of the table, such that the table after time splitting might have more rows than the initial one.
Because SSD will simulate pulse shapes for each row, and we do not want to be events far apart in time to be simulated together.

sm2909 · 2025-12-05T17:51:21Z

One thing that I saw when running this: we might wanna split events that are more than cluster_time apart in time into different rows of the table, such that the table after time splitting might have more rows than the initial one. Because SSD will simulate pulse shapes for each row, and we do not want to be events far apart in time to be simulated together.

Clarification on this:
The values in evtno column of the output table should have the event numbers from the original experiment, or should they just reflect the serial number of the row of the table?

fhagemann · 2025-12-05T18:07:03Z

Clarification on this: The values in evtno column of the output table should have the event numbers from the original experiment, or should they just reflect the serial number of the row of the table?

I would say we might want to keep the old numbers.
But we can also introduce a keyword argument to let the user choose if they want the original evtno or a counter that is equal to the row of the table.

Also, the original table might have evtno being a Vector of event numbers, so we might want to separate the arrays the same way we would separate times, energies and positions then.

src/ChargeClustering/ChargeClustering.jl

sm2909 · 2025-12-06T18:48:19Z

using SolidStateDetectors, Geant4, Unitful
sim = Simulation{Float32}(SSD_examples[:InvertedCoax])
source = IsotopeSource(90, 228, 0.0, 0.0, CartesianPoint(0.05,0,0.05))
app = G4JLApplication(sim, source)
t = run_geant4_simulation(app, 1_000)

After merging the latest changes from main, the geant4 table is giving unitless position values:

t.pos[1]
7-element view(::Vector{CartesianPoint{Float32}}, 1:7) with eltype CartesianPoint{Float32}:
 CartesianPoint{Float32}(0.029364103f0, -0.010049292f0, 0.042470355f0)
 CartesianPoint{Float32}(0.029358402f0, -0.010037613f0, 0.04245329f0)
 CartesianPoint{Float32}(0.029364105f0, -0.010049293f0, 0.04247036f0)
 CartesianPoint{Float32}(0.02936739f0, -0.010051437f0, 0.042446814f0)
 CartesianPoint{Float32}(0.029364103f0, -0.010049292f0, 0.04247036f0)
 CartesianPoint{Float32}(0.02934997f0, -0.010030221f0, 0.042460836f0)
 CartesianPoint{Float32}(0.029364105f0, -0.010049293f0, 0.04247036f0)

This is causing the ustrip to fail

fhagemann · 2025-12-07T01:29:59Z

Yes, this is why it's prefered to use to_internal_units on CartesianPoint (or SVector, depending on which it is).
I undid your merge commit and re-merged while resolving the merge conflicts.

fhagemann · 2025-12-07T01:31:52Z

src/ChargeClustering/ChargeClustering.jl

+    @assert is_detector_hits_table(table) "Table does not have the correct format"

    # Collect all time-clustered results
    all_results = []


We should avoid declaring Arrays with unknown element-type, as this will default to Any and not be type-stable.

fhagemann

We're almost there! :)

fhagemann · 2025-12-08T19:00:25Z

src/ChargeClustering/ChargeClustering.jl

+    @assert !isempty(table) "Input table is empty"
+    firstrow = first(table)


We might not even need this, as the check is_detector_hits_table will check whether certain columns exist and that they have the correct type. Can you work with table instead of firstrow (i.e. inferring eltypes from table directly)?

fhagemann · 2025-12-08T19:02:12Z

src/ChargeClustering/ChargeClustering.jl

+    result_columns = (
+        evtno = evtno_col,
+        detno = VectorOfVectors([r.detno for r in all_results]),
+        thit = VectorOfVectors([r.thit for r in all_results]),
+        edep = VectorOfVectors([r.edep for r in all_results]),
+        pos = VectorOfVectors([r.pos for r in all_results])
+    )


How do we deal with events that had more columns initially? Can we do the same thing we did for evtno to just clone the entries if a row was split into multiple rows by the clustering?
Maybe then, ResultType can just be the similar type as the original Table (?)
You can check how we used to do it for the original Table (merging the "old" Table with the new one)

fhagemann · 2025-12-12T17:33:34Z

src/ChargeClustering/ChargeClustering.jl

-    evtno_col = Int[]
-
-    for (evtno, evt) in enumerate(table)
+    @assert is_detector_hits_table(table)


I would like to keep the assertion message that we had before:
"Table does not have the correct format" (or also happy to discuss another message that's more descriptive)

fhagemann · 2025-12-12T17:34:53Z

src/ChargeClustering/ChargeClustering.jl

-        detno = VectorOfVectors([r.detno for r in all_results]),
-        thit = VectorOfVectors([r.thit for r in all_results]),
-        edep = VectorOfVectors([r.edep for r in all_results]),
-        pos = VectorOfVectors([r.pos for r in all_results])


Do we need to convert the new columns into VectorOfVectors or is that automatically done by the similar when defining col_vectors? Do we test somewhere in the tests that the new column types are VectorOfVectors?

sm2909 · 2025-12-22T18:00:43Z

I have added proper splitting of the extra columns that might be in the table.

Thank you for showing patience in me!

fhagemann · 2026-01-07T22:10:30Z

The more I think about it -- we might wanna do it a bit different.

This method should still receive the additional (keyword) argument cluster_time.

SolidStateDetectors.jl/src/ChargeClustering/ChargeClustering.jl

Lines 55 to 68 in 3daf777

    
           function cluster_detector_hits(table::TypedTables.Table, cluster_radius::RealQuantity) 
        
               @assert is_detector_hits_table(table) "Table does not have the correct format" 
        
               clustered_nt = map( 
        
                   evt -> cluster_detector_hits(evt.detno, evt.edep, evt.pos, cluster_radius), 
        
                   table 
        
               ) 
        
               TypedTables.Table(merge( 
        
                   TypedTables.columns(table), 
        
                   map( 
        
                       VectorOfVectors, 
        
                       TypedTables.columns(clustered_nt) 
        
                   ) 
        
               )) 
        
           end

But instead of passing it down to the other cluster_detector_hits, we should do the time-splitting already in this method BEFORE passing the table columns further down to the other cluster_detector_hits.

This would allow us to let the other cluster_detector_hits mostly untouched (no need for adding any time splitting functionality in here), except for the performance improvements (e.g. in unit handling etc.):

SolidStateDetectors.jl/src/ChargeClustering/ChargeClustering.jl

Lines 3 to 52 in 3daf777

    
           function cluster_detector_hits( 
        
                   detno::AbstractVector{<:Integer}, 
        
                   edep::AbstractVector{TT}, 
        
                   pos::AbstractVector{<:Union{<:StaticVector{3,PT}, <:CartesianPoint{PT}}}, 
        
                   cluster_radius::RealQuantity 
        
               ) where {TT <: RealQuantity, T <: Real, PT <: Union{T, <:Unitful.Length{T}}} 
        
               unsorted = TypedTables.Table(detno = detno, edep = edep, pos = pos) 
        
               sorting_idxs = sortperm(unsorted.detno) 
        
               sorted = unsorted[sorting_idxs] 
        
               grouped = TypedTables.Table(consgroupedview(sorted.detno, TypedTables.columns(sorted))) 
        
               r_detno = similar(detno, 0) 
        
               r_edep = similar(edep, 0) 
        
               r_pos = similar(pos, 0) 
        
               ustripped_cradius = _parse_value(float(T), cluster_radius, internal_length_unit) 
        
               for d_hits_nt in grouped 
        
                   d_hits = TypedTables.Table(d_hits_nt) 
        
                   d_detno = first(d_hits.detno) 
        
                   @assert all(isequal(d_detno), d_hits.detno) 
        
                   if length(d_hits) > 3 
        
                       clusters = Clustering.dbscan(hcat((to_internal_units.(getindex.(d_hits.pos,i)) for i in 1:3)...)',  
        
                           ustripped_cradius, leafsize = 20, min_neighbors = 1, min_cluster_size = 1).clusters 
        
                       for c in clusters 
        
                           idxs = vcat(c.boundary_indices, c.core_indices) 
        
                           @assert length(idxs) == c.size 
        
                           c_hits = view(d_hits, idxs) 
        
                           push!(r_detno, d_detno) 
        
                           esum = sum(c_hits.edep) 
        
                           push!(r_edep, esum) 
        
                           if esum ≈ zero(TT) 
        
                               push!(r_pos, barycenter(c_hits.pos)) 
        
                           else 
        
                               weights = ustrip.(Unitful.NoUnits, c_hits.edep .* inv(esum)) 
        
                               push!(r_pos, barycenter(c_hits.pos, StatsBase.Weights(weights))) 
        
                           end 
        
                       end 
        
                   else 
        
                       append!(r_detno, d_hits.detno) 
        
                       append!(r_edep, d_hits.edep) 
        
                       append!(r_pos, d_hits.pos) 
        
                   end 
        
               end 
        
               (detno = r_detno, edep = r_edep, pos = r_pos) 
        
           end

PS: While browsing through old code and cleaning some up, I also found an old leftover of Table utilities in https://github.com/JuliaPhysics/SolidStateDetectors.jl/blob/main/src/MCEventsProcessing/table_utils.jl. Maybe some of the functions there could be helpful in implementing this quickly.

fhagemann mentioned this pull request Nov 7, 2025

Operations with CartesianPoints not supported and causing errors #552

Open

sm2909 force-pushed the fix/issue-#523 branch from 4300699 to d370502 Compare November 8, 2025 17:00

fhagemann reviewed Nov 8, 2025

View reviewed changes

sm2909 force-pushed the fix/issue-#523 branch from d370502 to 0217418 Compare November 10, 2025 15:24

sm2909 changed the title ~~clustering detector hits by time as well using st-dbscan algorithm~~ clustering detector hits by time as well Nov 10, 2025

fhagemann reviewed Nov 11, 2025

View reviewed changes

test/test_geant4.jl Outdated Show resolved Hide resolved

fhagemann reviewed Nov 11, 2025

View reviewed changes

src/ChargeClustering/ChargeClustering.jl Outdated Show resolved Hide resolved

fhagemann added bug Something isn't working enhancement Improvement of existing features convenience Improve user-friendliness labels Nov 11, 2025

sm2909 force-pushed the fix/issue-#523 branch from 0217418 to 6f956a9 Compare November 12, 2025 13:42

sm2909 force-pushed the fix/issue-#523 branch from 6f956a9 to a3d71f4 Compare November 12, 2025 15:05

fhagemann reviewed Nov 24, 2025

View reviewed changes

src/ChargeClustering/ChargeClustering.jl Outdated Show resolved Hide resolved

sm2909 and others added 6 commits December 5, 2025 17:44

cluster detector hits by time as well

5ed5f01

Bugfix to actually use time clusters

5716867

Avoid converting pos to SVector for clustering

2e6dd5c

Add more tests for charge clustering

51428a3

removed time_groups and finalize_group! to save allocations

7b99362

using internal_length_unit for units of position

afa8f54

clusters based on time are split into multiple rows of output table

489ffb0

sm2909 force-pushed the fix/issue-#523 branch from c7d4da5 to 489ffb0 Compare December 5, 2025 18:07

fhagemann reviewed Dec 5, 2025

View reviewed changes

src/ChargeClustering/ChargeClustering.jl Show resolved Hide resolved

Merge 'upstream' into fix/issue-#523

e329c88

fhagemann force-pushed the fix/issue-#523 branch from 8dda4c5 to e329c88 Compare December 7, 2025 01:28

fhagemann reviewed Dec 7, 2025

View reviewed changes

sm2909 force-pushed the fix/issue-#523 branch 3 times, most recently from 3e5cb97 to 723af9d Compare December 8, 2025 17:13

fhagemann reviewed Dec 8, 2025

View reviewed changes

sm2909 force-pushed the fix/issue-#523 branch 2 times, most recently from 9c37f3d to e7cf6d1 Compare December 12, 2025 17:32

fhagemann reviewed Dec 12, 2025

View reviewed changes

sm2909 added 2 commits December 22, 2025 14:03

fix: type instability in cluster_detector_hits output table

2d059a0

support multiple unclustered columns

0a4b746

sm2909 force-pushed the fix/issue-#523 branch from e7cf6d1 to 0a4b746 Compare December 22, 2025 16:54

		@assert !isempty(table) "Input table is empty"
		firstrow = first(table)

clustering detector hits by time as well #559

Are you sure you want to change the base?

clustering detector hits by time as well #559

Conversation

sm2909 commented Nov 6, 2025

Uh oh!

fhagemann commented Nov 6, 2025

Uh oh!

sm2909 commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fhagemann commented Nov 7, 2025

Uh oh!

fhagemann commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sm2909 commented Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sm2909 commented Nov 8, 2025

Uh oh!

fhagemann left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fhagemann commented Nov 8, 2025

Uh oh!

sm2909 commented Nov 9, 2025

Uh oh!

fhagemann commented Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fhagemann commented Nov 9, 2025

Uh oh!

sm2909 commented Nov 9, 2025

Uh oh!

fhagemann commented Nov 9, 2025

Uh oh!

oschulz commented Nov 9, 2025

Uh oh!

oschulz commented Nov 9, 2025

Uh oh!

sm2909 commented Nov 10, 2025

Uh oh!

fhagemann left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sm2909 commented Nov 12, 2025

Uh oh!

fhagemann commented Nov 12, 2025

Uh oh!

Uh oh!

fhagemann commented Nov 24, 2025

Uh oh!

sm2909 commented Dec 5, 2025

Uh oh!

fhagemann commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

sm2909 commented Dec 6, 2025

Uh oh!

fhagemann commented Dec 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sm2909 commented Nov 7, 2025 •

edited

Loading

fhagemann commented Nov 7, 2025 •

edited

Loading

sm2909 commented Nov 8, 2025 •

edited

Loading

fhagemann commented Nov 9, 2025 •

edited

Loading

fhagemann commented Dec 5, 2025 •

edited

Loading

fhagemann commented Dec 7, 2025 •

edited

Loading

fhagemann Dec 8, 2025 •

edited

Loading

fhagemann Dec 12, 2025 •

edited

Loading