Skip to content

Conversation

@WoutPaepenUcLL
Copy link

Pull Request: Add SecondSpectrum Event Data Support

Description

This PR enhances kloppy to support loading and processing event data from SecondSpectrum. It adds new event types, improves the metadata handling in the deserializer, and includes comprehensive tests for event deserialization.
These changes are made with documentation provided by secondspectrum.

Changes

  • Added support for loading event data from SecondSpectrum format
  • Fixed metadata handling in SecondSpectrumDeserializer
  • Added DeflectionEvent and DeflectionResult classes
  • Updated the deserializer to properly handle game_id references
  • Improved handling of fps key in metadata (fallback to default of 25.0 when absent)
  • Added comprehensive tests for event deserialization

Implementation Details

  • Enhanced secondspectrum.py to handle the current metadata.json format
  • Added event data loading functionality with proper deserialization of various event types
  • Fixed nested metadata structure handling
  • Updated imports to reflect new event classes and functionality
  • Created sample event data (secondspectrum_fake_eventdata.jsonl) for testing

Testing

Added comprehensive tests to verify:

  • Event data deserialization works correctly for different event types
  • Metadata handling functions properly with newer file formats
  • FPS key handling falls back gracefully when the key is missing

Future Work

  • the deserialiser is far from perfect and test coverage is really bad because i do not have the events in my event files to cover all possible events.
  • If there are parts that are not correct, feel free to change it ;)
  • Tips and better optimizations are always welcome.

This PR builds upon several smaller fixes and enhancements that were previously merged, consolidating them into a comprehensive solution for SecondSpectrum event data handling.

WoutPaepenUcLL and others added 17 commits February 21, 2025 14:26
Update the `secondspectrum.load` function to handle the absence of the 'fps' key in newer files.

* Modify `kloppy/infra/serializers/tracking/secondspectrum.py` to check for the presence of the 'fps' key in the metadata and use a default frame rate of 25.0 if the 'fps' key is absent.
* Add a test case in `kloppy/tests/test_secondspectrum.py` to verify that the `secondspectrum.load` function handles the absence of the 'fps' key correctly.

---

For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/WoutPaepenUcLL/kloppy?shareId=XXXX-XXXX-XXXX-XXXX).
Fix fps key handling in secondspectrum.load function
pitchLength and width are nested in "data"
Update secondspectrum load function to support current metadata.json format
metadata nested error fix
Add event data loading and deflection event classes
@UnravelSports
Copy link
Contributor

Hi Wout,

Really cool PR! I haven't looked into it in-depth, but a couple quick things:

  • It looks like you've created a new tests folder outside the kloppy folder.
  • Any chance you could take this list of Events and list the ones that are suppored, unsupported (the ones you know exist, but have not inplemented and the ones you might know that are simply not available in SecondSpectrum event data). From looking at the test files it seems there are also receival events which should be listed too along side any other supported events not listed.

@WoutPaepenUcLL
Copy link
Author

WoutPaepenUcLL commented Mar 11, 2025

Event Support Status in SecondSpectrum Deserializer

Here is a table with all events and if it is supported by the deserialiser or not:

Event Type Status
generic Not Supported
pass Supported and Tested
shot Supported and Tested
take_on Supported and not Tested
carry Not Supported
clearance Supported and Tested
interception Supported and Not Tested
duel Supported and not Tested
substitution Supported and Tested
card Supported and not Tested
player_on Not Supported (part of substitution)
player_off Not Supported (part of substitution)
recovery Not Supported
miscontrol Not Supported
ball_out Supported and Tested
foul_committed Supported and Tested
goalkeeper Supported and Tested
pressure Not Supported
formation_change Not Supported

@UnravelSports
Copy link
Contributor

@WoutPaepenUcLL it looks like you have to run pip install black[jupyter] and then run black on some files to align formatting.

Also, you did not include "receival" in the above table, is that correct?

@WoutPaepenUcLL
Copy link
Author

WoutPaepenUcLL commented Mar 13, 2025

i did not see the "receival" type in the event_types. But it is not yet implemented. I think i implemented it wrong by looking if a pass is complete or not, so it just skips the event in the deserialiser when it is a "receival".

@UnravelSports
Copy link
Contributor

Ah, no that makes sense. Apologies for the confusion! I noticed the reception event in your tests files and figured it would be included. Hence why I asked.

@WoutPaepenUcLL
Copy link
Author

Are there anymore things i need to change?

@UnravelSports
Copy link
Contributor

@WoutPaepenUcLL I'm trying to source some SecondSpectrum event data so that I can further and more accurately review the PR.

Hopefully I'll have an update soon.

@UnravelSports
Copy link
Contributor

UnravelSports commented Apr 24, 2025

Hi @WoutPaepenUcLL thanks for your patience, I've finally sourced some SecondSpectrum data. Here is a list of feedback. I'm mostly following our new (yet to be released) contribution guide on Adding a New Event Data Provider.


Loading

  • Rename load_event_data to load_event to be consistent with other providers
  • Include event_factory and event_types in load_event
def load_event(
    event_data: FileLike,
    meta_data: FileLike,
    event_types: Optional[List[str]] = None,
    coordinates: Optional[str] = None,
    event_factory: Optional[EventFactory] = None,
) -> EventDataset:
  • Create load_tracking to deprecate load and add @deprecated("secondspectrum.load_tracking should be used") to the latter.
  • It looks like we have "dangling" additional_meta_data inside SecondSpectrumEventDataInputs that we simply set to None in the load_event_data function. If we don't use it, we should probably remove it.
  • Update the below accordingly:
from ._providers.secondspectrum import load, load_event, load_tracking

__all__ = ["load", "load_event", "load_tracking"]

Parsing

  • What is legacy_meta and why are we optionally setting frame_rate to 1000.0?
  • I think start_timestamp and end_timestamp are set incorrectly. Event data probably doesn't have a frame_rate or fps either. We're currently doing start_frame_id = int(period["startFrameClock"]) and then dividing that by frame_rate, but startFrameClock seems to be a unix timestamp. So we can probably set those more easily. Note: Period start times are always timedelta of 0, so end time would be timedelta of endFrameClock - startFrameClock.
  • It seems indeed like we're actively skipping "reception" events, which is somewhat fine (more on this below). PassEvent objects have a receive_timestamp, receiver_player and receiver_coordinates attributes which we are currently not taken from the reception events, but are inferred from the pass events. I think we can grab the relevant reception events for pass events by doing this:
with performance_logging("parse events", logger=logger):
            parsed_events = []

            items_list = list(raw_events.items())
            for i, (event_id, raw_event) in enumerate(iter(raw_events.items())):
                if raw_event["type"] == "reception":
                    continue
                
                if raw_event["type"] == "pass":
                    if i + 1 < len(items_list):
                        next_id, next_event = items_list[i + 1]
                        if next_event["type"] == "reception":
                            reception_event = next_event
                        else:
                            reception_event = None

Then, we could pass reception_event into self._parse_event() and into self._parse_pass() as next_event and grab the actual receiver_player, receiver_coordinates and receive_timestamp. Maybe we can check if the receiver_id of the pass event and the reception event is the same before doing so. Note: I'm not sure this 100% covers all cases.

  • It looks like we're never using the next_event option in _parse_event() even though there is functionality there to look for "offside" etc. This should be fixed alongside the above reception option. We might also need to update the "receiver" options for "out" and "offside" and take them from "next_event" (not sure here).
  • For setting the starting_position we'll need to use a position_mapping. I think we can simply use the one from Tracking Deserializer. Remember to implement it as:
position_mapping.get(
    player_data["position"],
    PositionType.Unknown,
)
  • Maybe we need to move this position mapping away from the tracking data file it's currently in.
  • There is a dangling passDeflected = False in line 562
  • Timestamp in base_kwargs should be timestamp=raw_event.timestamp - period.start_timestamp not the raw UTC timestamp.
  • Why are we setting base_kwargs player as the next() player?
  • base_kwargs "coordinates" should be a Point() not the raw list of coordinates. (You might be setting that though, in load_data)
  • The above also goes for other events, for example "result_coordinates": raw_event.get("goalmouthLocation") in line 621. And anywhere we have "location" as raw data instead of Point.
  • Not sure why for some stuff (like "location") we are overwriting the raw_event value to a Kloppy object like Point but then for other things like "period" or "teams" we seem to be leaving the actual raw events.
  • event_type == "foul" > the build_foul_committed does not get the event_kwargs passed, and result=None but we set the result in the event_kwargs that we don't end up using.
  • Are we missing OWN GOAL?
  • It looks like we assert there is an "interception" event in the data, however to me it looks like "reception" events can have "interception: true" instead. So, we should probably still process these "reception" events and then create an Interception event from them.
  • In the data I have I don't see "take_on", "duel" or "ball_recovery" events.

General

  • There is a large commented section left in the deserializer.py at like 850
  • Run black on all files to align formatting

Documentation

Eventhough the docs are not updated yet, we should probably attempt to update event-spec.yml to align with the table you created above, after you've updated the PR. In this yml file we differentiate between "parsed", "not supported" (by the data provider), "not implemented" (in kloppy), "unknown" and "inferred".

@koenvo koenvo added this to the 3.19.0 milestone May 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants