Constraints that match everything

Let’s now see how we put together all of the functionality of our accessor to make useful (though complex) queries.

Given that now sel() considers two different search spaces (i.e. the events DataFrame and the Dataset), we can make the search be so complex that it searches in both spaces. This is a powerful feature of our accessor.

Say we may wish to perform a selection with the following specification:

  • An event of type pass.

  • The frames are within 1728 and 2378.

Moreover, we want the result to be consistent across the Dataset and the events DataFrame. In that case, we can achieve this like this:

import numpy as np
import pandas as pd
import xarray as xr
import xarray_events

ds = xr.Dataset(
    data_vars={
        'ball_trajectory': (
            ['frame', 'cartesian_coords'],
            np.exp(np.linspace((-6, -8), (3, 2), 2450))
        )
    },
    coords={
        'frame': np.arange(1, 2451),
        'cartesian_coords': ['x', 'y'],
        'player_id': [2, 3, 7, 19, 20, 21, 22, 28, 34, 79]
    },
    attrs={'match_id': 12, 'resolution_fps': 25}
)

events = pd.DataFrame(
    {
        'event_type':
            ['pass', 'goal', 'pass', 'pass', 'pass',
             'penalty', 'goal', 'pass', 'pass', 'penalty'],
        'start_frame': [1, 425, 600, 945, 1100, 1280, 1890, 2020, 2300, 2390],
        'end_frame': [424, 599, 944, 1099, 1279, 1889, 2019, 2299, 2389, 2450],
        'player_id': [79, 79, 19, 2, 3, 2, 3, 79, 2, 79]
    }
)
(
    ds
    .events.load(events, {'frame': ('start_frame', 'end_frame')})
    .events.sel(
        {
            'frame': range(1729, 2378),
            'start_frame': lambda frame: frame > 1728,
            'end_frame': lambda frame: frame < 2378,
            'event_type': 'pass'
        }
    )
)
<xarray.Dataset>
Dimensions:           (cartesian_coords: 2, frame: 649, player_id: 10)
Coordinates:
  * frame             (frame) int64 1729 1730 1731 1732 ... 2374 2375 2376 2377
  * cartesian_coords  (cartesian_coords) <U1 'x' 'y'
  * player_id         (player_id) int64 2 3 7 19 20 21 22 28 34 79
Data variables:
    ball_trajectory   (frame, cartesian_coords) float64 1.42 0.389 ... 5.484
Attributes:
    match_id:        12
    resolution_fps:  25
    _events:           event_type  start_frame  end_frame  player_id\n7      ...
    _ds_df_mapping:  {'frame': ('start_frame', 'end_frame')}

Internally, sel() filters the events DataFrame and also the Dataset, each with its corresponding attributes.

The resulting DataFrame looks like this:

ds.events.df
event_type start_frame end_frame player_id
7 pass 2020 2299 79

We want to emphasize how we give the user the power to do things exactly as they want them since the constraints have to be properly specified for both the Dataset and also the DataFrame. sel() does not assume that they may want to select both or anything like that. It all must be specified. This provides great flexibility.