Constraints that match everything¶
Let’s now see how we put together all of the functionality of our accessor to make useful (though complex) queries.
Given that now sel()
considers two different search spaces (i.e. the
events DataFrame
and the Dataset
), we can make the search be so
complex that it searches in both spaces. This is a powerful feature of our
accessor.
Say we may wish to perform a selection with the following specification:
An event of type pass.
The frames are within 1728 and 2378.
Moreover, we want the result to be consistent across the Dataset
and the
events DataFrame
. In that case, we can achieve this like this:
import numpy as np
import pandas as pd
import xarray as xr
import xarray_events
ds = xr.Dataset(
data_vars={
'ball_trajectory': (
['frame', 'cartesian_coords'],
np.exp(np.linspace((-6, -8), (3, 2), 2450))
)
},
coords={
'frame': np.arange(1, 2451),
'cartesian_coords': ['x', 'y'],
'player_id': [2, 3, 7, 19, 20, 21, 22, 28, 34, 79]
},
attrs={'match_id': 12, 'resolution_fps': 25}
)
events = pd.DataFrame(
{
'event_type':
['pass', 'goal', 'pass', 'pass', 'pass',
'penalty', 'goal', 'pass', 'pass', 'penalty'],
'start_frame': [1, 425, 600, 945, 1100, 1280, 1890, 2020, 2300, 2390],
'end_frame': [424, 599, 944, 1099, 1279, 1889, 2019, 2299, 2389, 2450],
'player_id': [79, 79, 19, 2, 3, 2, 3, 79, 2, 79]
}
)
(
ds
.events.load(events, {'frame': ('start_frame', 'end_frame')})
.events.sel(
{
'frame': range(1729, 2378),
'start_frame': lambda frame: frame > 1728,
'end_frame': lambda frame: frame < 2378,
'event_type': 'pass'
}
)
)
<xarray.Dataset> Dimensions: (cartesian_coords: 2, frame: 649, player_id: 10) Coordinates: * frame (frame) int64 1729 1730 1731 1732 ... 2374 2375 2376 2377 * cartesian_coords (cartesian_coords) <U1 'x' 'y' * player_id (player_id) int64 2 3 7 19 20 21 22 28 34 79 Data variables: ball_trajectory (frame, cartesian_coords) float64 1.42 0.389 ... 5.484 Attributes: match_id: 12 resolution_fps: 25 _events: event_type start_frame end_frame player_id\n7 ... _ds_df_mapping: {'frame': ('start_frame', 'end_frame')}
- cartesian_coords: 2
- frame: 649
- player_id: 10
- frame(frame)int641729 1730 1731 ... 2375 2376 2377
array([1729, 1730, 1731, ..., 2375, 2376, 2377])
- cartesian_coords(cartesian_coords)<U1'x' 'y'
array(['x', 'y'], dtype='<U1')
- player_id(player_id)int642 3 7 19 20 21 22 28 34 79
array([ 2, 3, 7, 19, 20, 21, 22, 28, 34, 79])
- ball_trajectory(frame, cartesian_coords)float641.42 0.389 1.425 ... 15.36 5.484
array([[ 1.41956016, 0.38904557], [ 1.4247866 , 0.39063741], [ 1.43003228, 0.39223576], ..., [15.24691667, 5.43985579], [15.3030517 , 5.46211377], [15.35939341, 5.48446281]])
- match_id :
- 12
- resolution_fps :
- 25
- _events :
- event_type start_frame end_frame player_id 7 pass 2020 2299 79
- _ds_df_mapping :
- {'frame': ('start_frame', 'end_frame')}
Internally, sel()
filters the events DataFrame
and also the
Dataset
, each with its corresponding attributes.
The resulting DataFrame
looks like this:
ds.events.df
event_type | start_frame | end_frame | player_id | |
---|---|---|---|---|
7 | pass | 2020 | 2299 | 79 |
We want to emphasize how we give the user the power to do things exactly as they
want them since the constraints have to be properly specified for both the
Dataset
and also the DataFrame
. sel()
does not assume that
they may want to select both or anything like that. It all must be specified.
This provides great flexibility.