`expand_to_match_ds()`: A closer look.¶

In this section we shall take a closer look at the internals of expand_to_match_ds(). This method transforms a DataFrame into a DataArray by performing a series of operations to it.

Recall from its signature that the arguments it takes are:

dimension_matching_col
fill_method
fill_value_col

The transformation occurs essentially with the following code snippet:

return xr.DataArray(
    self.df
    .sort_values(dimension_matching_col)
    .reset_index()
    .rename(columns={'index': fill_value_col}, errors='ignore')
    .set_index(dimension_matching_col, drop=False)
    [fill_value_col]
    .reindex(
        self._ds[self._get_ds_from_df(dimension_matching_col)],
        method=fill_method
    )
)

Continuing with the tutorial, let’s see how the original DataFrame is progressively transformed.

This is the original DataFrame.

import numpy as np
import pandas as pd
import xarray as xr
import xarray_events

ds = xr.Dataset(
    data_vars={
        'ball_trajectory': (
            ['frame', 'cartesian_coords'],
            np.exp(np.linspace((-6, -8), (3, 2), 2450))
        )
    },
    coords={
        'frame': np.arange(1, 2451),
        'cartesian_coords': ['x', 'y'],
        'player_id': [2, 3, 7, 19, 20, 21, 22, 28, 34, 79]
    },
    attrs={'match_id': 12, 'resolution_fps': 25}
)

events = pd.DataFrame(
    {
        'event_type':
            ['pass', 'goal', 'pass', 'pass', 'pass',
             'penalty', 'goal', 'pass', 'pass', 'penalty'],
        'start_frame': [1, 425, 600, 945, 1100, 1280, 1890, 2020, 2300, 2390],
        'end_frame': [424, 599, 944, 1099, 1279, 1889, 2019, 2299, 2389, 2450],
        'player_id': [79, 79, 19, 2, 3, 2, 3, 79, 2, 79]
    }
)

(
    ds
    .events.load(events, {'frame': ('start_frame', 'end_frame')})
    .events.expand_to_match_ds('start_frame')
)

ds.events.df

	event_type	start_frame	end_frame	player_id
0	pass	1	424	79
1	goal	425	599	79
2	pass	600	944	19
3	pass	945	1099	2
4	pass	1100	1279	3
5	penalty	1280	1889	2
6	goal	1890	2019	3
7	pass	2020	2299	79
8	pass	2300	2389	2
9	penalty	2390	2450	79

The DataFrame gets sorted on the column dimension_matching_col, which is start_frame in this case.
```
.sort_values(dimension_matching_col)
```

It is already sorted, so nothing changes.

The index of the DataFrame gets reset.
```
.reset_index()
```

(
    ds.events.df
    .sort_values('start_frame')
    .reset_index()
)

	index	event_type	start_frame	end_frame	player_id
0	0	pass	1	424	79
1	1	goal	425	599	79
2	2	pass	600	944	19
3	3	pass	945	1099	2
4	4	pass	1100	1279	3
5	5	penalty	1280	1889	2
6	6	goal	1890	2019	3
7	7	pass	2020	2299	79
8	8	pass	2300	2389	2
9	9	penalty	2390	2450	79

Now index is a column of its own.

The column index gets renamed to fill_value_col, which is event_index in this case:
```
.rename(columns={'index': fill_value_col}, errors='ignore')
```

(
    ds.events.df
    .sort_values('start_frame')
    .reset_index()
    .rename(columns={'index': 'event_index'}, errors='ignore')
)

	event_index	event_type	start_frame	end_frame	player_id
0	0	pass	1	424	79
1	1	goal	425	599	79
2	2	pass	600	944	19
3	3	pass	945	1099	2
4	4	pass	1100	1279	3
5	5	penalty	1280	1889	2
6	6	goal	1890	2019	3
7	7	pass	2020	2299	79
8	8	pass	2300	2389	2
9	9	penalty	2390	2450	79

The column dimension_matching_col is set as the new index of the DataFrame:
```
.set_index(dimension_matching_col, drop=False)
```

(
    ds.events.df
    .sort_values('start_frame')
    .reset_index()
    .rename(columns={'index': 'event_index'}, errors='ignore')
    .set_index('start_frame', drop=False)
)

	event_index	event_type	start_frame	end_frame	player_id
start_frame
1	0	pass	1	424	79
425	1	goal	425	599	79
600	2	pass	600	944	19
945	3	pass	945	1099	2
1100	4	pass	1100	1279	3
1280	5	penalty	1280	1889	2
1890	6	goal	1890	2019	3
2020	7	pass	2020	2299	79
2300	8	pass	2300	2389	2
2390	9	penalty	2390	2450	79

All columns of the DataFrame except for fill_value_col, which is event_index in this case, and the index are dropped.
```
[fill_value_col]
```

(
    ds.events.df
    .sort_values('start_frame')
    .reset_index()
    .rename(columns={'index': 'event_index'}, errors='ignore')
    .set_index('start_frame', drop=False)
    ['event_index']
)

start_frame
     0
   1
   2
   3
  4
  5
  6
  7
  8
  9
Name: event_index, dtype: int64

The DataFrame is now reindexed to the Dataset coordinate or dimension that matches dimension_matching_col, which is frame in this case. Notice that there’s no fill method.
```
.reindex(
    self._ds[ds.events._get_ds_from_df(dimension_matching_col)],
    method=fill_method
)
```

(
    ds.events.df
    .sort_values('start_frame')
    .reset_index()
    .rename(columns={'index': 'event_index'}, errors='ignore')
    .set_index('start_frame', drop=False)
    ['event_index']
    .reindex(
        ds.events._ds[ds.events._get_ds_from_df('start_frame')]
    )
)

frame
     0.0
     NaN
     NaN
     NaN
     NaN
       ... 
  NaN
  NaN
  NaN
  NaN
  NaN
Name: event_index, Length: 2450, dtype: float64

The DataFrame is finally converted into a DataArray.
```
return xr.DataArray(
    ...
)
```

xr.DataArray(
    ds.events.df
    .sort_values('start_frame')
    .reset_index()
    .rename(columns={'index': 'event_index'}, errors='ignore')
    .set_index('start_frame', drop=False)
    ['event_index']
    .reindex(
        ds.events._ds[ds.events._get_ds_from_df('start_frame')]
    )
)

<xarray.DataArray 'event_index' (frame: 2450)>
array([ 0., nan, nan, ..., nan, nan, nan])
Coordinates:
  * frame    (frame) int64 1 2 3 4 5 6 7 ... 2444 2445 2446 2447 2448 2449 2450

This DataArray is useful on its own because it allows us to see which values of the Dataset coordinate or dimension match with unique events. It is also used to group the Dataset in groupby_events().

`expand_to_match_ds()`: A closer look.¶

xarray-events

Navigation

Related Topics

expand_to_match_ds(): A closer look.¶

`expand_to_match_ds()`: A closer look.¶