:meth:`expand_to_match_ds`: A closer look. ****************************************** In this section we shall take a closer look at the internals of :meth:`expand_to_match_ds`. This method transforms a :obj:`DataFrame` into a :obj:`DataArray` by performing a series of operations to it. Recall from :doc:`its signature <../api_reference/expand_to_match_ds>` that the arguments it takes are: - :attr:`dimension_matching_col` - :attr:`fill_method` - :attr:`fill_value_col` The transformation occurs essentially with the following code snippet: :: return xr.DataArray( self.df .sort_values(dimension_matching_col) .reset_index() .rename(columns={'index': fill_value_col}, errors='ignore') .set_index(dimension_matching_col, drop=False) [fill_value_col] .reindex( self._ds[self._get_ds_from_df(dimension_matching_col)], method=fill_method ) ) Continuing with the :doc:`tutorial <../tutorials/sports_data/expand_to_match_ds>`, let's see how the original :obj:`DataFrame` is progressively transformed. 0. This is the original :obj:`DataFrame`. .. jupyter-execute:: ../tutorials/sports_data/raw_data.py :hide-code: .. jupyter-execute:: :hide-code: :hide-output: ( ds .events.load(events, {'frame': ('start_frame', 'end_frame')}) .events.expand_to_match_ds('start_frame') ) .. jupyter-execute:: ds.events.df 1. The :obj:`DataFrame` gets sorted on the column :attr:`dimension_matching_col`, which is :attr:`start_frame` in this case. :: .sort_values(dimension_matching_col) It is already sorted, so nothing changes. 2. The index of the :obj:`DataFrame` gets reset. :: .reset_index() .. jupyter-execute:: :hide-code: ( ds.events.df .sort_values('start_frame') .reset_index() ) Now **index** is a column of its own. 3. The column **index** gets renamed to :attr:`fill_value_col`, which is :attr:`event_index` in this case: :: .rename(columns={'index': fill_value_col}, errors='ignore') .. jupyter-execute:: :hide-code: ( ds.events.df .sort_values('start_frame') .reset_index() .rename(columns={'index': 'event_index'}, errors='ignore') ) 4. The column :attr:`dimension_matching_col` is set as the new index of the :obj:`DataFrame`: :: .set_index(dimension_matching_col, drop=False) .. jupyter-execute:: :hide-code: ( ds.events.df .sort_values('start_frame') .reset_index() .rename(columns={'index': 'event_index'}, errors='ignore') .set_index('start_frame', drop=False) ) 5. All columns of the :obj:`DataFrame` except for :attr:`fill_value_col`, which is :attr:`event_index` in this case, and the index are dropped. :: [fill_value_col] .. jupyter-execute:: :hide-code: ( ds.events.df .sort_values('start_frame') .reset_index() .rename(columns={'index': 'event_index'}, errors='ignore') .set_index('start_frame', drop=False) ['event_index'] ) 6. The :obj:`DataFrame` is now reindexed to the :obj:`Dataset` coordinate or dimension that matches :attr:`dimension_matching_col`, which is :attr:`frame` in this case. Notice that there's **no fill method**. :: .reindex( self._ds[ds.events._get_ds_from_df(dimension_matching_col)], method=fill_method ) .. jupyter-execute:: :hide-code: ( ds.events.df .sort_values('start_frame') .reset_index() .rename(columns={'index': 'event_index'}, errors='ignore') .set_index('start_frame', drop=False) ['event_index'] .reindex( ds.events._ds[ds.events._get_ds_from_df('start_frame')] ) ) 7. The :obj:`DataFrame` is finally converted into a :obj:`DataArray`. :: return xr.DataArray( ... ) .. jupyter-execute:: :hide-code: xr.DataArray( ds.events.df .sort_values('start_frame') .reset_index() .rename(columns={'index': 'event_index'}, errors='ignore') .set_index('start_frame', drop=False) ['event_index'] .reindex( ds.events._ds[ds.events._get_ds_from_df('start_frame')] ) ) This :obj:`DataArray` is useful on its own because it allows us to see which values of the :obj:`Dataset` coordinate or dimension match with unique events. It is also used to group the :obj:`Dataset` in :meth:`groupby_events`.