linref.events package
Submodules
linref.events.collection module
Module featuring EventsCollection and EventsGroup object classes for the management of linear referencing events data and optimized performance of various events operations including dissolves, automated intersections and attribute retrievals, linear overlays, and more.
EventsCollection class instances represent complex events data sets with multiple groups of events which are distinguished by at least one set of keys (e.g., years of data or inventory categories). These collections can be used for a variety of linear referencing operations and events manipulations, such as dissolves based on a subset of events columns, returning a simplified data set with a selection of columns aggregated. Additionally, these collections can be used to perform automated merges and intersections with other EventsCollection class instances using the .merge() method, retrieving column data from another collection and relating it to the original collection’s events data.
EventsGroup class instances represent simple events data sets with a single group of contiguous events. These groups can be used for a variety of linear referencing operations such as overlays to determine portions of events overlapped by an input range, intersections to determine which events intersect with an input range, length-weighted averages of event column values based on an input range, and more.
EventsCollection class instances can be queried using square bracket indexing or the .get_subset() and .get_group() methods, returning a pared down EventsCollection or a specific EventsGroup, respectively. Similarly, this can be done using object indexing, passing a mixture of unique values and valid slices of unique key values to return a subset of the collection as an EventsCollection instance, or just unique key values to return a unique group as an EventsGroup instance.
Classes
EventsCollection, EventsGroup
Dependencies
pandas, geopandas, numpy, shapely, copy, warnings, rangel
Examples
Create an events collection for a sample roadway events dataframe with unique route identifier represented by the ‘Route’ column and data for multiple years, represented by the ‘Year’ column. The begin and end mile points are defined by the ‘Begin’ and ‘End’ columns. >>> ec = EventsCollection(df, keys=[‘Route’,’Year’], beg=’Begin’, end=’End’)
To select events from a specific route and a specific year, indexing for all keys can be used, producing an EventsGroup. >>> eg = ec[‘Route 50’, 2018]
To select events on all routes but only those from a specific year, indexing for only some keys can be used. >>> ec_2018 = ec[:, 2018]
To retrieve information from one events collection and apply it to the events of the other. >>> ec.merge()
To get all events which intersect with a numeric range, the intersecting() method can be used on an EventsGroup instance. >>> df_intersecting = eg.intersecting(0.5, 1.5, closed=’left_mod’)
The intersecting() method can also be used for point locations by ommitting the second location attribute. >>> df_intersecting = eg.intersecting(0.75, closed=’both’)
The linearly weighted average of one or more attributes can be obtained using the overlay_average() method. >>> df_overlay = eg.overlay_average(0.5, 1.5, cols=[‘Speed_Limit’,’Volume’])
If the events include information on the roadway speed limit and number of lanes, they can be dissolved on these attributes. During the dissolve, other attributes can be aggregated, providing a list of associated values or performing an aggregation function over these values. >>> ec_dissolved = ec.dissolve(attr=[‘Speed_Limit’,’Lanes’], aggs=[‘County’])
Development
Developed by: Tariq Shihadah, tariq.shihadah@gmail.com
Created: 10/22/2019
Modified: 3/3/2021
- class linref.events.collection.EventsCollection(df, keys=None, beg=None, end=None, geom=None, closed=None, sort=False, missing_data='warn', **kwargs)[source]
Bases:
EventsFrameUser-level class for managing linear and points events data. This class is used for complex data sets with multiple groups of events, grouped by at least one key column (e.g., route ID). Data is managed using both the pandas tabular data package as well as the ranges range data package.
EventsCollection class instances represent complex events data sets with multiple groups of events which are distinguished by at least one set of keys (e.g., years of data or inventory categories). These collections can be used for a variety of linear referencing operations and events manipulations, such as dissolves based on a subset of events columns, returning a simplified data set with a selection of columns aggregated. Additionally, these collections can be used to perform automated intersections with another EventsCollection class instance using the retrieve() method, retrieving column data from another collection and relating it to the original collection’s events data.
EventsCollection class instances can be queried using the get_subset() and get_group() methods, returning a pared down EventsCollection or a specific EventsGroup, respectively. Similarly, this can be done using object indexing, passing a mixture of unique values and valid slices of unique key values to return a subset of the collection as an EventsCollection instance, or just unique key values to return a unique group as an EventsGroup instance.
Parameters
- dfpd.DataFrame
Pandas dataframe which contains linear or point events data.
- keyslist or tuple
A list or tuple of dataframe column labels which define the unique groups of events within the events dataframe. Common examples include year or route ID columns which distinguish unrelated sets of events within the events dataframe.
- beg, endstr or label
Column labels within the events dataframe which represent the linearly referenced location of each event. For linear events both are required, defining the begin and end location of each event. For point events, only ‘beg’ is required, defining the exact location of each event (the ‘end’ property will automatically be set to be equal to the ‘beg’ property).
- geomstr or label, optional
Column label within the events dataframe which represents the shapely geometry associated with each event if available. If provided, certain additional class functionalities will be made available.
- closedstr {‘left’, ‘left_mod’, ‘right’, ‘right_mod’, ‘both’,
‘neither’}, optional
Whether intervals are closed on the left-side, right-side, both or neither. If None, will default to ‘left_mod’ for linear events and ‘both’ for point events.
- leftranges are always closed on the left and never closed on the
right.
- left_modranges are always closed on the left and only closed on the
right when the next range is not consecutive.
- rightranges are always closed on the right and never closed on the
right.
- right_modranges are always closed on the right and only closed on
the left when the previous range is not consecutive.
both : ranges are always closed on both sides neither : ranges are never closed on either side
- sortbool, default False
Whether to sort the events dataframe by its keys and begin and end values upon its creation.
- missing_data{‘ignore’,’drop’,’warn’,’raise’}, default ‘warn’
What to do when the input dataframe contains missing values in the target key, beg, and end columns.
ignore : do nothing. drop : drop all records which contain any missing data in the target
columns.
warn : log a warning when records are missing data. raise : raise a ValueError when records are missing data.
- from_similar(df, **kwargs)[source]
Create an EventsCollection from the input dataframe, assuming the same column labels and closed parameter as the calling collection. Additional constructor keyword arguments can be passed through **kwargs.
Parameters
- dfpd.DataFrame
Pandas dataframe which contains linear or point events data, formatted with standard labels. If multiple keys are detected, they will be assigned in the order in which they appear within the target dataframe. Only one of each begin and end option may be used. The geometry label is optional.
- **kwargs
Additional keyword arguments to be passed to the EventsCollection constructor.
- classmethod from_standard(df, require_end=False, **kwargs)[source]
Create an EventsCollection from the input dataframe assuming standard column labels. These standard labels can be modified on the class directly be modifying the associated class attributes: - default_keys - default_beg - default_end - default_geom
Standard labels include: keys : ‘RID’, ‘YEAR’, ‘KEY’ beg : ‘BMP’, ‘BEG’, ‘FROM’ end : ‘EMP’, ‘END’, ‘TO’ geom : ‘geometry’
Additional constructor keyword arguments can be passed through **kwargs.
Parameters
- dfpd.DataFrame
Pandas dataframe which contains linear or point events data, formatted with standard labels. If multiple keys are detected, they will be assigned in the order in which they appear within the target dataframe. Only one of each begin and end option may be used. The geometry label is optional.
- require_endbool, default False
Whether to raise an error if no valid unique end column label is found. If False, no end label will be used when generating the collection.
- **kwargs
Additional keyword arguments to be passed to the EventsCollection constructor.
- get_group(keys, empty=True, log_empty=True, **kwargs) EventsGroup[source]
Retrieve a unique group of events based on provided key values.
Parameters
- keyskey value, tuple of key values, or list of the same
If only one key column is defined within the collection, a single column value may be provided. Otherwise, a tuple of column values must be provided in the same order as they appear in self.keys.
- emptybool, default True
Whether to allow for empty events groups to be returned when the provided keys are valid but are not associated with any actual events. If False, these cases will return a KeyError.
- log_emptybool, default True
Whether created empty events should be logged and stored within the collection to allow for quicker access. More memory intensive but may produce moderate performance improvements if empty keys will be accessed repeatedly.
- get_matching(other, **kwargs)[source]
Retrieve a subset of the events collection based on the unique group values present in another provided events collection.
Parameters
- otherEventsCollection
Another events collection with matching keys which will be used to select a subset of this events collection based on its key values.
- get_subset(keys, reduce=True, **kwargs)[source]
Retrieve a subset of the events collection based on the provided key values or slices. Returned events must satisfy all keys.
Parameters
- keyslist or tuple of slice, list, or other
A list of either (1) slices which can be used to slice the key values present in self.key_values for the associated key, (2) a list of values which reflect those in self.key_values, or (3) a single value which is present in self.key_values. Inputs must be provided in the same order as they appear in self.keys.
- reducebool, default True
Whether to simplify the resulting EventsCollection by removing any keys which are queried for a single value and become obsolete.
For example, if one key represents years of data and a single year is provided, that key will be removed from the resulting collection as it can no longer be queried further.
- property log
- merge(other)[source]
Create an EventsMerge instance with this collection as the left and the other collection as the right. This can then be used to retrieve attributes from the other collection to be appended to this collection’s dataframe.
Parameters
- otherEventsCollection
Another events collection with similar keys which will be merged with this events collection, producing an EventsMerge instance which can be used to perform various overlay operations to retrieve attributes and more from the target collection.
- project_parallel(other, samples=3, buffer=100, match='all', choose=1, sort_locs=True, **kwargs)[source]
Project an input geodataframe of linear geometries onto parallel events in the events dataframe, producing linearly referenced locations for all input geometries which are found to be parallel based on buffer and sampling parameters.
Parameters
- othergpd.GeoDataFrame
Geodataframe containing linear geometry which will be projected onto the events dataframe.
- samplesint, default 3
The number of equidistant sample points to take along each geometry being projected to check for nearby geometry.
- bufferfloat, default 100
The max distance to search for input geometries to project against the events’ geometries. Measured in terms of the geometries’ coordinate reference system.
- match{‘all’, int}, default ‘all’
How many sample points must find a nearby target event to produce a positive match to that event, resulting in a projection.
- choose{int, ‘all’}, default 1
How many target geometries to choose when more than one match occurs.
- sort_locsbool, default True
Whether begin and end location values should be sorted, ensuring that all events are increasing and monotonic.
- **kwargs
Keyword arguments to be passed to the EventsCollection constructor upon completion of the projection.
- class linref.events.collection.EventsFrame(df, keys=None, beg=None, end=None, geom=None, route=None, closed=None, sort=False, **kwargs)[source]
Bases:
objectHigh-level class for managing linear events data. Users should instead use the EventsCollection class for complex data sets with multiple groups of events, grouped by at least one key column (e.g., route ID), or the EventsGroup class for simple data sets with only a single group of events.
- property beg
- property beg_loc
- property begs
- build_routes(label='route', errors='raise')[source]
Build MLSRoute instances for each event based on available geometry and begin and end locations.
Parameters
- labelvalid pandas column label
Column label to use for newly generated column populated with routes data.
- errors{‘raise’,’ignore’}
How to address errors if they arise when producing routes. If errors are not raised, inviable records in the new column will be filled with np.nan.
- cast_gdf(inplace=False, **kwargs)[source]
Convert the events dataframe to a geodataframe, passing the input keyword arguments, such as crs and geometry, to the gpd.GeoDataFrame constructor. See documentation for this constructor for more information.
- property closed
Collection parameter for whether event intervals are closed on the left-side, right-side, both or neither.
- property columns
A list of all columns within the events dataframe.
- copy(deep=False)[source]
Create an exact copy of the events class instance.
Parameters
- deepbool, default False
Whether the created copy should be a deep copy.
- default_beg = ['BMP', 'BEG', 'FROM', 'LOC']
- default_end = ['EMP', 'END', 'TO']
- default_geom = ['geometry']
- default_keys = ['RID', 'YEAR', 'KEY']
- property df
The collection’s events dataframe.
- dissolve(attr=None, aggs=None, agg_func=None, agg_suffix='_agg', agg_geometry=False, agg_routes=False, dropna=False, fillna=None, reorder=True, merge_lines=True)[source]
Dissolve the events dataframe on a selection of event attributes.
Note: Data will be sorted by keys and begin/end columns prior to performing the dissolve.
Note: Missing data in selected attribute fields may cause problems with dissolving; please use df.fillna(…) or df.dropna(…) to avoid this problem.
Parameters
- attrstr or list
Which event attribute(s) within the events dataframe to dissolve on.
- aggsstr or list, default None
Which event attribute(s) within the events dataframe to aggregate during the dissolve. Attributes will be aggregated into a list and returned under the same attribute name.
- agg_funccallable function or list of callable functions, default None
A function or list of functions corresponding to the list of aggregation attributes which will be called on the list-aggregated contents of those attributes.
- agg_suffixstr or list, default ‘_agg’
A suffix to be added to the name of aggregated columns. If provided as a list, must correspond to provided lost of aggregation attributes.
- agg_geometrybool, default False
Whether to create an aggregated geometries field, populated with aggregated shapely geometries based on those contained in the collection’s geometry field.
- agg_routesbool, default False
Whether to create an aggregated routes field, populated with MLSRoute object class instances, created based on aggregated segment geometries and begin and end mile posts.
- dropnabool, default False
Whether to drop records with empty values in the attribute fields. This parameter is passed to the df.groupby call.
- fillnaoptional
A value or dictionary used to fill instances of np.nan in the target dataframe. Consistent with the DataFrame.fillna() method.
- reorderbool, default True
Whether to reorder the resulting dataframe columns to match the order of the collection’s events dataframe.
- merge_linesbool, default True
Whether to use shapely’s ops.linemerge function to combine contiguous linestrings when aggregating linear geometries. Only applicable when agg_geometry=True.
- property end
- property end_loc
- property ends
- property geom
- property geom_loc
- geometry_from_xy(x, y, col_name='geometry', crs=None, inplace=False)[source]
Use X and Y coordinates in the events dataframe to generate point geometry.
- property group_keys
- property group_keys_unique
- property groups
The pandas GroupBy of the events dataframe, grouped by the collection’s key columns. This defines the basis for key queries.
- property is_point
Returns True if the collection’s beg and end columns are the same, implying that it is a collection of point events.
- iter_groups()[source]
Return an iterator which will iterate through all groups in the collection, yielding each group’s key as well as the associated EventsGroup.
- property key_locs
- property key_values
A dictionary of valid values for each key column.
- property keys
The list of column names within the events dataframe which are queried to define specific events groups (e.g., events on a specific route).
- property num_keys
The number of key columns within self.keys.
- property others
A list of columns within the events dataframe which are not the begin, end, or key columns.
- parse_routes(col=None, inplace=False, errors='raise')[source]
Parse MLSRoutes data in the provided column, which contains either MLSRoute objects, WKT data for MULTILINESTRINGs or LINESTRINGs with M-values, or a mixture of both.
Parameters
- collabel, optional
A valid column label within the events dataframe which contains the target MLSRoute data. If not provided, will attempt to retrieve a previously assigned column label from the self.route property.
- inplaceboolean, default False
Whether to perform the operation in place. If False, will return a modified copy of the events object.
- errors{‘raise’,’ignore’}
How to address errors which arise when coercing MLSRoute data during processing. If ignored, errors will result in null values in the events dataframe where errors occurred.
- project(other, buffer=100, nearest=True, loc_label='LOC', dist_label='DISTANCE', build_routes=True, **kwargs)[source]
Project an input geodataframe onto the events dataframe, producing linearly referenced point locations relative to events for all input geometries within a buffered search area.
Parameters
- othergpd.GeoDataFrame
Geodataframe containing geometry which will be projected onto the events dataframe.
- bufferfloat, default 100
The max distance to search for input geometries to project against the events’ geometries. Measured in terms of the geometries’ coordinate reference system.
- nearestbool, default True
Whether to choose only the nearest match within the defined buffer. If False, all matches will be returned. If True, when multiple equidistant points exist, choose the first result that appears.
- loc_label, dist_labellabel
Labels to be used for created columns for projected locations on target events groups and nearest point distances between target geometries and events geometries.
- build_routesbool, default True
Whether to automatically build routes using the build_routes() method if routes are not already available.
- **kwargs
Keyword arguments to be passed to the EventsFrame constructor upon completion of the projection.
- property route
- property route_loc
- set_closed(closed=None, inplace=False)[source]
Change whether ranges are closed on left, right, both, or neither side.
Parameters
- closedstr {‘left’, ‘left_mod’, ‘right’, ‘right_mod’, ‘both’,
‘neither’}, optional
Whether intervals are closed on the left-side, right-side, both or neither. If None, will default to ‘left_mod’ for linear events and ‘both’ for point events.
- inplaceboolean, default False
Whether to perform the operation in place on the parent range collection, returning None.
- property shape
- property size
Return the size of the events dataframe.
- property targets
A list of begin, end, and key columns within the events dataframe.
- to_grid(dissolve=False, **kwargs)[source]
Use the events dataframe to create a grid of zero-length, equidistant point events which span the bounds of each event.
Parameters
- lengthnumerical, default 1.0
A fixed distance between each point on the grid.
- fill{‘none’,’cut’,’extend’,’right’,’balance’}, default ‘cut’
How to fill a gap at the end of an event’s range.
- noneno point will be generated at the end of the input range
unless it falls directly on the defined grid distance.
- cuta point will be generated at the very end of the input range,
at a distance less than or equal to the defined grid distance.
- rightthe final point will be generated at a distance equal to
the defined grid distance, even if this extends beyond the full input range.
- extenda point will be generated at the very end of the input
range, at a distance greater than or equal to the defined grid distance.
- balanceif the final range is greater than or equal to half the
target range length, perform the cut method; if it is less, perform the extend method.
- dissolvebool, default False
Whether to dissolve the events dataframe before performing the transformation.
- to_windows(dissolve=False, endpoint=False, **kwargs)[source]
Use the events dataframe to create sliding window events of a fixed length and a fixed number of steps, and which fill the bounds of each event.
Parameters
- lengthnumerical, default 1.0
A fixed length for all windows being defined.
- stepsint, default 1
A number of steps per window length. The resulting step length will be equal to length / steps. For non-overlapped windows, use a steps value of 1.
- fill{‘none’,’cut’,’extend’,’left’,’right’,’balance’}, default ‘cut’
How to fill a gap at the end of an event’s range.
- noneno window will be generated to fill the gap at the end of
the input range.
- cuta truncated window will be created to fill the gap with a
length less than the full window length.
- extendthe final window will be anchored on the grid defined by
the step value, extending beyond the window length to the right bound of the event.
- leftthe final window will be anchored on the end of the input
range and will extend the full window length to the left.
- rightthe final window will be anchored on the grid defined by
the step value, extending the full window length to the right, beyond the event’s end value.
- balanceif the final range is greater than or equal to half the
target range length, perform the cut method; if it is less, perform the extend method.
- dissolvebool, default False
Whether to dissolve the events dataframe before performing the transformation.
- endpointbool, default False
Add a point event at the end of each event range.
- class linref.events.collection.EventsGroup(df, beg=None, end=None, geom=None, closed=None, **kwargs)[source]
Bases:
EventsFrameUser-level class for managing linear and points events data. This class is used for simple data sets with only a single group of events. Data is managed using both the pandas tabular data package as well as the ranges range data package.
EventsGroup class isntances can be used for a variety of linear referencing operations such as overlays to determine portions of events overlapped by an input range, intersections to determine which events intersect with an input range, length-weighted averages of event column values based on an input range, and more.
Parameters
- dfpd.DataFrame
Pandas dataframe which contains linear or point events data.
- beg, endstr or label
Column labels within the events dataframe which represent the linearly referenced location of each event. For linear events both are required, defining the begin and end location of each event. For point events, only ‘beg’ is required, defining the exact location of each event (the ‘end’ property will automatically be set to be equal to the ‘beg’ property).
- geomstr or label, optional
Column label within the events dataframe which represents the shapely geometry associated with each event if available. If provided, certain additional class functionalities will be made available.
- closedstr {‘left’, ‘left_mod’, ‘right’, ‘right_mod’, ‘both’,
‘neither’}, optional
Whether intervals are closed on the left-side, right-side, both or neither. If None, will default to ‘left_mod’ for linear events and ‘both’ for point events.
- leftranges are always closed on the left and never closed on the
right.
- left_modranges are always closed on the left and only closed on the
right when the next range is not consecutive.
- rightranges are always closed on the right and never closed on the
right.
- right_modranges are always closed on the right and only closed on
the left when the previous range is not consecutive.
both : ranges are always closed on both sides neither : ranges are never closed on either side
- property centers
Centers of all event ranges.
- intersecting(beg=None, end=None, other=None, closed='both', get_mask=False, **kwargs)[source]
Retrieve a selection of records from the group of events based on provided begin and end locations.
Parameters
- beg, endnumerical or array-like, optional
The begin and end locations of the range or ranges to be tested. If a single range is to be tested, provide a numeric value. If multiple, provide an array-like with a single begin and end value for each range. If no end parameter provided, point locations will be assumed and end will be set equal to beg. Not required if other parameter is used.
- otherEventsGroup, optional
Other EventsGroup instance to be intersected with this one. Can be provided instead of beg, end, and closed parameters and will take precedence over other input.
- closedstr {‘left’, ‘right’, ‘both’, ‘neither’}, default ‘both’
Whether input interval is closed on the left-side, right-side, both or neither.
- leftranges are always closed on the left and never closed on the
right.
- rightranges are always closed on the right and never closed on
the right.
both : ranges are always closed on both sides neither : ranges are never closed on either side
- get_maskbool, default False
Whether to return a boolean mask for selecting from the events dataframe instead of the selection from the dataframe itself.
- property lengths
Lengths of all event ranges.
- overlay(beg=None, end=None, other=None, **kwargs)[source]
Compute overlap of the input bounds with respect to the events group.
Parameters
- beg, endscalar or array of scalars
Begin and end locations of the overlaid range(s).
- otherEventsGroup, optional
Other EventsGroup instance to be overlaid with this one. Can be provided instead of beg and end parameters and will take precedence over other input.
- normalizeboolean, default True
Whether overlapping lengths should be normalized range length to give a proportional result.
- how{‘right’,’left’,’sum’}, default ‘right’
How overlapping lengths should be normalized. Only applied when normalize=True.
- rightNormalize overlaps by the length of each provided overlay
range.
- leftNormalize overlaps by the length of each of the collection’s
ranges being overlaid.
- sumNormalize overlaps by the sum of the lengths of all overlaps
for each provided overlay range. If there are gaps in the collection’s ranges or overlaps between the collection’s ranges, this will allow the sum of the overlaps to still equal 1.0, except where no overlaps occur.
- norm_zerofloat, optional
A number to substitute for instances where the normalizing factor (denominator) is equal to zero, e.g., when the overlay range has a length of zero and how=’right’. If not provided, all instances of zero division will return float value 0.0.
- overlay_average(beg=None, end=None, cols=None, weighted=True, zeroweight=None, how='right', weights=None, suffix='_average', **kwargs)[source]
Compute the weighted average of a selection of events columns based on the overlap of the input bounds with respect to linear events.
Parameters
- begfloat
Beginning milepost of the overlaid segment.
- endfloat
Ending milepost of the overlaid segment.
- colslist
List of column labels to aggregate.
- weightedboolean, default True
Whether the computed average should be weighted. If False, an un-weighted average will be computed, giving all intersecting values an equal weight.
- zeroweightdefault None
If weights sum to zero, how to compute average. If None, an un-weighted average will be computed. Else, no average will be computed and the input value will be returned instead.
- how{‘right’,’left’,’sum’}, default ‘right’
How overlapping lengths should be normalized. Only applied when normalize=True.
- rightNormalize overlaps by the length of each provided overlay
range.
- leftNormalize overlaps by the length of each of the collection’s
event ranges.
- sumNormalize overlaps by the sum of the lengths of all overlaps
for each provided overlay range. If there are gaps in the collection’s event ranges or overlaps between the collection’s ranges, this will allow the sum of the overlaps to still equal 1.0, except where no overlaps occur.
- weightsnp.ndarray
An array of length-normalized overlay weights; if excluded, weights will be computed based on given mileposts and parameters; if multiple overlay computations are being conducted, computing the weights separately and then inputting them directly into the aggregation functions will produce time savings.
- overlay_most(beg=None, end=None, cols=None, weights=None, suffix='_most', **kwargs)[source]
Compute the most represented values of a selection of events columns based on the overlap of the input bounds with respect to route events.
Parameters
- begfloat
Beginning milepost of the overlaid segment.
- endfloat
Ending milepost of the overlaid segment.
- colslist
List of column labels to aggregate.
- weightspd.Series
A series of length-normalized overlay weights; if excluded, weights will be computed based on given mileposts and parameters; if multiple overlay computations are being conducted, computing the weights separately and then inputting them directly into the aggregation functions will produce time savings.
- overlay_sum(beg=None, end=None, cols=None, weighted=True, weights=None, suffix='_sum', **kwargs)[source]
Compute the weighted average of a selection of events columns based on the overlap of the input bounds with respect to route events.
Parameters
- begfloat
Beginning milepost of the overlaid segment.
- endfloat
Ending milepost of the overlaid segment.
- colslist
List of column labels to aggregate.
- weightedboolean, default True
Whether the computed sum should be weighted. If False, an un-weighted sum will be computed, giving all intersecting values an equal weight.
- weightsnp.ndarray
An array of length-normalized overlay weights; if excluded, weights will be computed based on given mileposts and parameters; if multiple overlay computations are being conducted, computing the weights separately and then inputting them directly into the aggregation functions will produce time savings.
- property rng
- set_closed(closed, inplace=False)[source]
Change whether ranges are closed on left, right, both, or neither side.
Parameters
- closedstr {‘left’, ‘left_mod’, ‘right’, ‘right_mod’, ‘both’,
‘neither’}, default ‘left’
Whether intervals are closed on the left-side, right-side, both or neither.
- leftranges are always closed on the left and never closed on the
right.
- left_modranges are always closed on the left and only closed on
the right when the next range is not consecutive.
- rightranges are always closed on the right and never closed on
the right.
- right_modranges are always closed on the right and only closed
on the left when the previous range is not consecutive.
both : ranges are always closed on both sides neither : ranges are never closed on either side
- inplaceboolean, default False
Whether to perform the operation in place on the parent range collection, returning None.
- property shape
- class linref.events.collection.EventsLog(**kwargs)[source]
Bases:
objectHigh-level class for logging and managing child EventsGroups created within the context of a parent EventsCollection class instance.
- property data
- property keys
- linref.events.collection.check_compatibility(objs, errors='raise', **kwargs)[source]
Check if the input list of EventsCollections are all compatible for merging, unifying, or similar relational processes. Errors will be raised if objects are not found to be compatible with information about why they are not compatible. If requested, errors can be ignored, returning False instead. If all objects are compatible, the function will return True.
Parameters
- objslist-like of EventsCollections
List of EventsCollection objects to be tested against each other.
- errors{‘raise’,’ignore’}
How to respond to errors when they arise.
- linref.events.collection.from_standard(df, require_end=False, **kwargs)[source]
Create an EventsCollection from the input dataframe assuming standard column labels. These standard labels can be modified on the class directly be modifying the associated class attributes: - default_keys - default_beg - default_end - default_geom
Standard labels include: keys : ‘RID’, ‘YEAR’, ‘KEY’ beg : ‘BMP’, ‘BEG’, ‘FROM’ end : ‘EMP’, ‘END’, ‘TO’ geom : ‘geometry’
Additional constructor keyword arguments can be passed through **kwargs.
Parameters
- dfpd.DataFrame
Pandas dataframe which contains linear or point events data, formatted with standard labels. If multiple keys are detected, they will be assigned in the order in which they appear within the target dataframe. Only one of each begin and end option may be used. The geometry label is optional.
- require_endbool, default False
Whether to raise an error if no valid unique end column label is found. If False, no end label will be used when generating the collection.
- **kwargs
Additional keyword arguments to be passed to the EventsCollection constructor.
linref.events.merge module
Module featuring classes and functionality for merging events collections and summarizing/retrieving information from these merges. For ease of use, features in this module should be accessed through collection-level merging methods such as EventsCollection.merge instead of abstractly through the classes themselves.
Classes
EventsMerge, EventsMergeAttribute, EventsMergeTrace
Dependencies
pandas, numpy, rangel, copy, warnings, functools
Development
Developed by: Tariq Shihadah, tariq.shihadah@gmail.com
Created: 10/1/2021
Modified: 10/1/2021
- class linref.events.merge.EventsMerge(left, right)[source]
Bases:
objectHigh-level object class for managing merges between two events collections and summarizing/retrieving information from these merges. Generated through collection-level merging methods such as EventsCollection.merge.
- any(**kwargs)[source]
Indicate whether each record intersects with at least one event.
Parameters
- emptyscalar, string, or other pd.Series-compatible value, optional
Value to use to fill when there is no matching events group and aggregation cannot be performed. If None, values will be filled with np.nan.
- build(inplace=True)[source]
Perform intersects and overlays to produce EventsMergeTrace objects for aggregation.
- property columns
- copy(deep=False)[source]
Create an exact copy of the events class instance.
Parameters
- deepbool, default False
Whether the created copy should be a deep copy.
- count(**kwargs)[source]
Count the number of intersecting events.
Parameters
- emptyscalar, string, or other pd.Series-compatible value, optional
Value to use to fill when there is no matching events group and aggregation cannot be performed. If None, values will be filled with np.nan.
- cut(**kwargs)[source]
Cut intersecting event routes at the intersecting begin and end locations, returning the resulting route’s geometry or the route itself if requested.
Parameters
- emptyscalar, string, or other pd.Series-compatible value, optional
Value to use to fill when there are no intersecting events and aggregation cannot be performed. If None, values will be filled with np.nan.
- return_mlsbool, default True
Whether to return the MultiLineString associated with each cut MLRRoute instead of the route itself.
- distribute(column=None, squeeze=True, **kwargs)[source]
Intersect and distribute events over the range collection, scaling their values relative to their indexed distance from their intersecting range location.
Parameters
- columnpandas column label or list of same, optional
The events dataframe column(s) containing the values associated with each event being analyzed. If not provided, all values will default to be 1.
- blur_sizeint, default 0
The number of pixels to blur events across based on the blur style.
- blur_stylestr or callable, default ‘linear’
The scaling function to be called at each blurring step to scale original values. If a callable is provided, it must accept a single integer input for the zero-indexed pixel number, returning a single float scaling value. Predefined blurring functions can be called using the following labels:
- linearlinearly scale down values from the original value to zero
at the first index outside the blurred pixel range
norm_static : Tk norm_scale : Tk none : do not scale down original values
- length_normalizebool, default True
Normalize the intersection scores by the length of the range to account for differing range lengths.
Created: 2022-10-04
- interpolate(**kwargs)[source]
Interpolate along intersecting event routes at the intersecting location (or begin point for linear events), returning the resulting interpolated point geometry.
Parameters
- snap{None, ‘near’, ‘left’, ‘right’}, default None
If the event location does not fall within any geometry, snap to the nearest match based on distance, choosing the closest location to the left, right, or the nearest side (‘near’). If None, a value error will be raised when no intersecting ranges are found.
- point{‘begs’, ‘ends’, ‘centers’}, default ‘begs’
Where on the intersecting events the point should be made, at the begin, end, or center point of the range.
- emptyscalar, string, or other pd.Series-compatible value, optional
Value to use to fill when there are no intersecting events and aggregation cannot be performed. If None, values will be filled with np.nan.
- property keys
- property left
- property num_keys
- property right
- property traces
- class linref.events.merge.EventsMergeAttribute(parent, column)[source]
Bases:
object- all(empty=None, **kwargs)[source]
Return all values from intersecting events in a list.
Parameters
- emptyscalar, string, or other pd.Series-compatible value, optional
Value to use to fill when there is no matching events group and aggregation cannot be performed. If None, values will be filled with np.nan.
- any(empty=None, **kwargs)[source]
Indicate whether each record intersects with at least one event.
Parameters
- emptyscalar, string, or other pd.Series-compatible value, optional
Value to use to fill when there is no matching events group and aggregation cannot be performed. If None, values will be filled with np.nan.
- property column
- count(empty=None)[source]
Return the count of all intersected event values.
Parameters
- emptyscalar, string, or other pd.Series-compatible value, optional
Value to use to fill when there is no matching events group and aggregation cannot be performed. If None, values will be filled with np.nan.
- cut(empty=None, return_mls=True)[source]
Cut intersecting event routes at the intersecting begin and end locations, returning the resulting route’s geometry or the route itself if requested.
Parameters
- emptyscalar, string, or other pd.Series-compatible value, optional
Value to use to fill when there is no matching events group and aggregation cannot be performed. If None, values will be filled with np.nan.
- return_mlsbool, default True
Whether to return the MultiLineString associated with each cut MLRRoute instead of the route itself.
- first(empty=None)[source]
Return the first event value according to the order of the provided collection’s events dataframe.
Parameters
- emptyscalar, string, or other pd.Series-compatible value, optional
Value to use to fill when there is no matching events group and aggregation cannot be performed. If None, values will be filled with np.nan.
- interpolate(snap=None, point='begs', empty=None, **kwargs)[source]
Interpolate along intersecting event routes at the intersecting location (or begin point for linear events), returning the resulting interpolated point geometry.
Parameters
- snap{None, ‘near’, ‘left’, ‘right’}, default None
If the event location does not fall within any geometry, snap to the nearest match based on distance, choosing the closest location to the left, right, or the nearest side (‘near’). If None, a value error will be raised when no intersecting ranges are found.
- point{‘begs’, ‘ends’, ‘centers’}, default ‘begs’
Where on the intersecting events the point should be made, at the begin, end, or center point of the range.
- emptyscalar, string, or other pd.Series-compatible value, optional
Value to use to fill when there is no matching events group and aggregation cannot be performed. If None, values will be filled with np.nan.
- last(empty=None)[source]
Return the last event value according to the order of the provided collection’s events dataframe.
Parameters
- emptyscalar, string, or other pd.Series-compatible value, optional
Value to use to fill when there is no matching events group and aggregation cannot be performed. If None, values will be filled with np.nan.
- property loc
- mean(empty=None, weighted=True, dropna=False)[source]
Return an overlay length-weighted average of all event values. An unweighted simple average can also be computed if weighted=True.
Parameters
- emptyscalar, string, or other pd.Series-compatible value, optional
Value to use to fill when there is no matching events group and aggregation cannot be performed. If None, values will be filled with np.nan.
- weightedboolean, default True
Whether the computed average should be weighted. If False, an un-weighted average will be computed, giving all intersecting values an equal weight.
- dropnaboolean, default False
Whether to drop np.nan values before aggregating.
- mode(empty=None)[source]
Return the most frequent unique event value.
Parameters
- emptyscalar, string, or other pd.Series-compatible value, optional
Value to use to fill when there is no matching events group and aggregation cannot be performed. If None, values will be filled with np.nan.
- most(empty=None, dropna=True)[source]
Return the event value associated with the greatest total overlay length, ignoring missing values by default.
Parameters
- emptyscalar, string, or other pd.Series-compatible value, optional
Value to use to fill when there is no matching events group and aggregation cannot be performed. If None, values will be filled with np.nan.
- dropnaboolean, default False
Whether to drop np.nan values in intersecting events before aggregating.
- property ncols
- property ndim
- property parent
- sum(empty=None, nansum=False)[source]
Return the sum of all intersected event values.
Parameters
- emptyscalar, string, or other pd.Series-compatible value, optional
Value to use to fill when there is no matching events group and aggregation cannot be performed. If None, values will be filled with np.nan.
- sumproduct(empty=None, normalized=False, dropna=False)[source]
Return the sum of all event values multiplied by the weights of the intersecting events. If normalized=False, the event values will be multiplied by the actual overlapping length (e.g., multiplying a per- mile value by the miles of overlap). If normalized=True, the event values will be multiplied by the normalized overlapping length (e.g., multiplying a total value of an overlapped event by the proportion of the event which is overlapped).
Parameters
- emptyscalar, string, or other pd.Series-compatible value, optional
Value to use to fill when there is no matching events group and aggregation cannot be performed. If None, values will be filled with np.nan.
- normalizedboolean, default False
Whether the weights of the intersecting events being multiplied with the event values should be normalized by the total length of the events being intersected.
- dropnaboolean, default False
Whether to drop np.nan values before aggregating.
- property traces
- unique(empty=None, **kwargs)[source]
Return all unique values from intersecting events in a tuple.
Parameters
- emptyscalar, string, or other pd.Series-compatible value, optional
Value to use to fill when there is no matching events group and aggregation cannot be performed. If None, values will be filled with np.nan.
- class linref.events.merge.EventsMergeTrace(group_left=None, group_right=None, key=None, mask=None, weights=None, success=True)[source]
Bases:
objectObject class for managing data on the relationship between two events collections that have been merged using the EventsMerge system. Traces contain a few main elements:
- group_left, group_rightpointers to the left and right events groups that
are related. During aggregation, information in the right group will be aggregated and formed to the dataframe underlying the left group.
- keythe unique key associated with both events groups that produces their
relationship.
- maska boolean array of shape (group_left.df.shape[0],
group_right.df.shape[0]), i.e., a number of rows equal to the number of rows in the left events group and a number of columns equal to the number of rows in the right events group. This mask defines all instances where the left and right groups intersect based on their defined ranges and closed parameters.
- weightsa numeric array of shape (group_left.df.shape[0],
group_right.df.shape[0]), i.e., a number of rows equal to the number of rows in the left events group and a number of columns equal to the number of rows in the right events group. This array defines the actual numeric length that is overlapped between the individual events in the left and right events groups.
- successa boolean indicator of whether or not a valid relationship has
been discovered between the left group and any right group. When False, no right group will be indicated.
- linref.events.merge.get_mode(arr)[source]
Select the item from the input array which appears most frequently.
Parameters
- arrarray-like
Array with target values
- linref.events.merge.get_most(arr, weights)[source]
Select the item from the input 1D array which is associated with the highest total weight from each row in the 2D weights array. Scores are computed by summing the weights for each unique array value for each row of weights. When multiple values are tied, the first item in sorted order will be selected.
linref.events.spatial module
Module featuring classes and functionality for spatial analysis of events data including parallel projection.
Classes
ParallelProjector
Dependencies
pandas, numpy, copy, warnings, functools
Development
Developed by: Tariq Shihadah, tariq.shihadah@gmail.com
Created: 3/3/2022
Modified: 3/3/2022
- class linref.events.spatial.ParallelProjector(target, other, samples=3, buffer=100)[source]
Bases:
objectExperimental class for performing projections of linear geometries onto linear events collections.
- property buffer
- property projectors
- property sample_locs
- property sample_points
- property samples
linref.events.union module
Module featuring classes and functionality for unifying events collections.
Classes
EventsUnion
Dependencies
pandas, numpy, copy, warnings, functools
Development
Developed by: Tariq Shihadah, tariq.shihadah@gmail.com
Created: 4/13/2022
Modified: 4/13/2022
- class linref.events.union.EventsUnion(objs, **kwargs)[source]
Bases:
objectParameters
- objslist-like of EventsCollection instances
A selection of EventsCollection object instances to be combined into a single instance based on the input parameters.
- **kwargs
Keyword arguments to be passed to the initialization function for the new EventsCollection instance.
- get_groups(keys, empty=True)[source]
Retrieve unique groups of events from each related collection based on provided key values.
Parameters
- keyskey value, tuple of key values, or list of the same
If only one key column is defined within the collections, a single column value may be provided. Otherwise, a tuple of column values must be provided in the same order as they appear in self.keys. To get multiple groups, a list of key values or tuples may be provided.
- emptybool, default True
Whether to allow for empty events groups to be returned when the provided keys are valid but are not associated with any actual events. If False, these cases will return a KeyError.
- property group_keys_unique
- property num_keys
- property num_objs
- property objs
- union(fill_gaps=False, get_index=True, merge=False, suffixes=None, **kwargs)[source]
Combine multiple EventsCollection instances into a single instance, creating least common intervals among all collections and maintaining all event attributes. The resulting combined events will be used to create and return an EventsCollection modeled after the first indexed collection in self.objs.
Parameters
- fill_gapsbool, default False
Whether to fill gaps in the merged collection with empty events. These events would not be associated with any parent collection and would not be populated with any events attributes.
- get_indexbool, default True
Whether to produce columns relating each new record to the index of the originating record in the input events dataframes. When this is not necessary, setting to False may produce significant time savings.
- mergebool, default False
Whether to merge columns from each original dataframe to the newly created resegmented events collection dataframe. If not done during the union, it can be done later by merging on the new ‘index_i’ columns which correlate with the indices of the original dataframes. To perform this merge manually, the get_index parameter should be True.
- suffixeslist-like, default [‘_0’, …, ‘_n’]
Sequence of length equal to the number of events collections being unified, where each element is a string indicating the suffix to add to overlapping column names in each corresponding events dataframe. All entries must be unique.