linref.events package

Submodules

linref.events.collection module


Module featuring EventsCollection and EventsGroup object classes for the management of linear referencing events data and optimized performance of various events operations including dissolves, automated intersections and attribute retrievals, linear overlays, and more.

EventsCollection class instances represent complex events data sets with multiple groups of events which are distinguished by at least one set of keys (e.g., years of data or inventory categories). These collections can be used for a variety of linear referencing operations and events manipulations, such as dissolves based on a subset of events columns, returning a simplified data set with a selection of columns aggregated. Additionally, these collections can be used to perform automated merges and intersections with other EventsCollection class instances using the .merge() method, retrieving column data from another collection and relating it to the original collection’s events data.

EventsGroup class instances represent simple events data sets with a single group of contiguous events. These groups can be used for a variety of linear referencing operations such as overlays to determine portions of events overlapped by an input range, intersections to determine which events intersect with an input range, length-weighted averages of event column values based on an input range, and more.

EventsCollection class instances can be queried using square bracket indexing or the .get_subset() and .get_group() methods, returning a pared down EventsCollection or a specific EventsGroup, respectively. Similarly, this can be done using object indexing, passing a mixture of unique values and valid slices of unique key values to return a subset of the collection as an EventsCollection instance, or just unique key values to return a unique group as an EventsGroup instance.

Classes

EventsCollection, EventsGroup

Dependencies

pandas, geopandas, numpy, shapely, copy, warnings, rangel

Examples

Create an events collection for a sample roadway events dataframe with unique route identifier represented by the ‘Route’ column and data for multiple years, represented by the ‘Year’ column. The begin and end mile points are defined by the ‘Begin’ and ‘End’ columns. >>> ec = EventsCollection(df, keys=[‘Route’,’Year’], beg=’Begin’, end=’End’)

To select events from a specific route and a specific year, indexing for all keys can be used, producing an EventsGroup. >>> eg = ec[‘Route 50’, 2018]

To select events on all routes but only those from a specific year, indexing for only some keys can be used. >>> ec_2018 = ec[:, 2018]

To retrieve information from one events collection and apply it to the events of the other. >>> ec.merge()

To get all events which intersect with a numeric range, the intersecting() method can be used on an EventsGroup instance. >>> df_intersecting = eg.intersecting(0.5, 1.5, closed=’left_mod’)

The intersecting() method can also be used for point locations by ommitting the second location attribute. >>> df_intersecting = eg.intersecting(0.75, closed=’both’)

The linearly weighted average of one or more attributes can be obtained using the overlay_average() method. >>> df_overlay = eg.overlay_average(0.5, 1.5, cols=[‘Speed_Limit’,’Volume’])

If the events include information on the roadway speed limit and number of lanes, they can be dissolved on these attributes. During the dissolve, other attributes can be aggregated, providing a list of associated values or performing an aggregation function over these values. >>> ec_dissolved = ec.dissolve(attr=[‘Speed_Limit’,’Lanes’], aggs=[‘County’])

Development

Developed by: Tariq Shihadah, tariq.shihadah@gmail.com

Created: 10/22/2019

Modified: 3/3/2021


class linref.events.collection.EventsCollection(df, keys=None, beg=None, end=None, geom=None, closed=None, sort=False, missing_data='warn', **kwargs)[source]

Bases: EventsFrame

User-level class for managing linear and points events data. This class is used for complex data sets with multiple groups of events, grouped by at least one key column (e.g., route ID). Data is managed using both the pandas tabular data package as well as the ranges range data package.

EventsCollection class instances represent complex events data sets with multiple groups of events which are distinguished by at least one set of keys (e.g., years of data or inventory categories). These collections can be used for a variety of linear referencing operations and events manipulations, such as dissolves based on a subset of events columns, returning a simplified data set with a selection of columns aggregated. Additionally, these collections can be used to perform automated intersections with another EventsCollection class instance using the retrieve() method, retrieving column data from another collection and relating it to the original collection’s events data.

EventsCollection class instances can be queried using the get_subset() and get_group() methods, returning a pared down EventsCollection or a specific EventsGroup, respectively. Similarly, this can be done using object indexing, passing a mixture of unique values and valid slices of unique key values to return a subset of the collection as an EventsCollection instance, or just unique key values to return a unique group as an EventsGroup instance.

Parameters

dfpd.DataFrame

Pandas dataframe which contains linear or point events data.

keyslist or tuple

A list or tuple of dataframe column labels which define the unique groups of events within the events dataframe. Common examples include year or route ID columns which distinguish unrelated sets of events within the events dataframe.

beg, endstr or label

Column labels within the events dataframe which represent the linearly referenced location of each event. For linear events both are required, defining the begin and end location of each event. For point events, only ‘beg’ is required, defining the exact location of each event (the ‘end’ property will automatically be set to be equal to the ‘beg’ property).

geomstr or label, optional

Column label within the events dataframe which represents the shapely geometry associated with each event if available. If provided, certain additional class functionalities will be made available.

closedstr {‘left’, ‘left_mod’, ‘right’, ‘right_mod’, ‘both’,

‘neither’}, optional

Whether intervals are closed on the left-side, right-side, both or neither. If None, will default to ‘left_mod’ for linear events and ‘both’ for point events.

leftranges are always closed on the left and never closed on the

right.

left_modranges are always closed on the left and only closed on the

right when the next range is not consecutive.

rightranges are always closed on the right and never closed on the

right.

right_modranges are always closed on the right and only closed on

the left when the previous range is not consecutive.

both : ranges are always closed on both sides neither : ranges are never closed on either side

sortbool, default False

Whether to sort the events dataframe by its keys and begin and end values upon its creation.

missing_data{‘ignore’,’drop’,’warn’,’raise’}, default ‘warn’

What to do when the input dataframe contains missing values in the target key, beg, and end columns.

ignore : do nothing. drop : drop all records which contain any missing data in the target

columns.

warn : log a warning when records are missing data. raise : raise a ValueError when records are missing data.

from_similar(df, **kwargs)[source]

Create an EventsCollection from the input dataframe, assuming the same column labels and closed parameter as the calling collection. Additional constructor keyword arguments can be passed through **kwargs.

Parameters

dfpd.DataFrame

Pandas dataframe which contains linear or point events data, formatted with standard labels. If multiple keys are detected, they will be assigned in the order in which they appear within the target dataframe. Only one of each begin and end option may be used. The geometry label is optional.

**kwargs

Additional keyword arguments to be passed to the EventsCollection constructor.

classmethod from_standard(df, require_end=False, **kwargs)[source]

Create an EventsCollection from the input dataframe assuming standard column labels. These standard labels can be modified on the class directly be modifying the associated class attributes: - default_keys - default_beg - default_end - default_geom

Standard labels include: keys : ‘RID’, ‘YEAR’, ‘KEY’ beg : ‘BMP’, ‘BEG’, ‘FROM’ end : ‘EMP’, ‘END’, ‘TO’ geom : ‘geometry’

Additional constructor keyword arguments can be passed through **kwargs.

Parameters

dfpd.DataFrame

Pandas dataframe which contains linear or point events data, formatted with standard labels. If multiple keys are detected, they will be assigned in the order in which they appear within the target dataframe. Only one of each begin and end option may be used. The geometry label is optional.

require_endbool, default False

Whether to raise an error if no valid unique end column label is found. If False, no end label will be used when generating the collection.

**kwargs

Additional keyword arguments to be passed to the EventsCollection constructor.

get_group(keys, empty=True, log_empty=True, **kwargs) EventsGroup[source]

Retrieve a unique group of events based on provided key values.

Parameters

keyskey value, tuple of key values, or list of the same

If only one key column is defined within the collection, a single column value may be provided. Otherwise, a tuple of column values must be provided in the same order as they appear in self.keys.

emptybool, default True

Whether to allow for empty events groups to be returned when the provided keys are valid but are not associated with any actual events. If False, these cases will return a KeyError.

log_emptybool, default True

Whether created empty events should be logged and stored within the collection to allow for quicker access. More memory intensive but may produce moderate performance improvements if empty keys will be accessed repeatedly.

get_matching(other, **kwargs)[source]

Retrieve a subset of the events collection based on the unique group values present in another provided events collection.

Parameters

otherEventsCollection

Another events collection with matching keys which will be used to select a subset of this events collection based on its key values.

get_subset(keys, reduce=True, **kwargs)[source]

Retrieve a subset of the events collection based on the provided key values or slices. Returned events must satisfy all keys.

Parameters

keyslist or tuple of slice, list, or other

A list of either (1) slices which can be used to slice the key values present in self.key_values for the associated key, (2) a list of values which reflect those in self.key_values, or (3) a single value which is present in self.key_values. Inputs must be provided in the same order as they appear in self.keys.

reducebool, default True

Whether to simplify the resulting EventsCollection by removing any keys which are queried for a single value and become obsolete.

For example, if one key represents years of data and a single year is provided, that key will be removed from the resulting collection as it can no longer be queried further.

property log
merge(other)[source]

Create an EventsMerge instance with this collection as the left and the other collection as the right. This can then be used to retrieve attributes from the other collection to be appended to this collection’s dataframe.

Parameters

otherEventsCollection

Another events collection with similar keys which will be merged with this events collection, producing an EventsMerge instance which can be used to perform various overlay operations to retrieve attributes and more from the target collection.

overlay_average(other, cols=None, **kwargs)[source]
project_parallel(other, samples=3, buffer=100, match='all', choose=1, sort_locs=True, **kwargs)[source]

Project an input geodataframe of linear geometries onto parallel events in the events dataframe, producing linearly referenced locations for all input geometries which are found to be parallel based on buffer and sampling parameters.

Parameters

othergpd.GeoDataFrame

Geodataframe containing linear geometry which will be projected onto the events dataframe.

samplesint, default 3

The number of equidistant sample points to take along each geometry being projected to check for nearby geometry.

bufferfloat, default 100

The max distance to search for input geometries to project against the events’ geometries. Measured in terms of the geometries’ coordinate reference system.

match{‘all’, int}, default ‘all’

How many sample points must find a nearby target event to produce a positive match to that event, resulting in a projection.

choose{int, ‘all’}, default 1

How many target geometries to choose when more than one match occurs.

sort_locsbool, default True

Whether begin and end location values should be sorted, ensuring that all events are increasing and monotonic.

**kwargs

Keyword arguments to be passed to the EventsCollection constructor upon completion of the projection.

reset_log()[source]

Reset the log of built events groups.

class linref.events.collection.EventsFrame(df, keys=None, beg=None, end=None, geom=None, route=None, closed=None, sort=False, **kwargs)[source]

Bases: object

High-level class for managing linear events data. Users should instead use the EventsCollection class for complex data sets with multiple groups of events, grouped by at least one key column (e.g., route ID), or the EventsGroup class for simple data sets with only a single group of events.

property beg
property beg_loc
property begs
build_routes(label='route', errors='raise')[source]

Build MLSRoute instances for each event based on available geometry and begin and end locations.

Parameters

labelvalid pandas column label

Column label to use for newly generated column populated with routes data.

errors{‘raise’,’ignore’}

How to address errors if they arise when producing routes. If errors are not raised, inviable records in the new column will be filled with np.nan.

cast_gdf(inplace=False, **kwargs)[source]

Convert the events dataframe to a geodataframe, passing the input keyword arguments, such as crs and geometry, to the gpd.GeoDataFrame constructor. See documentation for this constructor for more information.

property closed

Collection parameter for whether event intervals are closed on the left-side, right-side, both or neither.

property columns

A list of all columns within the events dataframe.

copy(deep=False)[source]

Create an exact copy of the events class instance.

Parameters

deepbool, default False

Whether the created copy should be a deep copy.

default_beg = ['BMP', 'BEG', 'FROM', 'LOC']
default_end = ['EMP', 'END', 'TO']
default_geom = ['geometry']
default_keys = ['RID', 'YEAR', 'KEY']
property df

The collection’s events dataframe.

df_exportable()[source]

Return a dataframe which is optimized for exporting.

dissolve(attr=None, aggs=None, agg_func=None, agg_suffix='_agg', agg_geometry=False, agg_routes=False, dropna=False, fillna=None, reorder=True, merge_lines=True)[source]

Dissolve the events dataframe on a selection of event attributes.

Note: Data will be sorted by keys and begin/end columns prior to performing the dissolve.

Note: Missing data in selected attribute fields may cause problems with dissolving; please use df.fillna(…) or df.dropna(…) to avoid this problem.

Parameters

attrstr or list

Which event attribute(s) within the events dataframe to dissolve on.

aggsstr or list, default None

Which event attribute(s) within the events dataframe to aggregate during the dissolve. Attributes will be aggregated into a list and returned under the same attribute name.

agg_funccallable function or list of callable functions, default None

A function or list of functions corresponding to the list of aggregation attributes which will be called on the list-aggregated contents of those attributes.

agg_suffixstr or list, default ‘_agg’

A suffix to be added to the name of aggregated columns. If provided as a list, must correspond to provided lost of aggregation attributes.

agg_geometrybool, default False

Whether to create an aggregated geometries field, populated with aggregated shapely geometries based on those contained in the collection’s geometry field.

agg_routesbool, default False

Whether to create an aggregated routes field, populated with MLSRoute object class instances, created based on aggregated segment geometries and begin and end mile posts.

dropnabool, default False

Whether to drop records with empty values in the attribute fields. This parameter is passed to the df.groupby call.

fillnaoptional

A value or dictionary used to fill instances of np.nan in the target dataframe. Consistent with the DataFrame.fillna() method.

reorderbool, default True

Whether to reorder the resulting dataframe columns to match the order of the collection’s events dataframe.

merge_linesbool, default True

Whether to use shapely’s ops.linemerge function to combine contiguous linestrings when aggregating linear geometries. Only applicable when agg_geometry=True.

property end
property end_loc
property ends
property geom
property geom_loc
geometry_from_xy(x, y, col_name='geometry', crs=None, inplace=False)[source]

Use X and Y coordinates in the events dataframe to generate point geometry.

property group_keys
property group_keys_unique
property groups

The pandas GroupBy of the events dataframe, grouped by the collection’s key columns. This defines the basis for key queries.

property is_point

Returns True if the collection’s beg and end columns are the same, implying that it is a collection of point events.

iter_groups()[source]

Return an iterator which will iterate through all groups in the collection, yielding each group’s key as well as the associated EventsGroup.

property key_locs
property key_values

A dictionary of valid values for each key column.

property keys

The list of column names within the events dataframe which are queried to define specific events groups (e.g., events on a specific route).

property num_keys

The number of key columns within self.keys.

property others

A list of columns within the events dataframe which are not the begin, end, or key columns.

parse_routes(col=None, inplace=False, errors='raise')[source]

Parse MLSRoutes data in the provided column, which contains either MLSRoute objects, WKT data for MULTILINESTRINGs or LINESTRINGs with M-values, or a mixture of both.

Parameters

collabel, optional

A valid column label within the events dataframe which contains the target MLSRoute data. If not provided, will attempt to retrieve a previously assigned column label from the self.route property.

inplaceboolean, default False

Whether to perform the operation in place. If False, will return a modified copy of the events object.

errors{‘raise’,’ignore’}

How to address errors which arise when coercing MLSRoute data during processing. If ignored, errors will result in null values in the events dataframe where errors occurred.

project(other, buffer=100, nearest=True, loc_label='LOC', dist_label='DISTANCE', build_routes=True, **kwargs)[source]

Project an input geodataframe onto the events dataframe, producing linearly referenced point locations relative to events for all input geometries within a buffered search area.

Parameters

othergpd.GeoDataFrame

Geodataframe containing geometry which will be projected onto the events dataframe.

bufferfloat, default 100

The max distance to search for input geometries to project against the events’ geometries. Measured in terms of the geometries’ coordinate reference system.

nearestbool, default True

Whether to choose only the nearest match within the defined buffer. If False, all matches will be returned. If True, when multiple equidistant points exist, choose the first result that appears.

loc_label, dist_labellabel

Labels to be used for created columns for projected locations on target events groups and nearest point distances between target geometries and events geometries.

build_routesbool, default True

Whether to automatically build routes using the build_routes() method if routes are not already available.

**kwargs

Keyword arguments to be passed to the EventsFrame constructor upon completion of the projection.

property route
property route_loc
set_closed(closed=None, inplace=False)[source]

Change whether ranges are closed on left, right, both, or neither side.

Parameters

closedstr {‘left’, ‘left_mod’, ‘right’, ‘right_mod’, ‘both’,

‘neither’}, optional

Whether intervals are closed on the left-side, right-side, both or neither. If None, will default to ‘left_mod’ for linear events and ‘both’ for point events.

inplaceboolean, default False

Whether to perform the operation in place on the parent range collection, returning None.

set_df(obj, inplace=False)[source]

Set a new events dataframe.

property shape
property size

Return the size of the events dataframe.

sort(inplace=False)[source]

Sort the events dataframe based on target columns.

property targets

A list of begin, end, and key columns within the events dataframe.

to_grid(dissolve=False, **kwargs)[source]

Use the events dataframe to create a grid of zero-length, equidistant point events which span the bounds of each event.

Parameters

lengthnumerical, default 1.0

A fixed distance between each point on the grid.

fill{‘none’,’cut’,’extend’,’right’,’balance’}, default ‘cut’

How to fill a gap at the end of an event’s range.

noneno point will be generated at the end of the input range

unless it falls directly on the defined grid distance.

cuta point will be generated at the very end of the input range,

at a distance less than or equal to the defined grid distance.

rightthe final point will be generated at a distance equal to

the defined grid distance, even if this extends beyond the full input range.

extenda point will be generated at the very end of the input

range, at a distance greater than or equal to the defined grid distance.

balanceif the final range is greater than or equal to half the

target range length, perform the cut method; if it is less, perform the extend method.

dissolvebool, default False

Whether to dissolve the events dataframe before performing the transformation.

to_windows(dissolve=False, endpoint=False, **kwargs)[source]

Use the events dataframe to create sliding window events of a fixed length and a fixed number of steps, and which fill the bounds of each event.

Parameters

lengthnumerical, default 1.0

A fixed length for all windows being defined.

stepsint, default 1

A number of steps per window length. The resulting step length will be equal to length / steps. For non-overlapped windows, use a steps value of 1.

fill{‘none’,’cut’,’extend’,’left’,’right’,’balance’}, default ‘cut’

How to fill a gap at the end of an event’s range.

noneno window will be generated to fill the gap at the end of

the input range.

cuta truncated window will be created to fill the gap with a

length less than the full window length.

extendthe final window will be anchored on the grid defined by

the step value, extending beyond the window length to the right bound of the event.

leftthe final window will be anchored on the end of the input

range and will extend the full window length to the left.

rightthe final window will be anchored on the grid defined by

the step value, extending the full window length to the right, beyond the event’s end value.

balanceif the final range is greater than or equal to half the

target range length, perform the cut method; if it is less, perform the extend method.

dissolvebool, default False

Whether to dissolve the events dataframe before performing the transformation.

endpointbool, default False

Add a point event at the end of each event range.

class linref.events.collection.EventsGroup(df, beg=None, end=None, geom=None, closed=None, **kwargs)[source]

Bases: EventsFrame

User-level class for managing linear and points events data. This class is used for simple data sets with only a single group of events. Data is managed using both the pandas tabular data package as well as the ranges range data package.

EventsGroup class isntances can be used for a variety of linear referencing operations such as overlays to determine portions of events overlapped by an input range, intersections to determine which events intersect with an input range, length-weighted averages of event column values based on an input range, and more.

Parameters

dfpd.DataFrame

Pandas dataframe which contains linear or point events data.

beg, endstr or label

Column labels within the events dataframe which represent the linearly referenced location of each event. For linear events both are required, defining the begin and end location of each event. For point events, only ‘beg’ is required, defining the exact location of each event (the ‘end’ property will automatically be set to be equal to the ‘beg’ property).

geomstr or label, optional

Column label within the events dataframe which represents the shapely geometry associated with each event if available. If provided, certain additional class functionalities will be made available.

closedstr {‘left’, ‘left_mod’, ‘right’, ‘right_mod’, ‘both’,

‘neither’}, optional

Whether intervals are closed on the left-side, right-side, both or neither. If None, will default to ‘left_mod’ for linear events and ‘both’ for point events.

leftranges are always closed on the left and never closed on the

right.

left_modranges are always closed on the left and only closed on the

right when the next range is not consecutive.

rightranges are always closed on the right and never closed on the

right.

right_modranges are always closed on the right and only closed on

the left when the previous range is not consecutive.

both : ranges are always closed on both sides neither : ranges are never closed on either side

property centers

Centers of all event ranges.

intersecting(beg=None, end=None, other=None, closed='both', get_mask=False, **kwargs)[source]

Retrieve a selection of records from the group of events based on provided begin and end locations.

Parameters

beg, endnumerical or array-like, optional

The begin and end locations of the range or ranges to be tested. If a single range is to be tested, provide a numeric value. If multiple, provide an array-like with a single begin and end value for each range. If no end parameter provided, point locations will be assumed and end will be set equal to beg. Not required if other parameter is used.

otherEventsGroup, optional

Other EventsGroup instance to be intersected with this one. Can be provided instead of beg, end, and closed parameters and will take precedence over other input.

closedstr {‘left’, ‘right’, ‘both’, ‘neither’}, default ‘both’

Whether input interval is closed on the left-side, right-side, both or neither.

leftranges are always closed on the left and never closed on the

right.

rightranges are always closed on the right and never closed on

the right.

both : ranges are always closed on both sides neither : ranges are never closed on either side

get_maskbool, default False

Whether to return a boolean mask for selecting from the events dataframe instead of the selection from the dataframe itself.

property lengths

Lengths of all event ranges.

overlay(beg=None, end=None, other=None, **kwargs)[source]

Compute overlap of the input bounds with respect to the events group.

Parameters

beg, endscalar or array of scalars

Begin and end locations of the overlaid range(s).

otherEventsGroup, optional

Other EventsGroup instance to be overlaid with this one. Can be provided instead of beg and end parameters and will take precedence over other input.

normalizeboolean, default True

Whether overlapping lengths should be normalized range length to give a proportional result.

how{‘right’,’left’,’sum’}, default ‘right’

How overlapping lengths should be normalized. Only applied when normalize=True.

rightNormalize overlaps by the length of each provided overlay

range.

leftNormalize overlaps by the length of each of the collection’s

ranges being overlaid.

sumNormalize overlaps by the sum of the lengths of all overlaps

for each provided overlay range. If there are gaps in the collection’s ranges or overlaps between the collection’s ranges, this will allow the sum of the overlaps to still equal 1.0, except where no overlaps occur.

norm_zerofloat, optional

A number to substitute for instances where the normalizing factor (denominator) is equal to zero, e.g., when the overlay range has a length of zero and how=’right’. If not provided, all instances of zero division will return float value 0.0.

overlay_average(beg=None, end=None, cols=None, weighted=True, zeroweight=None, how='right', weights=None, suffix='_average', **kwargs)[source]

Compute the weighted average of a selection of events columns based on the overlap of the input bounds with respect to linear events.

Parameters

begfloat

Beginning milepost of the overlaid segment.

endfloat

Ending milepost of the overlaid segment.

colslist

List of column labels to aggregate.

weightedboolean, default True

Whether the computed average should be weighted. If False, an un-weighted average will be computed, giving all intersecting values an equal weight.

zeroweightdefault None

If weights sum to zero, how to compute average. If None, an un-weighted average will be computed. Else, no average will be computed and the input value will be returned instead.

how{‘right’,’left’,’sum’}, default ‘right’

How overlapping lengths should be normalized. Only applied when normalize=True.

rightNormalize overlaps by the length of each provided overlay

range.

leftNormalize overlaps by the length of each of the collection’s

event ranges.

sumNormalize overlaps by the sum of the lengths of all overlaps

for each provided overlay range. If there are gaps in the collection’s event ranges or overlaps between the collection’s ranges, this will allow the sum of the overlaps to still equal 1.0, except where no overlaps occur.

weightsnp.ndarray

An array of length-normalized overlay weights; if excluded, weights will be computed based on given mileposts and parameters; if multiple overlay computations are being conducted, computing the weights separately and then inputting them directly into the aggregation functions will produce time savings.

overlay_most(beg=None, end=None, cols=None, weights=None, suffix='_most', **kwargs)[source]

Compute the most represented values of a selection of events columns based on the overlap of the input bounds with respect to route events.

Parameters

begfloat

Beginning milepost of the overlaid segment.

endfloat

Ending milepost of the overlaid segment.

colslist

List of column labels to aggregate.

weightspd.Series

A series of length-normalized overlay weights; if excluded, weights will be computed based on given mileposts and parameters; if multiple overlay computations are being conducted, computing the weights separately and then inputting them directly into the aggregation functions will produce time savings.

overlay_sum(beg=None, end=None, cols=None, weighted=True, weights=None, suffix='_sum', **kwargs)[source]

Compute the weighted average of a selection of events columns based on the overlap of the input bounds with respect to route events.

Parameters

begfloat

Beginning milepost of the overlaid segment.

endfloat

Ending milepost of the overlaid segment.

colslist

List of column labels to aggregate.

weightedboolean, default True

Whether the computed sum should be weighted. If False, an un-weighted sum will be computed, giving all intersecting values an equal weight.

weightsnp.ndarray

An array of length-normalized overlay weights; if excluded, weights will be computed based on given mileposts and parameters; if multiple overlay computations are being conducted, computing the weights separately and then inputting them directly into the aggregation functions will produce time savings.

property rng
set_closed(closed, inplace=False)[source]

Change whether ranges are closed on left, right, both, or neither side.

Parameters

closedstr {‘left’, ‘left_mod’, ‘right’, ‘right_mod’, ‘both’,

‘neither’}, default ‘left’

Whether intervals are closed on the left-side, right-side, both or neither.

leftranges are always closed on the left and never closed on the

right.

left_modranges are always closed on the left and only closed on

the right when the next range is not consecutive.

rightranges are always closed on the right and never closed on

the right.

right_modranges are always closed on the right and only closed

on the left when the previous range is not consecutive.

both : ranges are always closed on both sides neither : ranges are never closed on either side

inplaceboolean, default False

Whether to perform the operation in place on the parent range collection, returning None.

property shape
class linref.events.collection.EventsLog(**kwargs)[source]

Bases: object

High-level class for logging and managing child EventsGroups created within the context of a parent EventsCollection class instance.

property data
property keys
log(key, obj, overwrite=True)[source]

Store the input events class instance within the log’s data under the provided key.

reset()[source]
linref.events.collection.check_compatibility(objs, errors='raise', **kwargs)[source]

Check if the input list of EventsCollections are all compatible for merging, unifying, or similar relational processes. Errors will be raised if objects are not found to be compatible with information about why they are not compatible. If requested, errors can be ignored, returning False instead. If all objects are compatible, the function will return True.

Parameters

objslist-like of EventsCollections

List of EventsCollection objects to be tested against each other.

errors{‘raise’,’ignore’}

How to respond to errors when they arise.

linref.events.collection.from_standard(df, require_end=False, **kwargs)[source]

Create an EventsCollection from the input dataframe assuming standard column labels. These standard labels can be modified on the class directly be modifying the associated class attributes: - default_keys - default_beg - default_end - default_geom

Standard labels include: keys : ‘RID’, ‘YEAR’, ‘KEY’ beg : ‘BMP’, ‘BEG’, ‘FROM’ end : ‘EMP’, ‘END’, ‘TO’ geom : ‘geometry’

Additional constructor keyword arguments can be passed through **kwargs.

Parameters

dfpd.DataFrame

Pandas dataframe which contains linear or point events data, formatted with standard labels. If multiple keys are detected, they will be assigned in the order in which they appear within the target dataframe. Only one of each begin and end option may be used. The geometry label is optional.

require_endbool, default False

Whether to raise an error if no valid unique end column label is found. If False, no end label will be used when generating the collection.

**kwargs

Additional keyword arguments to be passed to the EventsCollection constructor.

linref.events.merge module


Module featuring classes and functionality for merging events collections and summarizing/retrieving information from these merges. For ease of use, features in this module should be accessed through collection-level merging methods such as EventsCollection.merge instead of abstractly through the classes themselves.

Classes

EventsMerge, EventsMergeAttribute, EventsMergeTrace

Dependencies

pandas, numpy, rangel, copy, warnings, functools

Development

Developed by: Tariq Shihadah, tariq.shihadah@gmail.com

Created: 10/1/2021

Modified: 10/1/2021


class linref.events.merge.EventsMerge(left, right)[source]

Bases: object

High-level object class for managing merges between two events collections and summarizing/retrieving information from these merges. Generated through collection-level merging methods such as EventsCollection.merge.

any(**kwargs)[source]

Indicate whether each record intersects with at least one event.

Parameters

emptyscalar, string, or other pd.Series-compatible value, optional

Value to use to fill when there is no matching events group and aggregation cannot be performed. If None, values will be filled with np.nan.

build(inplace=True)[source]

Perform intersects and overlays to produce EventsMergeTrace objects for aggregation.

property columns
copy(deep=False)[source]

Create an exact copy of the events class instance.

Parameters

deepbool, default False

Whether the created copy should be a deep copy.

count(**kwargs)[source]

Count the number of intersecting events.

Parameters

emptyscalar, string, or other pd.Series-compatible value, optional

Value to use to fill when there is no matching events group and aggregation cannot be performed. If None, values will be filled with np.nan.

cut(**kwargs)[source]

Cut intersecting event routes at the intersecting begin and end locations, returning the resulting route’s geometry or the route itself if requested.

Parameters

emptyscalar, string, or other pd.Series-compatible value, optional

Value to use to fill when there are no intersecting events and aggregation cannot be performed. If None, values will be filled with np.nan.

return_mlsbool, default True

Whether to return the MultiLineString associated with each cut MLRRoute instead of the route itself.

distribute(column=None, squeeze=True, **kwargs)[source]

Intersect and distribute events over the range collection, scaling their values relative to their indexed distance from their intersecting range location.

Parameters

columnpandas column label or list of same, optional

The events dataframe column(s) containing the values associated with each event being analyzed. If not provided, all values will default to be 1.

blur_sizeint, default 0

The number of pixels to blur events across based on the blur style.

blur_stylestr or callable, default ‘linear’

The scaling function to be called at each blurring step to scale original values. If a callable is provided, it must accept a single integer input for the zero-indexed pixel number, returning a single float scaling value. Predefined blurring functions can be called using the following labels:

linearlinearly scale down values from the original value to zero

at the first index outside the blurred pixel range

norm_static : Tk norm_scale : Tk none : do not scale down original values

length_normalizebool, default True

Normalize the intersection scores by the length of the range to account for differing range lengths.

Created: 2022-10-04

interpolate(**kwargs)[source]

Interpolate along intersecting event routes at the intersecting location (or begin point for linear events), returning the resulting interpolated point geometry.

Parameters

snap{None, ‘near’, ‘left’, ‘right’}, default None

If the event location does not fall within any geometry, snap to the nearest match based on distance, choosing the closest location to the left, right, or the nearest side (‘near’). If None, a value error will be raised when no intersecting ranges are found.

point{‘begs’, ‘ends’, ‘centers’}, default ‘begs’

Where on the intersecting events the point should be made, at the begin, end, or center point of the range.

emptyscalar, string, or other pd.Series-compatible value, optional

Value to use to fill when there are no intersecting events and aggregation cannot be performed. If None, values will be filled with np.nan.

property keys
property left
property num_keys
property right
property traces
class linref.events.merge.EventsMergeAttribute(parent, column)[source]

Bases: object

all(empty=None, **kwargs)[source]

Return all values from intersecting events in a list.

Parameters

emptyscalar, string, or other pd.Series-compatible value, optional

Value to use to fill when there is no matching events group and aggregation cannot be performed. If None, values will be filled with np.nan.

any(empty=None, **kwargs)[source]

Indicate whether each record intersects with at least one event.

Parameters

emptyscalar, string, or other pd.Series-compatible value, optional

Value to use to fill when there is no matching events group and aggregation cannot be performed. If None, values will be filled with np.nan.

property column
count(empty=None)[source]

Return the count of all intersected event values.

Parameters

emptyscalar, string, or other pd.Series-compatible value, optional

Value to use to fill when there is no matching events group and aggregation cannot be performed. If None, values will be filled with np.nan.

cut(empty=None, return_mls=True)[source]

Cut intersecting event routes at the intersecting begin and end locations, returning the resulting route’s geometry or the route itself if requested.

Parameters

emptyscalar, string, or other pd.Series-compatible value, optional

Value to use to fill when there is no matching events group and aggregation cannot be performed. If None, values will be filled with np.nan.

return_mlsbool, default True

Whether to return the MultiLineString associated with each cut MLRRoute instead of the route itself.

first(empty=None)[source]

Return the first event value according to the order of the provided collection’s events dataframe.

Parameters

emptyscalar, string, or other pd.Series-compatible value, optional

Value to use to fill when there is no matching events group and aggregation cannot be performed. If None, values will be filled with np.nan.

interpolate(snap=None, point='begs', empty=None, **kwargs)[source]

Interpolate along intersecting event routes at the intersecting location (or begin point for linear events), returning the resulting interpolated point geometry.

Parameters

snap{None, ‘near’, ‘left’, ‘right’}, default None

If the event location does not fall within any geometry, snap to the nearest match based on distance, choosing the closest location to the left, right, or the nearest side (‘near’). If None, a value error will be raised when no intersecting ranges are found.

point{‘begs’, ‘ends’, ‘centers’}, default ‘begs’

Where on the intersecting events the point should be made, at the begin, end, or center point of the range.

emptyscalar, string, or other pd.Series-compatible value, optional

Value to use to fill when there is no matching events group and aggregation cannot be performed. If None, values will be filled with np.nan.

last(empty=None)[source]

Return the last event value according to the order of the provided collection’s events dataframe.

Parameters

emptyscalar, string, or other pd.Series-compatible value, optional

Value to use to fill when there is no matching events group and aggregation cannot be performed. If None, values will be filled with np.nan.

property loc
mean(empty=None, weighted=True, dropna=False)[source]

Return an overlay length-weighted average of all event values. An unweighted simple average can also be computed if weighted=True.

Parameters

emptyscalar, string, or other pd.Series-compatible value, optional

Value to use to fill when there is no matching events group and aggregation cannot be performed. If None, values will be filled with np.nan.

weightedboolean, default True

Whether the computed average should be weighted. If False, an un-weighted average will be computed, giving all intersecting values an equal weight.

dropnaboolean, default False

Whether to drop np.nan values before aggregating.

mode(empty=None)[source]

Return the most frequent unique event value.

Parameters

emptyscalar, string, or other pd.Series-compatible value, optional

Value to use to fill when there is no matching events group and aggregation cannot be performed. If None, values will be filled with np.nan.

most(empty=None, dropna=True)[source]

Return the event value associated with the greatest total overlay length, ignoring missing values by default.

Parameters

emptyscalar, string, or other pd.Series-compatible value, optional

Value to use to fill when there is no matching events group and aggregation cannot be performed. If None, values will be filled with np.nan.

dropnaboolean, default False

Whether to drop np.nan values in intersecting events before aggregating.

property ncols
property ndim
property parent
sum(empty=None, nansum=False)[source]

Return the sum of all intersected event values.

Parameters

emptyscalar, string, or other pd.Series-compatible value, optional

Value to use to fill when there is no matching events group and aggregation cannot be performed. If None, values will be filled with np.nan.

sumproduct(empty=None, normalized=False, dropna=False)[source]

Return the sum of all event values multiplied by the weights of the intersecting events. If normalized=False, the event values will be multiplied by the actual overlapping length (e.g., multiplying a per- mile value by the miles of overlap). If normalized=True, the event values will be multiplied by the normalized overlapping length (e.g., multiplying a total value of an overlapped event by the proportion of the event which is overlapped).

Parameters

emptyscalar, string, or other pd.Series-compatible value, optional

Value to use to fill when there is no matching events group and aggregation cannot be performed. If None, values will be filled with np.nan.

normalizedboolean, default False

Whether the weights of the intersecting events being multiplied with the event values should be normalized by the total length of the events being intersected.

dropnaboolean, default False

Whether to drop np.nan values before aggregating.

property traces
unique(empty=None, **kwargs)[source]

Return all unique values from intersecting events in a tuple.

Parameters

emptyscalar, string, or other pd.Series-compatible value, optional

Value to use to fill when there is no matching events group and aggregation cannot be performed. If None, values will be filled with np.nan.

value_counts(expand=True, dropna=True)[source]

Return a dataframe of all unique intersecting event values and their occurence counts.

Parameters

expandbool, default True

Whether to automatically expand the value counts data to a dataframe when a single column is being analyzed.

class linref.events.merge.EventsMergeTrace(group_left=None, group_right=None, key=None, mask=None, weights=None, success=True)[source]

Bases: object

Object class for managing data on the relationship between two events collections that have been merged using the EventsMerge system. Traces contain a few main elements:

group_left, group_rightpointers to the left and right events groups that

are related. During aggregation, information in the right group will be aggregated and formed to the dataframe underlying the left group.

keythe unique key associated with both events groups that produces their

relationship.

maska boolean array of shape (group_left.df.shape[0],

group_right.df.shape[0]), i.e., a number of rows equal to the number of rows in the left events group and a number of columns equal to the number of rows in the right events group. This mask defines all instances where the left and right groups intersect based on their defined ranges and closed parameters.

weightsa numeric array of shape (group_left.df.shape[0],

group_right.df.shape[0]), i.e., a number of rows equal to the number of rows in the left events group and a number of columns equal to the number of rows in the right events group. This array defines the actual numeric length that is overlapped between the individual events in the left and right events groups.

successa boolean indicator of whether or not a valid relationship has

been discovered between the left group and any right group. When False, no right group will be indicated.

linref.events.merge.get_mode(arr)[source]

Select the item from the input array which appears most frequently.

Parameters

arrarray-like

Array with target values

linref.events.merge.get_most(arr, weights)[source]

Select the item from the input 1D array which is associated with the highest total weight from each row in the 2D weights array. Scores are computed by summing the weights for each unique array value for each row of weights. When multiple values are tied, the first item in sorted order will be selected.

linref.events.spatial module


Module featuring classes and functionality for spatial analysis of events data including parallel projection.

Classes

ParallelProjector

Dependencies

pandas, numpy, copy, warnings, functools

Development

Developed by: Tariq Shihadah, tariq.shihadah@gmail.com

Created: 3/3/2022

Modified: 3/3/2022


class linref.events.spatial.ParallelProjector(target, other, samples=3, buffer=100)[source]

Bases: object

Experimental class for performing projections of linear geometries onto linear events collections.

property buffer
match(match='all', choose=1, sort_locs=True)[source]
property projectors
property sample_locs
property sample_points
property samples

linref.events.union module


Module featuring classes and functionality for unifying events collections.

Classes

EventsUnion

Dependencies

pandas, numpy, copy, warnings, functools

Development

Developed by: Tariq Shihadah, tariq.shihadah@gmail.com

Created: 4/13/2022

Modified: 4/13/2022


class linref.events.union.EventsUnion(objs, **kwargs)[source]

Bases: object

Parameters

objslist-like of EventsCollection instances

A selection of EventsCollection object instances to be combined into a single instance based on the input parameters.

**kwargs

Keyword arguments to be passed to the initialization function for the new EventsCollection instance.

get_groups(keys, empty=True)[source]

Retrieve unique groups of events from each related collection based on provided key values.

Parameters

keyskey value, tuple of key values, or list of the same

If only one key column is defined within the collections, a single column value may be provided. Otherwise, a tuple of column values must be provided in the same order as they appear in self.keys. To get multiple groups, a list of key values or tuples may be provided.

emptybool, default True

Whether to allow for empty events groups to be returned when the provided keys are valid but are not associated with any actual events. If False, these cases will return a KeyError.

property group_keys_unique
property num_keys
property num_objs
property objs
union(fill_gaps=False, get_index=True, merge=False, suffixes=None, **kwargs)[source]

Combine multiple EventsCollection instances into a single instance, creating least common intervals among all collections and maintaining all event attributes. The resulting combined events will be used to create and return an EventsCollection modeled after the first indexed collection in self.objs.

Parameters

fill_gapsbool, default False

Whether to fill gaps in the merged collection with empty events. These events would not be associated with any parent collection and would not be populated with any events attributes.

get_indexbool, default True

Whether to produce columns relating each new record to the index of the originating record in the input events dataframes. When this is not necessary, setting to False may produce significant time savings.

mergebool, default False

Whether to merge columns from each original dataframe to the newly created resegmented events collection dataframe. If not done during the union, it can be done later by merging on the new ‘index_i’ columns which correlate with the indices of the original dataframes. To perform this merge manually, the get_index parameter should be True.

suffixeslist-like, default [‘_0’, …, ‘_n’]

Sequence of length equal to the number of events collections being unified, where each element is a string indicating the suffix to add to overlapping column names in each corresponding events dataframe. All entries must be unique.

Module contents