Burr, Wesley S
Application of Data Science to Paramedic Data
Paramedic data has significant potential for research. Paramedics see many patients every year and collect a wide variety of crucial data at each encounter. This data is rarely used for good reason: it's messy and hard to work with. But like theunderdog character in a classic movie, with a little bit of work and a lot of understanding, paramedic data has significant potential to change the world of medical research. Paramedics throughout the world are involved in research every day, but most of this research uses purpose-built data structures and never takes advantage of the existing data that paramedics create as part of their everyday work. Through a project-based approach grounded in developing a better understanding of the opioid crisis, this thesis will examine the quantity and structure of the existing paramedic data, the complexities of its current design, the steps necessary to access it, and the processes necessary to clean existing data to a point where it can be easily modelled. Once we have our dataset, we will explore the challenges of choosing key metrics by examining the effectiveness of metrics currently employed to monitor the opioid crisis and the influences public health programs and changing policies have had on these metrics. Next, we will explore the temporal distributions of opioid and other intoxicant use with an eye to providing data to support public health in their harm reduction efforts. And lastly, we will look at the effect of fixed- and floating-point temporal influences on intoxicant-related calls with an eye to how these temporal points can affect call volumes. By using this exploration of the opioid crisis, this thesis will show that with a more thorough understanding of what paramedic data is, what data points are available, and the processes needed to transform it, paramedic data has the potential to greatly expand the limits of health care data science into a more precise and more all-encompassing discipline.
Author Keywords: Ambulance, Data Science, Opioid, Overdose, Paramedic, Pre-hospital
Particulate Matter Component Analyses in Relation to Public Health in Canada
This thesis explores the shot-term relationship between exposure to ambient air pollution and human health through metrics such as mortality and hospitalization in Canada. We begin by detailing the organization and interpolation of air pollution data from its partially quality-controlled source form. Analyses of seasonal, regional and temporal trends of all major components of PM2.5, was performed, showing a seasonal variation across most regions and validating the dataset.
A one-pollutant statistical Generalized Additive Model was applied to the data, estimating the health risk associated with exposure to thirteen different components of PM2.5. The selected components were based on those that compromised the majority of the mass and included: sulphate, nitrate, zinc, silicon, iron, nickel, vanadium, potassium, organic carbon, organic matter, elemental carbon, total carbon. Trends based on annual estimates of the association for PM2.5, and its constituents,were compared, showing that carbonaceous compounds, sulphate and nitrate had similar estimates of association. Many estimates, as is common in population ecologic epidemiology, had association estimates statistically indistinguishable from zero, but with clear features of interest, including evident differences between cold and warm season associations in Canada's temperate climate.
A method to model two correlated pollutants (in this case, PM2.5 and O3) was developed using thin plate splines. In this approach, the location of the response surface (after accounting for the temperature, a smooth function of time and day of week) that corresponds to the average pollutant concentration and the average plus one unit was used as the estimate of the joint contribution of pollutants due to a unit increase. The estimates from the thin plate spline (TPS) approach were compared to the single pollutant models, with large increases and decreases in PM2.5 and O3 being captured in the TPS estimates. However, this approach indicated significantly larger error in the estimates than would be expected, indicating a possible future area for refinement.
Author Keywords: Air pollution, Environmental Epidemiology, Generalized Additive Models, Human Health, Multivariate Models, Thin Plate Splines
Historic Magnetogram Digitization
The conversion of historical analog images to time series data was performed by using deconvolution for pre-processing, followed by the use of custom built digitization algorithms. These algorithms have been developed to be user friendly with the objective of aiding in the creation of a data set from decades of mechanical observations collected from the Agincourt and Toronto geomagnetic observatories beginning in the 1840s. The created algorithms follow a structure which begins with pre-processing followed by tracing and pattern detection. Each digitized magnetogram was then visually inspected, and the algorithm performance verified to ensure accuracy, and to allow the data to later be connected to create a long-running time-series.
Author Keywords: Magnetograms
A Framework for Testing Time Series Interpolators
The spectrum of a given time series is a characteristic function describing its frequency properties. Spectrum estimation methods require time series data to be contiguous in order for robust estimators to retain their performance. This poses a fundamental challenge, especially when considering real-world scientific data that is often plagued by missing values, and/or irregularly recorded measurements. One area of research devoted to this problem seeks to repair the original time series through interpolation. There are several algorithms that have proven successful for the interpolation of considerably large gaps of missing data, but most are only valid for use on stationary time series: processes whose statistical properties are time-invariant, which is not a common property of real-world data. The Hybrid Wiener interpolator is a method that was designed for repairing nonstationary data, rendering it suitable for spectrum estimation. This thesis work presents a computational framework designed for conducting systematic testing on the statistical performance of this method in light of changes to gap structure and departures from the stationarity assumption. A comprehensive audit of the Hybrid Wiener Interpolator against other state-of-the art algorithms will also be explored.
Author Keywords: applied statistics, hybrid wiener interpolator, imputation, interpolation, R statistical software, time series
Population-Level Ambient Pollution Exposure Proxies
The Air Health Trend Indicator (AHTI) is a joint Health Canada / Environment and Climate Change Canada initiative that seeks to model the Canadian national population health risk due to acute exposure to ambient air pollution. The common model in the field uses averages of local ambient air pollution monitors to produce a population-level exposure proxy variable. This method is applied to ozone, nitrogen dioxide, particulate matter, and other similar air pollutants.
We examine the representative nature of these proxy averages on a large-scale Canadian data set, representing hundreds of monitors and dozens of city-level populations. The careful determination of temporal and spatial correlations between the disparate monitors allows for more precise estimation of population-level exposure, taking inspiration from the land-use regression models commonly used in geography. We conclude this work with an examination of the risk estimation differences between the original, simplistic population exposure metric and our new, revised metric.
Author Keywords: Air Pollution, Population Health Risk, Spatial Process, Spatio-Temporal, Temporal Process, Time Series