Biblioteca Digital

960 resultados para Discrete Data Models

Comparison of Multivariate and Univariate Models for Genetic Evaluation of Milk Yield based on Test Day Data

Relevância:

40.00% 40.00%

Publicador:

Resumo:

2000 Mathematics Subject Classification: 62H12, 62P99

Veja mais

Development of prediction models for freeway incident durations using data mining techniques

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The nation's freeway systems are becoming increasingly congested. A major contribution to traffic congestion on freeways is due to traffic incidents. Traffic incidents are non-recurring events such as accidents or stranded vehicles that cause a temporary roadway capacity reduction, and they can account for as much as 60 percent of all traffic congestion on freeways. One major freeway incident management strategy involves diverting traffic to avoid incident locations by relaying timely information through Intelligent Transportation Systems (ITS) devices such as dynamic message signs or real-time traveler information systems. The decision to divert traffic depends foremost on the expected duration of an incident, which is difficult to predict. In addition, the duration of an incident is affected by many contributing factors. Determining and understanding these factors can help the process of identifying and developing better strategies to reduce incident durations and alleviate traffic congestion. A number of research studies have attempted to develop models to predict incident durations, yet with limited success. ^ This dissertation research attempts to improve on this previous effort by applying data mining techniques to a comprehensive incident database maintained by the District 4 ITS Office of the Florida Department of Transportation (FDOT). Two categories of incident duration prediction models were developed: "offline" models designed for use in the performance evaluation of incident management programs, and "online" models for real-time prediction of incident duration to aid in the decision making of traffic diversion in the event of an ongoing incident. Multiple data mining analysis techniques were applied and evaluated in the research. The multiple linear regression analysis and decision tree based method were applied to develop the offline models, and the rule-based method and a tree algorithm called M5P were used to develop the online models. ^ The results show that the models in general can achieve high prediction accuracy within acceptable time intervals of the actual durations. The research also identifies some new contributing factors that have not been examined in past studies. As part of the research effort, software code was developed to implement the models in the existing software system of District 4 FDOT for actual applications. ^

Veja mais

Improving resource management in virtualized data centers using application performance models

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The rapid growth of virtualized data centers and cloud hosting services is making the management of physical resources such as CPU, memory, and I/O bandwidth in data center servers increasingly important. Server management now involves dealing with multiple dissimilar applications with varying Service-Level-Agreements (SLAs) and multiple resource dimensions. The multiplicity and diversity of resources and applications are rendering administrative tasks more complex and challenging. This thesis aimed to develop a framework and techniques that would help substantially reduce data center management complexity.^ We specifically addressed two crucial data center operations. First, we precisely estimated capacity requirements of client virtual machines (VMs) while renting server space in cloud environment. Second, we proposed a systematic process to efficiently allocate physical resources to hosted VMs in a data center. To realize these dual objectives, accurately capturing the effects of resource allocations on application performance is vital. The benefits of accurate application performance modeling are multifold. Cloud users can size their VMs appropriately and pay only for the resources that they need; service providers can also offer a new charging model based on the VMs performance instead of their configured sizes. As a result, clients will pay exactly for the performance they are actually experiencing; on the other hand, administrators will be able to maximize their total revenue by utilizing application performance models and SLAs. ^ This thesis made the following contributions. First, we identified resource control parameters crucial for distributing physical resources and characterizing contention for virtualized applications in a shared hosting environment. Second, we explored several modeling techniques and confirmed the suitability of two machine learning tools, Artificial Neural Network and Support Vector Machine, to accurately model the performance of virtualized applications. Moreover, we suggested and evaluated modeling optimizations necessary to improve prediction accuracy when using these modeling tools. Third, we presented an approach to optimal VM sizing by employing the performance models we created. Finally, we proposed a revenue-driven resource allocation algorithm which maximizes the SLA-generated revenue for a data center.^

Veja mais

Learning Data-Driven Models of Non-Verbal Behaviors for Building Rapport Using an Intelligent Virtual Agent

Relevância:

40.00% 40.00%

Publicador:

Resumo:

There is a growing societal need to address the increasing prevalence of behavioral health issues, such as obesity, alcohol or drug use, and general lack of treatment adherence for a variety of health problems. The statistics, worldwide and in the USA, are daunting. Excessive alcohol use is the third leading preventable cause of death in the United States (with 79,000 deaths annually), and is responsible for a wide range of health and social problems. On the positive side though, these behavioral health issues (and associated possible diseases) can often be prevented with relatively simple lifestyle changes, such as losing weight with a diet and/or physical exercise, or learning how to reduce alcohol consumption. Medicine has therefore started to move toward finding ways of preventively promoting wellness, rather than solely treating already established illness. Evidence-based patient-centered Brief Motivational Interviewing (BMI) interven- tions have been found particularly effective in helping people find intrinsic motivation to change problem behaviors after short counseling sessions, and to maintain healthy lifestyles over the long-term. Lack of locally available personnel well-trained in BMI, however, often limits access to successful interventions for people in need. To fill this accessibility gap, Computer-Based Interventions (CBIs) have started to emerge. Success of the CBIs, however, critically relies on insuring engagement and retention of CBI users so that they remain motivated to use these systems and come back to use them over the long term as necessary. Because of their text-only interfaces, current CBIs can therefore only express limited empathy and rapport, which are the most important factors of health interventions. Fortunately, in the last decade, computer science research has progressed in the design of simulated human characters with anthropomorphic communicative abilities. Virtual characters interact using humans’ innate communication modalities, such as facial expressions, body language, speech, and natural language understanding. By advancing research in Artificial Intelligence (AI), we can improve the ability of artificial agents to help us solve CBI problems. To facilitate successful communication and social interaction between artificial agents and human partners, it is essential that aspects of human social behavior, especially empathy and rapport, be considered when designing human-computer interfaces. Hence, the goal of the present dissertation is to provide a computational model of rapport to enhance an artificial agent’s social behavior, and to provide an experimental tool for the psychological theories shaping the model. Parts of this thesis were already published in [LYL+12, AYL12, AL13, ALYR13, LAYR13, YALR13, ALY14].

Veja mais

Surface velocity fields, digital elevation models, ice front positions and grounding line derived from remote sensing data at Dinsmoor-Bombardier-Edgeworth glacier system, Antarctic Peninsula (1992-2014)

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The northern Antarctic Peninsula is one of the fastest changing regions on Earth. The disintegration of the Larsen-A Ice Shelf in 1995 caused tributary glaciers to adjust by speeding up, surface lowering, and overall increased ice-mass discharge. In this study, we investigate the temporal variation of these changes at the Dinsmoor-Bombardier-Edgeworth glacier system by analyzing dense time series from various spaceborne and airborne Earth observation missions. Precollapse ice shelf conditions and subsequent adjustments through 2014 were covered. Our results show a response of the glacier system some months after the breakup, reaching maximum surface velocities at the glacier front of up to 8.8 m/d in 1999 and a subsequent decrease to ~1.5 m/d in 2014. Using a dense time series of interferometrically derived TanDEM-X digital elevation models and photogrammetric data, an exponential function was fitted for the decrease in surface elevation. Elevation changes in areas below 1000 m a.s.l. amounted to at least 130±15 m130±15 m between 1995 and 2014, with change rates of ~3.15 m/a between 2003 and 2008. Current change rates (2010-2014) are in the range of 1.7 m/a. Mass imbalances were computed with different scenarios of boundary conditions. The most plausible results amount to -40.7±3.9 Gt-40.7±3.9 Gt. The contribution to sea level rise was estimated to be 18.8±1.8 Gt18.8±1.8 Gt, corresponding to a 0.052±0.005 mm0.052±0.005 mm sea level equivalent, for the period 1995-2014. Our analysis and scenario considerations revealed that major uncertainties still exist due to insufficiently accurate ice-thickness information. The second largest uncertainty in the computations was the glacier surface mass balance, which is still poorly known. Our time series analysis facilitates an improved comparison with GRACE data and as input to modeling of glacio-isostatic uplift in this region. The study contributed to a better understanding of how glacier systems adjust to ice shelf disintegration.

Veja mais

Multilevel structural equation models for longitudinal data where predictors are measured more frequently than outcomes : an application to the effects of stress on the cognitive function of nurses

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Funded by Chief Scientist Office, Scotland. Grant Number: CZH/4/394 Economic and Social Research Council grant as part of the National Centre for Research Methods. Grant Number: RES-576-25-0032

Veja mais

Using individual tracking data to validate the predictions of species distribution models

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The authors would like to thank the College of Life Sciences of Aberdeen University and Marine Scotland Science which funded CP's PhD project. Skate tagging experiments were undertaken as part of Scottish Government project SP004. We thank Ian Burrett for help in catching the fish and the other fishermen and anglers who returned tags. We thank José Manuel Gonzalez-Irusta for extracting and making available the environmental layers used as environmental covariates in the environmental suitability modelling procedure. We also thank Jason Matthiopoulos for insightful suggestions on habitat utilization metrics as well as Stephen C.F. Palmer, and three anonymous reviewers for useful suggestions to improve the clarity and quality of the manuscript.

Veja mais

Models and empirical data for the production of referring expressions

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Article Accepted Date: 29 May 2014 Acknowledgements The authors gratefully acknowledge the support of the Cognitive Science Society for the organisation of the Workshop on Production of Referring Expressions: Bridging the Gap between Cognitive and Computational Approaches to Reference, from which this special issue originated. Funding Emiel Krahmer and Albert Gatt thank The Netherlands Organisation for Scientific Research (NWO) for VICI grant Bridging the Gap between Computational Linguistics and Psycholinguistics: The Case of Referring Expressions (grant number 277-70-007).

Veja mais

Data to Decision in a Dynamic Ocean: Robust Species Distribution Models and Spatial Decision Frameworks

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Human use of the oceans is increasingly in conflict with conservation of endangered species. Methods for managing the spatial and temporal placement of industries such as military, fishing, transportation and offshore energy, have historically been post hoc; i.e. the time and place of human activity is often already determined before assessment of environmental impacts. In this dissertation, I build robust species distribution models in two case study areas, US Atlantic (Best et al. 2012) and British Columbia (Best et al. 2015), predicting presence and abundance respectively, from scientific surveys. These models are then applied to novel decision frameworks for preemptively suggesting optimal placement of human activities in space and time to minimize ecological impacts: siting for offshore wind energy development, and routing ships to minimize risk of striking whales. Both decision frameworks relate the tradeoff between conservation risk and industry profit with synchronized variable and map views as online spatial decision support systems.

For siting offshore wind energy development (OWED) in the U.S. Atlantic (chapter 4), bird density maps are combined across species with weights of OWED sensitivity to collision and displacement and 10 km2 sites are compared against OWED profitability based on average annual wind speed at 90m hub heights and distance to transmission grid. A spatial decision support system enables toggling between the map and tradeoff plot views by site. A selected site can be inspected for sensitivity to a cetaceans throughout the year, so as to capture months of the year which minimize episodic impacts of pre-operational activities such as seismic airgun surveying and pile driving.

Routing ships to avoid whale strikes (chapter 5) can be similarly viewed as a tradeoff, but is a different problem spatially. A cumulative cost surface is generated from density surface maps and conservation status of cetaceans, before applying as a resistance surface to calculate least-cost routes between start and end locations, i.e. ports and entrance locations to study areas. Varying a multiplier to the cost surface enables calculation of multiple routes with different costs to conservation of cetaceans versus cost to transportation industry, measured as distance. Similar to the siting chapter, a spatial decisions support system enables toggling between the map and tradeoff plot view of proposed routes. The user can also input arbitrary start and end locations to calculate the tradeoff on the fly.

Essential to the input of these decision frameworks are distributions of the species. The two preceding chapters comprise species distribution models from two case study areas, U.S. Atlantic (chapter 2) and British Columbia (chapter 3), predicting presence and density, respectively. Although density is preferred to estimate potential biological removal, per Marine Mammal Protection Act requirements in the U.S., all the necessary parameters, especially distance and angle of observation, are less readily available across publicly mined datasets.

In the case of predicting cetacean presence in the U.S. Atlantic (chapter 2), I extracted datasets from the online OBIS-SEAMAP geo-database, and integrated scientific surveys conducted by ship (n=36) and aircraft (n=16), weighting a Generalized Additive Model by minutes surveyed within space-time grid cells to harmonize effort between the two survey platforms. For each of 16 cetacean species guilds, I predicted the probability of occurrence from static environmental variables (water depth, distance to shore, distance to continental shelf break) and time-varying conditions (monthly sea-surface temperature). To generate maps of presence vs. absence, Receiver Operator Characteristic (ROC) curves were used to define the optimal threshold that minimizes false positive and false negative error rates. I integrated model outputs, including tables (species in guilds, input surveys) and plots (fit of environmental variables, ROC curve), into an online spatial decision support system, allowing for easy navigation of models by taxon, region, season, and data provider.

For predicting cetacean density within the inner waters of British Columbia (chapter 3), I calculated density from systematic, line-transect marine mammal surveys over multiple years and seasons (summer 2004, 2005, 2008, and spring/autumn 2007) conducted by Raincoast Conservation Foundation. Abundance estimates were calculated using two different methods: Conventional Distance Sampling (CDS) and Density Surface Modelling (DSM). CDS generates a single density estimate for each stratum, whereas DSM explicitly models spatial variation and offers potential for greater precision by incorporating environmental predictors. Although DSM yields a more relevant product for the purposes of marine spatial planning, CDS has proven to be useful in cases where there are fewer observations available for seasonal and inter-annual comparison, particularly for the scarcely observed elephant seal. Abundance estimates are provided on a stratum-specific basis. Steller sea lions and harbour seals are further differentiated by ‘hauled out’ and ‘in water’. This analysis updates previous estimates (Williams & Thomas 2007) by including additional years of effort, providing greater spatial precision with the DSM method over CDS, novel reporting for spring and autumn seasons (rather than summer alone), and providing new abundance estimates for Steller sea lion and northern elephant seal. In addition to providing a baseline of marine mammal abundance and distribution, against which future changes can be compared, this information offers the opportunity to assess the risks posed to marine mammals by existing and emerging threats, such as fisheries bycatch, ship strikes, and increased oil spill and ocean noise issues associated with increases of container ship and oil tanker traffic in British Columbia’s continental shelf waters.

Starting with marine animal observations at specific coordinates and times, I combine these data with environmental data, often satellite derived, to produce seascape predictions generalizable in space and time. These habitat-based models enable prediction of encounter rates and, in the case of density surface models, abundance that can then be applied to management scenarios. Specific human activities, OWED and shipping, are then compared within a tradeoff decision support framework, enabling interchangeable map and tradeoff plot views. These products make complex processes transparent for gaming conservation, industry and stakeholders towards optimal marine spatial management, fundamental to the tenets of marine spatial planning, ecosystem-based management and dynamic ocean management.

Veja mais

On the Advancement of Probabilistic Models of Decompression Sickness

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The work presented in this dissertation is focused on applying engineering methods to develop and explore probabilistic survival models for the prediction of decompression sickness in US NAVY divers. Mathematical modeling, computational model development, and numerical optimization techniques were employed to formulate and evaluate the predictive quality of models fitted to empirical data. In Chapters 1 and 2 we present general background information relevant to the development of probabilistic models applied to predicting the incidence of decompression sickness. The remainder of the dissertation introduces techniques developed in an effort to improve the predictive quality of probabilistic decompression models and to reduce the difficulty of model parameter optimization.

The first project explored seventeen variations of the hazard function using a well-perfused parallel compartment model. Models were parametrically optimized using the maximum likelihood technique. Model performance was evaluated using both classical statistical methods and model selection techniques based on information theory. Optimized model parameters were overall similar to those of previously published Results indicated that a novel hazard function definition that included both ambient pressure scaling and individually fitted compartment exponent scaling terms.

We developed ten pharmacokinetic compartmental models that included explicit delay mechanics to determine if predictive quality could be improved through the inclusion of material transfer lags. A fitted discrete delay parameter augmented the inflow to the compartment systems from the environment. Based on the observation that symptoms are often reported after risk accumulation begins for many of our models, we hypothesized that the inclusion of delays might improve correlation between the model predictions and observed data. Model selection techniques identified two models as having the best overall performance, but comparison to the best performing model without delay and model selection using our best identified no delay pharmacokinetic model both indicated that the delay mechanism was not statistically justified and did not substantially improve model predictions.

Our final investigation explored parameter bounding techniques to identify parameter regions for which statistical model failure will not occur. When a model predicts a no probability of a diver experiencing decompression sickness for an exposure that is known to produce symptoms, statistical model failure occurs. Using a metric related to the instantaneous risk, we successfully identify regions where model failure will not occur and identify the boundaries of the region using a root bounding technique. Several models are used to demonstrate the techniques, which may be employed to reduce the difficulty of model optimization for future investigations.

Veja mais

Holocene precipitation change in different monsoon sub-regions (time-slices and transient data) simulated by different global climate models, with links to model results

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The recently proposed global monsoon hypothesis interprets monsoon systems as part of one global-scale atmospheric overturning circulation, implying a connection between the regional monsoon systems and an in-phase behaviour of all northern hemispheric monsoons on annual timescales (Trenberth et al., 2000). Whether this concept can be applied to past climates and variability on longer timescales is still under debate, because the monsoon systems exhibit different regional characteristics such as different seasonality (i.e. onset, peak, and withdrawal). To investigate the interconnection of different monsoon systems during the pre-industrial Holocene, five transient global climate model simulations have been analysed with respect to the rainfall trend and variability in different sub-domains of the Afro-Asian monsoon region. Our analysis suggests that on millennial timescales with varying orbital forcing, the monsoons do not behave as a tightly connected global system. According to the models, the Indian and North African monsoons are coupled, showing similar rainfall trend and moderate correlation in rainfall variability in all models. The East Asian monsoon changes independently during the Holocene. The dissimilarities in the seasonality of the monsoon sub-systems lead to a stronger response of the North African and Indian monsoon systems to the Holocene insolation forcing than of the East Asian monsoon and affect the seasonal distribution of Holocene rainfall variations. Within the Indian and North African monsoon domain, precipitation solely changes during the summer months, showing a decreasing Holocene precipitation trend. In the East Asian monsoon region, the precipitation signal is determined by an increasing precipitation trend during spring and a decreasing precipitation change during summer, partly balancing each other. A synthesis of reconstructions and the model results do not reveal an impact of the different seasonality on the timing of the Holocene rainfall optimum in the different sub-monsoon systems. They rather indicate locally inhomogeneous rainfall changes and show, that single palaeo-records should not be used to characterise the rainfall change and monsoon evolution for entire monsoon sub-systems.

Veja mais

Joint Modelling of Longitudinal and Survival Data: A comparison of Joint and Independent Models

Relevância:

40.00% 40.00%

Publicador:

Veja mais

Design and Development of data mining Models for the Prediction of Manpower Placement in the Technical Domain

Relevância:

40.00% 40.00%

Publicador:

Veja mais

Structure learning of context-specific graphical models

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Abstract The ultimate problem considered in this thesis is modeling a high-dimensional joint distribution over a set of discrete variables. For this purpose, we consider classes of context-specific graphical models and the main emphasis is on learning the structure of such models from data. Traditional graphical models compactly represent a joint distribution through a factorization justi ed by statements of conditional independence which are encoded by a graph structure. Context-speci c independence is a natural generalization of conditional independence that only holds in a certain context, speci ed by the conditioning variables. We introduce context-speci c generalizations of both Bayesian networks and Markov networks by including statements of context-specific independence which can be encoded as a part of the model structures. For the purpose of learning context-speci c model structures from data, we derive score functions, based on results from Bayesian statistics, by which the plausibility of a structure is assessed. To identify high-scoring structures, we construct stochastic and deterministic search algorithms designed to exploit the structural decomposition of our score functions. Numerical experiments on synthetic and real-world data show that the increased exibility of context-specific structures can more accurately emulate the dependence structure among the variables and thereby improve the predictive accuracy of the models.

Veja mais

Dynamic fator Models for bivariate Count Data: an application to fire activity

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The study of forest re activity, in its several aspects, is essencial to understand the phenomenon and to prevent environmental public catastrophes. In this context the analysis of monthly number of res along several years is one aspect to have into account in order to better comprehend this tematic. The goal of this work is to analyze the monthly number of forest res in the neighboring districts of Aveiro and Coimbra, Portugal, through dynamic factor models for bivariate count series. We use a bayesian approach, through MCMC methods, to estimate the model parameters as well as to estimate the common latent factor to both series.

Veja mais

960 resultados para Discrete Data Models

Filtro por publicador