337 resultados para Outliers


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Hydrophobicity as measured by Log P is an important molecular property related to toxicity and carcinogenicity. With increasing public health concerns for the effects of Disinfection By-Products (DBPs), there are considerable benefits in developing Quantitative Structure and Activity Relationship (QSAR) models capable of accurately predicting Log P. In this research, Log P values of 173 DBP compounds in 6 functional classes were used to develop QSAR models, by applying 3 molecular descriptors, namely, Energy of the Lowest Unoccupied Molecular Orbital (ELUMO), Number of Chlorine (NCl) and Number of Carbon (NC) by Multiple Linear Regression (MLR) analysis. The QSAR models developed were validated based on the Organization for Economic Co-operation and Development (OECD) principles. The model Applicability Domain (AD) and mechanistic interpretation were explored. Considering the very complex nature of DBPs, the established QSAR models performed very well with respect to goodness-of-fit, robustness and predictability. The predicted values of Log P of DBPs by the QSAR models were found to be significant with a correlation coefficient R2 from 81% to 98%. The Leverage Approach by Williams Plot was applied to detect and remove outliers, consequently increasing R 2 by approximately 2% to 13% for different DBP classes. The developed QSAR models were statistically validated for their predictive power by the Leave-One-Out (LOO) and Leave-Many-Out (LMO) cross validation methods. Finally, Monte Carlo simulation was used to assess the variations and inherent uncertainties in the QSAR models of Log P and determine the most influential parameters in connection with Log P prediction. The developed QSAR models in this dissertation will have a broad applicability domain because the research data set covered six out of eight common DBP classes, including halogenated alkane, halogenated alkene, halogenated aromatic, halogenated aldehyde, halogenated ketone, and halogenated carboxylic acid, which have been brought to the attention of regulatory agencies in recent years. Furthermore, the QSAR models are suitable to be used for prediction of similar DBP compounds within the same applicability domain. The selection and integration of various methodologies developed in this research may also benefit future research in similar fields.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Quantitative Structure-Activity Relationship (QSAR) has been applied extensively in predicting toxicity of Disinfection By-Products (DBPs) in drinking water. Among many toxicological properties, acute and chronic toxicities of DBPs have been widely used in health risk assessment of DBPs. These toxicities are correlated with molecular properties, which are usually correlated with molecular descriptors. The primary goals of this thesis are: (1) to investigate the effects of molecular descriptors (e.g., chlorine number) on molecular properties such as energy of the lowest unoccupied molecular orbital (E LUMO) via QSAR modelling and analysis; (2) to validate the models by using internal and external cross-validation techniques; (3) to quantify the model uncertainties through Taylor and Monte Carlo Simulation. One of the very important ways to predict molecular properties such as ELUMO is using QSAR analysis. In this study, number of chlorine (NCl ) and number of carbon (NC) as well as energy of the highest occupied molecular orbital (EHOMO) are used as molecular descriptors. There are typically three approaches used in QSAR model development: (1) Linear or Multi-linear Regression (MLR); (2) Partial Least Squares (PLS); and (3) Principle Component Regression (PCR). In QSAR analysis, a very critical step is model validation after QSAR models are established and before applying them to toxicity prediction. The DBPs to be studied include five chemical classes: chlorinated alkanes, alkenes, and aromatics. In addition, validated QSARs are developed to describe the toxicity of selected groups (i.e., chloro-alkane and aromatic compounds with a nitro- or cyano group) of DBP chemicals to three types of organisms (e.g., Fish, T. pyriformis, and P.pyosphoreum) based on experimental toxicity data from the literature. The results show that: (1) QSAR models to predict molecular property built by MLR, PLS or PCR can be used either to select valid data points or to eliminate outliers; (2) The Leave-One-Out Cross-Validation procedure by itself is not enough to give a reliable representation of the predictive ability of the QSAR models, however, Leave-Many-Out/K-fold cross-validation and external validation can be applied together to achieve more reliable results; (3) E LUMO are shown to correlate highly with the NCl for several classes of DBPs; and (4) According to uncertainty analysis using Taylor method, the uncertainty of QSAR models is contributed mostly from NCl for all DBP classes.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Geo-referenced catch and fishing effort data of the bigeye tuna fisheries in the Indian Ocean over 1952-2014 were analysed and standardized to facilitate population dynamics modelling studies. During this sixty-two years historical period of exploitation, many changes occurred both in the fishing techniques and the monitoring of activity. This study includes a series of processing steps used for standardization of spatial resolution, conversion and standardization of catch and effort units, raising of geo-referenced catch into nominal catch level, screening and correction of outliers, and detection of major catchability changes over long time series of fishing data, i.e., the Japanese longline fleet operating in the tropical Indian Ocean. A total of thirty fisheries were finally determined from longline, purse seine and other-gears data sets, from which 10 longline and four purse seine fisheries represented 96% of the whole historical catch. The geo-referenced records consists of catch, fishing effort and associated length frequency samples of all fisheries.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This research analyses the components of the organizational structure of the UFRN (Rio Grande do Norte Federal University) and to what extent they affect organizational performance. The study, classified as exploratory and descriptive, was conducted in two phases. The first phase consists of a pilot test to refine the research instrument and to identify the latent components of the organizational structure, and the second to characterize these components and thereby establish relationships with organizational performance. In the first phase, the research was conducted in 20 UFRN organizational units with the participation of 84 employees between technical-administrative and teachers, after considering missing values and outliers, while the second phase occurred in two stages: one conducted with 279 valid cases, consisting of technical-administrative and teachers of 37 UFRN units, and another with 112 managers of the institution in the 49 units identified in this research. The instrument adopted in the first phase was composed of 36 indicators of organizational structure, with six extracted and adapted from the instrument developed by Medeiros (2003) and 30 prepared based on the literature review, from Mintzberg (2012), Hall (1984), Vasconcellos and Hemsley (1997) and Seiffert and Costa (2007) and 7 performance indicators adapted from Fleury and Mills (2006), Vieira and Vieira (2003) and Kaplan and Norton (1997) from the self-assessment instrument in use by the university. In this stage the data were analyzed using the techniques of factor analysis and reliability analysis by means of Cronbach’s alpha, aiming to extract the factors representing the components of the organizational structure. In step 1 of the second phase, the instrument, refined and reduced in the previous phase, with 24 variables of organizational structure and 6 for performance was used, while in step 2, a semi-structured interview guide with questions, organized into nine organizational structure elements, was adopted aiming to gather information to understand the relationship of structure to performance of the UFRN. The techniques used in the second phase, as a whole, were factor analysis and reliability analysis to characterize the components extracted in the previous phase and to validate the performance variables and correlation analysis, regression and content analysis to establish and understand the relationship between structure and performance. The results showed, in the two stages, six latent components of organizational structure in the context under study: training and internalization, communication, hierarchy, decentralization, formalization and departmentalization - with high levels of Cronbach's alpha indexes - which can thereby be characterized as components of UFRN structure. Six performance indicators were validated in this study, showing them as efficient and highly reliable. Finally, it was found that the formalization, communication, decentralization, training and internalization components positively affect UFRN performance, while departmentalization has an adverse affect and hierarchy did not show a significant relationship. The results achieved in this work are important in future studies to support the development of a model structure that represents the specifics of the university

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This work proposes a modified control chart incorporating concepts of time series analysis. Specifically, we considerer Gaussian mixed transition distribution (GMTD) models. The GMTD models are a more general class than the autorregressive (AR) family, in the sense that the autocorrelated processes may present flat stretches, bursts or outliers. In this scenario traditional Shewhart charts are no longer appropriate tools to monitoring such processes. Therefore, Vasilopoulos and Stamboulis (1978) proposed a modified version of those charts, considering proper control limits based on autocorrelated processes. In order to evaluate the efficiency of the proposed technique a comparison with a traditional Shewhart chart (which ignores the autocorrelation structure of the process), a AR(1) Shewhart control chart and a GMTD Shewhart control chart was made. An analytical expression for the process variance, as well as control limits were developed for a particular GMTD model. The ARL was used as a criteria to measure the efficiency of control charts. The comparison was made based on a series generated according to a GMTD model. The results point to the direction that the modified Shewhart GMTD charts have a better performance than the AR(1) Shewhart and the traditional Shewhart.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In the context of climate change over South America (SA) has been observed that the combination of high temperatures and rain more temperatures less rainfall, cause different impacts such as extreme precipitation events, favorable conditions for fires and droughts. As a result, these regions face growing threat of water shortage, local or generalized. Thus, the water availability in Brazil depends largely on the weather and its variations in different time scales. In this sense, the main objective of this research is to study the moisture budget through regional climate models (RCM) from Project Regional Climate Change Assessments for La Plata Basin (CLARIS-LPB) and combine these RCM through two statistical techniques in an attempt to improve prediction on three areas of AS: Amazon (AMZ), Northeast Brazil (NEB) and the Plata Basin (LPB) in past climates (1961-1990) and future (2071-2100). The moisture transport on AS was investigated through the moisture fluxes vertically integrated. The main results showed that the average fluxes of water vapor in the tropics (AMZ and NEB) are higher across the eastern and northern edges, thus indicating that the contributions of the trade winds of the North Atlantic and South are equally important for the entry moisture during the months of JJA and DJF. This configuration was observed in all the models and climates. In comparison climates, it was found that the convergence of the flow of moisture in the past weather was smaller in the future in various regions and seasons. Similarly, the majority of the SPC simulates the future climate, reduced precipitation in tropical regions (AMZ and NEB), and an increase in the LPB region. The second phase of this research was to carry out combination of RCM in more accurately predict precipitation, through the multiple regression techniques for components Main (C.RPC) and convex combination (C.EQM), and then analyze and compare combinations of RCM (ensemble). The results indicated that the combination was better in RPC represent precipitation observed in both climates. Since, in addition to showing values be close to those observed, the technique obtained coefficient of correlation of moderate to strong magnitude in almost every month in different climates and regions, also lower dispersion of data (RMSE). A significant advantage of the combination of methods was the ability to capture extreme events (outliers) for the study regions. In general, it was observed that the wet C.EQM captures more extreme, while C.RPC can capture more extreme dry climates and in the three regions studied.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The great interest in nonlinear system identification is mainly due to the fact that a large amount of real systems are complex and need to have their nonlinearities considered so that their models can be successfully used in applications of control, prediction, inference, among others. This work evaluates the application of Fuzzy Wavelet Neural Networks (FWNN) to identify nonlinear dynamical systems subjected to noise and outliers. Generally, these elements cause negative effects on the identification procedure, resulting in erroneous interpretations regarding the dynamical behavior of the system. The FWNN combines in a single structure the ability to deal with uncertainties of fuzzy logic, the multiresolution characteristics of wavelet theory and learning and generalization abilities of the artificial neural networks. Usually, the learning procedure of these neural networks is realized by a gradient based method, which uses the mean squared error as its cost function. This work proposes the replacement of this traditional function by an Information Theoretic Learning similarity measure, called correntropy. With the use of this similarity measure, higher order statistics can be considered during the FWNN training process. For this reason, this measure is more suitable for non-Gaussian error distributions and makes the training less sensitive to the presence of outliers. In order to evaluate this replacement, FWNN models are obtained in two identification case studies: a real nonlinear system, consisting of a multisection tank, and a simulated system based on a model of the human knee joint. The results demonstrate that the application of correntropy as the error backpropagation algorithm cost function makes the identification procedure using FWNN models more robust to outliers. However, this is only achieved if the gaussian kernel width of correntropy is properly adjusted.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This article is protected by copyright. All rights reserved. Acknowledgements This study was funded by a BBSRC studentship (MAW) and NERC grants NE/H00775X/1 and NE/D000602/1 (SBP). The authors are grateful to Mario Röder and Keliya Bai for fieldwork assistance, and all estate owners, factors and keepers for access to field sites, most particularly MJ Taylor and Mike Nisbet (Airlie), Neil Brown (Allargue), RR Gledson and David Scrimgeour (Delnadamph), Andrew Salvesen and John Hay (Dinnet), Stuart Young and Derek Calder (Edinglassie), Kirsty Donald and David Busfield (Glen Dye), Neil Hogbin and Ab Taylor (Glen Muick), Alistair Mitchell (Glenlivet), Simon Blackett, Jim Davidson and Liam Donald (Invercauld), Richard Cooke and Fred Taylor† (Invermark), Shaila Rao and Christopher Murphy (Mar Lodge), and Ralph Peters and Philip Astor (Tillypronie). Data accessibility • Genotype data (DataDryad: doi:10.5061/dryad.4t7jk) • Metadata (information on sampling sites, phenotypes and medication regimen) (DataDryad: doi:10.5061/dryad.4t7jk)

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Sound is a key sensory modality for Hawaiian spinner dolphins. Like many other marine animals, these dolphins rely on sound and their acoustic environment for many aspects of their daily lives, making it is essential to understand soundscape in areas that are critical to their survival. Hawaiian spinner dolphins rest during the day in shallow coastal areas and forage offshore at night. In my dissertation I focus on the soundscape of the bays where Hawaiian spinner dolphins rest taking a soundscape ecology approach. I primarily relied on passive acoustic monitoring using four DSG-Ocean acoustic loggers in four Hawaiian spinner dolphin resting bays on the Kona Coast of Hawai‛i Island. 30-second recordings were made every four minutes in each of the bays for 20 to 27 months between January 8, 2011 and March 30, 2013. I also utilized concomitant vessel-based visual surveys in the four bays to provide context for these recordings. In my first chapter I used the contributions of the dolphins to the soundscape to monitor presence in the bays and found the degree of presence varied greatly from less than 40% to nearly 90% of days monitored with dolphins present. Having established these bays as important to the animals, in my second chapter I explored the many components of their resting bay soundscape and evaluated the influence of natural and human events on the soundscape. I characterized the overall soundscape in each of the four bays, used the tsunami event of March 2011 to approximate a natural soundscape and identified all loud daytime outliers. Overall, sound levels were consistently louder at night and quieter during the daytime due to the sounds from snapping shrimp. In fact, peak Hawaiian spinner dolphin resting time co-occurs with the quietest part of the day. However, I also found that humans drastically alter this daytime soundscape with sound from offshore aquaculture, vessel sound and military mid-frequency active sonar. During one recorded mid-frequency active sonar event in August 2011, sound pressure levels in the 3.15 kHz 1/3rd-octave band were as high as 45.8 dB above median ambient noise levels. Human activity both inside (vessels) and outside (sonar and aquaculture) the bays significantly altered the resting bay soundscape. Inside the bays there are high levels of human activity including vessel-based tourism directly targeting the dolphins. The interactions between humans and dolphins in their resting bays are of concern; therefore, my third chapter aimed to assess the acoustic response of the dolphins to human activity. Using days where acoustic recordings overlapped with visual surveys I found the greatest response in a bay with dolphin-centric activities, not in the bay with the most vessel activity, indicating that it is not the magnitude that elicits a response but the focus of the activity. In my fourth chapter I summarize the key results from my first three chapters to illustrate the power of multiple site design to prioritize action to protect Hawaiian spinner dolphins in their resting bays, a chapter I hope will be useful for managers should they take further action to protect the dolphins.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Quantitative Structure-Activity Relationship (QSAR) has been applied extensively in predicting toxicity of Disinfection By-Products (DBPs) in drinking water. Among many toxicological properties, acute and chronic toxicities of DBPs have been widely used in health risk assessment of DBPs. These toxicities are correlated with molecular properties, which are usually correlated with molecular descriptors. The primary goals of this thesis are: 1) to investigate the effects of molecular descriptors (e.g., chlorine number) on molecular properties such as energy of the lowest unoccupied molecular orbital (ELUMO) via QSAR modelling and analysis; 2) to validate the models by using internal and external cross-validation techniques; 3) to quantify the model uncertainties through Taylor and Monte Carlo Simulation. One of the very important ways to predict molecular properties such as ELUMO is using QSAR analysis. In this study, number of chlorine (NCl) and number of carbon (NC) as well as energy of the highest occupied molecular orbital (EHOMO) are used as molecular descriptors. There are typically three approaches used in QSAR model development: 1) Linear or Multi-linear Regression (MLR); 2) Partial Least Squares (PLS); and 3) Principle Component Regression (PCR). In QSAR analysis, a very critical step is model validation after QSAR models are established and before applying them to toxicity prediction. The DBPs to be studied include five chemical classes: chlorinated alkanes, alkenes, and aromatics. In addition, validated QSARs are developed to describe the toxicity of selected groups (i.e., chloro-alkane and aromatic compounds with a nitro- or cyano group) of DBP chemicals to three types of organisms (e.g., Fish, T. pyriformis, and P.pyosphoreum) based on experimental toxicity data from the literature. The results show that: 1) QSAR models to predict molecular property built by MLR, PLS or PCR can be used either to select valid data points or to eliminate outliers; 2) The Leave-One-Out Cross-Validation procedure by itself is not enough to give a reliable representation of the predictive ability of the QSAR models, however, Leave-Many-Out/K-fold cross-validation and external validation can be applied together to achieve more reliable results; 3) ELUMO are shown to correlate highly with the NCl for several classes of DBPs; and 4) According to uncertainty analysis using Taylor method, the uncertainty of QSAR models is contributed mostly from NCl for all DBP classes.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Quantile regression (QR) was first introduced by Roger Koenker and Gilbert Bassett in 1978. It is robust to outliers which affect least squares estimator on a large scale in linear regression. Instead of modeling mean of the response, QR provides an alternative way to model the relationship between quantiles of the response and covariates. Therefore, QR can be widely used to solve problems in econometrics, environmental sciences and health sciences. Sample size is an important factor in the planning stage of experimental design and observational studies. In ordinary linear regression, sample size may be determined based on either precision analysis or power analysis with closed form formulas. There are also methods that calculate sample size based on precision analysis for QR like C.Jennen-Steinmetz and S.Wellek (2005). A method to estimate sample size for QR based on power analysis was proposed by Shao and Wang (2009). In this paper, a new method is proposed to calculate sample size based on power analysis under hypothesis test of covariate effects. Even though error distribution assumption is not necessary for QR analysis itself, researchers have to make assumptions of error distribution and covariate structure in the planning stage of a study to obtain a reasonable estimate of sample size. In this project, both parametric and nonparametric methods are provided to estimate error distribution. Since the method proposed can be implemented in R, user is able to choose either parametric distribution or nonparametric kernel density estimation for error distribution. User also needs to specify the covariate structure and effect size to carry out sample size and power calculation. The performance of the method proposed is further evaluated using numerical simulation. The results suggest that the sample sizes obtained from our method provide empirical powers that are closed to the nominal power level, for example, 80%.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Clustering algorithms, pattern mining techniques and associated quality metrics emerged as reliable methods for modeling learners’ performance, comprehension and interaction in given educational scenarios. The specificity of available data such as missing values, extreme values or outliers, creates a challenge to extract significant user models from an educational perspective. In this paper we introduce a pattern detection mechanism with-in our data analytics tool based on k-means clustering and on SSE, silhouette, Dunn index and Xi-Beni index quality metrics. Experiments performed on a dataset obtained from our online e-learning platform show that the extracted interaction patterns were representative in classifying learners. Furthermore, the performed monitoring activities created a strong basis for generating automatic feedback to learners in terms of their course participation, while relying on their previous performance. In addition, our analysis introduces automatic triggers that highlight learners who will potentially fail the course, enabling tutors to take timely actions.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The complexity of modern geochemical data sets is increasing in several aspects (number of available samples, number of elements measured, number of matrices analysed, geological-environmental variability covered, etc), hence it is becoming increasingly necessary to apply statistical methods to elucidate their structure. This paper presents an exploratory analysis of one such complex data set, the Tellus geochemical soil survey of Northern Ireland (NI). This exploratory analysis is based on one of the most fundamental exploratory tools, principal component analysis (PCA) and its graphical representation as a biplot, albeit in several variations: the set of elements included (only major oxides vs. all observed elements), the prior transformation applied to the data (none, a standardization or a logratio transformation) and the way the covariance matrix between components is estimated (classical estimation vs. robust estimation). Results show that a log-ratio PCA (robust or classical) of all available elements is the most powerful exploratory setting, providing the following insights: the first two processes controlling the whole geochemical variation in NI soils are peat coverage and a contrast between “mafic” and “felsic” background lithologies; peat covered areas are detected as outliers by a robust analysis, and can be then filtered out if required for further modelling; and peat coverage intensity can be quantified with the %Br in the subcomposition (Br, Rb, Ni).

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Robust joint modelling is an emerging field of research. Through the advancements in electronic patient healthcare records, the popularly of joint modelling approaches has grown rapidly in recent years providing simultaneous analysis of longitudinal and survival data. This research advances previous work through the development of a novel robust joint modelling methodology for one of the most common types of standard joint models, that which links a linear mixed model with a Cox proportional hazards model. Through t-distributional assumptions, longitudinal outliers are accommodated with their detrimental impact being down weighed and thus providing more efficient and reliable estimates. The robust joint modelling technique and its major benefits are showcased through the analysis of Northern Irish end stage renal disease patients. With an ageing population and growing prevalence of chronic kidney disease within the United Kingdom, there is a pressing demand to investigate the detrimental relationship between the changing haemoglobin levels of haemodialysis patients and their survival. As outliers within the NI renal data were found to have significantly worse survival, identification of outlying individuals through robust joint modelling may aid nephrologists to improve patient's survival. A simulation study was also undertaken to explore the difference between robust and standard joint models in the presence of increasing proportions and extremity of longitudinal outliers. More efficient and reliable estimates were obtained by robust joint models with increasing contrast between the robust and standard joint models when a greater proportion of more extreme outliers are present. Through illustration of the gains in efficiency and reliability of parameters when outliers exist, the potential of robust joint modelling is evident. The research presented in this thesis highlights the benefits and stresses the need to utilise a more robust approach to joint modelling in the presence of longitudinal outliers.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In recent years, depth cameras have been widely utilized in camera tracking for augmented and mixed reality. Many of the studies focus on the methods that generate the reference model simultaneously with the tracking and allow operation in unprepared environments. However, methods that rely on predefined CAD models have their advantages. In such methods, the measurement errors are not accumulated to the model, they are tolerant to inaccurate initialization, and the tracking is always performed directly in reference model's coordinate system. In this paper, we present a method for tracking a depth camera with existing CAD models and the Iterative Closest Point (ICP) algorithm. In our approach, we render the CAD model using the latest pose estimate and construct a point cloud from the corresponding depth map. We construct another point cloud from currently captured depth frame, and find the incremental change in the camera pose by aligning the point clouds. We utilize a GPGPU-based implementation of the ICP which efficiently uses all the depth data in the process. The method runs in real-time, it is robust for outliers, and it does not require any preprocessing of the CAD models. We evaluated the approach using the Kinect depth sensor, and compared the results to a 2D edge-based method, to a depth-based SLAM method, and to the ground truth. The results show that the approach is more stable compared to the edge-based method and it suffers less from drift compared to the depth-based SLAM.