901 resultados para = least-squares fit to flow-through data
Resumo:
Fractal and multifractal are concepts that have grown increasingly popular in recent years in the soil analysis, along with the development of fractal models. One of the common steps is to calculate the slope of a linear fit commonly using least squares method. This shouldn?t be a special problem, however, in many situations using experimental data the researcher has to select the range of scales at which is going to work neglecting the rest of points to achieve the best linearity that in this type of analysis is necessary. Robust regression is a form of regression analysis designed to circumvent some limitations of traditional parametric and non-parametric methods. In this method we don?t have to assume that the outlier point is simply an extreme observation drawn from the tail of a normal distribution not compromising the validity of the regression results. In this work we have evaluated the capacity of robust regression to select the points in the experimental data used trying to avoid subjective choices. Based on this analysis we have developed a new work methodology that implies two basic steps: ? Evaluation of the improvement of linear fitting when consecutive points are eliminated based on R pvalue. In this way we consider the implications of reducing the number of points. ? Evaluation of the significance of slope difference between fitting with the two extremes points and fitted with the available points. We compare the results applying this methodology and the common used least squares one. The data selected for these comparisons are coming from experimental soil roughness transect and simulated based on middle point displacement method adding tendencies and noise. The results are discussed indicating the advantages and disadvantages of each methodology.
Resumo:
Hoy en día, con la evolución continua y rápida de las tecnologías de la información y los dispositivos de computación, se recogen y almacenan continuamente grandes volúmenes de datos en distintos dominios y a través de diversas aplicaciones del mundo real. La extracción de conocimiento útil de una cantidad tan enorme de datos no se puede realizar habitualmente de forma manual, y requiere el uso de técnicas adecuadas de aprendizaje automático y de minería de datos. La clasificación es una de las técnicas más importantes que ha sido aplicada con éxito a varias áreas. En general, la clasificación se compone de dos pasos principales: en primer lugar, aprender un modelo de clasificación o clasificador a partir de un conjunto de datos de entrenamiento, y en segundo lugar, clasificar las nuevas instancias de datos utilizando el clasificador aprendido. La clasificación es supervisada cuando todas las etiquetas están presentes en los datos de entrenamiento (es decir, datos completamente etiquetados), semi-supervisada cuando sólo algunas etiquetas son conocidas (es decir, datos parcialmente etiquetados), y no supervisada cuando todas las etiquetas están ausentes en los datos de entrenamiento (es decir, datos no etiquetados). Además, aparte de esta taxonomía, el problema de clasificación se puede categorizar en unidimensional o multidimensional en función del número de variables clase, una o más, respectivamente; o también puede ser categorizado en estacionario o cambiante con el tiempo en función de las características de los datos y de la tasa de cambio subyacente. A lo largo de esta tesis, tratamos el problema de clasificación desde tres perspectivas diferentes, a saber, clasificación supervisada multidimensional estacionaria, clasificación semisupervisada unidimensional cambiante con el tiempo, y clasificación supervisada multidimensional cambiante con el tiempo. Para llevar a cabo esta tarea, hemos usado básicamente los clasificadores Bayesianos como modelos. La primera contribución, dirigiéndose al problema de clasificación supervisada multidimensional estacionaria, se compone de dos nuevos métodos de aprendizaje de clasificadores Bayesianos multidimensionales a partir de datos estacionarios. Los métodos se proponen desde dos puntos de vista diferentes. El primer método, denominado CB-MBC, se basa en una estrategia de envoltura de selección de variables que es voraz y hacia delante, mientras que el segundo, denominado MB-MBC, es una estrategia de filtrado de variables con una aproximación basada en restricciones y en el manto de Markov. Ambos métodos han sido aplicados a dos problemas reales importantes, a saber, la predicción de los inhibidores de la transcriptasa inversa y de la proteasa para el problema de infección por el virus de la inmunodeficiencia humana tipo 1 (HIV-1), y la predicción del European Quality of Life-5 Dimensions (EQ-5D) a partir de los cuestionarios de la enfermedad de Parkinson con 39 ítems (PDQ-39). El estudio experimental incluye comparaciones de CB-MBC y MB-MBC con los métodos del estado del arte de la clasificación multidimensional, así como con métodos comúnmente utilizados para resolver el problema de predicción de la enfermedad de Parkinson, a saber, la regresión logística multinomial, mínimos cuadrados ordinarios, y mínimas desviaciones absolutas censuradas. En ambas aplicaciones, los resultados han sido prometedores con respecto a la precisión de la clasificación, así como en relación al análisis de las estructuras gráficas que identifican interacciones conocidas y novedosas entre las variables. La segunda contribución, referida al problema de clasificación semi-supervisada unidimensional cambiante con el tiempo, consiste en un método nuevo (CPL-DS) para clasificar flujos de datos parcialmente etiquetados. Los flujos de datos difieren de los conjuntos de datos estacionarios en su proceso de generación muy rápido y en su aspecto de cambio de concepto. Es decir, los conceptos aprendidos y/o la distribución subyacente están probablemente cambiando y evolucionando en el tiempo, lo que hace que el modelo de clasificación actual sea obsoleto y deba ser actualizado. CPL-DS utiliza la divergencia de Kullback-Leibler y el método de bootstrapping para cuantificar y detectar tres tipos posibles de cambio: en las predictoras, en la a posteriori de la clase o en ambas. Después, si se detecta cualquier cambio, un nuevo modelo de clasificación se aprende usando el algoritmo EM; si no, el modelo de clasificación actual se mantiene sin modificaciones. CPL-DS es general, ya que puede ser aplicado a varios modelos de clasificación. Usando dos modelos diferentes, el clasificador naive Bayes y la regresión logística, CPL-DS se ha probado con flujos de datos sintéticos y también se ha aplicado al problema real de la detección de código malware, en el cual los nuevos ficheros recibidos deben ser continuamente clasificados en malware o goodware. Los resultados experimentales muestran que nuestro método es efectivo para la detección de diferentes tipos de cambio a partir de los flujos de datos parcialmente etiquetados y también tiene una buena precisión de la clasificación. Finalmente, la tercera contribución, sobre el problema de clasificación supervisada multidimensional cambiante con el tiempo, consiste en dos métodos adaptativos, a saber, Locally Adpative-MB-MBC (LA-MB-MBC) y Globally Adpative-MB-MBC (GA-MB-MBC). Ambos métodos monitorizan el cambio de concepto a lo largo del tiempo utilizando la log-verosimilitud media como métrica y el test de Page-Hinkley. Luego, si se detecta un cambio de concepto, LA-MB-MBC adapta el actual clasificador Bayesiano multidimensional localmente alrededor de cada nodo cambiado, mientras que GA-MB-MBC aprende un nuevo clasificador Bayesiano multidimensional. El estudio experimental realizado usando flujos de datos sintéticos multidimensionales indica los méritos de los métodos adaptativos propuestos. ABSTRACT Nowadays, with the ongoing and rapid evolution of information technology and computing devices, large volumes of data are continuously collected and stored in different domains and through various real-world applications. Extracting useful knowledge from such a huge amount of data usually cannot be performed manually, and requires the use of adequate machine learning and data mining techniques. Classification is one of the most important techniques that has been successfully applied to several areas. Roughly speaking, classification consists of two main steps: first, learn a classification model or classifier from an available training data, and secondly, classify the new incoming unseen data instances using the learned classifier. Classification is supervised when the whole class values are present in the training data (i.e., fully labeled data), semi-supervised when only some class values are known (i.e., partially labeled data), and unsupervised when the whole class values are missing in the training data (i.e., unlabeled data). In addition, besides this taxonomy, the classification problem can be categorized into uni-dimensional or multi-dimensional depending on the number of class variables, one or more, respectively; or can be also categorized into stationary or streaming depending on the characteristics of the data and the rate of change underlying it. Through this thesis, we deal with the classification problem under three different settings, namely, supervised multi-dimensional stationary classification, semi-supervised unidimensional streaming classification, and supervised multi-dimensional streaming classification. To accomplish this task, we basically used Bayesian network classifiers as models. The first contribution, addressing the supervised multi-dimensional stationary classification problem, consists of two new methods for learning multi-dimensional Bayesian network classifiers from stationary data. They are proposed from two different points of view. The first method, named CB-MBC, is based on a wrapper greedy forward selection approach, while the second one, named MB-MBC, is a filter constraint-based approach based on Markov blankets. Both methods are applied to two important real-world problems, namely, the prediction of the human immunodeficiency virus type 1 (HIV-1) reverse transcriptase and protease inhibitors, and the prediction of the European Quality of Life-5 Dimensions (EQ-5D) from 39-item Parkinson’s Disease Questionnaire (PDQ-39). The experimental study includes comparisons of CB-MBC and MB-MBC against state-of-the-art multi-dimensional classification methods, as well as against commonly used methods for solving the Parkinson’s disease prediction problem, namely, multinomial logistic regression, ordinary least squares, and censored least absolute deviations. For both considered case studies, results are promising in terms of classification accuracy as well as regarding the analysis of the learned MBC graphical structures identifying known and novel interactions among variables. The second contribution, addressing the semi-supervised uni-dimensional streaming classification problem, consists of a novel method (CPL-DS) for classifying partially labeled data streams. Data streams differ from the stationary data sets by their highly rapid generation process and their concept-drifting aspect. That is, the learned concepts and/or the underlying distribution are likely changing and evolving over time, which makes the current classification model out-of-date requiring to be updated. CPL-DS uses the Kullback-Leibler divergence and bootstrapping method to quantify and detect three possible kinds of drift: feature, conditional or dual. Then, if any occurs, a new classification model is learned using the expectation-maximization algorithm; otherwise, the current classification model is kept unchanged. CPL-DS is general as it can be applied to several classification models. Using two different models, namely, naive Bayes classifier and logistic regression, CPL-DS is tested with synthetic data streams and applied to the real-world problem of malware detection, where the new received files should be continuously classified into malware or goodware. Experimental results show that our approach is effective for detecting different kinds of drift from partially labeled data streams, as well as having a good classification performance. Finally, the third contribution, addressing the supervised multi-dimensional streaming classification problem, consists of two adaptive methods, namely, Locally Adaptive-MB-MBC (LA-MB-MBC) and Globally Adaptive-MB-MBC (GA-MB-MBC). Both methods monitor the concept drift over time using the average log-likelihood score and the Page-Hinkley test. Then, if a drift is detected, LA-MB-MBC adapts the current multi-dimensional Bayesian network classifier locally around each changed node, whereas GA-MB-MBC learns a new multi-dimensional Bayesian network classifier from scratch. Experimental study carried out using synthetic multi-dimensional data streams shows the merits of both proposed adaptive methods.
Resumo:
The Jones-Wilkins-Lee (JWL) equation of state parameters for ANFO and emulsion-type explosives have been obtained from cylinder test expansion measurements. The calculation method comprises a new radial expansion function, with a non-zero initial velocity at the onset of the expansion in order to comply with a positive Gurney energy at unit relative volume, as the isentropic expansion from the CJ state predicts. The equations reflecting the CJ state conditions and the measured expansion energy were solved for the JWL parameters by a non-linear least squares scheme. The JWL parameters of thirteen ANFO and emulsion type explosives have been determined in this way from their cylinder test expansion data. The results were evaluated through numerical modelling of the tests with the LS-DYNA hydrocode; the expansion histories from the modelling were compared with the measured ones, and excellent agreement was found.
Resumo:
Traffic flow time series data are usually high dimensional and very complex. Also they are sometimes imprecise and distorted due to data collection sensor malfunction. Additionally, events like congestion caused by traffic accidents add more uncertainty to real-time traffic conditions, making traffic flow forecasting a complicated task. This article presents a new data preprocessing method targeting multidimensional time series with a very high number of dimensions and shows its application to real traffic flow time series from the California Department of Transportation (PEMS web site). The proposed method consists of three main steps. First, based on a language for defining events in multidimensional time series, mTESL, we identify a number of types of events in time series that corresponding to either incorrect data or data with interference. Second, each event type is restored utilizing an original method that combines real observations, local forecasted values and historical data. Third, an exponential smoothing procedure is applied globally to eliminate noise interference and other random errors so as to provide good quality source data for future work.
Resumo:
Este estudo teve como objetivo principal analisar a relação entre a Liderança Transformacional, a Conversão do Conhecimento e a Eficácia Organizacional. Foram considerados como pressupostos teóricos conceitos consolidados sobre os temas desta relação, além de recentes pesquisas já realizadas em outros países e contextos organizacionais. Com base nisto identificou-se potencial estudo de um modelo que relacionasse estes três conceitos. Para tal considera-se que as organizações que buscam atingir Vantagem Competitiva e incorporam a Knowledge-Based View possam conquistar diferenciação frente a seus concorrentes. Nesse contexto o conhecimento ganha maior destaque e papel protagonista nestas organizações. Dessa forma criar conhecimento através de seus colaboradores, passa a ser um dos desafios dessas organizações ao passo que sugere melhoria de seus indicadores Econômicos, Sociais, Sistêmicos e Políticos, o que se define por Eficácia Organizacional. Portanto os modos de conversão do conhecimento nas organizações, demonstram relevância, uma vez que se cria e se converte conhecimentos através da interação entre o conhecimento existente de seus colaboradores. Essa conversão do conhecimento ou modelo SECI possui quatro modos que são a Socialização, Externalização, Combinação e Internalização. Nessa perspectiva a liderança nas organizações apresenta-se como um elemento capaz de influenciar seus colaboradores, propiciando maior dinâmica ao modelo SECI de conversão do conhecimento. Se identifica então na liderança do tipo Transformacional, características que possam influenciar colaboradores e entende-se que esta relação entre a Liderança Transformacional e a Conversão do Conhecimento possa ter influência positiva nos indicadores da Eficácia Organizacional. Dessa forma esta pesquisa buscou analisar um modelo que explorasse essa relação entre a liderança do tipo Transformacional, a Conversão do Conhecimento (SECI) e a Eficácia Organizacional. Esta pesquisa teve o caráter quantitativo com coleta de dados através do método survey, obtendo um total de 230 respondentes válidos de diferentes organizações. O instrumento de coleta de dados foi composto por afirmativas relativas ao modelo de relação pesquisado com um total de 44 itens. O perfil de respondentes concentrou-se entre 30 e 39 anos de idade, com a predominância de organizações privadas e de departamentos de TI/Telecom, Docência e Recursos Humanos respectivamente. O tratamento dos dados foi através da Análise Fatorial Exploratória e Modelagem de Equações Estruturais via Partial Least Square Path Modeling (PLS-PM). Como resultado da análise desta pesquisa, as hipóteses puderam ser confirmadas, concluindo que a Liderança Transformacional apresenta influência positiva nos modos de Conversão do Conhecimento e que; a Conversão do Conhecimento influencia positivamente na Eficácia Organizacional. Ainda, concluiu-se que a percepção entre os respondentes não apresenta resultado diferente sobre o modelo desta pesquisa entre quem possui ou não função de liderança.
Resumo:
Este estudo teve como objetivo principal analisar a relação entre a Liderança Transformacional, a Conversão do Conhecimento e a Eficácia Organizacional. Foram considerados como pressupostos teóricos conceitos consolidados sobre os temas desta relação, além de recentes pesquisas já realizadas em outros países e contextos organizacionais. Com base nisto identificou-se potencial estudo de um modelo que relacionasse estes três conceitos. Para tal considera-se que as organizações que buscam atingir Vantagem Competitiva e incorporam a Knowledge-Based View possam conquistar diferenciação frente a seus concorrentes. Nesse contexto o conhecimento ganha maior destaque e papel protagonista nestas organizações. Dessa forma criar conhecimento através de seus colaboradores, passa a ser um dos desafios dessas organizações ao passo que sugere melhoria de seus indicadores Econômicos, Sociais, Sistêmicos e Políticos, o que se define por Eficácia Organizacional. Portanto os modos de conversão do conhecimento nas organizações, demonstram relevância, uma vez que se cria e se converte conhecimentos através da interação entre o conhecimento existente de seus colaboradores. Essa conversão do conhecimento ou modelo SECI possui quatro modos que são a Socialização, Externalização, Combinação e Internalização. Nessa perspectiva a liderança nas organizações apresenta-se como um elemento capaz de influenciar seus colaboradores, propiciando maior dinâmica ao modelo SECI de conversão do conhecimento. Se identifica então na liderança do tipo Transformacional, características que possam influenciar colaboradores e entende-se que esta relação entre a Liderança Transformacional e a Conversão do Conhecimento possa ter influência positiva nos indicadores da Eficácia Organizacional. Dessa forma esta pesquisa buscou analisar um modelo que explorasse essa relação entre a liderança do tipo Transformacional, a Conversão do Conhecimento (SECI) e a Eficácia Organizacional. Esta pesquisa teve o caráter quantitativo com coleta de dados através do método survey, obtendo um total de 230 respondentes válidos de diferentes organizações. O instrumento de coleta de dados foi composto por afirmativas relativas ao modelo de relação pesquisado com um total de 44 itens. O perfil de respondentes concentrou-se entre 30 e 39 anos de idade, com a predominância de organizações privadas e de departamentos de TI/Telecom, Docência e Recursos Humanos respectivamente. O tratamento dos dados foi através da Análise Fatorial Exploratória e Modelagem de Equações Estruturais via Partial Least Square Path Modeling (PLS-PM). Como resultado da análise desta pesquisa, as hipóteses puderam ser confirmadas, concluindo que a Liderança Transformacional apresenta influência positiva nos modos de Conversão do Conhecimento e que; a Conversão do Conhecimento influencia positivamente na Eficácia Organizacional. Ainda, concluiu-se que a percepção entre os respondentes não apresenta resultado diferente sobre o modelo desta pesquisa entre quem possui ou não função de liderança.
Resumo:
Some of the factors affecting colonisation of a colonisation sampler, the Standard Aufwuchs Unit (S. Auf. U.) were investigated, namely immersion period, whether anchored on the bottom or suspended, and the influence of riffles. It was concluded that a four-week immersion period was best. S. Auf. U. anchored on the bottom collected both more taxa and individuals than suspended ones. Fewer taxa but more individuals colonised S. Auf. U. in the potamon zone compared to the rhithron zone with a consequent reduction in the values of pollution indexes and diversity. It was concluded that a completely different scoring system was necessary for lowland rivers. Macroinvertebrates colonising S. Auf. U. in simulated streams, lowland rivers and the R. Churnet reflected water quality. A variety of pollution and diversity indexes were applied to results from lowland river sites. Instead of these, it was recommended that an abbreviated species - relative abundance list be used to summarise biological data for use in lowland river surveillance. An intensive study of gastropod populations was made in simulated streams. Lynnaea peregra increased in abundance whereas Potamopyrgas jenkinsi decreased with increasing sewage effluent concentration. No clear-cut differences in reproduction were observed. The presence/absence of eight gastropod taxa was compared with concentrations of various pollutants in lowland rivers. On the basis of all field work it appeared that ammonia, nitrite, copper and zinc were the toxicants most likely to be detrimental to gastropods and that P. jenkinsi and Theodoxus fluviatilis were the least tolerant taxa. 96h acute toxicity tests of P. jenkinsi using ammonia and copper were carried out in a flow-through system after a variety of static range finding tests. P. jenkinsi was intolerant to both toxicants compared to reports on other taxa and the results suggested that these toxicants would affect distribution of this species in the field.
Resumo:
A combination of experimental methods was applied at a clogged, horizontal subsurface flow (HSSF) municipal wastewater tertiary treatment wetland (TW) in the UK, to quantify the extent of surface and subsurface clogging which had resulted in undesirable surface flow. The three dimensional hydraulic conductivity profile was determined, using a purpose made device which recreates the constant head permeameter test in-situ. The hydrodynamic pathways were investigated by performing dye tracing tests with Rhodamine WT and a novel multi-channel, data-logging, flow through Fluorimeter which allows synchronous measurements to be taken from a matrix of sampling points. Hydraulic conductivity varied in all planes, with the lowest measurement of 0.1 md1 corresponding to the surface layer at the inlet, and the maximum measurement of 1550 md1 located at a 0.4m depth at the outlet. According to dye tracing results, the region where the overland flow ceased received five times the average flow, which then vertically short-circuited below the rhizosphere. The tracer break-through curve obtained from the outlet showed that this preferential flow-path accounted for approximately 80% of the flow overall and arrived 8 h before a distinctly separate secondary flow-path. The overall volumetric efficiencyof the clogged system was 71% and the hydrology was simulated using a dual-path, dead-zone storage model. It is concluded that uneven inlet distribution, continuous surface loading and high rhizosphere resistance is responsible for the clog formation observed in this system. The average inlet hydraulic conductivity was 2 md1, suggesting that current European design guidelines, which predict that the system will reach an equilibrium hydraulic conductivity of 86 md1, do not adequately describe the hydrology of mature systems.
Resumo:
Digital systems can generate left and right audio channels that create the effect of virtual sound source placement (spatialization) by processing an audio signal through pairs of Head-Related Transfer Functions (HRTFs) or, equivalently, Head-Related Impulse Responses (HRIRs). The spatialization effect is better when individually-measured HRTFs or HRIRs are used than when generic ones (e.g., from a mannequin) are used. However, the measurement process is not available to the majority of users. There is ongoing interest to find mechanisms to customize HRTFs or HRIRs to a specific user, in order to achieve an improved spatialization effect for that subject. Unfortunately, the current models used for HRTFs and HRIRs contain over a hundred parameters and none of those parameters can be easily related to the characteristics of the subject. This dissertation proposes an alternative model for the representation of HRTFs, which contains at most 30 parameters, all of which have a defined functional significance. It also presents methods to obtain the value of parameters in the model to make it approximately equivalent to an individually-measured HRTF. This conversion is achieved by the systematic deconstruction of HRIR sequences through an augmented version of the Hankel Total Least Squares (HTLS) decomposition approach. An average 95% match (fit) was observed between the original HRIRs and those re-constructed from the Damped and Delayed Sinusoids (DDSs) found by the decomposition process, for ipsilateral source locations. The dissertation also introduces and evaluates an HRIR customization procedure, based on a multilinear model implemented through a 3-mode tensor, for mapping of anatomical data from the subjects to the HRIR sequences at different sound source locations. This model uses the Higher-Order Singular Value Decomposition (HOSVD) method to represent the HRIRs and is capable of generating customized HRIRs from easily attainable anatomical measurements of a new intended user of the system. Listening tests were performed to compare the spatialization performance of customized, generic and individually-measured HRIRs when they are used for synthesized spatial audio. Statistical analysis of the results confirms that the type of HRIRs used for spatialization is a significant factor in the spatialization success, with the customized HRIRs yielding better results than generic HRIRs.
Resumo:
The standard highway assignment model in the Florida Standard Urban Transportation Modeling Structure (FSUTMS) is based on the equilibrium traffic assignment method. This method involves running several iterations of all-or-nothing capacity-restraint assignment with an adjustment of travel time to reflect delays encountered in the associated iteration. The iterative link time adjustment process is accomplished through the Bureau of Public Roads (BPR) volume-delay equation. Since FSUTMS' traffic assignment procedure outputs daily volumes, and the input capacities are given in hourly volumes, it is necessary to convert the hourly capacities to their daily equivalents when computing the volume-to-capacity ratios used in the BPR function. The conversion is accomplished by dividing the hourly capacity by a factor called the peak-to-daily ratio, or referred to as CONFAC in FSUTMS. The ratio is computed as the highest hourly volume of a day divided by the corresponding total daily volume. ^ While several studies have indicated that CONFAC is a decreasing function of the level of congestion, a constant value is used for each facility type in the current version of FSUTMS. This ignores the different congestion level associated with each roadway and is believed to be one of the culprits of traffic assignment errors. Traffic counts data from across the state of Florida were used to calibrate CONFACs as a function of a congestion measure using the weighted least squares method. The calibrated functions were then implemented in FSUTMS through a procedure that takes advantage of the iterative nature of FSUTMS' equilibrium assignment method. ^ The assignment results based on constant and variable CONFACs were then compared against the ground counts for three selected networks. It was found that the accuracy from the two assignments was not significantly different, that the hypothesized improvement in assignment results from the variable CONFAC model was not empirically evident. It was recognized that many other factors beyond the scope and control of this study could contribute to this finding. It was recommended that further studies focus on the use of the variable CONFAC model with recalibrated parameters for the BPR function and/or with other forms of volume-delay functions. ^
Resumo:
Quantitative Structure-Activity Relationship (QSAR) has been applied extensively in predicting toxicity of Disinfection By-Products (DBPs) in drinking water. Among many toxicological properties, acute and chronic toxicities of DBPs have been widely used in health risk assessment of DBPs. These toxicities are correlated with molecular properties, which are usually correlated with molecular descriptors. The primary goals of this thesis are: (1) to investigate the effects of molecular descriptors (e.g., chlorine number) on molecular properties such as energy of the lowest unoccupied molecular orbital (E LUMO) via QSAR modelling and analysis; (2) to validate the models by using internal and external cross-validation techniques; (3) to quantify the model uncertainties through Taylor and Monte Carlo Simulation. One of the very important ways to predict molecular properties such as ELUMO is using QSAR analysis. In this study, number of chlorine (NCl ) and number of carbon (NC) as well as energy of the highest occupied molecular orbital (EHOMO) are used as molecular descriptors. There are typically three approaches used in QSAR model development: (1) Linear or Multi-linear Regression (MLR); (2) Partial Least Squares (PLS); and (3) Principle Component Regression (PCR). In QSAR analysis, a very critical step is model validation after QSAR models are established and before applying them to toxicity prediction. The DBPs to be studied include five chemical classes: chlorinated alkanes, alkenes, and aromatics. In addition, validated QSARs are developed to describe the toxicity of selected groups (i.e., chloro-alkane and aromatic compounds with a nitro- or cyano group) of DBP chemicals to three types of organisms (e.g., Fish, T. pyriformis, and P.pyosphoreum) based on experimental toxicity data from the literature. The results show that: (1) QSAR models to predict molecular property built by MLR, PLS or PCR can be used either to select valid data points or to eliminate outliers; (2) The Leave-One-Out Cross-Validation procedure by itself is not enough to give a reliable representation of the predictive ability of the QSAR models, however, Leave-Many-Out/K-fold cross-validation and external validation can be applied together to achieve more reliable results; (3) E LUMO are shown to correlate highly with the NCl for several classes of DBPs; and (4) According to uncertainty analysis using Taylor method, the uncertainty of QSAR models is contributed mostly from NCl for all DBP classes.
Resumo:
The exponential growth of studies on the biological response to ocean acidification over the last few decades has generated a large amount of data. To facilitate data comparison, a data compilation hosted at the data publisher PANGAEA was initiated in 2008 and is updated on a regular basis (doi:10.1594/PANGAEA.149999). By January 2015, a total of 581 data sets (over 4 000 000 data points) from 539 papers had been archived. Here we present the developments of this data compilation five years since its first description by Nisumaa et al. (2010). Most of study sites from which data archived are still in the Northern Hemisphere and the number of archived data from studies from the Southern Hemisphere and polar oceans are still relatively low. Data from 60 studies that investigated the response of a mix of organisms or natural communities were all added after 2010, indicating a welcomed shift from the study of individual organisms to communities and ecosystems. The initial imbalance of considerably more data archived on calcification and primary production than on other processes has improved. There is also a clear tendency towards more data archived from multifactorial studies after 2010. For easier and more effective access to ocean acidification data, the ocean acidification community is strongly encouraged to contribute to the data archiving effort, and help develop standard vocabularies describing the variables and define best practices for archiving ocean acidification data.
Resumo:
Quantitative Structure-Activity Relationship (QSAR) has been applied extensively in predicting toxicity of Disinfection By-Products (DBPs) in drinking water. Among many toxicological properties, acute and chronic toxicities of DBPs have been widely used in health risk assessment of DBPs. These toxicities are correlated with molecular properties, which are usually correlated with molecular descriptors. The primary goals of this thesis are: 1) to investigate the effects of molecular descriptors (e.g., chlorine number) on molecular properties such as energy of the lowest unoccupied molecular orbital (ELUMO) via QSAR modelling and analysis; 2) to validate the models by using internal and external cross-validation techniques; 3) to quantify the model uncertainties through Taylor and Monte Carlo Simulation. One of the very important ways to predict molecular properties such as ELUMO is using QSAR analysis. In this study, number of chlorine (NCl) and number of carbon (NC) as well as energy of the highest occupied molecular orbital (EHOMO) are used as molecular descriptors. There are typically three approaches used in QSAR model development: 1) Linear or Multi-linear Regression (MLR); 2) Partial Least Squares (PLS); and 3) Principle Component Regression (PCR). In QSAR analysis, a very critical step is model validation after QSAR models are established and before applying them to toxicity prediction. The DBPs to be studied include five chemical classes: chlorinated alkanes, alkenes, and aromatics. In addition, validated QSARs are developed to describe the toxicity of selected groups (i.e., chloro-alkane and aromatic compounds with a nitro- or cyano group) of DBP chemicals to three types of organisms (e.g., Fish, T. pyriformis, and P.pyosphoreum) based on experimental toxicity data from the literature. The results show that: 1) QSAR models to predict molecular property built by MLR, PLS or PCR can be used either to select valid data points or to eliminate outliers; 2) The Leave-One-Out Cross-Validation procedure by itself is not enough to give a reliable representation of the predictive ability of the QSAR models, however, Leave-Many-Out/K-fold cross-validation and external validation can be applied together to achieve more reliable results; 3) ELUMO are shown to correlate highly with the NCl for several classes of DBPs; and 4) According to uncertainty analysis using Taylor method, the uncertainty of QSAR models is contributed mostly from NCl for all DBP classes.
Resumo:
This paper is part of a special issue of Applied Geochemistry focusing on reliable applications of compositional multivariate statistical methods. This study outlines the application of compositional data analysis (CoDa) to calibration of geochemical data and multivariate statistical modelling of geochemistry and grain-size data from a set of Holocene sedimentary cores from the Ganges-Brahmaputra (G-B) delta. Over the last two decades, understanding near-continuous records of sedimentary sequences has required the use of core-scanning X-ray fluorescence (XRF) spectrometry, for both terrestrial and marine sedimentary sequences. Initial XRF data are generally unusable in ‘raw-format’, requiring data processing in order to remove instrument bias, as well as informed sequence interpretation. The applicability of these conventional calibration equations to core-scanning XRF data are further limited by the constraints posed by unknown measurement geometry and specimen homogeneity, as well as matrix effects. Log-ratio based calibration schemes have been developed and applied to clastic sedimentary sequences focusing mainly on energy dispersive-XRF (ED-XRF) core-scanning. This study has applied high resolution core-scanning XRF to Holocene sedimentary sequences from the tidal-dominated Indian Sundarbans, (Ganges-Brahmaputra delta plain). The Log-Ratio Calibration Equation (LRCE) was applied to a sub-set of core-scan and conventional ED-XRF data to quantify elemental composition. This provides a robust calibration scheme using reduced major axis regression of log-ratio transformed geochemical data. Through partial least squares (PLS) modelling of geochemical and grain-size data, it is possible to derive robust proxy information for the Sundarbans depositional environment. The application of these techniques to Holocene sedimentary data offers an improved methodological framework for unravelling Holocene sedimentation patterns.
Resumo:
The purpose of this study was to examine relationships between multiple characteristics of maternal employment, parenting practices, and adolescents’ transition outcomes to young adulthood. The research addressed four main research questions. First, are the characteristics of maternal work (i.e., hours worked, multiple jobs held, work schedules, earnings, and occupation) related to adolescents’ enrollment in post-secondary education, employment, or involvement in neither of these types of activities as young adults? Second, are the work characteristics related to parental involvement and monitoring, and are the parenting practices related to adolescents’ transition outcomes? Third, do parental involvement and monitoring mediate any relationships between the characteristics of maternal employment and adolescents’ transition outcomes? Finally, do any associations between characteristics of maternal employment and parenting practices and adolescents’ transition outcomes vary by poverty status, race/ethnicity, or gender? To address these research questions, secondary data analysis was conducted, using data from the National Longitudinal Survey of Youth (NLSY) from 1998 through 2004. The study sample consisted of 849 youths who were 15 through 17 years of age in either 1998 or 2000, and were 19 through 21 years of age when their transition outcomes in young adulthood were measured four years later. Multinomial logistic and ordinary least squares regression models were estimated to answer the research questions. Study findings indicated that of the maternal work characteristics, mothers’ multiple jobs held, occupation, and work schedule were significantly related to the youths’ transition outcomes. When mothers held multiple jobs for 1 to 25 weeks per year, and when mothers held jobs involving lower levels of occupational complexity, their youths were more likely to experience employment rather than post-secondary education. Adolescents whose mothers worked a standard work schedule were less likely to experience other types of transitions than post-secondary education. With regard to the effects of maternal employment on parenting practices, none of the maternal work variables were related to parental involvement, and only one variable, mothers working less than 40 hours per week, was negatively related to parental monitoring. In addition, when parents were more involved with their youths’ education, the youths were less likely to transition into employment and other types of transitions rather than post-secondary education. The parenting practices did not mediate the relation between the significant work variables (holding multiple jobs, work schedule, and occupation) and youths’ transition outcomes. Finally, none of the interactions between maternal work characteristics and poverty status, race/ethnicity, and gender met the criteria for determining significance; but in a series of sub-group analyses, some differences according to poverty status and gender were found. Despite the lack of mediation and moderation, the findings of this study have important implications for social policy and social work intervention. Based on the findings, suggestions are made in these areas to improve working mothers’ lives and their adolescents’ development and successful transition to adulthood. Finally, directions for future research are discussed.