937 resultados para Statistical methodologies


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Water covers over 70% of the Earth's surface, and is vital for all known forms of life. But only 3% of the Earth's water is fresh water, and less than 0.3% of all freshwater is in rivers, lakes, reservoirs and the atmosphere. However, rivers and lakes are an important part of fresh surface water, amounting to about 89%. In this Master Thesis dissertation, the focus is on three types of water bodies – rivers, lakes and reservoirs, and their water quality issues in Asian countries. The surface water quality in a region is largely determined both by the natural processes such as climate or geographic conditions, and the anthropogenic influences such as industrial and agricultural activities or land use conversion. The quality of the water can be affected by pollutants discharge from a specific point through a sewer pipe and also by extensive drainage from agriculture/urban areas and within basin. Hence, water pollutant sources can be divided into two categories: Point source pollution and Non-point source (NPS) pollution. Seasonal variations in precipitation and surface run-off have a strong effect on river discharge and the concentration of pollutants in water bodies. For example, in the rainy season, heavy and persistent rain wash off the ground, the runoff flow increases and may contain various kinds of pollutants and, eventually, enters the water bodies. In some cases, especially in confined water bodies, the quality may be positive related with rainfall in the wet season, because this confined type of fresh water systems allows high dilution of pollutants, decreasing their possible impacts. During the dry season, the quality of water is largely related to industrialization and urbanization pollution. The aim of this study is to identify the most common water quality problems in Asian countries and to enumerate and analyze the methodologies used for assessment of water quality conditions of both rivers and confined water bodies (lakes and reservoirs). Based on the evaluation of a sample of 57 papers, dated between 2000 and 2012, it was found that over the past decade, the water quality of rivers, lakes, and reservoirs in developing countries is being degraded. Water pollution and destruction of aquatic ecosystems have caused massive damage to the functions and integrity of water resources. The most widespread NPS in Asian countries and those which have the greatest spatial impacts are urban runoff and agriculture. Locally, mine waste runoff and rice paddy are serious NPS problems. The most relevant point pollution sources are the effluents from factories, sewage treatment plant, and public or household facilities. It was found that the most used methodology was unquestionably the monitoring activity, used in 49 of analyzed studies, accounting for 86%. Sometimes, data from historical databases were used as well. It can be seen that taking samples from the water body and then carry on laboratory work (chemical analyses) is important because it can give an understanding of the water quality. 6 papers (11%) used a method that combined monitoring data and modeling. 6 papers (11%) just applied a model to estimate the quality of water. Modeling is a useful resource when there is limited budget since some models are of free download and use. In particular, several of used models come from the U.S.A, but they have their own purposes and features, meaning that a careful application of the models to other countries and a critical discussion of the results are crucial. 5 papers (9%) focus on a method combining monitoring data and statistical analysis. When there is a huge data matrix, the researchers need an efficient way of interpretation of the information which is provided by statistics. 3 papers (5%) used a method combining monitoring data, statistical analysis and modeling. These different methods are all valuable to evaluate the water quality. It was also found that the evaluation of water quality was made as well by using other types of sampling different than water itself, and they also provide useful information to understand the condition of the water body. These additional monitoring activities are: Air sampling, sediment sampling, phytoplankton sampling and aquatic animal tissues sampling. Despite considerable progress in developing and applying control regulations to point and NPS pollution, the pollution status of rivers, lakes, and reservoirs in Asian countries is not improving. In fact, this reflects the slow pace of investment in new infrastructure for pollution control and growing population pressures. Water laws or regulations and public involvement in enforcement can play a constructive and indispensable role in environmental protection. In the near future, in order to protect water from further contamination, rapid action is highly needed to control the various kinds of effluents in one region. Environmental remediation and treatment of industrial effluent and municipal wastewaters is essential. It is also important to prevent the direct input of agricultural and mine site runoff. Finally, stricter environmental regulation for water quality is required to support protection and management strategies. It would have been possible to get further information based in the 57 sample of papers. For instance, it would have been interesting to compare the level of concentrations of some pollutants in the diferente Asian countries. However the limit of three months duration for this study prevented further work to take place. In spite of this, the study objectives were achieved: the work provided an overview of the most relevant water quality problems in rivers, lakes and reservoirs in Asian countries, and also listed and analyzed the most common methodologies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This Thesis describes the application of automatic learning methods for a) the classification of organic and metabolic reactions, and b) the mapping of Potential Energy Surfaces(PES). The classification of reactions was approached with two distinct methodologies: a representation of chemical reactions based on NMR data, and a representation of chemical reactions from the reaction equation based on the physico-chemical and topological features of chemical bonds. NMR-based classification of photochemical and enzymatic reactions. Photochemical and metabolic reactions were classified by Kohonen Self-Organizing Maps (Kohonen SOMs) and Random Forests (RFs) taking as input the difference between the 1H NMR spectra of the products and the reactants. The development of such a representation can be applied in automatic analysis of changes in the 1H NMR spectrum of a mixture and their interpretation in terms of the chemical reactions taking place. Examples of possible applications are the monitoring of reaction processes, evaluation of the stability of chemicals, or even the interpretation of metabonomic data. A Kohonen SOM trained with a data set of metabolic reactions catalysed by transferases was able to correctly classify 75% of an independent test set in terms of the EC number subclass. Random Forests improved the correct predictions to 79%. With photochemical reactions classified into 7 groups, an independent test set was classified with 86-93% accuracy. The data set of photochemical reactions was also used to simulate mixtures with two reactions occurring simultaneously. Kohonen SOMs and Feed-Forward Neural Networks (FFNNs) were trained to classify the reactions occurring in a mixture based on the 1H NMR spectra of the products and reactants. Kohonen SOMs allowed the correct assignment of 53-63% of the mixtures (in a test set). Counter-Propagation Neural Networks (CPNNs) gave origin to similar results. The use of supervised learning techniques allowed an improvement in the results. They were improved to 77% of correct assignments when an ensemble of ten FFNNs were used and to 80% when Random Forests were used. This study was performed with NMR data simulated from the molecular structure by the SPINUS program. In the design of one test set, simulated data was combined with experimental data. The results support the proposal of linking databases of chemical reactions to experimental or simulated NMR data for automatic classification of reactions and mixtures of reactions. Genome-scale classification of enzymatic reactions from their reaction equation. The MOLMAP descriptor relies on a Kohonen SOM that defines types of bonds on the basis of their physico-chemical and topological properties. The MOLMAP descriptor of a molecule represents the types of bonds available in that molecule. The MOLMAP descriptor of a reaction is defined as the difference between the MOLMAPs of the products and the reactants, and numerically encodes the pattern of bonds that are broken, changed, and made during a chemical reaction. The automatic perception of chemical similarities between metabolic reactions is required for a variety of applications ranging from the computer validation of classification systems, genome-scale reconstruction (or comparison) of metabolic pathways, to the classification of enzymatic mechanisms. Catalytic functions of proteins are generally described by the EC numbers that are simultaneously employed as identifiers of reactions, enzymes, and enzyme genes, thus linking metabolic and genomic information. Different methods should be available to automatically compare metabolic reactions and for the automatic assignment of EC numbers to reactions still not officially classified. In this study, the genome-scale data set of enzymatic reactions available in the KEGG database was encoded by the MOLMAP descriptors, and was submitted to Kohonen SOMs to compare the resulting map with the official EC number classification, to explore the possibility of predicting EC numbers from the reaction equation, and to assess the internal consistency of the EC classification at the class level. A general agreement with the EC classification was observed, i.e. a relationship between the similarity of MOLMAPs and the similarity of EC numbers. At the same time, MOLMAPs were able to discriminate between EC sub-subclasses. EC numbers could be assigned at the class, subclass, and sub-subclass levels with accuracies up to 92%, 80%, and 70% for independent test sets. The correspondence between chemical similarity of metabolic reactions and their MOLMAP descriptors was applied to the identification of a number of reactions mapped into the same neuron but belonging to different EC classes, which demonstrated the ability of the MOLMAP/SOM approach to verify the internal consistency of classifications in databases of metabolic reactions. RFs were also used to assign the four levels of the EC hierarchy from the reaction equation. EC numbers were correctly assigned in 95%, 90%, 85% and 86% of the cases (for independent test sets) at the class, subclass, sub-subclass and full EC number level,respectively. Experiments for the classification of reactions from the main reactants and products were performed with RFs - EC numbers were assigned at the class, subclass and sub-subclass level with accuracies of 78%, 74% and 63%, respectively. In the course of the experiments with metabolic reactions we suggested that the MOLMAP / SOM concept could be extended to the representation of other levels of metabolic information such as metabolic pathways. Following the MOLMAP idea, the pattern of neurons activated by the reactions of a metabolic pathway is a representation of the reactions involved in that pathway - a descriptor of the metabolic pathway. This reasoning enabled the comparison of different pathways, the automatic classification of pathways, and a classification of organisms based on their biochemical machinery. The three levels of classification (from bonds to metabolic pathways) allowed to map and perceive chemical similarities between metabolic pathways even for pathways of different types of metabolism and pathways that do not share similarities in terms of EC numbers. Mapping of PES by neural networks (NNs). In a first series of experiments, ensembles of Feed-Forward NNs (EnsFFNNs) and Associative Neural Networks (ASNNs) were trained to reproduce PES represented by the Lennard-Jones (LJ) analytical potential function. The accuracy of the method was assessed by comparing the results of molecular dynamics simulations (thermal, structural, and dynamic properties) obtained from the NNs-PES and from the LJ function. The results indicated that for LJ-type potentials, NNs can be trained to generate accurate PES to be used in molecular simulations. EnsFFNNs and ASNNs gave better results than single FFNNs. A remarkable ability of the NNs models to interpolate between distant curves and accurately reproduce potentials to be used in molecular simulations is shown. The purpose of the first study was to systematically analyse the accuracy of different NNs. Our main motivation, however, is reflected in the next study: the mapping of multidimensional PES by NNs to simulate, by Molecular Dynamics or Monte Carlo, the adsorption and self-assembly of solvated organic molecules on noble-metal electrodes. Indeed, for such complex and heterogeneous systems the development of suitable analytical functions that fit quantum mechanical interaction energies is a non-trivial or even impossible task. The data consisted of energy values, from Density Functional Theory (DFT) calculations, at different distances, for several molecular orientations and three electrode adsorption sites. The results indicate that NNs require a data set large enough to cover well the diversity of possible interaction sites, distances, and orientations. NNs trained with such data sets can perform equally well or even better than analytical functions. Therefore, they can be used in molecular simulations, particularly for the ethanol/Au (111) interface which is the case studied in the present Thesis. Once properly trained, the networks are able to produce, as output, any required number of energy points for accurate interpolations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Beyond the classical statistical approaches (determination of basic statistics, regression analysis, ANOVA, etc.) a new set of applications of different statistical techniques has increasingly gained relevance in the analysis, processing and interpretation of data concerning the characteristics of forest soils. This is possible to be seen in some of the recent publications in the context of Multivariate Statistics. These new methods require additional care that is not always included or refered in some approaches. In the particular case of geostatistical data applications it is necessary, besides to geo-reference all the data acquisition, to collect the samples in regular grids and in sufficient quantity so that the variograms can reflect the spatial distribution of soil properties in a representative manner. In the case of the great majority of Multivariate Statistics techniques (Principal Component Analysis, Correspondence Analysis, Cluster Analysis, etc.) despite the fact they do not require in most cases the assumption of normal distribution, they however need a proper and rigorous strategy for its utilization. In this work, some reflections about these methodologies and, in particular, about the main constraints that often occur during the information collecting process and about the various linking possibilities of these different techniques will be presented. At the end, illustrations of some particular cases of the applications of these statistical methods will also be presented.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Artigo elaborador no âmbito dos trabalhos decorrentes à dissertação de Mestrado do Aluno David Leite, no Mestrado em Gestão Integrada Qualidade, Ambiente e Segurança, ESTGF-IPP, Oreintados pelos Professores Luís Fonseca (ISEP-IPP) e Vanda Lima (ESTGF-IPP).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Over the last decade, the development of statistical models in support of forensic fingerprint identification has been the subject of increasing research attention, spurned on recently by commentators who claim that the scientific basis for fingerprint identification has not been adequately demonstrated. Such models are increasingly seen as useful tools in support of the fingerprint identification process within or in addition to the ACE-V framework. This paper provides a critical review of recent statistical models from both a practical and theoretical perspective. This includes analysis of models of two different methodologies: Probability of Random Correspondence (PRC) models that focus on calculating probabilities of the occurrence of fingerprint configurations for a given population, and Likelihood Ratio (LR) models which use analysis of corresponding features of fingerprints to derive a likelihood value representing the evidential weighting for a potential source.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The identification of compositional changes in fumarolic gases of active and quiescent volcanoes is one of the mostimportant targets in monitoring programs. From a general point of view, many systematic (often cyclic) and randomprocesses control the chemistry of gas discharges, making difficult to produce a convincing mathematical-statisticalmodelling.Changes in the chemical composition of volcanic gases sampled at Vulcano Island (Aeolian Arc, Sicily, Italy) fromeight different fumaroles located in the northern sector of the summit crater (La Fossa) have been analysed byconsidering their dependence from time in the period 2000-2007. Each intermediate chemical composition has beenconsidered as potentially derived from the contribution of the two temporal extremes represented by the 2000 and 2007samples, respectively, by using inverse modelling methodologies for compositional data. Data pertaining to fumarolesF5 and F27, located on the rim and in the inner part of La Fossa crater, respectively, have been used to achieve theproposed aim. The statistical approach has allowed us to highlight the presence of random and not random fluctuations,features useful to understand how the volcanic system works, opening new perspectives in sampling strategies and inthe evaluation of the natural risk related to a quiescent volcano

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents and discusses the use of Bayesian procedures - introduced through the use of Bayesian networks in Part I of this series of papers - for 'learning' probabilities from data. The discussion will relate to a set of real data on characteristics of black toners commonly used in printing and copying devices. Particular attention is drawn to the incorporation of the proposed procedures as an integral part in probabilistic inference schemes (notably in the form of Bayesian networks) that are intended to address uncertainties related to particular propositions of interest (e.g., whether or not a sample originates from a particular source). The conceptual tenets of the proposed methodologies are presented along with aspects of their practical implementation using currently available Bayesian network software.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

El principal objectiu del projecte era desenvolupar millores conceptuals i metodològiques que permetessin una millor predicció dels canvis en la distribució de les espècies (a una escala de paisatge) derivats de canvis ambientals en un context dominat per pertorbacions. En un primer estudi, vàrem comparar l'eficàcia de diferents models dinàmics per a predir la distribució de l'hortolà (Emberiza hortulana). Els nostres resultats indiquen que un model híbrid que combini canvis en la qualitat de l'hàbitat, derivats de canvis en el paisatge, amb un model poblacional espacialment explícit és una aproximació adequada per abordar canvis en la distribució d'espècies en contextos de dinàmica ambiental elevada i una capacitat de dispersió limitada de l'espècie objectiu. En un segon estudi abordarem la calibració mitjançant dades de seguiment de models de distribució dinàmics per a 12 espècies amb preferència per hàbitats oberts. Entre les conclusions extretes destaquem: (1) la necessitat de que les dades de seguiment abarquin aquelles àrees on es produeixen els canvis de qualitat; (2) el biaix que es produeix en la estimació dels paràmetres del model d'ocupació quan la hipòtesi de canvi de paisatge o el model de qualitat d'hàbitat són incorrectes. En el darrer treball estudiarem el possible impacte en 67 espècies d’ocells de diferents règims d’incendis, definits a partir de combinacions de nivells de canvi climàtic (portant a un augment esperat de la mida i freqüència d’incendis forestals), i eficiència d’extinció per part dels bombers. Segons els resultats dels nostres models, la combinació de factors antropogènics del regim d’incendis, tals com l’abandonament rural i l’extinció, poden ser més determinants per als canvis de distribució que els efectes derivats del canvi climàtic. Els productes generats inclouen tres publicacions científiques, una pàgina web amb resultats del projecte i una llibreria per a l'entorn estadístic R.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Pearson correlation coefficients were applied for the objective comparison of 30 black gel pen inks analysed by laser desorption ionization mass spectrometry (LDI-MS). The mass spectra were obtained for ink lines directly on paper using positive and negative ion modes at several laser intensities. This methodology has the advantage of taking into account the reproducibility of the results as well as the variability between spectra of different pens. A differentiation threshold could thus be selected in order to avoid the risk of false differentiation. Combining results from positive and negative mode yielded a discriminating power up to 85%, which was better than the one obtained previously with other optical comparison methodologies. The technique also allowed discriminating between pens from the same brand.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The identification of compositional changes in fumarolic gases of active and quiescent volcanoes is one of the most important targets in monitoring programs. From a general point of view, many systematic (often cyclic) and random processes control the chemistry of gas discharges, making difficult to produce a convincing mathematical-statistical modelling. Changes in the chemical composition of volcanic gases sampled at Vulcano Island (Aeolian Arc, Sicily, Italy) from eight different fumaroles located in the northern sector of the summit crater (La Fossa) have been analysed by considering their dependence from time in the period 2000-2007. Each intermediate chemical composition has been considered as potentially derived from the contribution of the two temporal extremes represented by the 2000 and 2007 samples, respectively, by using inverse modelling methodologies for compositional data. Data pertaining to fumaroles F5 and F27, located on the rim and in the inner part of La Fossa crater, respectively, have been used to achieve the proposed aim. The statistical approach has allowed us to highlight the presence of random and not random fluctuations, features useful to understand how the volcanic system works, opening new perspectives in sampling strategies and in the evaluation of the natural risk related to a quiescent volcano

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Technical efficiency is estimated and examined for a cross-section of Australian dairy farms using various frontier methodologies; Bayesian and Classical stochastic frontiers, and Data Envelopment Analysis. The results indicate technical inefficiency is present in the sample data. Also identified are statistical differences between the point estimates of technical efficiency generated by the various methodologies. However, the rank of farm level technical efficiency is statistically invariant to the estimation technique employed. Finally, when confidence/credible intervals of technical efficiency are compared significant overlap is found for many of the farms' intervals for all frontier methods employed. The results indicate that the choice of estimation methodology may matter, but the explanatory power of all frontier methods is significantly weaker when interval estimate of technical efficiency is examined.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Dysregulation of lipid and glucose metabolism in the postprandial state are recognised as important risk factors for the development of cardiovascular disease and type 2 diabetes. Our objective was to create a comprehensive, standardised database of postprandial studies to provide insights into the physiological factors that influence postprandial lipid and glucose responses. Data were collated from subjects (n = 467) taking part in single and sequential meal postprandial studies conducted by researchers at the University of Reading, to form the DISRUPT (DIetary Studies: Reading Unilever Postprandial Trials) database. Subject attributes including age, gender, genotype, menopausal status, body mass index, blood pressure and a fasting biochemical profile, together with postprandial measurements of triacylglycerol (TAG), non-esterified fatty acids, glucose, insulin and TAG-rich lipoprotein composition are recorded. A particular strength of the studies is the frequency of blood sampling, with on average 10-13 blood samples taken during each postprandial assessment, and the fact that identical test meal protocols were used in a number of studies, allowing pooling of data to increase statistical power. The DISRUPT database is the most comprehensive postprandial metabolism database that exists worldwide and preliminary analysis of the pooled sequential meal postprandial dataset has revealed both confirmatory and novel observations with respect to the impact of gender and age on the postprandial TAG response. Further analysis of the dataset using conventional statistical techniques along with integrated mathematical models and clustering analysis will provide a unique opportunity to greatly expand current knowledge of the aetiology of inter-individual variability in postprandial lipid and glucose responses.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The computers and network services became presence guaranteed in several places. These characteristics resulted in the growth of illicit events and therefore the computers and networks security has become an essential point in any computing environment. Many methodologies were created to identify these events; however, with increasing of users and services on the Internet, many difficulties are found in trying to monitor a large network environment. This paper proposes a methodology for events detection in large-scale networks. The proposal approaches the anomaly detection using the NetFlow protocol, statistical methods and monitoring the environment in a best time for the application. © 2010 Springer-Verlag Berlin Heidelberg.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Hybrid vehicles represent the future for automakers, since they allow to improve the fuel economy and to reduce the pollutant emissions. A key component of the hybrid powertrain is the Energy Storage System, that determines the ability of the vehicle to store and reuse energy. Though electrified Energy Storage Systems (ESS), based on batteries and ultracapacitors, are a proven technology, Alternative Energy Storage Systems (AESS), based on mechanical, hydraulic and pneumatic devices, are gaining interest because they give the possibility of realizing low-cost mild-hybrid vehicles. Currently, most literature of design methodologies focuses on electric ESS, which are not suitable for AESS design. In this contest, The Ohio State University has developed an Alternative Energy Storage System design methodology. This work focuses on the development of driving cycle analysis methodology that is a key component of Alternative Energy Storage System design procedure. The proposed methodology is based on a statistical approach to analyzing driving schedules that represent the vehicle typical use. Driving data are broken up into power events sequence, namely traction and braking events, and for each of them, energy-related and dynamic metrics are calculated. By means of a clustering process and statistical synthesis methods, statistically-relevant metrics are determined. These metrics define cycle representative braking events. By using these events as inputs for the Alternative Energy Storage System design methodology, different system designs are obtained. Each of them is characterized by attributes, namely system volume and weight. In the last part the work, the designs are evaluated in simulation by introducing and calculating a metric related to the energy conversion efficiency. Finally, the designs are compared accounting for attributes and efficiency values. In order to automate the driving data extraction and synthesis process, a specific script Matlab based has been developed. Results show that the driving cycle analysis methodology, based on the statistical approach, allows to extract and synthesize cycle representative data. The designs based on cycle statistically-relevant metrics are properly sized and have satisfying efficiency values with respect to the expectations. An exception is the design based on the cycle worst-case scenario, corresponding to same approach adopted by the conventional electric ESS design methodologies. In this case, a heavy system with poor efficiency is produced. The proposed new methodology seems to be a valid and consistent support for Alternative Energy Storage System design.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

As the development of genotyping and next-generation sequencing technologies, multi-marker testing in genome-wide association study and rare variant association study became active research areas in statistical genetics. This dissertation contains three methodologies for association study by exploring different genetic data features and demonstrates how to use those methods to test genetic association hypothesis. The methods can be categorized into in three scenarios: 1) multi-marker testing for strong Linkage Disequilibrium regions, 2) multi-marker testing for family-based association studies, 3) multi-marker testing for rare variant association study. I also discussed the advantage of using these methods and demonstrated its power by simulation studies and applications to real genetic data.