933 resultados para rainfall-runoff empirical statistical model
Resumo:
A visualization plot of a data set of molecular data is a useful tool for gaining insight into a set of molecules. In chemoinformatics, most visualization plots are of molecular descriptors, and the statistical model most often used to produce a visualization is principal component analysis (PCA). This paper takes PCA, together with four other statistical models (NeuroScale, GTM, LTM, and LTM-LIN), and evaluates their ability to produce clustering in visualizations not of molecular descriptors but of molecular fingerprints. Two different tasks are addressed: understanding structural information (particularly combinatorial libraries) and relating structure to activity. The quality of the visualizations is compared both subjectively (by visual inspection) and objectively (with global distance comparisons and local k-nearest-neighbor predictors). On the data sets used to evaluate clustering by structure, LTM is found to perform significantly better than the other models. In particular, the clusters in LTM visualization space are consistent with the relationships between the core scaffolds that define the combinatorial sublibraries. On the data sets used to evaluate clustering by activity, LTM again gives the best performance but by a smaller margin. The results of this paper demonstrate the value of using both a nonlinear projection map and a Bernoulli noise model for modeling binary data.
Resumo:
The target of no-reference (NR) image quality assessment (IQA) is to establish a computational model to predict the visual quality of an image. The existing prominent method is based on natural scene statistics (NSS). It uses the joint and marginal distributions of wavelet coefficients for IQA. However, this method is only applicable to JPEG2000 compressed images. Since the wavelet transform fails to capture the directional information of images, an improved NSS model is established by contourlets. In this paper, the contourlet transform is utilized to NSS of images, and then the relationship of contourlet coefficients is represented by the joint distribution. The statistics of contourlet coefficients are applicable to indicate variation of image quality. In addition, an image-dependent threshold is adopted to reduce the effect of content to the statistical model. Finally, image quality can be evaluated by combining the extracted features in each subband nonlinearly. Our algorithm is trained and tested on the LIVE database II. Experimental results demonstrate that the proposed algorithm is superior to the conventional NSS model and can be applied to different distortions. © 2009 Elsevier B.V. All rights reserved.
Resumo:
2000 Mathematics Subject Classification: 62P10, 62J12.
Resumo:
2010 Mathematics Subject Classification: 94A17.
Resumo:
Our aim was to approach an important and well-investigable phenomenon – connected to a relatively simple but real field situation – in such a way, that the results of field observations could be directly comparable with the predictions of a simulation model-system which uses a simple mathematical apparatus and to simultaneously gain such a hypothesis-system, which creates the theoretical opportunity for a later experimental series of studies. As a phenomenon of the study, we chose the seasonal coenological changes of aquatic and semiaquatic Heteroptera community. Based on the observed data, we developed such an ecological model-system, which is suitable for generating realistic patterns highly resembling to the observed temporal patterns, and by the help of which predictions can be given to alternative situations of climatic circumstances not experienced before (e.g. climate changes), and furthermore; which can simulate experimental circumstances. The stable coenological state-plane, which was constructed based on the principle of indirect ordination is suitable for unified handling of data series of monitoring and simulation, and also fits for their comparison. On the state-plane, such deviations of empirical and model-generated data can be observed and analysed, which could otherwise remain hidden.
Resumo:
This study explores factors related to the prompt difficulty in Automated Essay Scoring. The sample was composed of 6,924 students. For each student, there were 1-4 essays, across 20 different writing prompts, for a total of 20,243 essays. E-rater® v.2 essay scoring engine developed by the Educational Testing Service was used to score the essays. The scoring engine employs a statistical model that incorporates 10 predictors associated with writing characteristics of which 8 were used. The Rasch partial credit analysis was applied to the scores to determine the difficulty levels of prompts. In addition, the scores were used as outcomes in the series of hierarchical linear models (HLM) in which students and prompts constituted the cross-classification levels. This methodology was used to explore the partitioning of the essay score variance.^ The results indicated significant differences in prompt difficulty levels due to genre. Descriptive prompts, as a group, were found to be more difficult than the persuasive prompts. In addition, the essay score variance was partitioned between students and prompts. The amount of the essay score variance that lies between prompts was found to be relatively small (4 to 7 percent). When the essay-level, student-level-and prompt-level predictors were included in the model, it was able to explain almost all variance that lies between prompts. Since in most high-stakes writing assessments only 1-2 prompts per students are used, the essay score variance that lies between prompts represents an undesirable or "noise" variation. Identifying factors associated with this "noise" variance may prove to be important for prompt writing and for constructing Automated Essay Scoring mechanisms for weighting prompt difficulty when assigning essay score.^
Resumo:
We investigated controls on the water chemistry of a South Ecuadorian cloud forest catchment which is partly pristine, and partly converted to extensive pasture. From April 2007 to May 2008 water samples were taken weekly to biweekly at nine different subcatchments, and were screened for differences in electric conductivity, pH, anion, as well as element composition. A principal component analysis was conducted to reduce dimensionality of the data set and define major factors explaining variation in the data. Three main factors were isolated by a subset of 10 elements (Ca2+, Ce, Gd, K+, Mg2+, Na+, Nd, Rb, Sr, Y), explaining around 90% of the data variation. Land-use was the major factor controlling and changing water chemistry of the subcatchments. A second factor was associated with the concentration of rare earth elements in water, presumably highlighting other anthropogenic influences such as gravel excavation or road construction. Around 12% of the variation was explained by the third component, which was defined by the occurrence of Rb and K and represents the influence of vegetation dynamics on element accumulation and wash-out. Comparison of base- and fast flow concentrations led to the assumption that a significant portion of soil water from around 30 cm depth contributes to storm flow, as revealed by increased rare earth element concentrations in fast flow samples. Our findings demonstrate the utility of multi-tracer principal component analysis to study tropical headwater streams, and emphasize the need for effective land management in cloud forest catchments.
Resumo:
Solar activity indicators, each as sunspot numbers, sunspot area and flares, over the Sun’s photosphere are not considered to be symmetric between the northern and southern hemispheres of the Sun. This behavior is also known as the North-South Asymmetry of the different solar indices. Among the different conclusions obtained by several authors, we can point that the N-S asymmetry is a real and systematic phenomenon and is not due to random variability. In the present work, the probability distributions from the Marshall Space Flight Centre (MSFC) database are investigated using a statistical tool arises from well-known Non-Extensive Statistical Mechanics proposed by C. Tsallis in 1988. We present our results and discuss their physical implications with the help of theoretical model and observations. We obtained that there is a strong dependence between the nonextensive entropic parameter q and long-term solar variability presents in the sunspot area data. Among the most important results, we highlight that the asymmetry index q reveals the dominance of the North against the South. This behavior has been discussed and confirmed by several authors, but in no time they have given such behavior to a statistical model property. Thus, we conclude that this parameter can be considered as an effective measure for diagnosing long-term variations of solar dynamo. Finally, our dissertation opens a new approach for investigating time series in astrophysics from the perspective of non-extensivity.
Resumo:
An abstract of a thesis devoted to using helix-coil models to study unfolded states.\\
Research on polypeptide unfolded states has received much more attention in the last decade or so than it has in the past. Unfolded states are thought to be implicated in various
misfolding diseases and likely play crucial roles in protein folding equilibria and folding rates. Structural characterization of unfolded states has proven to be
much more difficult than the now well established practice of determining the structures of folded proteins. This is largely because many core assumptions underlying
folded structure determination methods are invalid for unfolded states. This has led to a dearth of knowledge concerning the nature of unfolded state conformational
distributions. While many aspects of unfolded state structure are not well known, there does exist a significant body of work stretching back half a century that
has been focused on structural characterization of marginally stable polypeptide systems. This body of work represents an extensive collection of experimental
data and biophysical models associated with describing helix-coil equilibria in polypeptide systems. Much of the work on unfolded states in the last decade has not been devoted
specifically to the improvement of our understanding of helix-coil equilibria, which arguably is the most well characterized of the various conformational equilibria
that likely contribute to unfolded state conformational distributions. This thesis seeks to provide a deeper investigation of helix-coil equilibria using modern
statistical data analysis and biophysical modeling techniques. The studies contained within seek to provide deeper insights and new perspectives on what we presumably
know very well about protein unfolded states. \\
Chapter 1 gives an overview of recent and historical work on studying protein unfolded states. The study of helix-coil equilibria is placed in the context
of the general field of unfolded state research and the basics of helix-coil models are introduced.\\
Chapter 2 introduces the newest incarnation of a sophisticated helix-coil model. State of the art modern statistical techniques are employed to estimate the energies
of various physical interactions that serve to influence helix-coil equilibria. A new Bayesian model selection approach is utilized to test many long-standing
hypotheses concerning the physical nature of the helix-coil transition. Some assumptions made in previous models are shown to be invalid and the new model
exhibits greatly improved predictive performance relative to its predecessor. \\
Chapter 3 introduces a new statistical model that can be used to interpret amide exchange measurements. As amide exchange can serve as a probe for residue-specific
properties of helix-coil ensembles, the new model provides a novel and robust method to use these types of measurements to characterize helix-coil ensembles experimentally
and test the position-specific predictions of helix-coil models. The statistical model is shown to perform exceedingly better than the most commonly used
method for interpreting amide exchange data. The estimates of the model obtained from amide exchange measurements on an example helical peptide
also show a remarkable consistency with the predictions of the helix-coil model. \\
Chapter 4 involves a study of helix-coil ensembles through the enumeration of helix-coil configurations. Aside from providing new insights into helix-coil ensembles,
this chapter also introduces a new method by which helix-coil models can be extended to calculate new types of observables. Future work on this approach could potentially
allow helix-coil models to move into use domains that were previously inaccessible and reserved for other types of unfolded state models that were introduced in chapter 1.
Resumo:
Global niobium production is presently dominated by three operations, Araxá and Catalão (Brazil), and Niobec (Canada). Although Brazil accounts for over 90% of the world’s niobium production, a number of high grade niobium deposits exist worldwide. The advancement of these deposits depends largely on the development of operable beneficiation flowsheets. Pyrochlore, as the primary niobium mineral, is typically upgraded by flotation with amine collectors at acidic pH following a complicated flowsheet with significant losses of niobium. This research compares the typical two stage flotation flowsheet to a direct flotation process (i.e. elimination of gangue pre-flotation) with the objective of circuit simplification. In addition, the use of a chelating reagent (benzohydroxamic acid, BHA) was studied as an alternative collector for fine grained, highly disseminated pyrochlore. For the amine based reagent system, results showed that while comparable at the laboratory scale, when scaled up to the pilot level the direct flotation process suffered from circuit instability because of high quantities of dissolved calcium in the process water due to stream recirculation and fine calcite dissolution, which ultimately depressed pyrochlore. This scale up issue was not observed in pilot plant operation of the two stage flotation process as a portion of the highly reactive carbonate minerals was removed prior to acid addition. A statistical model was developed for batch flotation using BHA on carbonatite ore (0.25% Nb2O5) that could not be effectively upgraded using the conventional amine reagent scheme. Results showed that it was possible to produce a concentrate containing 1.54% Nb2O5 with 93% Nb recovery in ~15% of the original mass. Fundamental studies undertaken included FT-IR and XPS, which showed the adsorption of both the protonized amine and the neutral amine onto the surface of the pyrochlore (possibly at niobium sites as indicated by detected shifts in the Nb3d binding energy). The results suggest that the preferential flotation of pyrochlore over quartz with amines at low pH levels can be attributed to a difference in critical hemimicelle concentration (CHC) values for the two minerals. BHA was found to be absorbed on pyrochlore surfaces by a similar mechanism to alkyl hydroxamic acid. It is hoped that this work will assist in improving operability of existing pyrochlore flotation circuits and help promote the development of niobium deposits globally. Future studies should focus on investigation into specific gangue mineral depressants and inadvertent activation phenomenon related to BHA flotation of gangue minerals.
Resumo:
El artículo analiza los cambios político electorales en León, Guanajuato, a partir de cómo se fue configurando el desplazamiento del Partido Revolucionario Institucional (PRI) por el Partido Acción Nacional (PAN) en este ayuntamiento en el año 1988, hasta el cambio de correlación de fuerzas en el año 2012. Ello da pauta para analizar los escenarios que podrían caracterizar las próximas elecciones de este año. Con este objetivo se propone un modelo estadístico para dicho estudio: el modelo de regresión Dirichlet, el cual permite considerar la naturaleza de los datos electorales.
The article analyzes the electoral changes in León, Guanajuato, based on how it was setting the displacement of the Institutional Revolutionary Party (PRI) by the National Action Party (PAN) in this council in 1988, until the change of correlation forces in 2012, which gives guidelines to analyze the scenarios that could characterize the upcoming elections this year. With this aim the authors proposed a statistical model for the study: the Dirichlet regression model, which allows to consider the nature of electoral data.
Resumo:
Landnutzungsänderungen sind eine wesentliche Ursache von Treibhausgasemissionen. Die Umwandlung von Ökosystemen mit permanenter natürlicher Vegetation hin zu Ackerbau mit zeitweise vegetationslosem Boden (z.B. nach der Bodenbearbeitung vor der Aussaat) führt häufig zu gesteigerten Treibhausgasemissionen und verminderter Kohlenstoffbindung. Weltweit dehnt sich Ackerbau sowohl in kleinbäuerlichen als auch in agro-industriellen Systemen aus, häufig in benachbarte semiaride bis subhumide Rangeland Ökosysteme. Die vorliegende Arbeit untersucht Trends der Landnutzungsänderung im Borana Rangeland Südäthiopiens. Bevölkerungswachstum, Landprivatisierung und damit einhergehende Einzäunung, veränderte Landnutzungspolitik und zunehmende Klimavariabilität führen zu raschen Veränderungen der traditionell auf Tierhaltung basierten, pastoralen Systeme. Mittels einer Literaturanalyse von Fallstudien in ostafrikanischen Rangelands wurde im Rahmen dieser Studie ein schematisches Modell der Zusammenhänge von Landnutzung, Treibhausgasemissionen und Kohlenstofffixierung entwickelt. Anhand von Satellitendaten und Daten aus Haushaltsbefragungen wurden Art und Umfang von Landnutzungsänderungen und Vegetationsveränderungen an fünf Untersuchungsstandorten (Darito/Yabelo Distrikt, Soda, Samaro, Haralo, Did Mega/alle Dire Distrikt) zwischen 1985 und 2011 analysiert. In Darito dehnte sich die Ackerbaufläche um 12% aus, überwiegend auf Kosten von Buschland. An den übrigen Standorten blieb die Ackerbaufläche relativ konstant, jedoch nahm Graslandvegetation um zwischen 16 und 28% zu, während Buschland um zwischen 23 und 31% abnahm. Lediglich am Standort Haralo nahm auch „bare land“, vegetationslose Flächen, um 13% zu. Faktoren, die zur Ausdehnung des Ackerbaus führen, wurden am Standort Darito detaillierter untersucht. GPS Daten und anbaugeschichtlichen Daten von 108 Feldern auf 54 Betrieben wurden in einem Geographischen Informationssystem (GIS) mit thematischen Boden-, Niederschlags-, und Hangneigungskarten sowie einem Digitales Höhenmodell überlagert. Multiple lineare Regression ermittelte Hangneigung und geographische Höhe als signifikante Erklärungsvariablen für die Ausdehnung von Ackerbau in niedrigere Lagen. Bodenart, Entfernung zum saisonalen Flusslauf und Niederschlag waren hingegen nicht signifikant. Das niedrige Bestimmtheitsmaß (R²=0,154) weist darauf hin, dass es weitere, hier nicht erfasste Erklärungsvariablen für die Richtung der räumlichen Ausweitung von Ackerland gibt. Streudiagramme zu Ackergröße und Anbaujahren in Relation zu geographischer Höhe zeigen seit dem Jahr 2000 eine Ausdehnung des Ackerbaus in Lagen unter 1620 müNN und eine Zunahme der Schlaggröße (>3ha). Die Analyse der phänologischen Entwicklung von Feldfrüchten im Jahresverlauf in Kombination mit Niederschlagsdaten und normalized difference vegetation index (NDVI) Zeitreihendaten dienten dazu, Zeitpunkte besonders hoher (Begrünung vor der Ernte) oder niedriger (nach der Bodenbearbeitung) Pflanzenbiomasse auf Ackerland zu identifizieren, um Ackerland und seine Ausdehnung von anderen Vegetationsformen fernerkundlich unterscheiden zu können. Anhand der NDVI Spektralprofile konnte Ackerland gut Wald, jedoch weniger gut von Gras- und Buschland unterschieden werden. Die geringe Auflösung (250m) der Moderate Resolution Imaging Spectroradiometer (MODIS) NDVI Daten führte zu einem Mixed Pixel Effect, d.h. die Fläche eines Pixels beinhaltete häufig verschiedene Vegetationsformen in unterschiedlichen Anteilen, was deren Unterscheidung beeinträchtigte. Für die Entwicklung eines Echtzeit Monitoring Systems für die Ausdehnung des Ackerbaus wären höher auflösende NDVI Daten (z.B. Multispektralband, Hyperion EO-1 Sensor) notwendig, um kleinräumig eine bessere Differenzierung von Ackerland und natürlicher Rangeland-Vegetation zu erhalten. Die Entwicklung und der Einsatz solcher Methoden als Entscheidungshilfen für Land- und Ressourcennutzungsplanung könnte dazu beitragen, Produktions- und Entwicklungsziele der Borana Landnutzer mit nationalen Anstrengungen zur Eindämmung des Klimawandels durch Steigerung der Kohlenstofffixierung in Rangelands in Einklang zu bringen.
Resumo:
Máster Universitario en Sistemas Inteligentes y Aplicaciones Numéricas en Ingeniería (SIANI)
Resumo:
Dans ce projet de recherche, le dépôt des couches minces de carbone amorphe (généralement connu sous le nom de DLC pour Diamond-Like Carbon en anglais) par un procédé de dépôt chimique en phase vapeur assisté par plasma (ou PECVD pour Plasma Enhanced Chemical Vapor deposition en anglais) a été étudié en utilisant la Spectroscopie d’Émission Optique (OES) et l’analyse partielle par régression des moindres carrés (PLSR). L’objectif de ce mémoire est d’établir un modèle statistique pour prévoir les propriétés des revêtements DLC selon les paramètres du procédé de déposition ou selon les données acquises par OES. Deux séries d’analyse PLSR ont été réalisées. La première examine la corrélation entre les paramètres du procédé et les caractéristiques du plasma pour obtenir une meilleure compréhension du processus de dépôt. La deuxième série montre le potentiel de la technique d’OES comme outil de surveillance du procédé et de prédiction des propriétés de la couche déposée. Les résultats montrent que la prédiction des propriétés des revêtements DLC qui était possible jusqu’à maintenant en se basant sur les paramètres du procédé (la pression, la puissance, et le mode du plasma), serait envisageable désormais grâce aux informations obtenues par OES du plasma (particulièrement les indices qui sont reliées aux concentrations des espèces dans le plasma). En effet, les données obtenues par OES peuvent être utilisées pour surveiller directement le processus de dépôt plutôt que faire une étude complète de l’effet des paramètres du processus, ceux-ci étant strictement reliés au réacteur plasma et étant variables d’un laboratoire à l’autre. La perspective de l’application d’un modèle PLSR intégrant les données de l’OES est aussi démontrée dans cette recherche afin d’élaborer et surveiller un dépôt avec une structure graduelle.
Resumo:
Finding rare events in multidimensional data is an important detection problem that has applications in many fields, such as risk estimation in insurance industry, finance, flood prediction, medical diagnosis, quality assurance, security, or safety in transportation. The occurrence of such anomalies is so infrequent that there is usually not enough training data to learn an accurate statistical model of the anomaly class. In some cases, such events may have never been observed, so the only information that is available is a set of normal samples and an assumed pairwise similarity function. Such metric may only be known up to a certain number of unspecified parameters, which would either need to be learned from training data, or fixed by a domain expert. Sometimes, the anomalous condition may be formulated algebraically, such as a measure exceeding a predefined threshold, but nuisance variables may complicate the estimation of such a measure. Change detection methods used in time series analysis are not easily extendable to the multidimensional case, where discontinuities are not localized to a single point. On the other hand, in higher dimensions, data exhibits more complex interdependencies, and there is redundancy that could be exploited to adaptively model the normal data. In the first part of this dissertation, we review the theoretical framework for anomaly detection in images and previous anomaly detection work done in the context of crack detection and detection of anomalous components in railway tracks. In the second part, we propose new anomaly detection algorithms. The fact that curvilinear discontinuities in images are sparse with respect to the frame of shearlets, allows us to pose this anomaly detection problem as basis pursuit optimization. Therefore, we pose the problem of detecting curvilinear anomalies in noisy textured images as a blind source separation problem under sparsity constraints, and propose an iterative shrinkage algorithm to solve it. Taking advantage of the parallel nature of this algorithm, we describe how this method can be accelerated using graphical processing units (GPU). Then, we propose a new method for finding defective components on railway tracks using cameras mounted on a train. We describe how to extract features and use a combination of classifiers to solve this problem. Then, we scale anomaly detection to bigger datasets with complex interdependencies. We show that the anomaly detection problem naturally fits in the multitask learning framework. The first task consists of learning a compact representation of the good samples, while the second task consists of learning the anomaly detector. Using deep convolutional neural networks, we show that it is possible to train a deep model with a limited number of anomalous examples. In sequential detection problems, the presence of time-variant nuisance parameters affect the detection performance. In the last part of this dissertation, we present a method for adaptively estimating the threshold of sequential detectors using Extreme Value Theory on a Bayesian framework. Finally, conclusions on the results obtained are provided, followed by a discussion of possible future work.