Biblioteca Digital

933 resultados para non-trivial data structures

Censored Linear Regression Models For Irregularly Observed Longitudinal Data Using The Multivariate-t Distribution.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In acquired immunodeficiency syndrome (AIDS) studies it is quite common to observe viral load measurements collected irregularly over time. Moreover, these measurements can be subjected to some upper and/or lower detection limits depending on the quantification assays. A complication arises when these continuous repeated measures have a heavy-tailed behavior. For such data structures, we propose a robust structure for a censored linear model based on the multivariate Student's t-distribution. To compensate for the autocorrelation existing among irregularly observed measures, a damped exponential correlation structure is employed. An efficient expectation maximization type algorithm is developed for computing the maximum likelihood estimates, obtaining as a by-product the standard errors of the fixed effects and the log-likelihood function. The proposed algorithm uses closed-form expressions at the E-step that rely on formulas for the mean and variance of a truncated multivariate Student's t-distribution. The methodology is illustrated through an application to an Human Immunodeficiency Virus-AIDS (HIV-AIDS) study and several simulation studies.

Development and evaluation of neural network freeway incident detection models using field data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper discusses a multi-layer feedforward (MLF) neural network incident detection model that was developed and evaluated using field data. In contrast to published neural network incident detection models which relied on simulated or limited field data for model development and testing, the model described in this paper was trained and tested on a real-world data set of 100 incidents. The model uses speed, flow and occupancy data measured at dual stations, averaged across all lanes and only from time interval t. The off-line performance of the model is reported under both incident and non-incident conditions. The incident detection performance of the model is reported based on a validation-test data set of 40 incidents that were independent of the 60 incidents used for training. The false alarm rates of the model are evaluated based on non-incident data that were collected from a freeway section which was video-taped for a period of 33 days. A comparative evaluation between the neural network model and the incident detection model in operation on Melbourne's freeways is also presented. The results of the comparative performance evaluation clearly demonstrate the substantial improvement in incident detection performance obtained by the neural network model. The paper also presents additional results that demonstrate how improvements in model performance can be achieved using variable decision thresholds. Finally, the model's fault-tolerance under conditions of corrupt or missing data is investigated and the impact of loop detector failure/malfunction on the performance of the trained model is evaluated and discussed. The results presented in this paper provide a comprehensive evaluation of the developed model and confirm that neural network models can provide fast and reliable incident detection on freeways. (C) 1997 Elsevier Science Ltd. All rights reserved.

The effects of normalisation on the ability of business end-users to detect data anomalies: An experimental evaluation

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the proliferation of relational database programs for PC's and other platforms, many business end-users are creating, maintaining, and querying their own databases. More importantly, business end-users use the output of these queries as the basis for operational, tactical, and strategic decisions. Inaccurate data reduce the expected quality of these decisions. Implementing various input validation controls, including higher levels of normalisation, can reduce the number of data anomalies entering the databases. Even in well-maintained databases, however, data anomalies will still accumulate. To improve the quality of data, databases can be queried periodically to locate and correct anomalies. This paper reports the results of two experiments that investigated the effects of different data structures on business end-users' abilities to detect data anomalies in a relational database. The results demonstrate that both unnormalised and higher levels of normalisation lower the effectiveness and efficiency of queries relative to the first normal form. First normal form databases appear to provide the most effective and efficient data structure for business end-users formulating queries to detect data anomalies.

Slicing for architectural analysis

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Current software development often relies on non-trivial coordination logic for combining autonomous services, eventually running on different platforms. As a rule, however, such a coordination layer is strongly woven within the application at source code level. Therefore, its precise identification becomes a major methodological (and technical) problem and a challenge to any program understanding or refactoring process. The approach introduced in this paper resorts to slicing techniques to extract coordination data from source code. Such data are captured in a specific dependency graph structure from which a coordination model can be recovered either in the form of an Orc specification or as a collection of code fragments corresponding to the identification of typical coordination patterns in the system. Tool support is also discussed

On the Suitability of Suffix Arrays for Lempel-Ziv Data Compression

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Lossless compression algorithms of the Lempel-Ziv (LZ) family are widely used nowadays. Regarding time and memory requirements, LZ encoding is much more demanding than decoding. In order to speed up the encoding process, efficient data structures, like suffix trees, have been used. In this paper, we explore the use of suffix arrays to hold the dictionary of the LZ encoder, and propose an algorithm to search over it. We show that the resulting encoder attains roughly the same compression ratios as those based on suffix trees. However, the amount of memory required by the suffix array is fixed, and much lower than the variable amount of memory used by encoders based on suffix trees (which depends on the text to encode). We conclude that suffix arrays, when compared to suffix trees in terms of the trade-off among time, memory, and compression ratio, may be preferable in scenarios (e.g., embedded systems) where memory is at a premium and high speed is not critical.

Automatic learning for the classification of chemical reactions and in statistical thermodynamics

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This Thesis describes the application of automatic learning methods for a) the classification of organic and metabolic reactions, and b) the mapping of Potential Energy Surfaces(PES). The classification of reactions was approached with two distinct methodologies: a representation of chemical reactions based on NMR data, and a representation of chemical reactions from the reaction equation based on the physico-chemical and topological features of chemical bonds. NMR-based classification of photochemical and enzymatic reactions. Photochemical and metabolic reactions were classified by Kohonen Self-Organizing Maps (Kohonen SOMs) and Random Forests (RFs) taking as input the difference between the 1H NMR spectra of the products and the reactants. The development of such a representation can be applied in automatic analysis of changes in the 1H NMR spectrum of a mixture and their interpretation in terms of the chemical reactions taking place. Examples of possible applications are the monitoring of reaction processes, evaluation of the stability of chemicals, or even the interpretation of metabonomic data. A Kohonen SOM trained with a data set of metabolic reactions catalysed by transferases was able to correctly classify 75% of an independent test set in terms of the EC number subclass. Random Forests improved the correct predictions to 79%. With photochemical reactions classified into 7 groups, an independent test set was classified with 86-93% accuracy. The data set of photochemical reactions was also used to simulate mixtures with two reactions occurring simultaneously. Kohonen SOMs and Feed-Forward Neural Networks (FFNNs) were trained to classify the reactions occurring in a mixture based on the 1H NMR spectra of the products and reactants. Kohonen SOMs allowed the correct assignment of 53-63% of the mixtures (in a test set). Counter-Propagation Neural Networks (CPNNs) gave origin to similar results. The use of supervised learning techniques allowed an improvement in the results. They were improved to 77% of correct assignments when an ensemble of ten FFNNs were used and to 80% when Random Forests were used. This study was performed with NMR data simulated from the molecular structure by the SPINUS program. In the design of one test set, simulated data was combined with experimental data. The results support the proposal of linking databases of chemical reactions to experimental or simulated NMR data for automatic classification of reactions and mixtures of reactions. Genome-scale classification of enzymatic reactions from their reaction equation. The MOLMAP descriptor relies on a Kohonen SOM that defines types of bonds on the basis of their physico-chemical and topological properties. The MOLMAP descriptor of a molecule represents the types of bonds available in that molecule. The MOLMAP descriptor of a reaction is defined as the difference between the MOLMAPs of the products and the reactants, and numerically encodes the pattern of bonds that are broken, changed, and made during a chemical reaction. The automatic perception of chemical similarities between metabolic reactions is required for a variety of applications ranging from the computer validation of classification systems, genome-scale reconstruction (or comparison) of metabolic pathways, to the classification of enzymatic mechanisms. Catalytic functions of proteins are generally described by the EC numbers that are simultaneously employed as identifiers of reactions, enzymes, and enzyme genes, thus linking metabolic and genomic information. Different methods should be available to automatically compare metabolic reactions and for the automatic assignment of EC numbers to reactions still not officially classified. In this study, the genome-scale data set of enzymatic reactions available in the KEGG database was encoded by the MOLMAP descriptors, and was submitted to Kohonen SOMs to compare the resulting map with the official EC number classification, to explore the possibility of predicting EC numbers from the reaction equation, and to assess the internal consistency of the EC classification at the class level. A general agreement with the EC classification was observed, i.e. a relationship between the similarity of MOLMAPs and the similarity of EC numbers. At the same time, MOLMAPs were able to discriminate between EC sub-subclasses. EC numbers could be assigned at the class, subclass, and sub-subclass levels with accuracies up to 92%, 80%, and 70% for independent test sets. The correspondence between chemical similarity of metabolic reactions and their MOLMAP descriptors was applied to the identification of a number of reactions mapped into the same neuron but belonging to different EC classes, which demonstrated the ability of the MOLMAP/SOM approach to verify the internal consistency of classifications in databases of metabolic reactions. RFs were also used to assign the four levels of the EC hierarchy from the reaction equation. EC numbers were correctly assigned in 95%, 90%, 85% and 86% of the cases (for independent test sets) at the class, subclass, sub-subclass and full EC number level,respectively. Experiments for the classification of reactions from the main reactants and products were performed with RFs - EC numbers were assigned at the class, subclass and sub-subclass level with accuracies of 78%, 74% and 63%, respectively. In the course of the experiments with metabolic reactions we suggested that the MOLMAP / SOM concept could be extended to the representation of other levels of metabolic information such as metabolic pathways. Following the MOLMAP idea, the pattern of neurons activated by the reactions of a metabolic pathway is a representation of the reactions involved in that pathway - a descriptor of the metabolic pathway. This reasoning enabled the comparison of different pathways, the automatic classification of pathways, and a classification of organisms based on their biochemical machinery. The three levels of classification (from bonds to metabolic pathways) allowed to map and perceive chemical similarities between metabolic pathways even for pathways of different types of metabolism and pathways that do not share similarities in terms of EC numbers. Mapping of PES by neural networks (NNs). In a first series of experiments, ensembles of Feed-Forward NNs (EnsFFNNs) and Associative Neural Networks (ASNNs) were trained to reproduce PES represented by the Lennard-Jones (LJ) analytical potential function. The accuracy of the method was assessed by comparing the results of molecular dynamics simulations (thermal, structural, and dynamic properties) obtained from the NNs-PES and from the LJ function. The results indicated that for LJ-type potentials, NNs can be trained to generate accurate PES to be used in molecular simulations. EnsFFNNs and ASNNs gave better results than single FFNNs. A remarkable ability of the NNs models to interpolate between distant curves and accurately reproduce potentials to be used in molecular simulations is shown. The purpose of the first study was to systematically analyse the accuracy of different NNs. Our main motivation, however, is reflected in the next study: the mapping of multidimensional PES by NNs to simulate, by Molecular Dynamics or Monte Carlo, the adsorption and self-assembly of solvated organic molecules on noble-metal electrodes. Indeed, for such complex and heterogeneous systems the development of suitable analytical functions that fit quantum mechanical interaction energies is a non-trivial or even impossible task. The data consisted of energy values, from Density Functional Theory (DFT) calculations, at different distances, for several molecular orientations and three electrode adsorption sites. The results indicate that NNs require a data set large enough to cover well the diversity of possible interaction sites, distances, and orientations. NNs trained with such data sets can perform equally well or even better than analytical functions. Therefore, they can be used in molecular simulations, particularly for the ethanol/Au (111) interface which is the case studied in the present Thesis. Once properly trained, the networks are able to produce, as output, any required number of energy points for accurate interpolations.

Calculation of fractional derivatives of noisy data with genetic algorithms

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper addresses the calculation of derivatives of fractional order for non-smooth data. The noise is avoided by adopting an optimization formulation using genetic algorithms (GA). Given the flexibility of the evolutionary schemes, a hierarchical GA composed by a series of two GAs, each one with a distinct fitness function, is established.

Estudo de viabilidade de paralelização de códigos de análise de dados em PROOF

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dissertação de mestrado em Engenharia Informática

Population Dynamics Modeling of Arapaima gigas

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Pirarucu (Arapaima gigas) has been of the most important natural fishing resources of the Amazon region. Due to its economic importance, and the necessity to preserve the species hand, field research concerning the habits and behavior of the pirarucu has been increasing for the last 20 years. The aim of this paper is to present a mathematical model for the pirarucu population dynamics considering the species peculiarities, particularly the male parental care over the offspring. The solution of the dynamical systems indicates three possible equilibrium points for the population. The first corresponds to extinction; the third corresponds to a stable population close to the environmental carrying capacity. The second corresponds to an unstable equilibrium located between extinction and full use of the carrying capacity. It is shown that lack of males’ parental care closes the gap between the point corresponding to the unstable equilibrium and the point of stable non-trivial equilibrium. If guarding failure reaches a critical point the two points coincide and the population tends irreversibly to extinction. If some event tends to destabilize the population equilibrium, as for instance inadequate parental care, the model responds in such a way as to restore the trajectory towards the stable equilibrium point avoiding the route to extinction. The parameters introduced to solve the system of equations are partially derived from limited but reliable field data collected at the Mamirauá Sustainable Development Reserve (MSDR) in the Brazilian Amazonian Region.

From software extensions to product lines of dataflow programs

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Abstract Dataflow programs are widely used. Each program is a directed graph where nodes are computations and edges indicate the flow of data. In prior work, we reverse-engineered legacy dataflow programs by deriving their optimized implementations from a simple specification graph using graph transformations called refinements and optimizations. In MDE-speak, our derivations were PIM-to-PSM mappings. In this paper, we show how extensions complement refinements, optimizations, and PIM-to-PSM derivations to make the process of reverse engineering complex legacy dataflow programs tractable. We explain how optional functionality in transformations can be encoded, thereby enabling us to encode product lines of transformations as well as product lines of dataflow programs. We describe the implementation of extensions in the ReFlO tool and present two non-trivial case studies as evidence of our work’s generality

Concise server-wide causality management for eventually consistent data stores

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Large scale distributed data stores rely on optimistic replication to scale and remain highly available in the face of net work partitions. Managing data without coordination results in eventually consistent data stores that allow for concurrent data updates. These systems often use anti-entropy mechanisms (like Merkle Trees) to detect and repair divergent data versions across nodes. However, in practice hash-based data structures are too expensive for large amounts of data and create too many false conﬂicts. Another aspect of eventual consistency is detecting write conﬂicts. Logical clocks are often used to track data causality, necessary to detect causally concurrent writes on the same key. However, there is a nonnegligible metadata overhead per key, which also keeps growing with time, proportional with the node churn rate. Another challenge is deleting keys while respecting causality: while the values can be deleted, perkey metadata cannot be permanently removed without coordination. Weintroduceanewcausalitymanagementframeworkforeventuallyconsistentdatastores,thatleveragesnodelogicalclocks(BitmappedVersion Vectors) and a new key logical clock (Dotted Causal Container) to provides advantages on multiple fronts: 1) a new eﬃcient and lightweight anti-entropy mechanism; 2) greatly reduced per-key causality metadata size; 3) accurate key deletes without permanent metadata.

Development of an intelligent earthwork optimization system

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Tese de Doutoramento em Engenharia Civil.

An Ontology for Open Rubric Exchange on the Web

Relevância:

100.00% 100.00%

Publicador:

Resumo:

While the Internet has given educators access to a steady supply of Open Educational Resources, the educational rubrics commonly shared on the Web are generally in the form of static, non-semantic presentational documents or in the proprietary data structures of commercial content and learning management systems.With the advent of Semantic Web Standards, producers of online resources have a new framework to support the open exchange of software-readable datasets. Despite these advances, the state of the art of digital representation of rubrics as sharable documents has not progressed.This paper proposes an ontological model for digital rubrics. This model is built upon the Semantic Web Standards of the World Wide Web Consortium (W3C), principally the Resource Description Framework (RDF) and Web Ontology Language (OWL).

KnotProt: a database of proteins with knots and slipknots.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The protein topology database KnotProt, http://knotprot.cent.uw.edu.pl/, collects information about protein structures with open polypeptide chains forming knots or slipknots. The knotting complexity of the cataloged proteins is presented in the form of a matrix diagram that shows users the knot type of the entire polypeptide chain and of each of its subchains. The pattern visible in the matrix gives the knotting fingerprint of a given protein and permits users to determine, for example, the minimal length of the knotted regions (knot's core size) or the depth of a knot, i.e. how many amino acids can be removed from either end of the cataloged protein structure before converting it from a knot to a different type of knot. In addition, the database presents extensive information about the biological functions, families and fold types of proteins with non-trivial knotting. As an additional feature, the KnotProt database enables users to submit protein or polymer chains and generate their knotting fingerprints.

Real wage rigidities and the new Keynesian model

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Most central banks perceive a trade-off between stabilizing inflation and stabilizing the gap between output and desired output. However, the standard new Keynesian framework implies no such trade-off. In that framework, stabilizing inflation is equivalent to stabilizing the welfare-relevant output gap. In this paper, we argue that this property of the new Keynesian framework, which we call the divine coincidence, is due to a special feature of the model: the absence of non trivial real imperfections.We focus on one such real imperfection, namely, real wage rigidities. When the baseline new Keynesian model is extended to allow for real wage rigidities, the divine coincidence disappears, and central banks indeed face a trade-off between stabilizing inflation and stabilizing the welfare-relevant output gap. We show that not only does the extended model have more realistic normative implications, but it also has appealing positive properties. In particular, it provides a natural interpretation for the dynamic inflation-unemployment relation found in the data.

«
1
2
3
4
5
6
7
8
...
62
63
»