57 resultados para Data recovery (Computer science)
Resumo:
In this and a preceding paper, we provide an introduction to the Fujitsu VPP range of vector-parallel supercomputers and to some of the computational chemistry software available for the VPP. Here, we consider the implementation and performance of seven popular chemistry application packages. The codes discussed range from classical molecular dynamics to semiempirical and ab initio quantum chemistry. All have evolved from sequential codes, and have typically been parallelised using a replicated data approach. As such they are well suited to the large-memory/fast-processor architecture of the VPP. For one code, CASTEP, a distributed-memory data-driven parallelisation scheme is presented. (C) 2000 Published by Elsevier Science B.V. All rights reserved.
Resumo:
Normal mixture models are being increasingly used to model the distributions of a wide variety of random phenomena and to cluster sets of continuous multivariate data. However, for a set of data containing a group or groups of observations with longer than normal tails or atypical observations, the use of normal components may unduly affect the fit of the mixture model. In this paper, we consider a more robust approach by modelling the data by a mixture of t distributions. The use of the ECM algorithm to fit this t mixture model is described and examples of its use are given in the context of clustering multivariate data in the presence of atypical observations in the form of background noise.
Resumo:
Continuous-valued recurrent neural networks can learn mechanisms for processing context-free languages. The dynamics of such networks is usually based on damped oscillation around fixed points in state space and requires that the dynamical components are arranged in certain ways. It is shown that qualitatively similar dynamics with similar constraints hold for a(n)b(n)c(n), a context-sensitive language. The additional difficulty with a(n)b(n)c(n), compared with the context-free language a(n)b(n), consists of 'counting up' and 'counting down' letters simultaneously. The network solution is to oscillate in two principal dimensions, one for counting up and one for counting down. This study focuses on the dynamics employed by the sequential cascaded network, in contrast to the simple recurrent network, and the use of backpropagation through time. Found solutions generalize well beyond training data, however, learning is not reliable. The contribution of this study lies in demonstrating how the dynamics in recurrent neural networks that process context-free languages can also be employed in processing some context-sensitive languages (traditionally thought of as requiring additional computation resources). This continuity of mechanism between language classes contributes to our understanding of neural networks in modelling language learning and processing.
Resumo:
The movement of chemicals through the soil to the groundwater or discharged to surface waters represents a degradation of these resources. In many cases, serious human and stock health implications are associated with this form of pollution. The chemicals of interest include nutrients, pesticides, salts, and industrial wastes. Recent studies have shown that current models and methods do not adequately describe the leaching of nutrients through soil, often underestimating the risk of groundwater contamination by surface-applied chemicals, and overestimating the concentration of resident solutes. This inaccuracy results primarily from ignoring soil structure and nonequilibrium between soil constituents, water, and solutes. A multiple sample percolation system (MSPS), consisting of 25 individual collection wells, was constructed to study the effects of localized soil heterogeneities on the transport of nutrients (NO3-, Cl-, PO43-) in the vadose zone of an agricultural soil predominantly dominated by clay. Very significant variations in drainage patterns across a small spatial scale were observed tone-way ANOVA, p < 0.001) indicating considerable heterogeneity in water flow patterns and nutrient leaching. Using data collected from the multiple sample percolation experiments, this paper compares the performance of two mathematical models for predicting solute transport, the advective-dispersion model with a reaction term (ADR), and a two-region preferential flow model (TRM) suitable for modelling nonequilibrium transport. These results have implications for modelling solute transport and predicting nutrient loading on a larger scale. (C) 2001 Elsevier Science Ltd. All rights reserved.
Resumo:
The World Wide Web (WWW) is useful for distributing scientific data. Most existing web data resources organize their information either in structured flat files or relational databases with basic retrieval capabilities. For databases with one or a few simple relations, these approaches are successful, but they can be cumbersome when there is a data model involving multiple relations between complex data. We believe that knowledge-based resources offer a solution in these cases. Knowledge bases have explicit declarations of the concepts in the domain, along with the relations between them. They are usually organized hierarchically, and provide a global data model with a controlled vocabulary, We have created the OWEB architecture for building online scientific data resources using knowledge bases. OWEB provides a shell for structuring data, providing secure and shared access, and creating computational modules for processing and displaying data. In this paper, we describe the translation of the online immunological database MHCPEP into an OWEB system called MHCWeb. This effort involved building a conceptual model for the data, creating a controlled terminology for the legal values for different types of data, and then translating the original data into the new structure. The 0 WEB environment allows for flexible access to the data by both users and computer programs.
Resumo:
Sum: Plant biologists in fields of ecology, evolution, genetics and breeding frequently use multivariate methods. This paper illustrates Principal Component Analysis (PCA) and Gabriel's biplot as applied to microarray expression data from plant pathology experiments. Availability: An example program in the publicly distributed statistical language R is available from the web site (www.tpp.uq.edu.au) and by e-mail from the contact. Contact: scott.chapman@csiro.au.
Resumo:
This paper is concerned with the use of scientific visualization methods for the analysis of feedforward neural networks (NNs). Inevitably, the kinds of data associated with the design and implementation of neural networks are of very high dimensionality, presenting a major challenge for visualization. A method is described using the well-known statistical technique of principal component analysis (PCA). This is found to be an effective and useful method of visualizing the learning trajectories of many learning algorithms such as back-propagation and can also be used to provide insight into the learning process and the nature of the error surface.
Resumo:
While multimedia data, image data in particular, is an integral part of most websites and web documents, our quest for information so far is still restricted to text based search. To explore the World Wide Web more effectively, especially its rich repository of truly multimedia information, we are facing a number of challenging problems. Firstly, we face the ambiguous and highly subjective nature of defining image semantics and similarity. Secondly, multimedia data could come from highly diversified sources, as a result of automatic image capturing and generation processes. Finally, multimedia information exists in decentralised sources over the Web, making it difficult to use conventional content-based image retrieval (CBIR) techniques for effective and efficient search. In this special issue, we present a collection of five papers on visual and multimedia information management and retrieval topics, addressing some aspects of these challenges. These papers have been selected from the conference proceedings (Kluwer Academic Publishers, ISBN: 1-4020- 7060-8) of the Sixth IFIP 2.6 Working Conference on Visual Database Systems (VDB6), held in Brisbane, Australia, on 29–31 May 2002.
Resumo:
Taking functional programming to its extremities in search of simplicity still requires integration with other development (e.g. formal) methods. Induction is the key to deriving and verifying functional programs, but can be simplified through packaging proofs with functions, particularly folds, on data (structures). Totally Functional Programming avoids the complexities of interpretation by directly representing data (structures) as platonic combinators - the functions characteristic to the data. The link between the two simplifications is that platonic combinators are a kind of partially-applied fold, which means that platonic combinators inherit fold-theoretic properties, but with some apparent simplifications due to the platonic combinator representation. However, despite observable behaviour within functional programming that suggests that TFP is widely-applicable, significant work remains before TFP as such could be widely adopted.
Resumo:
Spatial data has now been used extensively in the Web environment, providing online customized maps and supporting map-based applications. The full potential of Web-based spatial applications, however, has yet to be achieved due to performance issues related to the large sizes and high complexity of spatial data. In this paper, we introduce a multiresolution approach to spatial data management and query processing such that the database server can choose spatial data at the right resolution level for different Web applications. One highly desirable property of the proposed approach is that the server-side processing cost and network traffic can be reduced when the level of resolution required by applications are low. Another advantage is that our approach pushes complex multiresolution structures and algorithms into the spatial database engine. That is, the developer of spatial Web applications needs not to be concerned with such complexity. This paper explains the basic idea, technical feasibility and applications of multiresolution spatial databases.
Resumo:
PHWAT is a new model that couples a geochemical reaction model (PHREEQC-2) with a density-dependent groundwater flow and solute transport model (SEAWAT) using the split-operator approach. PHWAT was developed to simulate multi-component reactive transport in variable density groundwater flow. Fluid density in PHWAT depends not on only the concentration of a single species as in SEAWAT, but also the concentrations of other dissolved chemicals that can be subject to reactive processes. Simulation results of PHWAT and PHREEQC-2 were compared in their predictions of effluent concentration from a column experiment. Both models produced identical results, showing that PHWAT has correctly coupled the sub-packages. PHWAT was then applied to the simulation of a tank experiment in which seawater intrusion was accompanied by cation exchange. The density dependence of the intrusion and the snow-plough effect in the breakthrough curves were reflected in the model simulations, which were in good agreement with the measured breakthrough data. Comparison simulations that, in turn, excluded density effects and reactions allowed us to quantify the marked effect of ignoring these processes. Next, we explored numerical issues involved in the practical application of PHWAT using the example of a dense plume flowing into a tank containing fresh water. It was shown that PHWAT could model physically unstable flow and that numerical instabilities were suppressed. Physical instability developed in the model in accordance with the increase of the modified Rayleigh number for density-dependent flow, in agreement with previous research. (c) 2004 Elsevier Ltd. All rights reserved.
Resumo:
Test templates and a test template framework are introduced as useful concepts in specification-based testing. The framework can be defined using any model-based specification notation and used to derive tests from model-based specifications-in this paper, it is demonstrated using the Z notation. The framework formally defines test data sets and their relation to the operations in a specification and to other test data sets, providing structure to the testing process. Flexibility is preserved, so that many testing strategies can be used. Important application areas of the framework are discussed, including refinement of test data, regression testing, and test oracles.
Resumo:
In order to separate the effects of experience from other characteristics of word frequency (e.g., orthographic distinctiveness), computer science and psychology students rated their experience with computer science technical items and nontechnical items from a wide range of word frequencies prior to being tested for recognition memory of the rated items. For nontechnical items, there was a curvilinear relationship between recognition accuracy and word frequency for both groups of students. The usual superiority of low-frequency words was demonstrated and high-frequency words were recognized least well. For technical items, a similar curvilinear relationship was evident for the psychology students, but for the computer science students, recognition accuracy was inversely related to word frequency. The ratings data showed that subjective experience rather than background word frequency was the better predictor of recognition accuracy.
Resumo:
An important feature of some conceptual modelling grammars is the features they provide to allow database designers to show real-world things may or may not possess a particular attribute or relationship. In the entity-relationship model, for example, the fact that a thing may not possess an attribute can be represented by using a special symbol to indicate that the attribute is optional. Similarly, the fact that a thing may or may not be involved in a relationship can be represented by showing the minimum cardinality of the relationship as zero. Whether these practices should be followed, however, is a contentious issue. An alternative approach is to eliminate optional attributes and relationships from conceptual schema diagrams by using subtypes that have only mandatory attributes and relationships. In this paper, we first present a theory that led us to predict that optional attributes and relationships should be used in conceptual schema diagrams only when users of the diagrams require a surface-level understanding of the domain being represented by the diagrams. When users require a deep-level understanding, however, optional attributes and relationships should not be used because they undermine users' abilities to grasp important domain semantics. We describe three experiments which we then undertook to test our predictions. The results of the experiments support our predictions.
Resumo:
This paper examines the effects of information request ambiguity and construct incongruence on end user's ability to develop SQL queries with an interactive relational database query language. In this experiment, ambiguity in information requests adversely affected accuracy and efficiency. Incongruities among the information request, the query syntax, and the data representation adversely affected accuracy, efficiency, and confidence. The results for ambiguity suggest that organizations might elicit better query development if end users were sensitized to the nature of ambiguities that could arise in their business contexts. End users could translate natural language queries into pseudo-SQL that could be examined for precision before the queries were developed. The results for incongruence suggest that better query development might ensue if semantic distances could be reduced by giving users data representations and database views that maximize construct congruence for the kinds of queries in typical domains. (C) 2001 Elsevier Science B.V. All rights reserved.