73 resultados para Data structures (Computer science)
Resumo:
Normal mixture models are often used to cluster continuous data. However, conventional approaches for fitting these models will have problems in producing nonsingular estimates of the component-covariance matrices when the dimension of the observations is large relative to the number of observations. In this case, methods such as principal components analysis (PCA) and the mixture of factor analyzers model can be adopted to avoid these estimation problems. We examine these approaches applied to the Cabernet wine data set of Ashenfelter (1999), considering the clustering of both the wines and the judges, and comparing our results with another analysis. The mixture of factor analyzers model proves particularly effective in clustering the wines, accurately classifying many of the wines by location.
Resumo:
The schema of an information system can significantly impact the ability of end users to efficiently and effectively retrieve the information they need. Obtaining quickly the appropriate data increases the likelihood that an organization will make good decisions and respond adeptly to challenges. This research presents and validates a methodology for evaluating, ex ante, the relative desirability of alternative instantiations of a model of data. In contrast to prior research, each instantiation is based on a different formal theory. This research theorizes that the instantiation that yields the lowest weighted average query complexity for a representative sample of information requests is the most desirable instantiation for end-user queries. The theory was validated by an experiment that compared end-user performance using an instantiation of a data structure based on the relational model of data with performance using the corresponding instantiation of the data structure based on the object-relational model of data. Complexity was measured using three different Halstead metrics: program length, difficulty, and effort. For a representative sample of queries, the average complexity using each instantiation was calculated. As theorized, end users querying the instantiation with the lower average complexity made fewer semantic errors, i.e., were more effective at composing queries. (c) 2005 Elsevier B.V. All rights reserved.
Resumo:
With mixed feature data, problems are induced in modeling the gating network of normalized Gaussian (NG) networks as the assumption of multivariate Gaussian becomes invalid. In this paper, we propose an independence model to handle mixed feature data within the framework of NG networks. The method is illustrated using a real example of breast cancer data.
Resumo:
beta-turns are important topological motifs for biological recognition of proteins and peptides. Organic molecules that sample the side chain positions of beta-turns have shown broad binding capacity to multiple different receptors, for example benzodiazepines. beta-turns have traditionally been classified into various types based on the backbone dihedral angles (phi 2, psi 2, phi 3 and psi 3). Indeed, 57-68% of beta-turns are currently classified into 8 different backbone families (Type I, Type II, Type I', Type II', Type VIII, Type VIa1, Type VIa2 and Type VIb and Type IV which represents unclassified beta-turns). Although this classification of beta-turns has been useful, the resulting beta-turn types are not ideal for the design of beta-turn mimetics as they do not reflect topological features of the recognition elements, the side chains. To overcome this, we have extracted beta-turns from a data set of non-homologous and high-resolution protein crystal structures. The side chain positions, as defined by C-alpha-C-beta vectors, of these turns have been clustered using the kth nearest neighbor clustering and filtered nearest centroid sorting algorithms. Nine clusters were obtained that cluster 90% of the data, and the average intra-cluster RMSD of the four C-alpha-C-beta vectors is 0.36. The nine clusters therefore represent the topology of the side chain scaffold architecture of the vast majority of beta-turns. The mean structures of the nine clusters are useful for the development of beta-turn mimetics and as biological descriptors for focusing combinatorial chemistry towards biologically relevant topological space.
Resumo:
Fuzzy data has grown to be an important factor in data mining. Whenever uncertainty exists, simulation can be used as a model. Simulation is very flexible, although it can involve significant levels of computation. This article discusses fuzzy decision-making using the grey related analysis method. Fuzzy models are expected to better reflect decision-making uncertainty, at some cost in accuracy relative to crisp models. Monte Carlo simulation is used to incorporate experimental levels of uncertainty into the data and to measure the impact of fuzzy decision tree models using categorical data. Results are compared with decision tree models based on crisp continuous data.
Resumo:
This article presents various novel and conventional planar electromagnetic bandgap (EBG)-assisted transmission lines. Both microstrip lines and coplanar waveguides (CPWs) are designed with circular, rectangular, annular, plus-sign and fractal-patterned EBGs and dumbbell-shaped defected ground structure (DGS). The dispersion characteristics and the slow-wave factors of the design are investigated. (c) 2006 Wiley Periodicals, Inc.
Resumo:
In this paper, we consider how refinements between state-based specifications (e.g., written in Z) can be checked by use of a model checker. Specifically, we are interested in the verification of downward and upward simulations which are the standard approach to verifying refinements in state-based notations. We show how downward and upward simulations can be checked using existing temporal logic model checkers. In particular, we show how the branching time temporal logic CTL can be used to encode the standard simulation conditions. We do this for both a blocking, or guarded, interpretation of operations (often used when specifying reactive systems) as well as the more common non-blocking interpretation of operations used in many state-based specification languages (for modelling sequential systems). The approach is general enough to use with any state-based specification language, and we illustrate how refinements between Z specifications can be checked using the SAL CTL model checker using a small example.
Resumo:
Even when data repositories exhibit near perfect data quality, users may formulate queries that do not correspond to the information requested. Users’ poor information retrieval performance may arise from either problems understanding of the data models that represent the real world systems, or their query skills. This research focuses on users’ understanding of the data structures, i.e., their ability to map the information request and the data model. The Bunge-Wand-Weber ontology was used to formulate three sets of hypotheses. Two laboratory experiments (one using a small data model and one using a larger data model) tested the effect of ontological clarity on users’ performance when undertaking component, record, and aggregate level tasks. The results indicate for the hypotheses associated with different representations but equivalent semantics that parsimonious data model participants performed better for component level tasks but that ontologically clearer data model participants performed better for record and aggregate level tasks.
Resumo:
This paper describes a formal component language, used to support automated component-based program development. The components, referred to as templates, are machine processable, meaning that appropriate tool support, such as retrieval support, can be developed. The templates are highly adaptable, meaning that they can be applied to a wide range of problems. Some of the main features of the language are described, including: higher-order parameters; state variable declarations; specification statements and conditionals; applicability conditions and theories; meta-level place holders; and abstract data structures.
Resumo:
Motivation: Prediction methods for identifying binding peptides could minimize the number of peptides required to be synthesized and assayed, and thereby facilitate the identification of potential T-cell epitopes. We developed a bioinformatic method for the prediction of peptide binding to MHC class II molecules. Results: Experimental binding data and expert knowledge of anchor positions and binding motifs were combined with an evolutionary algorithm (EA) and an artificial neural network (ANN): binding data extraction --> peptide alignment --> ANN training and classification. This method, termed PERUN, was implemented for the prediction of peptides that bind to HLA-DR4(B1*0401). The respective positive predictive values of PERUN predictions of high-, moderate-, low- and zero-affinity binder-a were assessed as 0.8, 0.7, 0.5 and 0.8 by cross-validation, and 1.0, 0.8, 0.3 and 0.7 by experimental binding. This illustrates the synergy between experimentation and computer modeling, and its application to the identification of potential immunotheraaeutic peptides.
Resumo:
The moving finite element collocation method proposed by Kill et al. (1995) Chem. Engng Sci. 51 (4), 2793-2799 for solution of problems with steep gradients is further developed to solve transient problems arising in the field of adsorption. The technique is applied to a model of adsorption in solids with bidisperse pore structures. Numerical solutions were found to match the analytical solution when it exists (i.e. when the adsorption isotherm is linear). The method is simple yet sufficiently accurate for use in adsorption problems, where global collocation methods fail. (C) 1998 Elsevier Science Ltd. All rights reserved.
Resumo:
A set of five tasks was designed to examine dynamic aspects of visual attention: selective attention to color, selective attention to pattern, dividing and switching attention between color and pattern, and selective attention to pattern with changing target. These varieties of visual attention were examined using the same set of stimuli under different instruction sets; thus differences between tasks cannot be attributed to differences in the perceptual features of the stimuli. ERP data are presented for each of these tasks. A within-task analysis of different stimulus types varying in similarity to the attended target feature revealed that an early frontal selection positivity (FSP) was evident in selective attention tasks, regardless of whether color was the attended feature. The scalp distribution of a later posterior selection negativity (SN) was affected by whether the attended feature was color or pattern. The SN was largely unaffected by dividing attention across color and pattern. A large widespread positivity was evident in most conditions, consisting of at least three subcomponents which were differentially affected by the attention conditions. These findings are discussed in relation to prior research and the time course of visual attention processes in the brain. (C) 1999 Elsevier Science B.V. All rights reserved.
Resumo:
CXTANNEAL is a program for analysing contaminant transport in soils. The code, written in Fortran 77, is a modified version of CXTFIT, a commonly used package for estimating solute transport parameters in soils. The improvement of the present code is that it includes simulated annealing as the optimization technique for curve fitting. Tests with hypothetical data show that CXTANNEAL performs better than the original code in searching for optimal parameter estimates. To reduce the computational time, a parallel version of CXTANNEAL (CXTANNEAL_P) was also developed. (C) 1999 Elsevier Science Ltd. All rights reserved.