Biblioteca Digital

57 resultados para Data recovery (Computer science)

Model-based clustering in gene expression microarrays: An application to breast cancer data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In microarray studies, the application of clustering techniques is often used to derive meaningful insights into the data. In the past, hierarchical methods have been the primary clustering tool employed to perform this task. The hierarchical algorithms have been mainly applied heuristically to these cluster analysis problems. Further, a major limitation of these methods is their inability to determine the number of clusters. Thus there is a need for a model-based approach to these. clustering problems. To this end, McLachlan et al. [7] developed a mixture model-based algorithm (EMMIX-GENE) for the clustering of tissue samples. To further investigate the EMMIX-GENE procedure as a model-based -approach, we present a case study involving the application of EMMIX-GENE to the breast cancer data as studied recently in van 't Veer et al. [10]. Our analysis considers the problem of clustering the tissue samples on the basis of the genes which is a non-standard problem because the number of genes greatly exceed the number of tissue samples. We demonstrate how EMMIX-GENE can be useful in reducing the initial set of genes down to a more computationally manageable size. The results from this analysis also emphasise the difficulty associated with the task of separating two tissue groups on the basis of a particular subset of genes. These results also shed light on why supervised methods have such a high misallocation error rate for the breast cancer data.

Confirmation: Increasing resource availability for transactional workflows

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The notion of compensation is widely used in advanced transaction models as means of recovery from a failure. Similar concepts are adopted for providing transaction-like behaviour for long business processes supported by workflows technology. In general, it is not trivial to design compensating tasks for tasks in the context of a workflow. Actually, a task in a workflow process does not have to be compensatable in the sense that the forcibility of reverse operations of the task is not always guaranteed by the application semantics. In addition, the isolation requirement on data resources may make a task difficult to compensate. In this paper, we first look into the requirements that a compensating task has to satisfy. Then we introduce a new concept called confirmation. With the help of confirmation, we are able to modify most non-compensatable tasks so that they become compensatable. This can substantially increase the availability of shared resources and greatly improve backward recovery for workflow applications in case of failures. To effectively incorporate confirmation and compensation into a workflow management environment, a three level bottom-up workflow design method is introduced. The implementation issues of this design are also discussed. (C) 2003 Elsevier Science Inc. All rights reserved.

Data-driven fuzzy analysis in quantitative mineral resource assessment

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The integration of geo-information from multiple sources and of diverse nature in developing mineral favourability indexes (MFIs) is a well-known problem in mineral exploration and mineral resource assessment. Fuzzy set theory provides a convenient framework to combine and analyse qualitative and quantitative data independently of their source or characteristics. A novel, data-driven formulation for calculating MFIs based on fuzzy analysis is developed in this paper. Different geo-variables are considered fuzzy sets and their appropriate membership functions are defined and modelled. A new weighted average-type aggregation operator is then introduced to generate a new fuzzy set representing mineral favourability. The membership grades of the new fuzzy set are considered as the MFI. The weights for the aggregation operation combine the individual membership functions of the geo-variables, and are derived using information from training areas and L, regression. The technique is demonstrated in a case study of skarn tin deposits and is used to integrate geological, geochemical and magnetic data. The study area covers a total of 22.5 km(2) and is divided into 349 cells, which include nine control cells. Nine geo-variables are considered in this study. Depending on the nature of the various geo-variables, four different types of membership functions are used to model the fuzzy membership of the geo-variables involved. (C) 2002 Elsevier Science Ltd. All rights reserved.

Ex ante evaluations of alternate data structures for end user queries: Theory and experimental test

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The data structure of an information system can significantly impact the ability of end users to efficiently and effectively retrieve the information they need. This research develops a methodology for evaluating, ex ante, the relative desirability of alternative data structures for end user queries. This research theorizes that the data structure that yields the lowest weighted average complexity for a representative sample of information requests is the most desirable data structure for end user queries. The theory was tested in an experiment that compared queries from two different relational database schemas. As theorized, end users querying the data structure associated with the less complex queries performed better Complexity was measured using three different Halstead metrics. Each of the three metrics provided excellent predictions of end user performance. This research supplies strong evidence that organizations can use complexity metrics to evaluate, ex ante, the desirability of alternate data structures. Organizations can use these evaluations to enhance the efficient and effective retrieval of information by creating data structures that minimize end user query complexity.

Cluster analysis of high-dimensional data: A case study

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Normal mixture models are often used to cluster continuous data. However, conventional approaches for fitting these models will have problems in producing nonsingular estimates of the component-covariance matrices when the dimension of the observations is large relative to the number of observations. In this case, methods such as principal components analysis (PCA) and the mixture of factor analyzers model can be adopted to avoid these estimation problems. We examine these approaches applied to the Cabernet wine data set of Ashenfelter (1999), considering the clustering of both the wines and the judges, and comparing our results with another analysis. The mixture of factor analyzers model proves particularly effective in clustering the wines, accurately classifying many of the wines by location.

Selecting optimal instantiations of data models - Theory and validation of an ex ante approach

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The schema of an information system can significantly impact the ability of end users to efficiently and effectively retrieve the information they need. Obtaining quickly the appropriate data increases the likelihood that an organization will make good decisions and respond adeptly to challenges. This research presents and validates a methodology for evaluating, ex ante, the relative desirability of alternative instantiations of a model of data. In contrast to prior research, each instantiation is based on a different formal theory. This research theorizes that the instantiation that yields the lowest weighted average query complexity for a representative sample of information requests is the most desirable instantiation for end-user queries. The theory was validated by an experiment that compared end-user performance using an instantiation of a data structure based on the relational model of data with performance using the corresponding instantiation of the data structure based on the object-relational model of data. Complexity was measured using three different Halstead metrics: program length, difficulty, and effort. For a representative sample of queries, the average complexity using each instantiation was calculated. As theorized, end users querying the instantiation with the lower average complexity made fewer semantic errors, i.e., were more effective at composing queries. (c) 2005 Elsevier B.V. All rights reserved.

Normalized Gaussian Networks with Mixed Feature Data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With mixed feature data, problems are induced in modeling the gating network of normalized Gaussian (NG) networks as the assumption of multivariate Gaussian becomes invalid. In this paper, we propose an independence model to handle mixed feature data within the framework of NG networks. The method is illustrated using a real example of breast cancer data.

Data mining and simulation: a grey relationship demonstration

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fuzzy data has grown to be an important factor in data mining. Whenever uncertainty exists, simulation can be used as a model. Simulation is very flexible, although it can involve significant levels of computation. This article discusses fuzzy decision-making using the grey related analysis method. Fuzzy models are expected to better reflect decision-making uncertainty, at some cost in accuracy relative to crisp models. Monte Carlo simulation is used to incorporate experimental levels of uncertainty into the data and to measure the impact of fuzzy decision tree models using categorical data. Results are compared with decision tree models based on crisp continuous data.

Verifying data refinements using a model checker

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we consider how refinements between state-based specifications (e.g., written in Z) can be checked by use of a model checker. Specifically, we are interested in the verification of downward and upward simulations which are the standard approach to verifying refinements in state-based notations. We show how downward and upward simulations can be checked using existing temporal logic model checkers. In particular, we show how the branching time temporal logic CTL can be used to encode the standard simulation conditions. We do this for both a blocking, or guarded, interpretation of operations (often used when specifying reactive systems) as well as the more common non-blocking interpretation of operations used in many state-based specification languages (for modelling sequential systems). The approach is general enough to use with any state-based specification language, and we illustrate how refinements between Z specifications can be checked using the SAL CTL model checker using a small example.

A bang-bang PLL employing dynamic gain control for low jitter and fast lock times

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Bang-bang phase detector based PLLs are simple to design, suffer no systematic phase error, and can run at the highest speed a process can make a working flip-flop. For these reasons designers are employing them in the design of very high speed Clock Data Recovery (CDR) architectures. The major drawback of this class of PLL is the inherent jitter due to quantized phase and frequency corrections. Reducing loop gain can proportionally improve jitter performance, but also reduces locking time and pull-in range. This paper presents a novel PLL design that dynamically scales its gain in order to achieve fast lock times while improving fitter performance in lock. Under certain circumstances the design also demonstrates improved capture range. This paper also analyses the behaviour of a bang-bang type PLL when far from lock, and demonstrates that the pull-in range is proportional to the square root of the PLL loop gain.

Materialized view maintenance in data warehouses

Relevância:

100.00% 100.00%

Publicador:

Incident detection on arterials using neural network data fusion of simulated probe vehicle and loop detector data

Relevância:

100.00% 100.00%

Publicador:

Prediction of MHC class II-binding peptides using an evolutionary algorithm and artificial neural network

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Motivation: Prediction methods for identifying binding peptides could minimize the number of peptides required to be synthesized and assayed, and thereby facilitate the identification of potential T-cell epitopes. We developed a bioinformatic method for the prediction of peptide binding to MHC class II molecules. Results: Experimental binding data and expert knowledge of anchor positions and binding motifs were combined with an evolutionary algorithm (EA) and an artificial neural network (ANN): binding data extraction --> peptide alignment --> ANN training and classification. This method, termed PERUN, was implemented for the prediction of peptides that bind to HLA-DR4(B1*0401). The respective positive predictive values of PERUN predictions of high-, moderate-, low- and zero-affinity binder-a were assessed as 0.8, 0.7, 0.5 and 0.8 by cross-validation, and 1.0, 0.8, 0.3 and 0.7 by experimental binding. This illustrates the synergy between experimentation and computer modeling, and its application to the identification of potential immunotheraaeutic peptides.

An exploration of varieties of visual attention: ERP findings

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A set of five tasks was designed to examine dynamic aspects of visual attention: selective attention to color, selective attention to pattern, dividing and switching attention between color and pattern, and selective attention to pattern with changing target. These varieties of visual attention were examined using the same set of stimuli under different instruction sets; thus differences between tasks cannot be attributed to differences in the perceptual features of the stimuli. ERP data are presented for each of these tasks. A within-task analysis of different stimulus types varying in similarity to the attended target feature revealed that an early frontal selection positivity (FSP) was evident in selective attention tasks, regardless of whether color was the attended feature. The scalp distribution of a later posterior selection negativity (SN) was affected by whether the attended feature was color or pattern. The SN was largely unaffected by dividing attention across color and pattern. A large widespread positivity was evident in most conditions, consisting of at least three subcomponents which were differentially affected by the attention conditions. These findings are discussed in relation to prior research and the time course of visual attention processes in the brain. (C) 1999 Elsevier Science B.V. All rights reserved.

CXTANNEAL: an improved program for estimating solute transport parameters

Relevância:

100.00% 100.00%

Publicador:

Resumo:

CXTANNEAL is a program for analysing contaminant transport in soils. The code, written in Fortran 77, is a modified version of CXTFIT, a commonly used package for estimating solute transport parameters in soils. The improvement of the present code is that it includes simulated annealing as the optimization technique for curve fitting. Tests with hypothetical data show that CXTANNEAL performs better than the original code in searching for optimal parameter estimates. To reduce the computational time, a parallel version of CXTANNEAL (CXTANNEAL_P) was also developed. (C) 1999 Elsevier Science Ltd. All rights reserved.

«
1
2
3
4
»