836 resultados para Data fusion applications
Resumo:
Background: Feature selection is a pattern recognition approach to choose important variables according to some criteria in order to distinguish or explain certain phenomena (i.e., for dimensionality reduction). There are many genomic and proteomic applications that rely on feature selection to answer questions such as selecting signature genes which are informative about some biological state, e. g., normal tissues and several types of cancer; or inferring a prediction network among elements such as genes, proteins and external stimuli. In these applications, a recurrent problem is the lack of samples to perform an adequate estimate of the joint probabilities between element states. A myriad of feature selection algorithms and criterion functions have been proposed, although it is difficult to point the best solution for each application. Results: The intent of this work is to provide an open-source multiplataform graphical environment for bioinformatics problems, which supports many feature selection algorithms, criterion functions and graphic visualization tools such as scatterplots, parallel coordinates and graphs. A feature selection approach for growing genetic networks from seed genes ( targets or predictors) is also implemented in the system. Conclusion: The proposed feature selection environment allows data analysis using several algorithms, criterion functions and graphic visualization tools. Our experiments have shown the software effectiveness in two distinct types of biological problems. Besides, the environment can be used in different pattern recognition applications, although the main concern regards bioinformatics tasks.
Resumo:
This paper proposes a novel computer vision approach that processes video sequences of people walking and then recognises those people by their gait. Human motion carries different information that can be analysed in various ways. The skeleton carries motion information about human joints, and the silhouette carries information about boundary motion of the human body. Moreover, binary and gray-level images contain different information about human movements. This work proposes to recover these different kinds of information to interpret the global motion of the human body based on four different segmented image models, using a fusion model to improve classification. Our proposed method considers the set of the segmented frames of each individual as a distinct class and each frame as an object of this class. The methodology applies background extraction using the Gaussian Mixture Model (GMM), a scale reduction based on the Wavelet Transform (WT) and feature extraction by Principal Component Analysis (PCA). We propose four new schemas for motion information capture: the Silhouette-Gray-Wavelet model (SGW) captures motion based on grey level variations; the Silhouette-Binary-Wavelet model (SBW) captures motion based on binary information; the Silhouette-Edge-Binary model (SEW) captures motion based on edge information and the Silhouette Skeleton Wavelet model (SSW) captures motion based on skeleton movement. The classification rates obtained separately from these four different models are then merged using a new proposed fusion technique. The results suggest excellent performance in terms of recognising people by their gait.
Resumo:
Thermodynamic properties of bread dough (fusion enthalpy, apparent specific heat, initial freezing point and unfreezable water) were measured at temperatures from -40 degrees C to 35 degrees C using differential scanning calorimetry. The initial freezing point was also calculated based on the water activity of dough. The apparent specific heat varied as a function of temperature: specific heat in the freezing region varied from (1.7-23.1) J g(-1) degrees C(-1), and was constant at temperatures above freezing (2.7 J g(-1) degrees C(-1)). Unfreezable water content varied from (0.174-0.182) g/g of total product. Values of heat capacity as a function of temperature were correlated using thermodynamic models. A modification for low-moisture foodstuffs (such as bread dough) was successfully applied to the experimental data. (C) 2010 Elsevier Ltd. All rights reserved.
Resumo:
For the first time, we introduce and study some mathematical properties of the Kumaraswamy Weibull distribution that is a quite flexible model in analyzing positive data. It contains as special sub-models the exponentiated Weibull, exponentiated Rayleigh, exponentiated exponential, Weibull and also the new Kumaraswamy exponential distribution. We provide explicit expressions for the moments and moment generating function. We examine the asymptotic distributions of the extreme values. Explicit expressions are derived for the mean deviations, Bonferroni and Lorenz curves, reliability and Renyi entropy. The moments of the order statistics are calculated. We also discuss the estimation of the parameters by maximum likelihood. We obtain the expected information matrix. We provide applications involving two real data sets on failure times. Finally, some multivariate generalizations of the Kumaraswamy Weibull distribution are discussed. (C) 2010 The Franklin Institute. Published by Elsevier Ltd. All rights reserved.
Resumo:
This paper proposes a regression model considering the modified Weibull distribution. This distribution can be used to model bathtub-shaped failure rate functions. Assuming censored data, we consider maximum likelihood and Jackknife estimators for the parameters of the model. We derive the appropriate matrices for assessing local influence on the parameter estimates under different perturbation schemes and we also present some ways to perform global influence. Besides, for different parameter settings, sample sizes and censoring percentages, various simulations are performed and the empirical distribution of the modified deviance residual is displayed and compared with the standard normal distribution. These studies suggest that the residual analysis usually performed in normal linear regression models can be straightforwardly extended for a martingale-type residual in log-modified Weibull regression models with censored data. Finally, we analyze a real data set under log-modified Weibull regression models. A diagnostic analysis and a model checking based on the modified deviance residual are performed to select appropriate models. (c) 2008 Elsevier B.V. All rights reserved.
Resumo:
In this study, regression models are evaluated for grouped survival data when the effect of censoring time is considered in the model and the regression structure is modeled through four link functions. The methodology for grouped survival data is based on life tables, and the times are grouped in k intervals so that ties are eliminated. Thus, the data modeling is performed by considering the discrete models of lifetime regression. The model parameters are estimated by using the maximum likelihood and jackknife methods. To detect influential observations in the proposed models, diagnostic measures based on case deletion, which are denominated global influence, and influence measures based on small perturbations in the data or in the model, referred to as local influence, are used. In addition to those measures, the local influence and the total influential estimate are also employed. Various simulation studies are performed and compared to the performance of the four link functions of the regression models for grouped survival data for different parameter settings, sample sizes and numbers of intervals. Finally, a data set is analyzed by using the proposed regression models. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
A four-parameter extension of the generalized gamma distribution capable of modelling a bathtub-shaped hazard rate function is defined and studied. The beauty and importance of this distribution lies in its ability to model monotone and non-monotone failure rate functions, which are quite common in lifetime data analysis and reliability. The new distribution has a number of well-known lifetime special sub-models, such as the exponentiated Weibull, exponentiated generalized half-normal, exponentiated gamma and generalized Rayleigh, among others. We derive two infinite sum representations for its moments. We calculate the density of the order statistics and two expansions for their moments. The method of maximum likelihood is used for estimating the model parameters and the observed information matrix is obtained. Finally, a real data set from the medical area is analysed.
Resumo:
Two major factors are likely to impact the utilisation of remotely sensed data in the near future: (1)an increase in the number and availability of commercial and non-commercial image data sets with a range of spatial, spectral and temporal dimensions, and (2) increased access to image display and analysis software through GIS. A framework was developed to provide an objective approach to selecting remotely sensed data sets for specific environmental monitoring problems. Preliminary applications of the framework have provided successful approaches for monitoring disturbed and restored wetlands in southern California.
Resumo:
Recent structural studies of proteins mediating membrane fusion reveal intriguing similarities between diverse viral and mammalian systems. Particularly striking is the close similarity between the transmembrane envelope glycoproteins from the retrovirus HTLV-1 and the filovirus Ebola. These similarities suggest similar mechanisms of membrane fusion. The model that fits most currently available data suggests fusion activation in viral systems is driven by a symmetrical conformational change triggered by an activation event such as receptor binding or a pH change. The mammalian vesicle fusion mediated by the SNARE protein complex most likely occurs by a similar mechanism but without symmetry constraints.
Resumo:
The new technologies for Knowledge Discovery from Databases (KDD) and data mining promise to bring new insights into a voluminous growing amount of biological data. KDD technology is complementary to laboratory experimentation and helps speed up biological research. This article contains an introduction to KDD, a review of data mining tools, and their biological applications. We discuss the domain concepts related to biological data and databases, as well as current KDD and data mining developments in biology.
Resumo:
In this and a preceding paper, we provide an introduction to the Fujitsu VPP range of vector-parallel supercomputers and to some of the computational chemistry software available for the VPP. Here, we consider the implementation and performance of seven popular chemistry application packages. The codes discussed range from classical molecular dynamics to semiempirical and ab initio quantum chemistry. All have evolved from sequential codes, and have typically been parallelised using a replicated data approach. As such they are well suited to the large-memory/fast-processor architecture of the VPP. For one code, CASTEP, a distributed-memory data-driven parallelisation scheme is presented. (C) 2000 Published by Elsevier Science B.V. All rights reserved.
Resumo:
Spatial data has now been used extensively in the Web environment, providing online customized maps and supporting map-based applications. The full potential of Web-based spatial applications, however, has yet to be achieved due to performance issues related to the large sizes and high complexity of spatial data. In this paper, we introduce a multiresolution approach to spatial data management and query processing such that the database server can choose spatial data at the right resolution level for different Web applications. One highly desirable property of the proposed approach is that the server-side processing cost and network traffic can be reduced when the level of resolution required by applications are low. Another advantage is that our approach pushes complex multiresolution structures and algorithms into the spatial database engine. That is, the developer of spatial Web applications needs not to be concerned with such complexity. This paper explains the basic idea, technical feasibility and applications of multiresolution spatial databases.
Resumo:
Computational models complement laboratory experimentation for efficient identification of MHC-binding peptides and T-cell epitopes. Methods for prediction of MHC-binding peptides include binding motifs, quantitative matrices, artificial neural networks, hidden Markov models, and molecular modelling. Models derived by these methods have been successfully used for prediction of T-cell epitopes in cancer, autoimmunity, infectious disease, and allergy. For maximum benefit, the use of computer models must be treated as experiments analogous to standard laboratory procedures and performed according to strict standards. This requires careful selection of data for model building, and adequate testing and validation. A range of web-based databases and MHC-binding prediction programs are available. Although some available prediction programs for particular MHC alleles have reasonable accuracy, there is no guarantee that all models produce good quality predictions. In this article, we present and discuss a framework for modelling, testing, and applications of computational methods used in predictions of T-cell epitopes. (C) 2004 Elsevier Inc. All rights reserved.
Resumo:
The field of protein crystallography inspires and enthrals, whether it be for the beauty and symmetry of a perfectly formed protein crystal, the unlocked secrets of a novel protein fold, or the precise atomic-level detail yielded from a protein-ligand complex. Since 1958, when the first protein structure was solved, there have been tremendous advances in all aspects of protein crystallography, from protein preparation and crystallisation through to diffraction data measurement and structure refinement. These advances have significantly reduced the time required to solve protein crystal structures, while at the same time substantially improving the quality and resolution of the resulting structures. Moreover, the technological developments have induced researchers to tackle ever more complex systems, including ribosomes and intact membrane-bound proteins, with a reasonable expectation of success. In this review, the steps involved in determining a protein crystal structure are described and the impact of recent methodological advances identified. Protein crystal structures have proved to be extraordinarily useful in medicinal chemistry research, particularly with respect to inhibitor design. The precise interaction between a drug and its receptor can be visualised at the molecular level using protein crystal structures, and this information then used to improve the complementarity and thus increase the potency and selectivity of an inhibitor. The use of protein crystal structures in receptor-based drug design is highlighted by (i) HIV protease, (ii) influenza virus neuraminidase and (iii) prostaglandin H-2-synthetase. These represent, respectively, examples of protein crystal structures that (i) influenced the design of drugs currently approved for use in the treatment of HIV infection, (ii) led to the design of compounds currently in clinical trials for the treatment of influenza infection and (iii) could enable the design of highly specific non-steroidal anti-inflammatory drugs that lack the common side-effects of this drug class.
Wavelet correlation between subjects: A time-scale data driven analysis for brain mapping using fMRI
Resumo:
Functional magnetic resonance imaging (fMRI) based on BOLD signal has been used to indirectly measure the local neural activity induced by cognitive tasks or stimulation. Most fMRI data analysis is carried out using the general linear model (GLM), a statistical approach which predicts the changes in the observed BOLD response based on an expected hemodynamic response function (HRF). In cases when the task is cognitively complex or in cases of diseases, variations in shape and/or delay may reduce the reliability of results. A novel exploratory method using fMRI data, which attempts to discriminate between neurophysiological signals induced by the stimulation protocol from artifacts or other confounding factors, is introduced in this paper. This new method is based on the fusion between correlation analysis and the discrete wavelet transform, to identify similarities in the time course of the BOLD signal in a group of volunteers. We illustrate the usefulness of this approach by analyzing fMRI data from normal subjects presented with standardized human face pictures expressing different degrees of sadness. The results show that the proposed wavelet correlation analysis has greater statistical power than conventional GLM or time domain intersubject correlation analysis. (C) 2010 Elsevier B.V. All rights reserved.