924 resultados para Computational methods
Resumo:
Nonlinear analysis tools for studying and characterizing the dynamics of physiological signals have gained popularity, mainly because tracking sudden alterations of the inherent complexity of biological processes might be an indicator of altered physiological states. Typically, in order to perform an analysis with such tools, the physiological variables that describe the biological process under study are used to reconstruct the underlying dynamics of the biological processes. For that goal, a procedure called time-delay or uniform embedding is usually employed. Nonetheless, there is evidence of its inability for dealing with non-stationary signals, as those recorded from many physiological processes. To handle with such a drawback, this paper evaluates the utility of non-conventional time series reconstruction procedures based on non uniform embedding, applying them to automatic pattern recognition tasks. The paper compares a state of the art non uniform approach with a novel scheme which fuses embedding and feature selection at once, searching for better reconstructions of the dynamics of the system. Moreover, results are also compared with two classic uniform embedding techniques. Thus, the goal is comparing uniform and non uniform reconstruction techniques, including the one proposed in this work, for pattern recognition in biomedical signal processing tasks. Once the state space is reconstructed, the scheme followed characterizes with three classic nonlinear dynamic features (Largest Lyapunov Exponent, Correlation Dimension and Recurrence Period Density Entropy), while classification is carried out by means of a simple k-nn classifier. In order to test its generalization capabilities, the approach was tested with three different physiological databases (Speech Pathologies, Epilepsy and Heart Murmurs). In terms of the accuracy obtained to automatically detect the presence of pathologies, and for the three types of biosignals analyzed, the non uniform techniques used in this work lightly outperformed the results obtained using the uniform methods, suggesting their usefulness to characterize non-stationary biomedical signals in pattern recognition applications. On the other hand, in view of the results obtained and its low computational load, the proposed technique suggests its applicability for the applications under study.
Resumo:
Background: One of the main challenges for biomedical research lies in the computer-assisted integrative study of large and increasingly complex combinations of data in order to understand molecular mechanisms. The preservation of the materials and methods of such computational experiments with clear annotations is essential for understanding an experiment, and this is increasingly recognized in the bioinformatics community. Our assumption is that offering means of digital, structured aggregation and annotation of the objects of an experiment will provide necessary meta-data for a scientist to understand and recreate the results of an experiment. To support this we explored a model for the semantic description of a workflow-centric Research Object (RO), where an RO is defined as a resource that aggregates other resources, e.g., datasets, software, spreadsheets, text, etc. We applied this model to a case study where we analysed human metabolite variation by workflows. Results: We present the application of the workflow-centric RO model for our bioinformatics case study. Three workflows were produced following recently defined Best Practices for workflow design. By modelling the experiment as an RO, we were able to automatically query the experiment and answer questions such as “which particular data was input to a particular workflow to test a particular hypothesis?”, and “which particular conclusions were drawn from a particular workflow?”. Conclusions: Applying a workflow-centric RO model to aggregate and annotate the resources used in a bioinformatics experiment, allowed us to retrieve the conclusions of the experiment in the context of the driving hypothesis, the executed workflows and their input data. The RO model is an extendable reference model that can be used by other systems as well.
Resumo:
Multi-dimensional classification (MDC) is the supervised learning problem where an instance is associated with multiple classes, rather than with a single class, as in traditional classification problems. Since these classes are often strongly correlated, modeling the dependencies between them allows MDC methods to improve their performance – at the expense of an increased computational cost. In this paper we focus on the classifier chains (CC) approach for modeling dependencies, one of the most popular and highest-performing methods for multi-label classification (MLC), a particular case of MDC which involves only binary classes (i.e., labels). The original CC algorithm makes a greedy approximation, and is fast but tends to propagate errors along the chain. Here we present novel Monte Carlo schemes, both for finding a good chain sequence and performing efficient inference. Our algorithms remain tractable for high-dimensional data sets and obtain the best predictive performance across several real data sets.
Resumo:
The operating theatres are the engine of the hospitals; proper management of the operating rooms and its staff represents a great challenge for managers and its results impact directly in the budget of the hospital. This work presents a MILP model for the efficient schedule of multiple surgeries in Operating Rooms (ORs) during a working day. This model considers multiple surgeons and ORs and different types of surgeries. Stochastic strategies are also implemented for taking into account the uncertain in surgery durations (pre-incision, incision, post-incision times). In addition, a heuristic-based methods and a MILP decomposition approach is proposed for solving large-scale ORs scheduling problems in computational efficient way. All these computer-aided strategies has been implemented in AIMMS, as an advanced modeling and optimization software, developing a user friendly solution tool for the operating room management under uncertainty.
Resumo:
In this paper three p-adaptation strategies based on the minimization of the truncation error are presented for high order discontinuous Galerkin methods. The truncation error is approximated by means of a ? -estimation procedure and enables the identification of mesh regions that require adaptation. Three adaptation strategies are developed and termed a posteriori, quasi-a priori and quasi-a priori corrected. All strategies require fine solutions, which are obtained by enriching the polynomial order, but while the former needs time converged solutions, the last two rely on non-converged solutions, which lead to faster computations. In addition, the high order method permits the spatial decoupling for the estimated errors and enables anisotropic p-adaptation. These strategies are verified and compared in terms of accuracy and computational cost for the Euler and the compressible Navier?Stokes equations. It is shown that the two quasi- a priori methods achieve a significant reduction in computational cost when compared to a uniform polynomial enrichment. Namely, for a viscous boundary layer flow, we obtain a speedup of 6.6 and 7.6 for the quasi-a priori and quasi-a priori corrected approaches, respectively.
Resumo:
In this work a p-adaptation (modification of the polynomial order) strategy based on the minimization of the truncation error is developed for high order discontinuous Galerkin methods. The truncation error is approximated by means of a truncation error estimation procedure and enables the identification of mesh regions that require adaptation. Three truncation error estimation approaches are developed and termed a posteriori, quasi-a priori and quasi-a priori corrected. Fine solutions, which are obtained by enriching the polynomial order, are required to solve the numerical problem with adequate accuracy. For the three truncation error estimation methods the former needs time converged solutions, while the last two rely on non-converged solutions, which lead to faster computations. Based on these truncation error estimation methods, algorithms for mesh adaptation were designed and tested. Firstly, an isotropic adaptation approach is presented, which leads to equally distributed polynomial orders in different coordinate directions. This first implementation is improved by incorporating a method to extrapolate the truncation error. This results in a significant reduction of computational cost. Secondly, the employed high order method permits the spatial decoupling of the estimated errors and enables anisotropic p-adaptation. The incorporation of anisotropic features leads to meshes with different polynomial orders in the different coordinate directions such that flow-features related to the geometry are resolved in a better manner. These adaptations result in a significant reduction of degrees of freedom and computational cost, while the amount of improvement depends on the test-case. Finally, this anisotropic approach is extended by using error extrapolation which leads to an even higher reduction in computational cost. These strategies are verified and compared in terms of accuracy and computational cost for the Euler and the compressible Navier-Stokes equations. The main result is that the two quasi-a priori methods achieve a significant reduction in computational cost when compared to a uniform polynomial enrichment. Namely, for a viscous boundary layer flow, we obtain a speedup of a factor of 6.6 and 7.6 for the quasi-a priori and quasi-a priori corrected approaches, respectively. RESUMEN En este trabajo se ha desarrollado una estrategia de adaptación-p (modificación del orden polinómico) para métodos Galerkin discontinuo de alto orden basada en la minimización del error de truncación. El error de truncación se estima utilizando el método tau-estimation. El estimador permite la identificación de zonas de la malla que requieren adaptación. Se distinguen tres técnicas de estimación: a posteriori, quasi a priori y quasi a priori con correción. Todas las estrategias requieren una solución obtenida en una malla fina, la cual es obtenida aumentando de manera uniforme el orden polinómico. Sin embargo, mientras que el primero requiere que esta solución esté convergida temporalmente, el resto utiliza soluciones no convergidas, lo que se traduce en un menor coste computacional. En este trabajo se han diseñado y probado algoritmos de adaptación de malla basados en métodos tau-estimation. En primer lugar, se presenta un algoritmo de adaptacin isótropo, que conduce a discretizaciones con el mismo orden polinómico en todas las direcciones espaciales. Esta primera implementación se mejora incluyendo un método para extrapolar el error de truncación. Esto resulta en una reducción significativa del coste computacional. En segundo lugar, el método de alto orden permite el desacoplamiento espacial de los errores estimados, permitiendo la adaptación anisotropica. Las mallas obtenidas mediante esta técnica tienen distintos órdenes polinómicos en cada una de las direcciones espaciales. La malla final tiene una distribución óptima de órdenes polinómicos, los cuales guardan relación con las características del flujo que, a su vez, depenen de la geometría. Estas técnicas de adaptación reducen de manera significativa los grados de libertad y el coste computacional. Por último, esta aproximación anisotropica se extiende usando extrapolación del error de truncación, lo que conlleva un coste computational aún menor. Las estrategias se verifican y se comparan en téminors de precisión y coste computacional utilizando las ecuaciones de Euler y Navier Stokes. Los dos métodos quasi a priori consiguen una reducción significativa del coste computacional en comparación con aumento uniforme del orden polinómico. En concreto, para una capa límite viscosa, obtenemos una mejora en tiempo de computación de 6.6 y 7.6 respectivamente, para las aproximaciones quasi-a priori y quasi-a priori con corrección.
Resumo:
We consider quasi-Newton methods for generalized equations in Banach spaces under metric regularity and give a sufficient condition for q-linear convergence. Then we show that the well-known Broyden update satisfies this sufficient condition in Hilbert spaces. We also establish various modes of q-superlinear convergence of the Broyden update under strong metric subregularity, metric regularity and strong metric regularity. In particular, we show that the Broyden update applied to a generalized equation in Hilbert spaces satisfies the Dennis–Moré condition for q-superlinear convergence. Simple numerical examples illustrate the results.
Resumo:
The Iterative Closest Point algorithm (ICP) is commonly used in engineering applications to solve the rigid registration problem of partially overlapped point sets which are pre-aligned with a coarse estimate of their relative positions. This iterative algorithm is applied in many areas such as the medicine for volumetric reconstruction of tomography data, in robotics to reconstruct surfaces or scenes using range sensor information, in industrial systems for quality control of manufactured objects or even in biology to study the structure and folding of proteins. One of the algorithm’s main problems is its high computational complexity (quadratic in the number of points with the non-optimized original variant) in a context where high density point sets, acquired by high resolution scanners, are processed. Many variants have been proposed in the literature whose goal is the performance improvement either by reducing the number of points or the required iterations or even enhancing the complexity of the most expensive phase: the closest neighbor search. In spite of decreasing its complexity, some of the variants tend to have a negative impact on the final registration precision or the convergence domain thus limiting the possible application scenarios. The goal of this work is the improvement of the algorithm’s computational cost so that a wider range of computationally demanding problems from among the ones described before can be addressed. For that purpose, an experimental and mathematical convergence analysis and validation of point-to-point distance metrics has been performed taking into account those distances with lower computational cost than the Euclidean one, which is used as the de facto standard for the algorithm’s implementations in the literature. In that analysis, the functioning of the algorithm in diverse topological spaces, characterized by different metrics, has been studied to check the convergence, efficacy and cost of the method in order to determine the one which offers the best results. Given that the distance calculation represents a significant part of the whole set of computations performed by the algorithm, it is expected that any reduction of that operation affects significantly and positively the overall performance of the method. As a result, a performance improvement has been achieved by the application of those reduced cost metrics whose quality in terms of convergence and error has been analyzed and validated experimentally as comparable with respect to the Euclidean distance using a heterogeneous set of objects, scenarios and initial situations.
Resumo:
In this work, a modified version of the elastic bunch graph matching (EBGM) algorithm for face recognition is introduced. First, faces are detected by using a fuzzy skin detector based on the RGB color space. Then, the fiducial points for the facial graph are extracted automatically by adjusting a grid of points to the result of an edge detector. After that, the position of the nodes, their relation with their neighbors and their Gabor jets are calculated in order to obtain the feature vector defining each face. A self-organizing map (SOM) framework is shown afterwards. Thus, the calculation of the winning neuron and the recognition process are performed by using a similarity function that takes into account both the geometric and texture information of the facial graph. The set of experiments carried out for our SOM-EBGM method shows the accuracy of our proposal when compared with other state-of the-art methods.
Resumo:
This paper gives a review of recent progress in the design of numerical methods for computing the trajectories (sample paths) of solutions to stochastic differential equations. We give a brief survey of the area focusing on a number of application areas where approximations to strong solutions are important, with a particular focus on computational biology applications, and give the necessary analytical tools for understanding some of the important concepts associated with stochastic processes. We present the stochastic Taylor series expansion as the fundamental mechanism for constructing effective numerical methods, give general results that relate local and global order of convergence and mention the Magnus expansion as a mechanism for designing methods that preserve the underlying structure of the problem. We also present various classes of explicit and implicit methods for strong solutions, based on the underlying structure of the problem. Finally, we discuss implementation issues relating to maintaining the Brownian path, efficient simulation of stochastic integrals and variable-step-size implementations based on various types of control.
Resumo:
MULTIPRED is a web-based computational system for the prediction of peptide binding to multiple molecules ( proteins) belonging to human leukocyte antigens (HLA) class I A2, A3 and class II DR supertypes. It uses hidden Markov models and artificial neural network methods as predictive engines. A novel data representation method enables MULTIPRED to predict peptides that promiscuously bind multiple HLA alleles within one HLA supertype. Extensive testing was performed for validation of the prediction models. Testing results show that MULTIPRED is both sensitive and specific and it has good predictive ability ( area under the receiver operating characteristic curve A(ROC) > 0.80). MULTIPRED can be used for the mapping of promiscuous T-cell epitopes as well as the regions of high concentration of these targets termed T-cell epitope hotspots. MULTIPRED is available at http:// antigen.i2r.a-star.edu.sg/ multipred/.
Resumo:
Many variables that are of interest in social science research are nominal variables with two or more categories, such as employment status, occupation, political preference, or self-reported health status. With longitudinal survey data it is possible to analyse the transitions of individuals between different employment states or occupations (for example). In the statistical literature, models for analysing categorical dependent variables with repeated observations belong to the family of models known as generalized linear mixed models (GLMMs). The specific GLMM for a dependent variable with three or more categories is the multinomial logit random effects model. For these models, the marginal distribution of the response does not have a closed form solution and hence numerical integration must be used to obtain maximum likelihood estimates for the model parameters. Techniques for implementing the numerical integration are available but are computationally intensive requiring a large amount of computer processing time that increases with the number of clusters (or individuals) in the data and are not always readily accessible to the practitioner in standard software. For the purposes of analysing categorical response data from a longitudinal social survey, there is clearly a need to evaluate the existing procedures for estimating multinomial logit random effects model in terms of accuracy, efficiency and computing time. The computational time will have significant implications as to the preferred approach by researchers. In this paper we evaluate statistical software procedures that utilise adaptive Gaussian quadrature and MCMC methods, with specific application to modeling employment status of women using a GLMM, over three waves of the HILDA survey.
Resumo:
Biologists are increasingly conscious of the critical role that noise plays in cellular functions such as genetic regulation, often in connection with fluctuations in small numbers of key regulatory molecules. This has inspired the development of models that capture this fundamentally discrete and stochastic nature of cellular biology - most notably the Gillespie stochastic simulation algorithm (SSA). The SSA simulates a temporally homogeneous, discrete-state, continuous-time Markov process, and of course the corresponding probabilities and numbers of each molecular species must all remain positive. While accurately serving this purpose, the SSA can be computationally inefficient due to very small time stepping so faster approximations such as the Poisson and Binomial τ-leap methods have been suggested. This work places these leap methods in the context of numerical methods for the solution of stochastic differential equations (SDEs) driven by Poisson noise. This allows analogues of Euler-Maruyuma, Milstein and even higher order methods to be developed through the Itô-Taylor expansions as well as similar derivative-free Runge-Kutta approaches. Numerical results demonstrate that these novel methods compare favourably with existing techniques for simulating biochemical reactions by more accurately capturing crucial properties such as the mean and variance than existing methods.
Resumo:
Background: Determination of the subcellular location of a protein is essential to understanding its biochemical function. This information can provide insight into the function of hypothetical or novel proteins. These data are difficult to obtain experimentally but have become especially important since many whole genome sequencing projects have been finished and many resulting protein sequences are still lacking detailed functional information. In order to address this paucity of data, many computational prediction methods have been developed. However, these methods have varying levels of accuracy and perform differently based on the sequences that are presented to the underlying algorithm. It is therefore useful to compare these methods and monitor their performance. Results: In order to perform a comprehensive survey of prediction methods, we selected only methods that accepted large batches of protein sequences, were publicly available, and were able to predict localization to at least nine of the major subcellular locations (nucleus, cytosol, mitochondrion, extracellular region, plasma membrane, Golgi apparatus, endoplasmic reticulum (ER), peroxisome, and lysosome). The selected methods were CELLO, MultiLoc, Proteome Analyst, pTarget and WoLF PSORT. These methods were evaluated using 3763 mouse proteins from SwissProt that represent the source of the training sets used in development of the individual methods. In addition, an independent evaluation set of 2145 mouse proteins from LOCATE with a bias towards the subcellular localization underrepresented in SwissProt was used. The sensitivity and specificity were calculated for each method and compared to a theoretical value based on what might be observed by random chance. Conclusion: No individual method had a sufficient level of sensitivity across both evaluation sets that would enable reliable application to hypothetical proteins. All methods showed lower performance on the LOCATE dataset and variable performance on individual subcellular localizations was observed. Proteins localized to the secretory pathway were the most difficult to predict, while nuclear and extracellular proteins were predicted with the highest sensitivity.
Resumo:
A major problem in modern probabilistic modeling is the huge computational complexity involved in typical calculations with multivariate probability distributions when the number of random variables is large. Because exact computations are infeasible in such cases and Monte Carlo sampling techniques may reach their limits, there is a need for methods that allow for efficient approximate computations. One of the simplest approximations is based on the mean field method, which has a long history in statistical physics. The method is widely used, particularly in the growing field of graphical models. Researchers from disciplines such as statistical physics, computer science, and mathematical statistics are studying ways to improve this and related methods and are exploring novel application areas. Leading approaches include the variational approach, which goes beyond factorizable distributions to achieve systematic improvements; the TAP (Thouless-Anderson-Palmer) approach, which incorporates correlations by including effective reaction terms in the mean field theory; and the more general methods of graphical models. Bringing together ideas and techniques from these diverse disciplines, this book covers the theoretical foundations of advanced mean field methods, explores the relation between the different approaches, examines the quality of the approximation obtained, and demonstrates their application to various areas of probabilistic modeling.