937 resultados para Markov chains, uniformization, inexact methods, relaxed matrix-vector
Resumo:
This dissertation established a state-of-the-art programming tool for designing and training artificial neural networks (ANNs) and showed its applicability to brain research. The developed tool, called NeuralStudio, allows users without programming skills to conduct studies based on ANNs in a powerful and very user friendly interface. A series of unique features has been implemented in NeuralStudio, such as ROC analysis, cross-validation, network averaging, topology optimization, and optimization of the activation function’s slopes. It also included a Support Vector Machines module for comparison purposes. Once the tool was fully developed, it was applied to two studies in brain research. In the first study, the goal was to create and train an ANN to detect epileptic seizures from subdural EEG. This analysis involved extracting features from the spectral power in the gamma frequencies. In the second application, a unique method was devised to link EEG recordings to epileptic and non-epileptic subjects. The contribution of this method consisted of developing a descriptor matrix that can be used to represent any EEG file regarding its duration and the number of electrodes. The first study showed that the inter-electrode mean of the spectral power in the gamma frequencies and its duration above a specific threshold performs better than the other frequencies in seizure detection, exhibiting an accuracy of 95.90%, a sensitivity of 92.59%, and a specificity of 96.84%. The second study yielded that Hjorth’s parameter activity is sufficient to accurately relate EEG to epileptic and non-epileptic subjects. After testing, accuracy, sensitivity and specificity of the classifier were all above 0.9667. Statistical tests measured the superiority of activity at over 99.99 % certainty. It was demonstrated that 1) the spectral power in the gamma frequencies is highly effective in locating seizures from EEG and 2) activity can be used to link EEG recordings to epileptic and non-epileptic subjects. These two studies required high computational load and could be addressed thanks to NeuralStudio. From a medical perspective, both methods proved the merits of NeuralStudio in brain research applications. For its outstanding features, NeuralStudio has been recently awarded a patent (US patent No. 7502763).
Resumo:
Service supply chain (SSC) has attracted more and more attention from academia and industry. Although there exists extensive product-based supply chain management models and methods, they are not applicable to the SSC as the differences between service and product. Besides, the existing supply chain management models and methods possess some common deficiencies. Because of the above reasons, this paper develops a novel value-oriented model for the management of SSC using the modeling methods of E3-value and Use Case Maps (UCMs). This model can not only resolve the problems of applicability and effectiveness of the existing supply chain management models and methods, but also answer the questions of ‘why the management model is this?’ and ‘how to quantify the potential profitability of the supply chains?’. Meanwhile, the service business processes of SSC system can be established using its logic procedure. In addition, the model can also determine the value and benefits distribution of the entire service value chain and optimize the operations management performance of the service supply.
Resumo:
Background - Clostridium difficile is a bacterial healthcare-associated infection that may be transferred by houseflies (Musca domestica) due to their close ecological association with humans and cosmopolitan nature. Aim - To determine the ability of M. domestica to transfer C. difficile both mechanically and following ingestion. Methods - M. domestica were exposed to independent suspensions of vegetative cells and spores of C. difficile, then sampled on to selective agar plates immediately postexposure and at 1-h intervals to assess the mechanical transfer of C. difficile. Fly excreta was cultured and alimentary canals were dissected to determine internalization of cells and spores. Findings - M. domestica exposed to vegetative cell suspensions and spore suspensions of C. difficile were able to transfer the bacteria mechanically for up to 4 h upon subsequent contact with surfaces. The greatest numbers of colony-forming units (CFUs) per fly were transferred immediately following exposure (mean CFUs 123.8 +/− 66.9 for vegetative cell suspension and 288.2 +/− 83.2 for spore suspension). After 1 h, this had reduced (21.2 +/− 11.4 for vegetative cell suspension and 19.9 +/− 9 for spores). Mean C. difficile CFUs isolated from the M. domestica alimentary canal was 35 +/− 6.5, and mean C. difficile CFUs per faecal spot was 1.04 +/− 0.58. C. difficile could be recovered from fly excreta for up to 96 h. Conclusion - This study describes the potential for M. domestica to contribute to environmental persistence and spread of C. difficile in hospitals, highlighting flies as realistic vectors of this micro-organism in clinical areas.
Resumo:
Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.
Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.
Resumo:
A certain type of bacterial inclusion, known as a bacterial microcompartment, was recently identified and imaged through cryo-electron tomography. A reconstructed 3D object from single-axis limited angle tilt-series cryo-electron tomography contains missing regions and this problem is known as the missing wedge problem. Due to missing regions on the reconstructed images, analyzing their 3D structures is a challenging problem. The existing methods overcome this problem by aligning and averaging several similar shaped objects. These schemes work well if the objects are symmetric and several objects with almost similar shapes and sizes are available. Since the bacterial inclusions studied here are not symmetric, are deformed, and show a wide range of shapes and sizes, the existing approaches are not appropriate. This research develops new statistical methods for analyzing geometric properties, such as volume, symmetry, aspect ratio, polyhedral structures etc., of these bacterial inclusions in presence of missing data. These methods work with deformed and non-symmetric varied shaped objects and do not necessitate multiple objects for handling the missing wedge problem. The developed methods and contributions include: (a) an improved method for manual image segmentation, (b) a new approach to 'complete' the segmented and reconstructed incomplete 3D images, (c) a polyhedral structural distance model to predict the polyhedral shapes of these microstructures, (d) a new shape descriptor for polyhedral shapes, named as polyhedron profile statistic, and (e) the Bayes classifier, linear discriminant analysis and support vector machine based classifiers for supervised incomplete polyhedral shape classification. Finally, the predicted 3D shapes for these bacterial microstructures belong to the Johnson solids family, and these shapes along with their other geometric properties are important for better understanding of their chemical and biological characteristics.
Resumo:
Hypertrophic cardiomyopathy (HCM) is a cardiovascular disease where the heart muscle is partially thickened and blood flow is - potentially fatally - obstructed. It is one of the leading causes of sudden cardiac death in young people. Electrocardiography (ECG) and Echocardiography (Echo) are the standard tests for identifying HCM and other cardiac abnormalities. The American Heart Association has recommended using a pre-participation questionnaire for young athletes instead of ECG or Echo tests due to considerations of cost and time involved in interpreting the results of these tests by an expert cardiologist. Initially we set out to develop a classifier for automated prediction of young athletes’ heart conditions based on the answers to the questionnaire. Classification results and further in-depth analysis using computational and statistical methods indicated significant shortcomings of the questionnaire in predicting cardiac abnormalities. Automated methods for analyzing ECG signals can help reduce cost and save time in the pre-participation screening process by detecting HCM and other cardiac abnormalities. Therefore, the main goal of this dissertation work is to identify HCM through computational analysis of 12-lead ECG. ECG signals recorded on one or two leads have been analyzed in the past for classifying individual heartbeats into different types of arrhythmia as annotated primarily in the MIT-BIH database. In contrast, we classify complete sequences of 12-lead ECGs to assign patients into two groups: HCM vs. non-HCM. The challenges and issues we address include missing ECG waves in one or more leads and the dimensionality of a large feature-set. We address these by proposing imputation and feature-selection methods. We develop heartbeat-classifiers by employing Random Forests and Support Vector Machines, and propose a method to classify full 12-lead ECGs based on the proportion of heartbeats classified as HCM. The results from our experiments show that the classifiers developed using our methods perform well in identifying HCM. Thus the two contributions of this thesis are the utilization of computational and statistical methods for discovering shortcomings in a current screening procedure and the development of methods to identify HCM through computational analysis of 12-lead ECG signals.
Resumo:
Spectral unmixing (SU) is a technique to characterize mixed pixels of the hyperspectral images measured by remote sensors. Most of the existing spectral unmixing algorithms are developed using the linear mixing models. Since the number of endmembers/materials present at each mixed pixel is normally scanty compared with the number of total endmembers (the dimension of spectral library), the problem becomes sparse. This thesis introduces sparse hyperspectral unmixing methods for the linear mixing model through two different scenarios. In the first scenario, the library of spectral signatures is assumed to be known and the main problem is to find the minimum number of endmembers under a reasonable small approximation error. Mathematically, the corresponding problem is called the $\ell_0$-norm problem which is NP-hard problem. Our main study for the first part of thesis is to find more accurate and reliable approximations of $\ell_0$-norm term and propose sparse unmixing methods via such approximations. The resulting methods are shown considerable improvements to reconstruct the fractional abundances of endmembers in comparison with state-of-the-art methods such as having lower reconstruction errors. In the second part of the thesis, the first scenario (i.e., dictionary-aided semiblind unmixing scheme) will be generalized as the blind unmixing scenario that the library of spectral signatures is also estimated. We apply the nonnegative matrix factorization (NMF) method for proposing new unmixing methods due to its noticeable supports such as considering the nonnegativity constraints of two decomposed matrices. Furthermore, we introduce new cost functions through some statistical and physical features of spectral signatures of materials (SSoM) and hyperspectral pixels such as the collaborative property of hyperspectral pixels and the mathematical representation of the concentrated energy of SSoM for the first few subbands. Finally, we introduce sparse unmixing methods for the blind scenario and evaluate the efficiency of the proposed methods via simulations over synthetic and real hyperspectral data sets. The results illustrate considerable enhancements to estimate the spectral library of materials and their fractional abundances such as smaller values of spectral angle distance (SAD) and abundance angle distance (AAD) as well.
Resumo:
Major food adulteration and contamination events occur with alarming regularity and are known to be episodic, with the question being not if but when another large-scale food safety/integrity incident will occur. Indeed, the challenges of maintaining food security are now internationally recognised. The ever increasing scale and complexity of food supply networks can lead to them becoming significantly more vulnerable to fraud and contamination, and potentially dysfunctional. This can make the task of deciding which analytical methods are more suitable to collect and analyse (bio)chemical data within complex food supply chains, at targeted points of vulnerability, that much more challenging. It is evident that those working within and associated with the food industry are seeking rapid, user-friendly methods to detect food fraud and contamination, and rapid/high-throughput screening methods for the analysis of food in general. In addition to being robust and reproducible, these methods should be portable and ideally handheld and/or remote sensor devices, that can be taken to or be positioned on/at-line at points of vulnerability along complex food supply networks and require a minimum amount of background training to acquire information rich data rapidly (ergo point-and-shoot). Here we briefly discuss a range of spectrometry and spectroscopy based approaches, many of which are commercially available, as well as other methods currently under development. We discuss a future perspective of how this range of detection methods in the growing sensor portfolio, along with developments in computational and information sciences such as predictive computing and the Internet of Things, will together form systems- and technology-based approaches that significantly reduce the areas of vulnerability to food crime within food supply chains. As food fraud is a problem of systems and therefore requires systems level solutions and thinking.
Resumo:
In this work we explore the validity of employing a modified version of the nonrelativistic structure code civ3 for heavy, highly charged systems, using Na-like tungsten as a simple benchmark. Consequently, we present radiative and subsequent collisional atomic data compared with corresponding results from a fully relativistic structure and collisional model. Our motivation for this line of study is to benchmark civ3 against the relativistic grasp0 structure code. This is an important study as civ3 wave functions in nonrelativistic R-matrix calculations are computationally less expensive than their Dirac counterparts. There are very few existing data for the W LXIV ion in the literature with which we can compare except for an incomplete set of energy levels available from the NIST database. The overall accuracy of the present results is thus determined by the comparison between the civ3 and grasp0 structure codes alongside collisional atomic data computed by the R-matrix Breit-Pauli and Dirac codes. It is found that the electron-impact collision strengths and effective collision strengths computed by these differing methods are in good general agreement for the majority of the transitions considered, across a broad range of electron temperatures.
Resumo:
La spectrométrie de masse mesure la masse des ions selon leur rapport masse sur charge. Cette technique est employée dans plusieurs domaines et peut analyser des mélanges complexes. L’imagerie par spectrométrie de masse (Imaging Mass Spectrometry en anglais, IMS), une branche de la spectrométrie de masse, permet l’analyse des ions sur une surface, tout en conservant l’organisation spatiale des ions détectés. Jusqu’à présent, les échantillons les plus étudiés en IMS sont des sections tissulaires végétales ou animales. Parmi les molécules couramment analysées par l’IMS, les lipides ont suscité beaucoup d'intérêt. Les lipides sont impliqués dans les maladies et le fonctionnement normal des cellules; ils forment la membrane cellulaire et ont plusieurs rôles, comme celui de réguler des événements cellulaires. Considérant l’implication des lipides dans la biologie et la capacité du MALDI IMS à les analyser, nous avons développé des stratégies analytiques pour la manipulation des échantillons et l’analyse de larges ensembles de données lipidiques. La dégradation des lipides est très importante dans l’industrie alimentaire. De la même façon, les lipides des sections tissulaires risquent de se dégrader. Leurs produits de dégradation peuvent donc introduire des artefacts dans l’analyse IMS ainsi que la perte d’espèces lipidiques pouvant nuire à la précision des mesures d’abondance. Puisque les lipides oxydés sont aussi des médiateurs importants dans le développement de plusieurs maladies, leur réelle préservation devient donc critique. Dans les études multi-institutionnelles où les échantillons sont souvent transportés d’un emplacement à l’autre, des protocoles adaptés et validés, et des mesures de dégradation sont nécessaires. Nos principaux résultats sont les suivants : un accroissement en fonction du temps des phospholipides oxydés et des lysophospholipides dans des conditions ambiantes, une diminution de la présence des lipides ayant des acides gras insaturés et un effet inhibitoire sur ses phénomènes de la conservation des sections au froid sous N2. A température et atmosphère ambiantes, les phospholipides sont oxydés sur une échelle de temps typique d’une préparation IMS normale (~30 minutes). Les phospholipides sont aussi décomposés en lysophospholipides sur une échelle de temps de plusieurs jours. La validation d’une méthode de manipulation d’échantillon est d’autant plus importante lorsqu’il s’agit d’analyser un plus grand nombre d’échantillons. L’athérosclérose est une maladie cardiovasculaire induite par l’accumulation de matériel cellulaire sur la paroi artérielle. Puisque l’athérosclérose est un phénomène en trois dimension (3D), l'IMS 3D en série devient donc utile, d'une part, car elle a la capacité à localiser les molécules sur la longueur totale d’une plaque athéromateuse et, d'autre part, car elle peut identifier des mécanismes moléculaires du développement ou de la rupture des plaques. l'IMS 3D en série fait face à certains défis spécifiques, dont beaucoup se rapportent simplement à la reconstruction en 3D et à l’interprétation de la reconstruction moléculaire en temps réel. En tenant compte de ces objectifs et en utilisant l’IMS des lipides pour l’étude des plaques d’athérosclérose d’une carotide humaine et d’un modèle murin d’athérosclérose, nous avons élaboré des méthodes «open-source» pour la reconstruction des données de l’IMS en 3D. Notre méthodologie fournit un moyen d’obtenir des visualisations de haute qualité et démontre une stratégie pour l’interprétation rapide des données de l’IMS 3D par la segmentation multivariée. L’analyse d’aortes d’un modèle murin a été le point de départ pour le développement des méthodes car ce sont des échantillons mieux contrôlés. En corrélant les données acquises en mode d’ionisation positive et négative, l’IMS en 3D a permis de démontrer une accumulation des phospholipides dans les sinus aortiques. De plus, l’IMS par AgLDI a mis en évidence une localisation différentielle des acides gras libres, du cholestérol, des esters du cholestérol et des triglycérides. La segmentation multivariée des signaux lipidiques suite à l’analyse par IMS d’une carotide humaine démontre une histologie moléculaire corrélée avec le degré de sténose de l’artère. Ces recherches aident à mieux comprendre la complexité biologique de l’athérosclérose et peuvent possiblement prédire le développement de certains cas cliniques. La métastase au foie du cancer colorectal (Colorectal cancer liver metastasis en anglais, CRCLM) est la maladie métastatique du cancer colorectal primaire, un des cancers le plus fréquent au monde. L’évaluation et le pronostic des tumeurs CRCLM sont effectués avec l’histopathologie avec une marge d’erreur. Nous avons utilisé l’IMS des lipides pour identifier les compartiments histologiques du CRCLM et extraire leurs signatures lipidiques. En exploitant ces signatures moléculaires, nous avons pu déterminer un score histopathologique quantitatif et objectif et qui corrèle avec le pronostic. De plus, par la dissection des signatures lipidiques, nous avons identifié des espèces lipidiques individuelles qui sont discriminants des différentes histologies du CRCLM et qui peuvent potentiellement être utilisées comme des biomarqueurs pour la détermination de la réponse à la thérapie. Plus spécifiquement, nous avons trouvé une série de plasmalogènes et sphingolipides qui permettent de distinguer deux différents types de nécrose (infarct-like necrosis et usual necrosis en anglais, ILN et UN, respectivement). L’ILN est associé avec la réponse aux traitements chimiothérapiques, alors que l’UN est associé au fonctionnement normal de la tumeur.
Resumo:
Abstract not available
Resumo:
Ubiquitylation or covalent attachment of ubiquitin (Ub) to a variety of substrate proteins in cells is a versatile post-translational modification involved in the regulation of numerous cellular processes. The distinct messages that polyubiquitylation encodes are attributed to the multitude of conformations possible through attachment of ubiquitin monomers within a polyubiquitin chain via a specific lysine residue. Thus the hypothesis is that linkage defines polyubiquitin conformation which in turn determines specific recognition by cellular receptors. Ubiquitylation of membrane surface receptor proteins plays a very important role in regulating receptor-mediated endocytosis as well as endosomal sorting for lysosomal degradation. Epsin1 is an endocytic adaptor protein with three tandem UIMs (Ubiquitin Interacting Motifs) which are responsible for the highly specific interaction between epsin and ubiquitylated receptors. Epsin1 is also an oncogenic protein and its expression is upregulated in some types of cancer. Recently it has been shown that novel K11 and K63 mixed-linkage polyubiquitin chains serve as internalization signal for MHC I (Major Histocompatibility Complex I) molecule through their association with the tUIMs of epsin1. However the molecular mode of action and structural details of the interaction between polyubiquitin chains on receptors and tUIMs of epsin1 is yet to be determined. This information is crucial for the development of anticancer therapeutics targeting epsin1. The molecular basis for the linkage-specific recognition of K11 and K63 mixed-linkage polyubiquitin chains by the tandem UIMs of the endocytic adaptor protein epsin1 is investigated using a combination of NMR methods.
Resumo:
One challenge on data assimilation (DA) methods is how the error covariance for the model state is computed. Ensemble methods have been proposed for producing error covariance estimates, as error is propagated in time using the non-linear model. Variational methods, on the other hand, use the concepts of control theory, whereby the state estimate is optimized from both the background and the measurements. Numerical optimization schemes are applied which solve the problem of memory storage and huge matrix inversion needed by classical Kalman filter methods. Variational Ensemble Kalman filter (VEnKF), as a method inspired the Variational Kalman Filter (VKF), enjoys the benefits from both ensemble methods and variational methods. It avoids filter inbreeding problems which emerge when the ensemble spread underestimates the true error covariance. In VEnKF this is tackled by resampling the ensemble every time measurements are available. One advantage of VEnKF over VKF is that it needs neither tangent linear code nor adjoint code. In this thesis, VEnKF has been applied to a two-dimensional shallow water model simulating a dam-break experiment. The model is a public code with water height measurements recorded in seven stations along the 21:2 m long 1:4 m wide flume’s mid-line. Because the data were too sparse to assimilate the 30 171 model state vector, we chose to interpolate the data both in time and in space. The results of the assimilation were compared with that of a pure simulation. We have found that the results revealed by the VEnKF were more realistic, without numerical artifacts present in the pure simulation. Creating a wrapper code for a model and DA scheme might be challenging, especially when the two were designed independently or are poorly documented. In this thesis we have presented a non-intrusive approach of coupling the model and a DA scheme. An external program is used to send and receive information between the model and DA procedure using files. The advantage of this method is that the model code changes needed are minimal, only a few lines which facilitate input and output. Apart from being simple to coupling, the approach can be employed even if the two were written in different programming languages, because the communication is not through code. The non-intrusive approach is made to accommodate parallel computing by just telling the control program to wait until all the processes have ended before the DA procedure is invoked. It is worth mentioning the overhead increase caused by the approach, as at every assimilation cycle both the model and the DA procedure have to be initialized. Nonetheless, the method can be an ideal approach for a benchmark platform in testing DA methods. The non-intrusive VEnKF has been applied to a multi-purpose hydrodynamic model COHERENS to assimilate Total Suspended Matter (TSM) in lake Säkylän Pyhäjärvi. The lake has an area of 154 km2 with an average depth of 5:4 m. Turbidity and chlorophyll-a concentrations from MERIS satellite images for 7 days between May 16 and July 6 2009 were available. The effect of the organic matter has been computationally eliminated to obtain TSM data. Because of computational demands from both COHERENS and VEnKF, we have chosen to use 1 km grid resolution. The results of the VEnKF have been compared with the measurements recorded at an automatic station located at the North-Western part of the lake. However, due to TSM data sparsity in both time and space, it could not be well matched. The use of multiple automatic stations with real time data is important to elude the time sparsity problem. With DA, this will help in better understanding the environmental hazard variables for instance. We have found that using a very high ensemble size does not necessarily improve the results, because there is a limit whereby additional ensemble members add very little to the performance. Successful implementation of the non-intrusive VEnKF and the ensemble size limit for performance leads to an emerging area of Reduced Order Modeling (ROM). To save computational resources, running full-blown model in ROM is avoided. When the ROM is applied with the non-intrusive DA approach, it might result in a cheaper algorithm that will relax computation challenges existing in the field of modelling and DA.
Resumo:
The recent advent of new technologies has led to huge amounts of genomic data. With these data come new opportunities to understand biological cellular processes underlying hidden regulation mechanisms and to identify disease related biomarkers for informative diagnostics. However, extracting biological insights from the immense amounts of genomic data is a challenging task. Therefore, effective and efficient computational techniques are needed to analyze and interpret genomic data. In this thesis, novel computational methods are proposed to address such challenges: a Bayesian mixture model, an extended Bayesian mixture model, and an Eigen-brain approach. The Bayesian mixture framework involves integration of the Bayesian network and the Gaussian mixture model. Based on the proposed framework and its conjunction with K-means clustering and principal component analysis (PCA), biological insights are derived such as context specific/dependent relationships and nested structures within microarray where biological replicates are encapsulated. The Bayesian mixture framework is then extended to explore posterior distributions of network space by incorporating a Markov chain Monte Carlo (MCMC) model. The extended Bayesian mixture model summarizes the sampled network structures by extracting biologically meaningful features. Finally, an Eigen-brain approach is proposed to analyze in situ hybridization data for the identification of the cell-type specific genes, which can be useful for informative blood diagnostics. Computational results with region-based clustering reveals the critical evidence for the consistency with brain anatomical structure.