912 resultados para sequencing error
Resumo:
Contrast enhancement is an image processing technique where the objective is to preprocess the image so that relevant information can be either seen or further processed more reliably. These techniques are typically applied when the image itself or the device used for image reproduction provides poor visibility and distinguishability of different regions of interest inthe image. In most studies, the emphasis is on the visualization of image data,but this human observer biased goal often results to images which are not optimal for automated processing. The main contribution of this study is to express the contrast enhancement as a mapping from N-channel image data to 1-channel gray-level image, and to devise a projection method which results to an image with minimal error to the correct contrast image. The projection, the minimum-error contrast image, possess the optimal contrast between the regions of interest in the image. The method is based on estimation of the probability density distributions of the region values, and it employs Bayesian inference to establish the minimum error projection.
Resumo:
The market place of the twenty-first century will demand that manufacturing assumes a crucial role in a new competitive field. Two potential resources in the area of manufacturing are advanced manufacturing technology (AMT) and empowered employees. Surveys in Finland have shown the need to invest in the new AMT in the Finnish sheet metal industry in the 1990's. In this run the focus has been on hard technology and less attention is paid to the utilization of human resources. In manymanufacturing companies an appreciable portion of the profit within reach is wasted due to poor quality of planning and workmanship. The production flow production error distribution of the sheet metal part based constructions is inspectedin this thesis. The objective of the thesis is to analyze the origins of production errors in the production flow of sheet metal based constructions. Also the employee empowerment is investigated in theory and the meaning of the employee empowerment in reducing the overall production error amount is discussed in this thesis. This study is most relevant to the sheet metal part fabricating industrywhich produces sheet metal part based constructions for electronics and telecommunication industry. This study concentrates on the manufacturing function of a company and is based on a field study carried out in five Finnish case factories. In each studied case factory the most delicate work phases for production errors were detected. It can be assumed that most of the production errors are caused in manually operated work phases and in mass production work phases. However, no common theme in collected production error data for production error distribution in the production flow can be found. Most important finding was still that most of the production errors in each case factory studied belong to the 'human activity based errors-category'. This result indicates that most of the problemsin the production flow are related to employees or work organization. Development activities must therefore be focused to the development of employee skills orto the development of work organization. Employee empowerment gives the right tools and methods to achieve this.
Resumo:
To permit the tracking of turbulent flow structures in an Eulerian frame from single-point measurements, we make use of a generalization of conventional two-dimensional quadrant analysis to three-dimensional octants. We characterize flow structures using the sequences of these octants and show how significance may be attached to particular sequences using statistical mull models. We analyze an example experiment and show how a particular dominant flow structure can be identified from the conditional probability of octant sequences. The frequency of this structure corresponds to the dominant peak in the velocity spectra and exerts a high proportion of the total shear stress. We link this structure explicitly to the propensity for sediment entrainment and show that greater insight into sediment entrainment can be obtained by disaggregating those octants that occur within the identified macroturbulence structure from those that do not. Hence, this work goes beyond critiques of Reynolds stress approaches to bed load entrainment that highlight the importance of outward interactions, to identifying and prioritizing the quadrants/octants that define particular flow structures. Key Points <list list-type=''bulleted'' id=''jgrf20196-list-0001''> <list-item id=''jgrf20196-li-0001''>A new method for analysing single point velocity data is presented <list-item id=''jgrf20196-li-0002''>Flow structures are identified by a sequence of flow states (termed octants) <list-item id=''jgrf20196-li-0003''>The identified structure exerts high stresses and causes bed-load entrainment
Resumo:
Hydrograph convolution is a product of tributary inputs from across the watershed. The time-space distribution of precipitation, the biophysical processes that control the conversion of precipitation to runoff and channel flow conveyance processes, are heterogeneous and different areas respond to rainfall in different ways. We take a subwatershed approach to this and account for tributary flow magnitude, relative timing, and sequencing. We hypothesize that as the scale of the watershed increases so we may start to see systematic differences in subwatershed hydrological response. We test this hypothesis for a large flood (T >100 years) in a large watershed in northern England. We undertake a sensitivity analysis of the effects of changing subwatershed hydrological response using a hydraulic model. Delaying upstream tributary peak flow timing to make them asynchronous from downstream subwatersheds reduced flood magnitude. However, significant hydrograph adjustment in any one subwatershed was needed for meaningful reductions in stage downstream, although smaller adjustments in multiple tributaries resulted in comparable impacts. For larger hydrograph adjustments, the effect of changing the timing of two tributaries together was lower than the effect of changing each one separately. For smaller adjustments synergy between two subwatersheds meant the effect of changing them together could be greater than the sum of the parts. Thus, this work shows that while the effects of modifying biophysical catchment properties diminishes with scale due to dilution effects, their impact on relative timing of tributaries may, if applied in the right locations, be an important element of flood management.
Resumo:
The past decade has seen the emergence of next-generation sequencing (NGS) technologies, which have revolutionized the field of human molecular genetics. With NGS, significant portions of the human genome can now be assessed by direct sequence analysis, highlighting normal and pathological variants of our DNA. Recent advances have also allowed the sequencing of complete genomes, by a method referred to as whole genome sequencing (WGS). In this work, we review the use of WGS in medical genetics, with specific emphasis on the benefits and the disadvantages of this technique for detecting genomic alterations leading to Mendelian human diseases and to cancer.
Resumo:
Location information is becoming increasingly necessary as every new smartphone incorporates a GPS (Global Positioning System) which allows the development of various applications based on it. However, it is not possible to properly receive the GPS signal in indoor environments. For this reason, new indoor positioning systems are being developed. As indoors is a very challenging scenario, it is necessary to study the precision of the obtained location information in order to determine if these new positioning techniques are suitable for indoor positioning.
Resumo:
Approximate models (proxies) can be employed to reduce the computational costs of estimating uncertainty. The price to pay is that the approximations introduced by the proxy model can lead to a biased estimation. To avoid this problem and ensure a reliable uncertainty quantification, we propose to combine functional data analysis and machine learning to build error models that allow us to obtain an accurate prediction of the exact response without solving the exact model for all realizations. We build the relationship between proxy and exact model on a learning set of geostatistical realizations for which both exact and approximate solvers are run. Functional principal components analysis (FPCA) is used to investigate the variability in the two sets of curves and reduce the dimensionality of the problem while maximizing the retained information. Once obtained, the error model can be used to predict the exact response of any realization on the basis of the sole proxy response. This methodology is purpose-oriented as the error model is constructed directly for the quantity of interest, rather than for the state of the system. Also, the dimensionality reduction performed by FPCA allows a diagnostic of the quality of the error model to assess the informativeness of the learning set and the fidelity of the proxy to the exact model. The possibility of obtaining a prediction of the exact response for any newly generated realization suggests that the methodology can be effectively used beyond the context of uncertainty quantification, in particular for Bayesian inference and optimization.
Resumo:
Next-generation sequencing (NGS) technologies have become the standard for data generation in studies of population genomics, as the 1000 Genomes Project (1000G). However, these techniques are known to be problematic when applied to highly polymorphic genomic regions, such as the human leukocyte antigen (HLA) genes. Because accurate genotype calls and allele frequency estimations are crucial to population genomics analyses, it is important to assess the reliability of NGS data. Here, we evaluate the reliability of genotype calls and allele frequency estimates of the single-nucleotide polymorphisms (SNPs) reported by 1000G (phase I) at five HLA genes (HLA-A, -B, -C, -DRB1, and -DQB1). We take advantage of the availability of HLA Sanger sequencing of 930 of the 1092 1000G samples and use this as a gold standard to benchmark the 1000G data. We document that 18.6% of SNP genotype calls in HLA genes are incorrect and that allele frequencies are estimated with an error greater than ±0.1 at approximately 25% of the SNPs in HLA genes. We found a bias toward overestimation of reference allele frequency for the 1000G data, indicating mapping bias is an important cause of error in frequency estimation in this dataset. We provide a list of sites that have poor allele frequency estimates and discuss the outcomes of including those sites in different kinds of analyses. Because the HLA region is the most polymorphic in the human genome, our results provide insights into the challenges of using of NGS data at other genomic regions of high diversity.
Resumo:
Mitochondrial DNA (mtDNA), a maternally inherited 16.6-Kb molecule crucial for energy production, is implicated in numerous human traits and disorders. It has been hypothesized that the presence of mutations in the mtDNA may contribute to the complex genetic basis of schizophreniadisease, due to the evidence of maternal inheritance and the presence of schizophrenia symptoms in patients affected of a mitochondrial disorder related to a mtDNA mutation. The present project aims to study the association of variants of mitochondrial DNA (mtDNA), and an increased risk of schizophrenia in a cohort of patients and controls from the same population. The entire mtDNA of 55 schizophrenia patients with an apparent maternal transmission of the disease and 38 controls was sequenced by Next Generation Sequencing (Ion Torrent PGM, Life Technologies) and compared to the reference sequence. The current method for establishing mtDNA haplotypes is Sanger sequencing, which is laborious, timeconsuming, and expensive. With the emergence of Next Generation Sequencing technologies, this sequencing process can be much more quickly and cost-efficiently. We have identified 14 variants that have not been previously reported. Two of them were missense variants: MTATP6 p.V113M and MTND5 p.F334L ,and also three variants encoding rRNA and one variant encoding tRNA. Not significant differences have been found in the number of variants between the two groups. We found that the sequence alignment algorithm employed to align NGS reads played a significant role in the analysis of the data and the resulting mtDNA haplotypes. Further development of the bioinformatics analysis and annotation step would be desirable to facilitate the application of NGS in mtDNA analysis.
Resumo:
This thesis studies evaluation of software development practices through an error analysis. The work presents software development process, software testing, software errors, error classification and software process improvement methods. The practical part of the work presents results from the error analysis of one software process. It also gives improvement ideas for the project. It was noticed that the classification of the error data was inadequate in the project. Because of this it was impossible to use the error data effectively. With the error analysis we were able to show that there were deficiencies in design and analyzing phases, implementation phase and in testing phase. The work gives ideas for improving error classification and for software development practices.
Resumo:
En el sector suroriental de la Cuenca del Ebro, la inclinación paleomagnética obtenida en las sucesiones aluviales oligocenas es considerablemente menor que la esperable, si se considera la paleolatitud de referencia calculada para esa región durante el Oligoceno. Este error de inclinación puede deberse a diversos factores, como el control hidrodinámica de las partículas magnéticas en el medio deposicional, la compactación diferencial del sedimento durante el enterramiento, o bien a la deformación tectónica. Este trabajo se ha centrado en su estudio en dos sucesiones dominantemente aluviales, donde previamente se había establecido su magnetoestratigrafia. Las litofacies aluviales y lacustres estudiadas se han agrupado en cinco grupos: areniscas grises, areniscas rojas y versicolores, limos rojos, lutitas rojas y calizas. Se ha demostrado la existencia de una correlación entre la abundancia de filosilicatos y el error de inclinación. De esta manera, las litofacies con un bajo porcentaje de filosilicatos (calizas y areniscas grises) presentan errores de unos 5', estadisticarnente no significativos, con respecto a la inclinación de referencia. Por el contrario, en materiales con un porcentaje más elevado de filosilicatos (limos y arcillas) el error puede llegar a los 25'. Este hecho no tiene repercusión en la interpretación de las polaridades magnéticas, pero si en las reconstmcciones palinspásticas y paleogeográficas basadas en los cálculos de paleolatitudes a partir de las paleoinclinaciones. Los resultados obtenidos demuestran la necesidad de cautela en la propuesta de conclusiones basadas exclusivamente en este tipo de información.
Resumo:
Notre consommation en eau souterraine, en particulier comme eau potable ou pour l'irrigation, a considérablement augmenté au cours des années. De nombreux problèmes font alors leur apparition, allant de la prospection de nouvelles ressources à la remédiation des aquifères pollués. Indépendamment du problème hydrogéologique considéré, le principal défi reste la caractérisation des propriétés du sous-sol. Une approche stochastique est alors nécessaire afin de représenter cette incertitude en considérant de multiples scénarios géologiques et en générant un grand nombre de réalisations géostatistiques. Nous rencontrons alors la principale limitation de ces approches qui est le coût de calcul dû à la simulation des processus d'écoulements complexes pour chacune de ces réalisations. Dans la première partie de la thèse, ce problème est investigué dans le contexte de propagation de l'incertitude, oú un ensemble de réalisations est identifié comme représentant les propriétés du sous-sol. Afin de propager cette incertitude à la quantité d'intérêt tout en limitant le coût de calcul, les méthodes actuelles font appel à des modèles d'écoulement approximés. Cela permet l'identification d'un sous-ensemble de réalisations représentant la variabilité de l'ensemble initial. Le modèle complexe d'écoulement est alors évalué uniquement pour ce sousensemble, et, sur la base de ces réponses complexes, l'inférence est faite. Notre objectif est d'améliorer la performance de cette approche en utilisant toute l'information à disposition. Pour cela, le sous-ensemble de réponses approximées et exactes est utilisé afin de construire un modèle d'erreur, qui sert ensuite à corriger le reste des réponses approximées et prédire la réponse du modèle complexe. Cette méthode permet de maximiser l'utilisation de l'information à disposition sans augmentation perceptible du temps de calcul. La propagation de l'incertitude est alors plus précise et plus robuste. La stratégie explorée dans le premier chapitre consiste à apprendre d'un sous-ensemble de réalisations la relation entre les modèles d'écoulement approximé et complexe. Dans la seconde partie de la thèse, cette méthodologie est formalisée mathématiquement en introduisant un modèle de régression entre les réponses fonctionnelles. Comme ce problème est mal posé, il est nécessaire d'en réduire la dimensionnalité. Dans cette optique, l'innovation du travail présenté provient de l'utilisation de l'analyse en composantes principales fonctionnelles (ACPF), qui non seulement effectue la réduction de dimensionnalités tout en maximisant l'information retenue, mais permet aussi de diagnostiquer la qualité du modèle d'erreur dans cet espace fonctionnel. La méthodologie proposée est appliquée à un problème de pollution par une phase liquide nonaqueuse et les résultats obtenus montrent que le modèle d'erreur permet une forte réduction du temps de calcul tout en estimant correctement l'incertitude. De plus, pour chaque réponse approximée, une prédiction de la réponse complexe est fournie par le modèle d'erreur. Le concept de modèle d'erreur fonctionnel est donc pertinent pour la propagation de l'incertitude, mais aussi pour les problèmes d'inférence bayésienne. Les méthodes de Monte Carlo par chaîne de Markov (MCMC) sont les algorithmes les plus communément utilisés afin de générer des réalisations géostatistiques en accord avec les observations. Cependant, ces méthodes souffrent d'un taux d'acceptation très bas pour les problèmes de grande dimensionnalité, résultant en un grand nombre de simulations d'écoulement gaspillées. Une approche en deux temps, le "MCMC en deux étapes", a été introduite afin d'éviter les simulations du modèle complexe inutiles par une évaluation préliminaire de la réalisation. Dans la troisième partie de la thèse, le modèle d'écoulement approximé couplé à un modèle d'erreur sert d'évaluation préliminaire pour le "MCMC en deux étapes". Nous démontrons une augmentation du taux d'acceptation par un facteur de 1.5 à 3 en comparaison avec une implémentation classique de MCMC. Une question reste sans réponse : comment choisir la taille de l'ensemble d'entrainement et comment identifier les réalisations permettant d'optimiser la construction du modèle d'erreur. Cela requiert une stratégie itérative afin que, à chaque nouvelle simulation d'écoulement, le modèle d'erreur soit amélioré en incorporant les nouvelles informations. Ceci est développé dans la quatrième partie de la thèse, oú cette méthodologie est appliquée à un problème d'intrusion saline dans un aquifère côtier. -- Our consumption of groundwater, in particular as drinking water and for irrigation, has considerably increased over the years and groundwater is becoming an increasingly scarce and endangered resource. Nofadays, we are facing many problems ranging from water prospection to sustainable management and remediation of polluted aquifers. Independently of the hydrogeological problem, the main challenge remains dealing with the incomplete knofledge of the underground properties. Stochastic approaches have been developed to represent this uncertainty by considering multiple geological scenarios and generating a large number of realizations. The main limitation of this approach is the computational cost associated with performing complex of simulations in each realization. In the first part of the thesis, we explore this issue in the context of uncertainty propagation, where an ensemble of geostatistical realizations is identified as representative of the subsurface uncertainty. To propagate this lack of knofledge to the quantity of interest (e.g., the concentration of pollutant in extracted water), it is necessary to evaluate the of response of each realization. Due to computational constraints, state-of-the-art methods make use of approximate of simulation, to identify a subset of realizations that represents the variability of the ensemble. The complex and computationally heavy of model is then run for this subset based on which inference is made. Our objective is to increase the performance of this approach by using all of the available information and not solely the subset of exact responses. Two error models are proposed to correct the approximate responses follofing a machine learning approach. For the subset identified by a classical approach (here the distance kernel method) both the approximate and the exact responses are knofn. This information is used to construct an error model and correct the ensemble of approximate responses to predict the "expected" responses of the exact model. The proposed methodology makes use of all the available information without perceptible additional computational costs and leads to an increase in accuracy and robustness of the uncertainty propagation. The strategy explored in the first chapter consists in learning from a subset of realizations the relationship between proxy and exact curves. In the second part of this thesis, the strategy is formalized in a rigorous mathematical framework by defining a regression model between functions. As this problem is ill-posed, it is necessary to reduce its dimensionality. The novelty of the work comes from the use of functional principal component analysis (FPCA), which not only performs the dimensionality reduction while maximizing the retained information, but also allofs a diagnostic of the quality of the error model in the functional space. The proposed methodology is applied to a pollution problem by a non-aqueous phase-liquid. The error model allofs a strong reduction of the computational cost while providing a good estimate of the uncertainty. The individual correction of the proxy response by the error model leads to an excellent prediction of the exact response, opening the door to many applications. The concept of functional error model is useful not only in the context of uncertainty propagation, but also, and maybe even more so, to perform Bayesian inference. Monte Carlo Markov Chain (MCMC) algorithms are the most common choice to ensure that the generated realizations are sampled in accordance with the observations. Hofever, this approach suffers from lof acceptance rate in high dimensional problems, resulting in a large number of wasted of simulations. This led to the introduction of two-stage MCMC, where the computational cost is decreased by avoiding unnecessary simulation of the exact of thanks to a preliminary evaluation of the proposal. In the third part of the thesis, a proxy is coupled to an error model to provide an approximate response for the two-stage MCMC set-up. We demonstrate an increase in acceptance rate by a factor three with respect to one-stage MCMC results. An open question remains: hof do we choose the size of the learning set and identify the realizations to optimize the construction of the error model. This requires devising an iterative strategy to construct the error model, such that, as new of simulations are performed, the error model is iteratively improved by incorporating the new information. This is discussed in the fourth part of the thesis, in which we apply this methodology to a problem of saline intrusion in a coastal aquifer.
Resumo:
Most fishes produce free-living embryos that are exposed to environmental stressors immediately following fertilization, including pathogenic microorganisms. Initial immune protection of embryos involves the chorion, as a protective barrier, and maternally-allocated antimicrobial compounds. At later developmental stages, host-genetic effects influence susceptibility and tolerance, suggesting a direct interaction between embryo genes and pathogens. So far, only a few host genes could be identified that correlate with embryonic survival under pathogen stress in salmonids. Here, we utilized high-throughput RNA-sequencing in order to describe the transcriptional response of a non-model fish, the Alpine whitefish Coregonus palaea, to infection, both in terms of host genes that are likely manipulated by the pathogen, and those involved in an early putative immune response. Embryos were produced in vitro, raised individually, and exposed at the late-eyed stage to a virulent strain of the opportunistic fish pathogen Pseudomonas fluorescens. The pseudomonad increased embryonic mortality and affected gene expression substantially. For example, essential, upregulated metabolic pathways in embryos under pathogen stress included ion binding pathways, aminoacyl-tRNA-biosynthesis, and the production of arginine and proline, most probably mediated by the pathogen for its proliferation. Most prominently downregulated transcripts comprised the biosynthesis of unsaturated fatty acids, the citrate cycle, and various isoforms of b-cell transcription factors. These factors have been shown to play a significant role in host blood cell differentiation and renewal. With regard to specific immune functions, differentially expressed transcripts mapped to the complement cascade, MHC class I and II, TNF-alpha, and T-cell differentiation proteins. The results of this study reveal insights into how P. fluorescens impairs the development of whitefish embryos and set a foundation for future studies investigating host pathogen interactions in fish embryos.