917 resultados para Statistical Robustness
Resumo:
We describe a series of experiments in which we start with English to French and English to Japanese versions of an Open Source rule-based speech translation system for a medical domain, and bootstrap correspondign statistical systems. Comparative evaluation reveals that the rule-based systems are still significantly better than the statistical ones, despite the fact that considerable effort has been invested in tuning both the recognition and translation components; also, a hybrid system only marginally improved recall at the cost of a los in precision. The result suggests that rule-based architectures may still be preferable to statistical ones for safety-critical speech translation tasks.
Resumo:
Abstract One of the most important issues in molecular biology is to understand regulatory mechanisms that control gene expression. Gene expression is often regulated by proteins, called transcription factors which bind to short (5 to 20 base pairs),degenerate segments of DNA. Experimental efforts towards understanding the sequence specificity of transcription factors is laborious and expensive, but can be substantially accelerated with the use of computational predictions. This thesis describes the use of algorithms and resources for transcriptionfactor binding site analysis in addressing quantitative modelling, where probabilitic models are built to represent binding properties of a transcription factor and can be used to find new functional binding sites in genomes. Initially, an open-access database(HTPSELEX) was created, holding high quality binding sequences for two eukaryotic families of transcription factors namely CTF/NF1 and LEFT/TCF. The binding sequences were elucidated using a recently described experimental procedure called HTP-SELEX, that allows generation of large number (> 1000) of binding sites using mass sequencing technology. For each HTP-SELEX experiments we also provide accurate primary experimental information about the protein material used, details of the wet lab protocol, an archive of sequencing trace files, and assembled clone sequences of binding sequences. The database also offers reasonably large SELEX libraries obtained with conventional low-throughput protocols.The database is available at http://wwwisrec.isb-sib.ch/htpselex/ and and ftp://ftp.isrec.isb-sib.ch/pub/databases/htpselex. The Expectation-Maximisation(EM) algorithm is one the frequently used methods to estimate probabilistic models to represent the sequence specificity of transcription factors. We present computer simulations in order to estimate the precision of EM estimated models as a function of data set parameters(like length of initial sequences, number of initial sequences, percentage of nonbinding sequences). We observed a remarkable robustness of the EM algorithm with regard to length of training sequences and the degree of contamination. The HTPSELEX database and the benchmarked results of the EM algorithm formed part of the foundation for the subsequent project, where a statistical framework called hidden Markov model has been developed to represent sequence specificity of the transcription factors CTF/NF1 and LEF1/TCF using the HTP-SELEX experiment data. The hidden Markov model framework is capable of both predicting and classifying CTF/NF1 and LEF1/TCF binding sites. A covariance analysis of the binding sites revealed non-independent base preferences at different nucleotide positions, providing insight into the binding mechanism. We next tested the LEF1/TCF model by computing binding scores for a set of LEF1/TCF binding sequences for which relative affinities were determined experimentally using non-linear regression. The predicted and experimentally determined binding affinities were in good correlation.
Resumo:
A general criterion for the design of adaptive systemsin digital communications called the statistical reference criterionis proposed. The criterion is based on imposition of the probabilitydensity function of the signal of interest at the outputof the adaptive system, with its application to the scenario ofhighly powerful interferers being the main focus of this paper.The knowledge of the pdf of the wanted signal is used as adiscriminator between signals so that interferers with differingdistributions are rejected by the algorithm. Its performance isstudied over a range of scenarios. Equations for gradient-basedcoefficient updates are derived, and the relationship with otherexisting algorithms like the minimum variance and the Wienercriterion are examined.
Resumo:
The general objective of this study was to conduct astatistical analysis on the variation of the weld profiles and their influence on the fatigue strength of the joint. Weld quality with respect to its fatigue strength is of importance which is the main concept behind this thesis. The intention of this study was to establish the influence of weld geometric parameters on the weld quality and fatigue strength. The effect of local geometrical variations of non-load carrying cruciform fillet welded joint under tensile loading wasstudied in this thesis work. Linear Elastic Fracture Mechanics was used to calculate fatigue strength of the cruciform fillet welded joints in as-welded condition and under cyclic tensile loading, for a range of weld geometries. With extreme value statistical analysis and LEFM, an attempt was made to relate the variation of the cruciform weld profiles such as weld angle and weld toe radius to respective FAT classes.
Resumo:
A maximum entropy statistical treatment of an inverse problem concerning frame theory is presented. The problem arises from the fact that a frame is an overcomplete set of vectors that defines a mapping with no unique inverse. Although any vector in the concomitant space can be expressed as a linear combination of frame elements, the coefficients of the expansion are not unique. Frame theory guarantees the existence of a set of coefficients which is “optimal” in a minimum norm sense. We show here that these coefficients are also “optimal” from a maximum entropy viewpoint.
Resumo:
Within a developing organism, cells require information on where they are in order to differentiate into the correct cell-type. Pattern formation is the process by which cells acquire and process positional cues and thus determine their fate. This can be achieved by the production and release of a diffusible signaling molecule, called a morphogen, which forms a concentration gradient: exposure to different morphogen levels leads to the activation of specific signaling pathways. Thus, in response to the morphogen gradient, cells start to express different sets of genes, forming domains characterized by a unique combination of differentially expressed genes. As a result, a pattern of cell fates and specification emerges.Though morphogens have been known for decades, it is not yet clear how these gradients form and are interpreted in order to yield highly robust patterns of gene expression. During my PhD thesis, I investigated the properties of Bicoid (Bcd) and Decapentaplegic (Dpp), two morphogens involved in the patterning of the anterior-posterior axis of Drosophila embryo and wing primordium, respectively. In particular, I have been interested in understanding how the pattern proportions are maintained across embryos of different sizes or within a growing tissue. This property is commonly referred to as scaling and is essential for yielding functional organs or organisms. In order to tackle these questions, I analysed fluorescence images showing the pattern of gene expression domains in the early embryo and wing imaginal disc. After characterizing the extent of these domains in a quantitative and systematic manner, I introduced and applied a new scaling measure in order to assess how well proportions are maintained. I found that scaling emerged as a universal property both in early embryos (at least far away from the Bcd source) and in wing imaginal discs (across different developmental stages). Since we were also interested in understanding the mechanisms underlying scaling and how it is transmitted from the morphogen to the target genes down in the signaling cascade, I also quantified scaling in mutant flies where this property could be disrupted. While scaling is largely conserved in embryos with altered bcd dosage, my modeling suggests that Bcd trapping by the nuclei as well as pre-steady state decoding of the morphogen gradient are essential to ensure precise and scaled patterning of the Bcd signaling cascade. In the wing imaginal disc, it appears that as the disc grows, the Dpp response expands and scales with the tissue size. Interestingly, scaling is not perfect at all positions in the field. The scaling of the target gene domains is best where they have a function; Spalt, for example, scales best at the position in the anterior compartment where it helps to form one of the anterior veins of the wing. Analysis of mutants for pentagone, a transcriptional target of Dpp that encodes a secreted feedback regulator of the pathway, indicates that Pentagone plays a key role in scaling the Dpp gradient activity.
Resumo:
To enable a mathematically and physically sound execution of the fatigue test and a correct interpretation of its results, statistical evaluation methods are used to assist in the analysis of fatigue testing data. The main objective of this work is to develop step-by-stepinstructions for statistical analysis of the laboratory fatigue data. The scopeof this project is to provide practical cases about answering the several questions raised in the treatment of test data with application of the methods and formulae in the document IIW-XIII-2138-06 (Best Practice Guide on the Statistical Analysis of Fatigue Data). Generally, the questions in the data sheets involve some aspects: estimation of necessary sample size, verification of the statistical equivalence of the collated sets of data, and determination of characteristic curves in different cases. The series of comprehensive examples which are given in this thesis serve as a demonstration of the various statistical methods to develop a sound procedure to create reliable calculation rules for the fatigue analysis.
Resumo:
Occupational exposure modeling is widely used in the context of the E.U. regulation on the registration, evaluation, authorization, and restriction of chemicals (REACH). First tier tools, such as European Centre for Ecotoxicology and TOxicology of Chemicals (ECETOC) targeted risk assessment (TRA) or Stoffenmanager, are used to screen a wide range of substances. Those of concern are investigated further using second tier tools, e.g., Advanced REACH Tool (ART). Local sensitivity analysis (SA) methods are used here to determine dominant factors for three models commonly used within the REACH framework: ECETOC TRA v3, Stoffenmanager 4.5, and ART 1.5. Based on the results of the SA, the robustness of the models is assessed. For ECETOC, the process category (PROC) is the most important factor. A failure to identify the correct PROC has severe consequences for the exposure estimate. Stoffenmanager is the most balanced model and decision making uncertainties in one modifying factor are less severe in Stoffenmanager. ART requires a careful evaluation of the decisions in the source compartment since it constitutes ∼75% of the total exposure range, which corresponds to an exposure estimate of 20-22 orders of magnitude. Our results indicate that there is a trade off between accuracy and precision of the models. Previous studies suggested that ART may lead to more accurate results in well-documented exposure situations. However, the choice of the adequate model should ultimately be determined by the quality of the available exposure data: if the practitioner is uncertain concerning two or more decisions in the entry parameters, Stoffenmanager may be more robust than ART.
Resumo:
The goal of this work is to try to create a statistical model, based only on easily computable parameters from the CSP problem to predict runtime behaviour of the solving algorithms, and let us choose the best algorithm to solve the problem. Although it seems that the obvious choice should be MAC, experimental results obtained so far show, that with big numbers of variables, other algorithms perfom much better, specially for hard problems in the transition phase.
Resumo:
The effect of the heat flux on the rate of chemical reaction in dilute gases is shown to be important for reactions characterized by high activation energies and in the presence of very large temperature gradients. This effect, obtained from the second-order terms in the distribution function (similar to those obtained in the Burnett approximation to the solution of the Boltzmann equation), is derived on the basis of information theory. It is shown that the analytical results describing the effect are simpler if the kinetic definition for the nonequilibrium temperature is introduced than if the thermodynamic definition is introduced. The numerical results are nearly the same for both definitions
Resumo:
Background. Although peer review is widely considered to be the most credible way of selecting manuscripts and improving the quality of accepted papers in scientific journals, there is little evidence to support its use. Our aim was to estimate the effects on manuscript quality of either adding a statistical peer reviewer or suggesting the use of checklists such as CONSORT or STARD to clinical reviewers or both. Methodology and Principal Findings. Interventions were defined as 1) the addition of a statistical reviewer to the clinical peer review process, and 2) suggesting reporting guidelines to reviewers; with"no statistical expert" and"no checklist" as controls. The two interventions were crossed in a 262 balanced factorial design including original research articles consecutively selected, between May 2004 and March 2005, by the Medicina Clinica (Barc) editorial committee. We randomized manuscripts to minimize differences in terms of baseline quality and type of study (intervention, longitudinal, cross-sectional, others). Sample-size calculations indicated that 100 papers provide an 80% power to test a 55% standardized difference. We specified the main outcome as the increment in quality of papers as measured on the Goodman Scale. Two blinded evaluators rated the quality of manuscripts at initial submission and final post peer review version. Of the 327 manuscripts submitted to the journal, 131 were accepted for further review, and 129 were randomized. Of those, 14 that were lost to follow-up showed no differences in initial quality to the followed-up papers. Hence, 115 were included in the main analysis, with 16 rejected for publication after peer review. 21 (18.3%) of the 115 included papers were interventions, 46 (40.0%) were longitudinal designs, 28 (24.3%) cross-sectional and 20 (17.4%) others. The 16 (13.9%) rejected papers had a significantly lower initial score on the overall Goodman scale than accepted papers (difference 15.0, 95% CI: 4.6- 24.4). The effect of suggesting a guideline to the reviewers had no effect on change in overall quality as measured by the Goodman scale (0.9, 95% CI: 20.3+2.1). The estimated effect of adding a statistical reviewer was 5.5 (95% CI: 4.3-6.7), showing a significant improvement in quality. Conclusions and Significance. This prospective randomized study shows the positive effect of adding a statistical reviewer to the field-expert peers in improving manuscript quality. We did not find a statistically significant positive effect by suggesting reviewers use reporting guidelines.
Resumo:
PURPOSE: Proper delineation of ocular anatomy in 3-dimensional (3D) imaging is a big challenge, particularly when developing treatment plans for ocular diseases. Magnetic resonance imaging (MRI) is presently used in clinical practice for diagnosis confirmation and treatment planning for treatment of retinoblastoma in infants, where it serves as a source of information, complementary to the fundus or ultrasonographic imaging. Here we present a framework to fully automatically segment the eye anatomy for MRI based on 3D active shape models (ASM), and we validate the results and present a proof of concept to automatically segment pathological eyes. METHODS AND MATERIALS: Manual and automatic segmentation were performed in 24 images of healthy children's eyes (3.29 ± 2.15 years of age). Imaging was performed using a 3-T MRI scanner. The ASM consists of the lens, the vitreous humor, the sclera, and the cornea. The model was fitted by first automatically detecting the position of the eye center, the lens, and the optic nerve, and then aligning the model and fitting it to the patient. We validated our segmentation method by using a leave-one-out cross-validation. The segmentation results were evaluated by measuring the overlap, using the Dice similarity coefficient (DSC) and the mean distance error. RESULTS: We obtained a DSC of 94.90 ± 2.12% for the sclera and the cornea, 94.72 ± 1.89% for the vitreous humor, and 85.16 ± 4.91% for the lens. The mean distance error was 0.26 ± 0.09 mm. The entire process took 14 seconds on average per eye. CONCLUSION: We provide a reliable and accurate tool that enables clinicians to automatically segment the sclera, the cornea, the vitreous humor, and the lens, using MRI. We additionally present a proof of concept for fully automatically segmenting eye pathology. This tool reduces the time needed for eye shape delineation and thus can help clinicians when planning eye treatment and confirming the extent of the tumor.
Resumo:
Background: The repertoire of statistical methods dealing with the descriptive analysis of the burden of a disease has been expanded and implemented in statistical software packages during the last years. The purpose of this paper is to present a web-based tool, REGSTATTOOLS http://regstattools.net intended to provide analysis for the burden of cancer, or other group of disease registry data. Three software applications are included in REGSTATTOOLS: SART (analysis of disease"s rates and its time trends), RiskDiff (analysis of percent changes in the rates due to demographic factors and risk of developing or dying from a disease) and WAERS (relative survival analysis). Results: We show a real-data application through the assessment of the burden of tobacco-related cancer incidence in two Spanish regions in the period 1995-2004. Making use of SART we show that lung cancer is the most common cancer among those cancers, with rising trends in incidence among women. We compared 2000-2004 data with that of 1995-1999 to assess percent changes in the number of cases as well as relative survival using RiskDiff and WAERS, respectively. We show that the net change increase in lung cancer cases among women was mainly attributable to an increased risk of developing lung cancer, whereas in men it is attributable to the increase in population size. Among men, lung cancer relative survival was higher in 2000-2004 than in 1995-1999, whereas it was similar among women when these time periods were compared. Conclusions: Unlike other similar applications, REGSTATTOOLS does not require local software installation and it is simple to use, fast and easy to interpret. It is a set of web-based statistical tools intended for automated calculation of population indicators that any professional in health or social sciences may require.