14 resultados para Statistical Method

em AMS Tesi di Dottorato - Alm@DL - Università di Bologna


Relevância:

60.00% 60.00%

Publicador:

Resumo:

In the past decade, the advent of efficient genome sequencing tools and high-throughput experimental biotechnology has lead to enormous progress in the life science. Among the most important innovations is the microarray tecnology. It allows to quantify the expression for thousands of genes simultaneously by measurin the hybridization from a tissue of interest to probes on a small glass or plastic slide. The characteristics of these data include a fair amount of random noise, a predictor dimension in the thousand, and a sample noise in the dozens. One of the most exciting areas to which microarray technology has been applied is the challenge of deciphering complex disease such as cancer. In these studies, samples are taken from two or more groups of individuals with heterogeneous phenotypes, pathologies, or clinical outcomes. these samples are hybridized to microarrays in an effort to find a small number of genes which are strongly correlated with the group of individuals. Eventhough today methods to analyse the data are welle developed and close to reach a standard organization (through the effort of preposed International project like Microarray Gene Expression Data -MGED- Society [1]) it is not unfrequant to stumble in a clinician's question that do not have a compelling statistical method that could permit to answer it.The contribution of this dissertation in deciphering disease regards the development of new approaches aiming at handle open problems posed by clinicians in handle specific experimental designs. In Chapter 1 starting from a biological necessary introduction, we revise the microarray tecnologies and all the important steps that involve an experiment from the production of the array, to the quality controls ending with preprocessing steps that will be used into the data analysis in the rest of the dissertation. While in Chapter 2 a critical review of standard analysis methods are provided stressing most of problems that In Chapter 3 is introduced a method to adress the issue of unbalanced design of miacroarray experiments. In microarray experiments, experimental design is a crucial starting-point for obtaining reasonable results. In a two-class problem, an equal or similar number of samples it should be collected between the two classes. However in some cases, e.g. rare pathologies, the approach to be taken is less evident. We propose to address this issue by applying a modified version of SAM [2]. MultiSAM consists in a reiterated application of a SAM analysis, comparing the less populated class (LPC) with 1,000 random samplings of the same size from the more populated class (MPC) A list of the differentially expressed genes is generated for each SAM application. After 1,000 reiterations, each single probe given a "score" ranging from 0 to 1,000 based on its recurrence in the 1,000 lists as differentially expressed. The performance of MultiSAM was compared to the performance of SAM and LIMMA [3] over two simulated data sets via beta and exponential distribution. The results of all three algorithms over low- noise data sets seems acceptable However, on a real unbalanced two-channel data set reagardin Chronic Lymphocitic Leukemia, LIMMA finds no significant probe, SAM finds 23 significantly changed probes but cannot separate the two classes, while MultiSAM finds 122 probes with score >300 and separates the data into two clusters by hierarchical clustering. We also report extra-assay validation in terms of differentially expressed genes Although standard algorithms perform well over low-noise simulated data sets, multi-SAM seems to be the only one able to reveal subtle differences in gene expression profiles on real unbalanced data. In Chapter 4 a method to adress similarities evaluation in a three-class prblem by means of Relevance Vector Machine [4] is described. In fact, looking at microarray data in a prognostic and diagnostic clinical framework, not only differences could have a crucial role. In some cases similarities can give useful and, sometimes even more, important information. The goal, given three classes, could be to establish, with a certain level of confidence, if the third one is similar to the first or the second one. In this work we show that Relevance Vector Machine (RVM) [2] could be a possible solutions to the limitation of standard supervised classification. In fact, RVM offers many advantages compared, for example, with his well-known precursor (Support Vector Machine - SVM [3]). Among these advantages, the estimate of posterior probability of class membership represents a key feature to address the similarity issue. This is a highly important, but often overlooked, option of any practical pattern recognition system. We focused on Tumor-Grade-three-class problem, so we have 67 samples of grade I (G1), 54 samples of grade 3 (G3) and 100 samples of grade 2 (G2). The goal is to find a model able to separate G1 from G3, then evaluate the third class G2 as test-set to obtain the probability for samples of G2 to be member of class G1 or class G3. The analysis showed that breast cancer samples of grade II have a molecular profile more similar to breast cancer samples of grade I. Looking at the literature this result have been guessed, but no measure of significance was gived before.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Widespread occurrence of pharmaceuticals residues has been reported in aquatic ecosystems. However, their toxic effects on aquatic biota remain unclear. Generally, the acute toxicity has been assessed in laboratory experiments, while chronic toxicity studies have rarely been performed. Of importance appears also the assessment of mixture effects, since pharmaceuticals never occur in waters alone. The aim of the present work is to evaluate acute and chronic toxic response in the crustacean Daphnia magna exposed to single pharmaceuticals and mixtures. We tested fluoxetine, a SSRI widely prescribed as antidepressant, and propranolol, a non selective β-adrenergic receptor-blocking agent used to treat hypertension. Acute immobilization and chronic reproduction tests were performed according to OECD guidelines 202 and 211, respectively. Single chemicals were first tested separately. Toxicity of binary mixtures was then assessed using a fixed ratio experimental design with concentrations based on Toxic Units. The conceptual model of Concentration Addition was adopted in this study, as we assumed that the mixture effect mirrors the sum of the single substances for compounds having similar mode of action. The MixTox statistical method was applied to analyze the experimental results. Results showed a significant deviation from CA model that indicated antagonism between chemicals in both the acute and the chronic mixture tests. The study was integrated assessing the effects of fluoxetine on a battery of biomarkers. We wanted to evaluate the organism biological vulnerability caused by low concentrations of pharmaceutical occurring in the aquatic environment. We assessed the acetylcholinesterase and glutathione s-transferase enzymatic activities and the malondialdehyde production. No treatment induced significant alteration of biomarkers with respect to the control. Biological assays and the MixTox model application proved to be useful tools for pharmaceutical risk assessment. Although promising, the application of biomarkers in Daphnia magna needs further elucidation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Aggregate masonry buildings have been generated over the years, allowing the interaction of different aggregated structural units under seismic action. The first part of this work is focused on the seismic vulnerability and fragility assessment of clay brick masonry buildings, sited in Bologna (Italy), with reference, at first, to single isolated structural units, by means of the Response Surface statistical method, taking into account some variabilities and uncertainties involved in the problem. The seismic action was defined by means of a group of selected registered accelerograms, in order to analyse the effect of the variability of the earthquakes. Identical and different structural units chosen by the Response Surface generated simulations are then aggregated in row, in order to compare the collapse PGA referred to the isolated structural unit and the one referred to the aggregate structure. The second part is focused on the seismic vulnerability and fragility assessment of stone masonry structures, sited in Seixal (Portugal), applying a methodology similar to that used for the buildings sited in Bologna. Since the availability of several information, the analyses involved the assessment of the most prevalent structural typologies in the area, considering the variability of a set of structural and geometrical parameters. The results highlighted the importance of the statistic procedures as method able to consider the variabilities and the uncertainties involved in the problem of the fragility of unreinforced masonry structures, in absence of accurate investigations on the structural typologies, as in the Seixal case study. Furthermore, it was showed that the structural units along the unreinforced clay brick or stone masonry aggregates cannot be analysed as isolated, as they are affected by the effect of the aggregation with adjacent structural units, according to the different directions of the seismic action considered and to their different position along the row aggregate.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The research activities described in this thesis were focused on two main topics: the study of shaft-hub joint performance, with particular regard to interference-fitted and adhesively bonded connection, and the fatigue characterization of additively processed metal alloys. The research on interference-fitted shaft-hub joints dealt with some studies in the field of fretting fatigue. Rotating bending fatigue tests were performed on different materials by not conventional specimens to determine the fatigue properties of interference-fitted joints and to investigate the fretting fatigue phenomenon, which led to novel and original results. In adhesively bonded and interference-fitted shaft-hub connections (called hybrid joints) the synergic effect of anaerobic adhesive and interference has the capability of improving the joint strength. However, the adhesive contribution depends on several factors. Therefore, its behavior was investigated for different coupling pressure, coupling procedure, operating temperature and joint design. The study on additively manufactured metal alloy deals with rotating banding fatigue tests. AlSi10Mg and Maraging Stainless Steel CX were involved in the campaign for their wide applicability in Automotive. Build direction, heat and surface treatments were considered as input parameters. Fatigue results were interpreted by statistical method and microscopy analyses in order to determine the effectiveness and the beneficial or detrimental effects of the considered factors. Fracture mode and microstructure were investigated by fractographic and micrographic analyses

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Multiple Myeloma (MM) is a hematologic cancer with heterogeneous and complex genomic landscape, where Copy Number Alterations (CNAs) play a key role in the disease's pathogenesis and prognosis. It is of biological and clinical interest to study the temporal occurrence of early alterations, as they play a disease "driver" function by deregulating key tumor pathways. This study presents an innovative bioinformatic tools suite created for harmonizing and tracing the origin of CNAs throughout the evolutionary history of MM. To this aim, large cohorts of newly-diagnosed MM (NDMM, N=1582) and Smoldering-MM (SMM, N=282) were aggregated. The tools developed in this study enable the harmonization of CNAs as obtained from different genomic platforms in such a way that a high statistical power can be obtained. By doing so, the high numerosity of those cohorts was harnessed for the identification of novel genes characterized as "driver" (NFKB2, NOTCH2, MAX, EVI5 and MYC-ME2-enhancer), and the generation of an innovative timing model, implemented with a statistical method to introduce confidence intervals in the CNAs-calls. By applying this model on both NDMM and SMM cohorts, it was possible to identify specific CNAs (1q(CKS1B)amp, 13q(RB1)del, 11q(CCND1)amp and 14q(MAX)del) and categorize them as "early"/ "driver" events. A high level of precision was guaranteed by the narrow confidence intervals in the timing estimates. These CNAs were proposed as critical MM alterations, which play a foundational role in the evolutionary history of both SMM and NDMM. Finally, a multivariate survival model was able to identify the independent genomic alterations with the greatest effect on patients’ survival, including RB1-del, CKS1B-amp, MYC-amp, NOTCH2-amp and TRAF3-del/mut. In conclusion, the alterations that were identified as both "early-drivers” and correlated with patients’ survival were proposed as biomarkers that, if included in wider survival models, could provide a better disease stratification and an improved prognosis definition.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this thesis two major topics inherent with medical ultrasound images are addressed: deconvolution and segmentation. In the first case a deconvolution algorithm is described allowing statistically consistent maximum a posteriori estimates of the tissue reflectivity to be restored. These estimates are proven to provide a reliable source of information for achieving an accurate characterization of biological tissues through the ultrasound echo. The second topic involves the definition of a semi automatic algorithm for myocardium segmentation in 2D echocardiographic images. The results show that the proposed method can reduce inter- and intra observer variability in myocardial contours delineation and is feasible and accurate even on clinical data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Bioinformatics, in the last few decades, has played a fundamental role to give sense to the huge amount of data produced. Obtained the complete sequence of a genome, the major problem of knowing as much as possible of its coding regions, is crucial. Protein sequence annotation is challenging and, due to the size of the problem, only computational approaches can provide a feasible solution. As it has been recently pointed out by the Critical Assessment of Function Annotations (CAFA), most accurate methods are those based on the transfer-by-homology approach and the most incisive contribution is given by cross-genome comparisons. In the present thesis it is described a non-hierarchical sequence clustering method for protein automatic large-scale annotation, called “The Bologna Annotation Resource Plus” (BAR+). The method is based on an all-against-all alignment of more than 13 millions protein sequences characterized by a very stringent metric. BAR+ can safely transfer functional features (Gene Ontology and Pfam terms) inside clusters by means of a statistical validation, even in the case of multi-domain proteins. Within BAR+ clusters it is also possible to transfer the three dimensional structure (when a template is available). This is possible by the way of cluster-specific HMM profiles that can be used to calculate reliable template-to-target alignments even in the case of distantly related proteins (sequence identity < 30%). Other BAR+ based applications have been developed during my doctorate including the prediction of Magnesium binding sites in human proteins, the ABC transporters superfamily classification and the functional prediction (GO terms) of the CAFA targets. Remarkably, in the CAFA assessment, BAR+ placed among the ten most accurate methods. At present, as a web server for the functional and structural protein sequence annotation, BAR+ is freely available at http://bar.biocomp.unibo.it/bar2.0.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Schroeder's backward integration method is the most used method to extract the decay curve of an acoustic impulse response and to calculate the reverberation time from this curve. In the literature the limits and the possible improvements of this method are widely discussed. In this work a new method is proposed for the evaluation of the energy decay curve. The new method has been implemented in a Matlab toolbox. Its performance has been tested versus the most accredited literature method. The values of EDT and reverberation time extracted from the energy decay curves calculated with both methods have been compared in terms of the values themselves and in terms of their statistical representativeness. The main case study consists of nine Italian historical theatres in which acoustical measurements were performed. The comparison of the two extraction methods has also been applied to a critical case, i.e. the structural impulse responses of some building elements. The comparison underlines that both methods return a comparable value of the T30. Decreasing the range of evaluation, they reveal increasing differences; in particular, the main differences are in the first part of the decay, where the EDT is evaluated. This is a consequence of the fact that the new method returns a “locally" defined energy decay curve, whereas the Schroeder's method accumulates energy from the tail to the beginning of the impulse response. Another characteristic of the new method for the energy decay extraction curve is its independence on the background noise estimation. Finally, a statistical analysis is performed on the T30 and EDT values calculated from the impulse responses measurements in the Italian historical theatres. The aim of this evaluation is to know whether a subset of measurements could be considered representative for a complete characterization of these opera houses.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Coastal sand dunes represent a richness first of all in terms of defense from the sea storms waves and the saltwater ingression; moreover these morphological elements constitute an unique ecosystem of transition between the sea and the land environment. The research about dune system is a strong part of the coastal sciences, since the last century. Nowadays this branch have assumed even more importance for two reasons: on one side the born of brand new technologies, especially related to the Remote Sensing, have increased the researcher possibilities; on the other side the intense urbanization of these days have strongly limited the dune possibilities of development and fragmented what was remaining from the last century. This is particularly true in the Ravenna area, where the industrialization united to the touristic economy and an intense subsidence, have left only few dune ridges residual still active. In this work three different foredune ridges, along the Ravenna coast, have been studied with Laser Scanner technology. This research didn’t limit to analyze volume or spatial difference, but try also to find new ways and new features to monitor this environment. Moreover the author planned a series of test to validate data from Terrestrial Laser Scanner (TLS), with the additional aim of finalize a methodology to test 3D survey accuracy. Data acquired by TLS were then applied on one hand to test some brand new applications, such as Digital Shore Line Analysis System (DSAS) and Computational Fluid Dynamics (CFD), to prove their efficacy in this field; on the other hand the author used TLS data to find any correlation with meteorological indexes (Forcing Factors), linked to sea and wind (Fryberger's method) applying statistical tools, such as the Principal Component Analysis (PCA).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The uncertainties in the determination of the stratigraphic profile of natural soils is one of the main problems in geotechnics, in particular for landslide characterization and modeling. The study deals with a new approach in geotechnical modeling which relays on a stochastic generation of different soil layers distributions, following a boolean logic – the method has been thus called BoSG (Boolean Stochastic Generation). In this way, it is possible to randomize the presence of a specific material interdigitated in a uniform matrix. In the building of a geotechnical model it is generally common to discard some stratigraphic data in order to simplify the model itself, assuming that the significance of the results of the modeling procedure would not be affected. With the proposed technique it is possible to quantify the error associated with this simplification. Moreover, it could be used to determine the most significant zones where eventual further investigations and surveys would be more effective to build the geotechnical model of the slope. The commercial software FLAC was used for the 2D and 3D geotechnical model. The distribution of the materials was randomized through a specifically coded MatLab program that automatically generates text files, each of them representing a specific soil configuration. Besides, a routine was designed to automate the computation of FLAC with the different data files in order to maximize the sample number. The methodology is applied with reference to a simplified slope in 2D, a simplified slope in 3D and an actual landslide, namely the Mortisa mudslide (Cortina d’Ampezzo, BL, Italy). However, it could be extended to numerous different cases, especially for hydrogeological analysis and landslide stability assessment, in different geological and geomorphological contexts.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this thesis we will see that the DNA sequence is constantly shaped by the interactions with its environment at multiple levels, showing footprints of DNA methylation, of its 3D organization and, in the case of bacteria, of the interaction with the host organisms. In the first chapter, we will see that analyzing the distribution of distances between consecutive dinucleotides of the same type along the sequence, we can detect epigenetic and structural footprints. In particular, we will see that CG distance distribution allows to distinguish among organisms of different biological complexity, depending on how much CG sites are involved in DNA methylation. Moreover, we will see that CG and TA can be described by the same fitting function, suggesting a relationship between the two. We will also provide an interpretation of the observed trend, simulating a positioning process guided by the presence and absence of memory. In the end, we will focus on TA distance distribution, characterizing deviations from the trend predicted by the best fitting function, and identifying specific patterns that might be related to peculiar mechanical properties of the DNA and also to epigenetic and structural processes. In the second chapter, we will see how we can map the 3D structure of the DNA onto its sequence. In particular, we devised a network-based algorithm that produces a genome assembly starting from its 3D configuration, using as inputs Hi-C contact maps. Specifically, we will see how we can identify the different chromosomes and reconstruct their sequences by exploiting the spectral properties of the Laplacian operator of a network. In the third chapter, we will see a novel method for source clustering and source attribution, based on a network approach, that allows to identify host-bacteria interaction starting from the detection of Single-Nucleotide Polymorphisms along the sequence of bacterial genomes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The main purpose of this thesis is to go beyond two usual assumptions that accompany theoretical analysis in spin-glasses and inference: the i.i.d. (independently and identically distributed) hypothesis on the noise elements and the finite rank regime. The first one appears since the early birth of spin-glasses. The second one instead concerns the inference viewpoint. Disordered systems and Bayesian inference have a well-established relation, evidenced by their continuous cross-fertilization. The thesis makes use of techniques coming both from the rigorous mathematical machinery of spin-glasses, such as the interpolation scheme, and from Statistical Physics, such as the replica method. The first chapter contains an introduction to the Sherrington-Kirkpatrick and spiked Wigner models. The first is a mean field spin-glass where the couplings are i.i.d. Gaussian random variables. The second instead amounts to establish the information theoretical limits in the reconstruction of a fixed low rank matrix, the “spike”, blurred by additive Gaussian noise. In chapters 2 and 3 the i.i.d. hypothesis on the noise is broken by assuming a noise with inhomogeneous variance profile. In spin-glasses this leads to multi-species models. The inferential counterpart is called spatial coupling. All the previous models are usually studied in the Bayes-optimal setting, where everything is known about the generating process of the data. In chapter 4 instead we study the spiked Wigner model where the prior on the signal to reconstruct is ignored. In chapter 5 we analyze the statistical limits of a spiked Wigner model where the noise is no longer Gaussian, but drawn from a random matrix ensemble, which makes its elements dependent. The thesis ends with chapter 6, where the challenging problem of high-rank probabilistic matrix factorization is tackled. Here we introduce a new procedure called "decimation" and we show that it is theoretically to perform matrix factorization through it.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Long-term monitoring of acoustical environments is gaining popularity thanks to the relevant amount of scientific and engineering insights that it provides. The increasing interest is due to the constant growth of storage capacity and computational power to process large amounts of data. In this perspective, machine learning (ML) provides a broad family of data-driven statistical techniques to deal with large databases. Nowadays, the conventional praxis of sound level meter measurements limits the global description of a sound scene to an energetic point of view. The equivalent continuous level Leq represents the main metric to define an acoustic environment, indeed. Finer analyses involve the use of statistical levels. However, acoustic percentiles are based on temporal assumptions, which are not always reliable. A statistical approach, based on the study of the occurrences of sound pressure levels, would bring a different perspective to the analysis of long-term monitoring. Depicting a sound scene through the most probable sound pressure level, rather than portions of energy, brought more specific information about the activity carried out during the measurements. The statistical mode of the occurrences can capture typical behaviors of specific kinds of sound sources. The present work aims to propose an ML-based method to identify, separate and measure coexisting sound sources in real-world scenarios. It is based on long-term monitoring and is addressed to acousticians focused on the analysis of environmental noise in manifold contexts. The presented method is based on clustering analysis. Two algorithms, Gaussian Mixture Model and K-means clustering, represent the main core of a process to investigate different active spaces monitored through sound level meters. The procedure has been applied in two different contexts: university lecture halls and offices. The proposed method shows robust and reliable results in describing the acoustic scenario and it could represent an important analytical tool for acousticians.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the field of educational and psychological measurement, the shift from paper-based to computerized tests has become a prominent trend in recent years. Computerized tests allow for more complex and personalized test administration procedures, like Computerized Adaptive Testing (CAT). CAT, following the Item Response Theory (IRT) models, dynamically generates tests based on test-taker responses, driven by complex statistical algorithms. Even if CAT structures are complex, they are flexible and convenient, but concerns about test security should be addressed. Frequent item administration can lead to item exposure and cheating, necessitating preventive and diagnostic measures. In this thesis a method called "CHeater identification using Interim Person fit Statistic" (CHIPS) is developed, designed to identify and limit cheaters in real-time during test administration. CHIPS utilizes response times (RTs) to calculate an Interim Person fit Statistic (IPS), allowing for on-the-fly intervention using a more secret item bank. Also, a slight modification is proposed to overcome situations with constant speed, called Modified-CHIPS (M-CHIPS). A simulation study assesses CHIPS, highlighting its effectiveness in identifying and controlling cheaters. However, it reveals limitations when cheaters possess all correct answers. The M-CHIPS overcame this limitation. Furthermore, the method has shown not to be influenced by the cheaters’ ability distribution or the level of correlation between ability and speed of test-takers. Finally, the method has demonstrated flexibility for the choice of significance level and the transition from fixed-length tests to variable-length ones. The thesis discusses potential applications, including the suitability of the method for multiple-choice tests, assumptions about RT distribution and level of item pre-knowledge. Also limitations are discussed to explore future developments such as different RT distributions, unusual honest respondent behaviors, and field testing in real-world scenarios. In summary, CHIPS and M-CHIPS offer real-time cheating detection in CAT, enhancing test security and ability estimation while not penalizing test respondents.