12 resultados para statistical techniques

em AMS Tesi di Dottorato - Alm@DL - Università di Bologna


Relevância:

70.00% 70.00%

Publicador:

Resumo:

With the advent of new technologies it is increasingly easier to find data of different nature from even more accurate sensors that measure the most disparate physical quantities and with different methodologies. The collection of data thus becomes progressively important and takes the form of archiving, cataloging and online and offline consultation of information. Over time, the amount of data collected can become so relevant that it contains information that cannot be easily explored manually or with basic statistical techniques. The use of Big Data therefore becomes the object of more advanced investigation techniques, such as Machine Learning and Deep Learning. In this work some applications in the world of precision zootechnics and heat stress accused by dairy cows are described. Experimental Italian and German stables were involved for the training and testing of the Random Forest algorithm, obtaining a prediction of milk production depending on the microclimatic conditions of the previous days with satisfactory accuracy. Furthermore, in order to identify an objective method for identifying production drops, compared to the Wood model, typically used as an analytical model of the lactation curve, a Robust Statistics technique was used. Its application on some sample lactations and the results obtained allow us to be confident about the use of this method in the future.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Long-term monitoring of acoustical environments is gaining popularity thanks to the relevant amount of scientific and engineering insights that it provides. The increasing interest is due to the constant growth of storage capacity and computational power to process large amounts of data. In this perspective, machine learning (ML) provides a broad family of data-driven statistical techniques to deal with large databases. Nowadays, the conventional praxis of sound level meter measurements limits the global description of a sound scene to an energetic point of view. The equivalent continuous level Leq represents the main metric to define an acoustic environment, indeed. Finer analyses involve the use of statistical levels. However, acoustic percentiles are based on temporal assumptions, which are not always reliable. A statistical approach, based on the study of the occurrences of sound pressure levels, would bring a different perspective to the analysis of long-term monitoring. Depicting a sound scene through the most probable sound pressure level, rather than portions of energy, brought more specific information about the activity carried out during the measurements. The statistical mode of the occurrences can capture typical behaviors of specific kinds of sound sources. The present work aims to propose an ML-based method to identify, separate and measure coexisting sound sources in real-world scenarios. It is based on long-term monitoring and is addressed to acousticians focused on the analysis of environmental noise in manifold contexts. The presented method is based on clustering analysis. Two algorithms, Gaussian Mixture Model and K-means clustering, represent the main core of a process to investigate different active spaces monitored through sound level meters. The procedure has been applied in two different contexts: university lecture halls and offices. The proposed method shows robust and reliable results in describing the acoustic scenario and it could represent an important analytical tool for acousticians.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This thesis is focused on the metabolomic study of human cancer tissues by ex vivo High Resolution-Magic Angle Spinning (HR-MAS) nuclear magnetic resonance (NMR) spectroscopy. This new technique allows for the acquisition of spectra directly on intact tissues (biopsy or surgery), and it has become very important for integrated metabonomics studies. The objective is to identify metabolites that can be used as markers for the discrimination of the different types of cancer, for the grading, and for the assessment of the evolution of the tumour. Furthermore, an attempt to recognize metabolites, that although involved in the metabolism of tumoral tissues in low concentration, can be important modulators of neoplastic proliferation, was performed. In addition, NMR data was integrated with statistical techniques in order to obtain semi-quantitative information about the metabolite markers. In the case of gliomas, the NMR study was correlated with gene expression of neoplastic tissues. Chapter 1 begins with a general description of a new “omics” study, the metabolomics. The study of metabolism can contribute significantly to biomedical research and, ultimately, to clinical medical practice. This rapidly developing discipline involves the study of the metabolome: the total repertoire of small molecules present in cells, tissues, organs, and biological fluids. Metabolomic approaches are becoming increasingly popular in disease diagnosis and will play an important role on improving our understanding of cancer mechanism. Chapter 2 addresses in more detail the basis of NMR Spectroscopy, presenting the new HR-MAS NMR tool, that is gaining importance in the examination of tumour tissues, and in the assessment of tumour grade. Some advanced chemometric methods were used in an attempt to enhance the interpretation and quantitative information of the HR-MAS NMR data are and presented in chapter 3. Chemometric methods seem to have a high potential in the study of human diseases, as it permits the extraction of new and relevant information from spectroscopic data, allowing a better interpretation of the results. Chapter 4 reports results obtained from HR-MAS NMR analyses performed on different brain tumours: medulloblastoma, meningioms and gliomas. The medulloblastoma study is a case report of primitive neuroectodermal tumor (PNET) localised in the cerebellar region by Magnetic Resonance Imaging (MRI) in a 3-year-old child. In vivo single voxel 1H MRS shows high specificity in detecting the main metabolic alterations in the primitive cerebellar lesion; which consist of very high amounts of the choline-containing compounds and of very low levels of creatine derivatives and N-acetylaspartate. Ex vivo HR-MAS NMR, performed at 9.4 Tesla on the neoplastic specimen collected during surgery, allows the unambiguous identification of several metabolites giving a more in-depth evaluation of the metabolic pattern of the lesion. The ex vivo HR-MAS NMR spectra show higher detail than that obtained in vivo. In addition, the spectroscopic data appear to correlate with some morphological features of the medulloblastoma. The present study shows that ex vivo HR-MAS 1H NMR is able to strongly improve the clinical possibility of in vivo MRS and can be used in conjunction with in vivo spectroscopy for clinical purposes. Three histological subtypes of meningiomas (meningothelial, fibrous and oncocytic) were analysed both by in vivo and ex vivo MRS experiments. The ex vivo HR-MAS investigations are very helpful for the assignment of the in vivo resonances of human meningiomas and for the validation of the quantification procedure of in vivo MR spectra. By using one- and two dimensional experiments, several metabolites in different histological subtypes of meningiomas, were identified. The spectroscopic data confirmed the presence of the typical metabolites of these benign neoplasms and, at the same time, that meningomas with different morphological characteristics have different metabolic profiles, particularly regarding macromolecules and lipids. The profile of total choline metabolites (tCho) and the expression of the Kennedy pathway genes in biopsies of human gliomas were also investigated using HR-MAS NMR, and microfluidic genomic cards. 1H HR-MAS spectra, allowed the resolution and relative quantification by LCModel of the resonances from choline (Cho), phosphorylcholine (PC) and glycerolphorylcholine (GPC), the three main components of the combined tCho peak observed in gliomas by in vivo 1H MRS spectroscopy. All glioma biopsies depicted an increase in tCho as calculated from the addition of Cho, PC and GPC HR-MAS resonances. However, the increase was constantly derived from augmented GPC in low grade NMR gliomas or increased PC content in the high grade gliomas, respectively. This circumstance allowed the unambiguous discrimination of high and low grade gliomas by 1H HR-MAS, which could not be achieved by calculating the tCho/Cr ratio commonly used by in vivo 1H MR spectroscopy. The expression of the genes involved in choline metabolism was investigated in the same biopsies. The present findings offer a convenient procedure to classify accurately glioma grade using 1H HR-MAS, providing in addition the genetic background for the alterations of choline metabolism observed in high and low gliomas grade. Chapter 5 reports the study on human gastrointestinal tract (stomach and colon) neoplasms. The human healthy gastric mucosa, and the characteristics of the biochemical profile of human gastric adenocarcinoma in comparison with that of healthy gastric mucosa were analyzed using ex vivo HR-MAS NMR. Healthy human mucosa is mainly characterized by the presence of small metabolites (more than 50 identified) and macromolecules. The adenocarcinoma spectra were dominated by the presence of signals due to triglycerides, that are usually very low in healthy gastric mucosa. The use of spin-echo experiments enable us to detect some metabolites in the unhealthy tissues and to determine their variation with respect to the healthy ones. Then, the ex vivo HR-MAS NMR analysis was applied to human gastric tissue, to obtain information on the molecular steps involved in the gastric carcinogenesis. A microscopic investigation was also carried out in order to identify and locate the lipids in the cellular and extra-cellular environments. Correlation of the morphological changes detected by transmission (TEM) and scanning (SEM) electron microscopy, with the metabolic profile of gastric mucosa in healthy, gastric atrophy autoimmune diseases (AAG), Helicobacter pylori-related gastritis and adenocarcinoma subjects, were obtained. These ultrastructural studies of AAG and gastric adenocarcinoma revealed lipid intra- and extra-cellularly accumulation associated with a severe prenecrotic hypoxia and mitochondrial degeneration. A deep insight into the metabolic profile of human healthy and neoplastic colon tissues was gained using ex vivo HR-MAS NMR spectroscopy in combination with multivariate methods: Principal Component Analysis (PCA) and Partial Least Squares Discriminant Analysis (PLS-DA). The NMR spectra of healthy tissues highlight different metabolic profiles with respect to those of neoplastic and microscopically normal colon specimens (these last obtained at least 15 cm far from the adenocarcinoma). Furthermore, metabolic variations are detected not only for neoplastic tissues with different histological diagnosis, but also for those classified identical by histological analysis. These findings suggest that the same subclass of colon carcinoma is characterized, at a certain degree, by metabolic heterogeneity. The statistical multivariate approach applied to the NMR data is crucial in order to find metabolic markers of the neoplastic state of colon tissues, and to correctly classify the samples. Significant different levels of choline containing compounds, taurine and myoinositol, were observed. Chapter 6 deals with the metabolic profile of normal and tumoral renal human tissues obtained by ex vivo HR-MAS NMR. The spectra of human normal cortex and medulla show the presence of differently distributed osmolytes as markers of physiological renal condition. The marked decrease or disappearance of these metabolites and the high lipid content (triglycerides and cholesteryl esters) is typical of clear cell renal carcinoma (RCC), while papillary RCC is characterized by the absence of lipids and very high amounts of taurine. This research is a contribution to the biochemical classification of renal neoplastic pathologies, especially for RCCs, which can be evaluated by in vivo MRS for clinical purposes. Moreover, these data help to gain a better knowledge of the molecular processes envolved in the onset of renal carcinogenesis.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This Thesis focuses on the X-ray study of the inner regions of Active Galactic Nuclei, in particular on the formation of high velocity winds by the accretion disk itself. Constraining AGN winds physical parameters is of paramount importance both for understanding the physics of the accretion/ejection flow onto supermassive black holes, and for quantifying the amount of feedback between the SMBH and its environment across the cosmic time. The sources selected for the present study are BAL, mini-BAL, and NAL QSOs, known to host high-velocity winds associated to the AGN nuclear regions. Observationally, a three-fold strategy has been adopted: - substantial samples of distant sources have been analyzed through spectral, photometric, and statistical techniques, to gain insights into their mean properties as a population; - a moderately sized sample of bright sources has been studied through detailed X-ray spectral analysis, to give a first flavor of the general spectral properties of these sources, also from a temporally resolved point of view; - the best nearby candidate has been thoroughly studied using the most sophisticated spectral analysis techniques applied to a large dataset with a high S/N ratio, to understand the details of the physics of its accretion/ejection flow. There are three main channels through which this Thesis has been developed: - [Archival Studies]: the XMM-Newton public archival data has been extensively used to analyze both a large sample of distant BAL QSOs, and several individual bright sources, either BAL, mini-BAL, or NAL QSOs. - [New Observational Campaign]: I proposed and was awarded with new X-ray pointings of the mini-BAL QSOs PG 1126-041 and PG 1351+640 during the XMM-Newton AO-7 and AO-8. These produced the biggest X-ray observational campaign ever made on a mini-BAL QSO (PG 1126-041), including the longest exposure so far. Thanks to the exceptional dataset, a whealth of informations have been obtained on both the intrinsic continuum and on the complex reprocessing media that happen to be in the inner regions of this AGN. Furthermore, the temporally resolved X-ray spectral analysis field has been finally opened for mini-BAL QSOs. - [Theoretical Studies]: some issues about the connection between theories and observations of AGN accretion disk winds have been investigated, through theoretical arguments and synthetic absorption line profiles studies.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Questa tesi descrive alcuni studi di messa a punto di metodi di analisi fisici accoppiati con tecniche statistiche multivariate per valutare la qualità e l’autenticità di oli vegetali e prodotti caseari. L’applicazione di strumenti fisici permette di abbattere i costi ed i tempi necessari per le analisi classiche ed allo stesso tempo può fornire un insieme diverso di informazioni che possono riguardare tanto la qualità come l’autenticità di prodotti. Per il buon funzionamento di tali metodi è necessaria la costruzione di modelli statistici robusti che utilizzino set di dati correttamente raccolti e rappresentativi del campo di applicazione. In questo lavoro di tesi sono stati analizzati oli vegetali e alcune tipologie di formaggi (in particolare pecorini per due lavori di ricerca e Parmigiano-Reggiano per un altro). Sono stati utilizzati diversi strumenti di analisi (metodi fisici), in particolare la spettroscopia, l’analisi termica differenziale, il naso elettronico, oltre a metodiche separative tradizionali. I dati ottenuti dalle analisi sono stati trattati mediante diverse tecniche statistiche, soprattutto: minimi quadrati parziali; regressione lineare multipla ed analisi discriminante lineare.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Information is nowadays a key resource: machine learning and data mining techniques have been developed to extract high-level information from great amounts of data. As most data comes in form of unstructured text in natural languages, research on text mining is currently very active and dealing with practical problems. Among these, text categorization deals with the automatic organization of large quantities of documents in priorly defined taxonomies of topic categories, possibly arranged in large hierarchies. In commonly proposed machine learning approaches, classifiers are automatically trained from pre-labeled documents: they can perform very accurate classification, but often require a consistent training set and notable computational effort. Methods for cross-domain text categorization have been proposed, allowing to leverage a set of labeled documents of one domain to classify those of another one. Most methods use advanced statistical techniques, usually involving tuning of parameters. A first contribution presented here is a method based on nearest centroid classification, where profiles of categories are generated from the known domain and then iteratively adapted to the unknown one. Despite being conceptually simple and having easily tuned parameters, this method achieves state-of-the-art accuracy in most benchmark datasets with fast running times. A second, deeper contribution involves the design of a domain-independent model to distinguish the degree and type of relatedness between arbitrary documents and topics, inferred from the different types of semantic relationships between respective representative words, identified by specific search algorithms. The application of this model is tested on both flat and hierarchical text categorization, where it potentially allows the efficient addition of new categories during classification. Results show that classification accuracy still requires improvements, but models generated from one domain are shown to be effectively able to be reused in a different one.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Today’s data are increasingly complex and classical statistical techniques need growingly more refined mathematical tools to be able to model and investigate them. Paradigmatic situations are represented by data which need to be considered up to some kind of trans- formation and all those circumstances in which the analyst finds himself in the need of defining a general concept of shape. Topological Data Analysis (TDA) is a field which is fundamentally contributing to such challenges by extracting topological information from data with a plethora of interpretable and computationally accessible pipelines. We con- tribute to this field by developing a series of novel tools, techniques and applications to work with a particular topological summary called merge tree. To analyze sets of merge trees we introduce a novel metric structure along with an algorithm to compute it, define a framework to compare different functions defined on merge trees and investigate the metric space obtained with the aforementioned metric. Different geometric and topolog- ical properties of the space of merge trees are established, with the aim of obtaining a deeper understanding of such trees. To showcase the effectiveness of the proposed metric, we develop an application in the field of Functional Data Analysis, working with functions up to homeomorphic reparametrization, and in the field of radiomics, where each patient is represented via a clustering dendrogram.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The present work is devoted to the assessment of the energy fluxes physics in the space of scales and physical space of wall-turbulent flows. The generalized Kolmogorov equation will be applied to DNS data of a turbulent channel flow in order to describe the energy fluxes paths from production to dissipation in the augmented space of wall-turbulent flows. This multidimensional description will be shown to be crucial to understand the formation and sustainment of the turbulent fluctuations fed by the energy fluxes coming from the near-wall production region. An unexpected behavior of the energy fluxes comes out from this analysis consisting of spiral-like paths in the combined physical/scale space where the controversial reverse energy cascade plays a central role. The observed behavior conflicts with the classical notion of the Richardson/Kolmogorov energy cascade and may have strong repercussions on both theoretical and modeling approaches to wall-turbulence. To this aim a new relation stating the leading physical processes governing the energy transfer in wall-turbulence is suggested and shown able to capture most of the rich dynamics of the shear dominated region of the flow. Two dynamical processes are identified as driving mechanisms for the fluxes, one in the near wall region and a second one further away from the wall. The former, stronger one is related to the dynamics involved in the near-wall turbulence regeneration cycle. The second suggests an outer self-sustaining mechanism which is asymptotically expected to take place in the log-layer and could explain the debated mixed inner/outer scaling of the near-wall statistics. The same approach is applied for the first time to a filtered velocity field. A generalized Kolmogorov equation specialized for filtered velocity field is derived and discussed. The results will show what effects the subgrid scales have on the resolved motion in both physical and scale space, singling out the prominent role of the filter length compared to the cross-over scale between production dominated scales and inertial range, lc, and the reverse energy cascade region lb. The systematic characterization of the resolved and subgrid physics as function of the filter scale and of the wall-distance will be shown instrumental for a correct use of LES models in the simulation of wall turbulent flows. Taking inspiration from the new relation for the energy transfer in wall turbulence, a new class of LES models will be also proposed. Finally, the generalized Kolmogorov equation specialized for filtered velocity fields will be shown to be an helpful statistical tool for the assessment of LES models and for the development of new ones. As example, some classical purely dissipative eddy viscosity models are analyzed via an a priori procedure.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Many new Escherichia coli outer membrane proteins have recently been identified by proteomics techniques. However, poorly expressed proteins and proteins expressed only under certain conditions may escape detection when wild-type cells are grown under standard conditions. Here, we have taken a complementary approach where candidate outer membrane proteins have been identified by bioinformatics prediction, cloned and overexpressed, and finally localized by cell fractionation experiments. Out of eight predicted outer membrane proteins, we have confirmed the outer membrane localization for five—YftM, YaiO, YfaZ, CsgF, and YliI—and also provide preliminary data indicating that a sixth—YfaL—may be an outer membrane autotransporter.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Statistical modelling and statistical learning theory are two powerful analytical frameworks for analyzing signals and developing efficient processing and classification algorithms. In this thesis, these frameworks are applied for modelling and processing biomedical signals in two different contexts: ultrasound medical imaging systems and primate neural activity analysis and modelling. In the context of ultrasound medical imaging, two main applications are explored: deconvolution of signals measured from a ultrasonic transducer and automatic image segmentation and classification of prostate ultrasound scans. In the former application a stochastic model of the radio frequency signal measured from a ultrasonic transducer is derived. This model is then employed for developing in a statistical framework a regularized deconvolution procedure, for enhancing signal resolution. In the latter application, different statistical models are used to characterize images of prostate tissues, extracting different features. These features are then uses to segment the images in region of interests by means of an automatic procedure based on a statistical model of the extracted features. Finally, machine learning techniques are used for automatic classification of the different region of interests. In the context of neural activity signals, an example of bio-inspired dynamical network was developed to help in studies of motor-related processes in the brain of primate monkeys. The presented model aims to mimic the abstract functionality of a cell population in 7a parietal region of primate monkeys, during the execution of learned behavioural tasks.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A critical point in the analysis of ground displacements time series is the development of data driven methods that allow the different sources that generate the observed displacements to be discerned and characterised. A widely used multivariate statistical technique is the Principal Component Analysis (PCA), which allows reducing the dimensionality of the data space maintaining most of the variance of the dataset explained. Anyway, PCA does not perform well in finding the solution to the so-called Blind Source Separation (BSS) problem, i.e. in recovering and separating the original sources that generated the observed data. This is mainly due to the assumptions on which PCA relies: it looks for a new Euclidean space where the projected data are uncorrelated. The Independent Component Analysis (ICA) is a popular technique adopted to approach this problem. However, the independence condition is not easy to impose, and it is often necessary to introduce some approximations. To work around this problem, I use a variational bayesian ICA (vbICA) method, which models the probability density function (pdf) of each source signal using a mix of Gaussian distributions. This technique allows for more flexibility in the description of the pdf of the sources, giving a more reliable estimate of them. Here I present the application of the vbICA technique to GPS position time series. First, I use vbICA on synthetic data that simulate a seismic cycle (interseismic + coseismic + postseismic + seasonal + noise) and a volcanic source, and I study the ability of the algorithm to recover the original (known) sources of deformation. Secondly, I apply vbICA to different tectonically active scenarios, such as the 2009 L'Aquila (central Italy) earthquake, the 2012 Emilia (northern Italy) seismic sequence, and the 2006 Guerrero (Mexico) Slow Slip Event (SSE).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The main purpose of this thesis is to go beyond two usual assumptions that accompany theoretical analysis in spin-glasses and inference: the i.i.d. (independently and identically distributed) hypothesis on the noise elements and the finite rank regime. The first one appears since the early birth of spin-glasses. The second one instead concerns the inference viewpoint. Disordered systems and Bayesian inference have a well-established relation, evidenced by their continuous cross-fertilization. The thesis makes use of techniques coming both from the rigorous mathematical machinery of spin-glasses, such as the interpolation scheme, and from Statistical Physics, such as the replica method. The first chapter contains an introduction to the Sherrington-Kirkpatrick and spiked Wigner models. The first is a mean field spin-glass where the couplings are i.i.d. Gaussian random variables. The second instead amounts to establish the information theoretical limits in the reconstruction of a fixed low rank matrix, the “spike”, blurred by additive Gaussian noise. In chapters 2 and 3 the i.i.d. hypothesis on the noise is broken by assuming a noise with inhomogeneous variance profile. In spin-glasses this leads to multi-species models. The inferential counterpart is called spatial coupling. All the previous models are usually studied in the Bayes-optimal setting, where everything is known about the generating process of the data. In chapter 4 instead we study the spiked Wigner model where the prior on the signal to reconstruct is ignored. In chapter 5 we analyze the statistical limits of a spiked Wigner model where the noise is no longer Gaussian, but drawn from a random matrix ensemble, which makes its elements dependent. The thesis ends with chapter 6, where the challenging problem of high-rank probabilistic matrix factorization is tackled. Here we introduce a new procedure called "decimation" and we show that it is theoretically to perform matrix factorization through it.