8 resultados para data complexity
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
Introduction. Postnatal neurogenesis in the hippocampal dentate gyrus, can be modulated by numerous determinants, such as hormones, transmitters and stress. Among the factors positively interfering with neurogenesis, the complexity of the environment appears to play a particularly striking role. Adult mice reared in an enriched environment produce more neurons and exhibit better performance in hippocampus-specific learning tasks. While the effects of complex environments on hippocampal neurogenesis are well documented, there is a lack of information on the effects of living under socio-sensory deprivation conditions. Due to the immaturity of rats and mice at birth, studies dealing with the effects of environmental enrichment on hippocampal neurogenesis were carried out in adult animals, i.e. during a period of relatively low rate of neurogenesis. The impact of environment is likely to be more dramatic during the first postnatal weeks, because at this time granule cell production is remarkably higher than at later phases of development. The aim of the present research was to clarify whether and to what extent isolated or enriched rearing conditions affect hippocampal neurogenesis during the early postnatal period, a time window characterized by a high rate of precursor proliferation and to elucidate the mechanisms underlying these effects. The experimental model chosen for this research was the guinea pig, a precocious rodent, which, at 4-5 days of age can be independent from maternal care. Experimental design. Animals were assigned to a standard (control), an isolated, or an enriched environment a few days after birth (P5-P6). On P14-P17 animals received one daily bromodeoxyuridine (BrdU) injection, to label dividing cells, and were sacrificed either on P18, to evaluate cell proliferation or on P45, to evaluate cell survival and differentiation. Methods. Brain sections were processed for BrdU immunhistochemistry, to quantify the new born and surviving cells. The phenotype of the surviving cells was examined by means of confocal microscopy and immunofluorescent double-labeling for BrdU and either a marker of neurons (NeuN) or a marker of astrocytes (GFAP). Apoptotic cell death was examined with the TUNEL method. Serial sections were processed for immunohistochemistry for i) vimentin, a marker of radial glial cells, ii) BDNF (brain-derived neurotrofic factor), a neurotrophin involved in neuron proliferation/survival, iii) PSA-NCAM (the polysialylated form of the neural cell adhesion molecule), a molecule associated with neuronal migration. Total granule cell number in the dentate gyrus was evaluated by stereological methods, in Nissl-stained sections. Results. Effects of isolation. In P18 isolated animals we found a reduced cell proliferation (-35%) compared to controls and a lower expression of BDNF. Though in absolute terms P45 isolated animals had less surviving cells than controls, they showed no differences in survival rate and phenotype percent distribution compared to controls. Evaluation of the absolute number of surviving cells of each phenotype showed that isolated animals had a reduced number of cells with neuronal phenotype than controls. Looking at the location of the new neurons, we found that while in control animals 76% of them had migrated to the granule cell layer, in isolated animals only 55% of the new neurons had reached this layer. Examination of radial glia cells of P18 and P45 animals by vimentin immunohistochemistry showed that in isolated animals radial glia cells were reduced in density and had less and shorter processes. Granule cell count revealed that isolated animals had less granule cells than controls (-32% at P18 and -42% at P45). Effects of enrichment. In P18 enriched animals there was an increase in cell proliferation (+26%) compared to controls and a higher expression of BDNF. Though in both groups there was a decline in the number of BrdU-positive cells by P45, enriched animals had more surviving cells (+63) and a higher survival rate than controls. No differences were found between control and enriched animals in phenotype percent distribution. Evaluation of the absolute number of cells of each phenotype showed that enriched animals had a larger number of cells of each phenotype than controls. Looking at the location of cells of each phenotype we found that enriched animals had more new neurons in the granule cell layer and more astrocytes and cells with undetermined phenotype in the hilus. Enriched animals had a higher expression of PSA-NCAM in the granule cell layer and hilus Vimentin immunohistochemistry showed that in enriched animals radial glia cells were more numerous and had more processes.. Granule cell count revealed that enriched animals had more granule cells than controls (+37% at P18 and +31% at P45). Discussion. Results show that isolation rearing reduces hippocampal cell proliferation but does not affect cell survival, while enriched rearing increases both cell proliferation and cell survival. Changes in the expression of BDNF are likely to contribute to he effects of environment on precursor cell proliferation. The reduction and increase in final number of granule neurons in isolated and enriched animals, respectively, are attributable to the effects of environment on cell proliferation and survival and not to changes in the differentiation program. As radial glia cells play a pivotal role in neuron guidance to the granule cell layer, the reduced number of radial glia cells in isolated animals and the increased number in enriched animals suggests that the size of radial glia population may change dynamically, in order to match changes in neuron production. The high PSA-NCAM expression in enriched animals may concur to favor the survival of the new neurons by facilitating their migration to the granule cell layer. Conclusions. By using a precocious rodent we could demonstrate that isolated/enriched rearing conditions, at a time window during which intense granule cell proliferation takes place, lead to a notable decrease/increase of total granule cell number. The time-course and magnitude of postnatal granule cell production in guinea pigs are more similar to the human and non-human primate condition than in rats and mice. Translation of current data to humans would imply that exposure of children to environments poor/rich of stimuli may have a notably large impact on dentate neurogenesis and, very likely, on hippocampus dependent memory functions.
Resumo:
Machine learning comprises a series of techniques for automatic extraction of meaningful information from large collections of noisy data. In many real world applications, data is naturally represented in structured form. Since traditional methods in machine learning deal with vectorial information, they require an a priori form of preprocessing. Among all the learning techniques for dealing with structured data, kernel methods are recognized to have a strong theoretical background and to be effective approaches. They do not require an explicit vectorial representation of the data in terms of features, but rely on a measure of similarity between any pair of objects of a domain, the kernel function. Designing fast and good kernel functions is a challenging problem. In the case of tree structured data two issues become relevant: kernel for trees should not be sparse and should be fast to compute. The sparsity problem arises when, given a dataset and a kernel function, most structures of the dataset are completely dissimilar to one another. In those cases the classifier has too few information for making correct predictions on unseen data. In fact, it tends to produce a discriminating function behaving as the nearest neighbour rule. Sparsity is likely to arise for some standard tree kernel functions, such as the subtree and subset tree kernel, when they are applied to datasets with node labels belonging to a large domain. A second drawback of using tree kernels is the time complexity required both in learning and classification phases. Such a complexity can sometimes prevents the kernel application in scenarios involving large amount of data. This thesis proposes three contributions for resolving the above issues of kernel for trees. A first contribution aims at creating kernel functions which adapt to the statistical properties of the dataset, thus reducing its sparsity with respect to traditional tree kernel functions. Specifically, we propose to encode the input trees by an algorithm able to project the data onto a lower dimensional space with the property that similar structures are mapped similarly. By building kernel functions on the lower dimensional representation, we are able to perform inexact matchings between different inputs in the original space. A second contribution is the proposal of a novel kernel function based on the convolution kernel framework. Convolution kernel measures the similarity of two objects in terms of the similarities of their subparts. Most convolution kernels are based on counting the number of shared substructures, partially discarding information about their position in the original structure. The kernel function we propose is, instead, especially focused on this aspect. A third contribution is devoted at reducing the computational burden related to the calculation of a kernel function between a tree and a forest of trees, which is a typical operation in the classification phase and, for some algorithms, also in the learning phase. We propose a general methodology applicable to convolution kernels. Moreover, we show an instantiation of our technique when kernels such as the subtree and subset tree kernels are employed. In those cases, Direct Acyclic Graphs can be used to compactly represent shared substructures in different trees, thus reducing the computational burden and storage requirements.
Resumo:
During my PhD, starting from the original formulations proposed by Bertrand et al., 2000 and Emolo & Zollo 2005, I developed inversion methods and applied then at different earthquakes. In particular large efforts have been devoted to the study of the model resolution and to the estimation of the model parameter errors. To study the source kinematic characteristics of the Christchurch earthquake we performed a joint inversion of strong-motion, GPS and InSAR data using a non-linear inversion method. Considering the complexity highlighted by superficial deformation data, we adopted a fault model consisting of two partially overlapping segments, with dimensions 15x11 and 7x7 km2, having different faulting styles. This two-fault model allows to better reconstruct the complex shape of the superficial deformation data. The total seismic moment resulting from the joint inversion is 3.0x1025 dyne.cm (Mw = 6.2) with an average rupture velocity of 2.0 km/s. Errors associated with the kinematic model have been estimated of around 20-30 %. The 2009 Aquila sequence was characterized by an intense aftershocks sequence that lasted several months. In this study we applied an inversion method that assumes as data the apparent Source Time Functions (aSTFs), to a Mw 4.0 aftershock of the Aquila sequence. The estimation of aSTFs was obtained using the deconvolution method proposed by Vallée et al., 2004. The inversion results show a heterogeneous slip distribution, characterized by two main slip patches located NW of the hypocenter, and a variable rupture velocity distribution (mean value of 2.5 km/s), showing a rupture front acceleration in between the two high slip zones. Errors of about 20% characterize the final estimated parameters.
Resumo:
The aim of the thesis is to propose a Bayesian estimation through Markov chain Monte Carlo of multidimensional item response theory models for graded responses with complex structures and correlated traits. In particular, this work focuses on the multiunidimensional and the additive underlying latent structures, considering that the first one is widely used and represents a classical approach in multidimensional item response analysis, while the second one is able to reflect the complexity of real interactions between items and respondents. A simulation study is conducted to evaluate the parameter recovery for the proposed models under different conditions (sample size, test and subtest length, number of response categories, and correlation structure). The results show that the parameter recovery is particularly sensitive to the sample size, due to the model complexity and the high number of parameters to be estimated. For a sufficiently large sample size the parameters of the multiunidimensional and additive graded response models are well reproduced. The results are also affected by the trade-off between the number of items constituting the test and the number of item categories. An application of the proposed models on response data collected to investigate Romagna and San Marino residents' perceptions and attitudes towards the tourism industry is also presented.
Resumo:
In many application domains data can be naturally represented as graphs. When the application of analytical solutions for a given problem is unfeasible, machine learning techniques could be a viable way to solve the problem. Classical machine learning techniques are defined for data represented in a vectorial form. Recently some of them have been extended to deal directly with structured data. Among those techniques, kernel methods have shown promising results both from the computational complexity and the predictive performance point of view. Kernel methods allow to avoid an explicit mapping in a vectorial form relying on kernel functions, which informally are functions calculating a similarity measure between two entities. However, the definition of good kernels for graphs is a challenging problem because of the difficulty to find a good tradeoff between computational complexity and expressiveness. Another problem we face is learning on data streams, where a potentially unbounded sequence of data is generated by some sources. There are three main contributions in this thesis. The first contribution is the definition of a new family of kernels for graphs based on Directed Acyclic Graphs (DAGs). We analyzed two kernels from this family, achieving state-of-the-art results from both the computational and the classification point of view on real-world datasets. The second contribution consists in making the application of learning algorithms for streams of graphs feasible. Moreover,we defined a principled way for the memory management. The third contribution is the application of machine learning techniques for structured data to non-coding RNA function prediction. In this setting, the secondary structure is thought to carry relevant information. However, existing methods considering the secondary structure have prohibitively high computational complexity. We propose to apply kernel methods on this domain, obtaining state-of-the-art results.
Resumo:
The advent of omic data production has opened many new perspectives in the quest for modelling complexity in biophysical systems. With the capability of characterizing a complex organism through the patterns of its molecular states, observed at different levels through various omics, a new paradigm of investigation is arising. In this thesis, we investigate the links between perturbations of the human organism, described as the ensemble of crosstalk of its molecular states, and health. Machine learning plays a key role within this picture, both in omic data analysis and model building. We propose and discuss different frameworks developed by the author using machine learning for data reduction, integration, projection on latent features, pattern analysis, classification and clustering of omic data, with a focus on 1H NMR metabolomic spectral data. The aim is to link different levels of omic observations of molecular states, from nanoscale to macroscale, to study perturbations such as diseases and diet interpreted as changes in molecular patterns. The first part of this work focuses on the fingerprinting of diseases, linking cellular and systemic metabolomics with genomic to asses and predict the downstream of perturbations all the way down to the enzymatic network. The second part is a set of frameworks and models, developed with 1H NMR metabolomic at its core, to study the exposure of the human organism to diet and food intake in its full complexity, from epidemiological data analysis to molecular characterization of food structure.
Resumo:
Machine learning is widely adopted to decode multi-variate neural time series, including electroencephalographic (EEG) and single-cell recordings. Recent solutions based on deep learning (DL) outperformed traditional decoders by automatically extracting relevant discriminative features from raw or minimally pre-processed signals. Convolutional Neural Networks (CNNs) have been successfully applied to EEG and are the most common DL-based EEG decoders in the state-of-the-art (SOA). However, the current research is affected by some limitations. SOA CNNs for EEG decoding usually exploit deep and heavy structures with the risk of overfitting small datasets, and architectures are often defined empirically. Furthermore, CNNs are mainly validated by designing within-subject decoders. Crucially, the automatically learned features mainly remain unexplored; conversely, interpreting these features may be of great value to use decoders also as analysis tools, highlighting neural signatures underlying the different decoded brain or behavioral states in a data-driven way. Lastly, SOA DL-based algorithms used to decode single-cell recordings rely on more complex, slower to train and less interpretable networks than CNNs, and the use of CNNs with these signals has not been investigated. This PhD research addresses the previous limitations, with reference to P300 and motor decoding from EEG, and motor decoding from single-neuron activity. CNNs were designed light, compact, and interpretable. Moreover, multiple training strategies were adopted, including transfer learning, which could reduce training times promoting the application of CNNs in practice. Furthermore, CNN-based EEG analyses were proposed to study neural features in the spatial, temporal and frequency domains, and proved to better highlight and enhance relevant neural features related to P300 and motor states than canonical EEG analyses. Remarkably, these analyses could be used, in perspective, to design novel EEG biomarkers for neurological or neurodevelopmental disorders. Lastly, CNNs were developed to decode single-neuron activity, providing a better compromise between performance and model complexity.
Resumo:
The discovery of new materials and their functions has always been a fundamental component of technological progress. Nowadays, the quest for new materials is stronger than ever: sustainability, medicine, robotics and electronics are all key assets which depend on the ability to create specifically tailored materials. However, designing materials with desired properties is a difficult task, and the complexity of the discipline makes it difficult to identify general criteria. While scientists developed a set of best practices (often based on experience and expertise), this is still a trial-and-error process. This becomes even more complex when dealing with advanced functional materials. Their properties depend on structural and morphological features, which in turn depend on fabrication procedures and environment, and subtle alterations leads to dramatically different results. Because of this, materials modeling and design is one of the most prolific research fields. Many techniques and instruments are continuously developed to enable new possibilities, both in the experimental and computational realms. Scientists strive to enforce cutting-edge technologies in order to make progress. However, the field is strongly affected by unorganized file management, proliferation of custom data formats and storage procedures, both in experimental and computational research. Results are difficult to find, interpret and re-use, and a huge amount of time is spent interpreting and re-organizing data. This also strongly limit the application of data-driven and machine learning techniques. This work introduces possible solutions to the problems described above. Specifically, it talks about developing features for specific classes of advanced materials and use them to train machine learning models and accelerate computational predictions for molecular compounds; developing method for organizing non homogeneous materials data; automate the process of using devices simulations to train machine learning models; dealing with scattered experimental data and use them to discover new patterns.