8 resultados para kernel estimate
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
Machine learning comprises a series of techniques for automatic extraction of meaningful information from large collections of noisy data. In many real world applications, data is naturally represented in structured form. Since traditional methods in machine learning deal with vectorial information, they require an a priori form of preprocessing. Among all the learning techniques for dealing with structured data, kernel methods are recognized to have a strong theoretical background and to be effective approaches. They do not require an explicit vectorial representation of the data in terms of features, but rely on a measure of similarity between any pair of objects of a domain, the kernel function. Designing fast and good kernel functions is a challenging problem. In the case of tree structured data two issues become relevant: kernel for trees should not be sparse and should be fast to compute. The sparsity problem arises when, given a dataset and a kernel function, most structures of the dataset are completely dissimilar to one another. In those cases the classifier has too few information for making correct predictions on unseen data. In fact, it tends to produce a discriminating function behaving as the nearest neighbour rule. Sparsity is likely to arise for some standard tree kernel functions, such as the subtree and subset tree kernel, when they are applied to datasets with node labels belonging to a large domain. A second drawback of using tree kernels is the time complexity required both in learning and classification phases. Such a complexity can sometimes prevents the kernel application in scenarios involving large amount of data. This thesis proposes three contributions for resolving the above issues of kernel for trees. A first contribution aims at creating kernel functions which adapt to the statistical properties of the dataset, thus reducing its sparsity with respect to traditional tree kernel functions. Specifically, we propose to encode the input trees by an algorithm able to project the data onto a lower dimensional space with the property that similar structures are mapped similarly. By building kernel functions on the lower dimensional representation, we are able to perform inexact matchings between different inputs in the original space. A second contribution is the proposal of a novel kernel function based on the convolution kernel framework. Convolution kernel measures the similarity of two objects in terms of the similarities of their subparts. Most convolution kernels are based on counting the number of shared substructures, partially discarding information about their position in the original structure. The kernel function we propose is, instead, especially focused on this aspect. A third contribution is devoted at reducing the computational burden related to the calculation of a kernel function between a tree and a forest of trees, which is a typical operation in the classification phase and, for some algorithms, also in the learning phase. We propose a general methodology applicable to convolution kernels. Moreover, we show an instantiation of our technique when kernels such as the subtree and subset tree kernels are employed. In those cases, Direct Acyclic Graphs can be used to compactly represent shared substructures in different trees, thus reducing the computational burden and storage requirements.
Resumo:
This thesis is based on the integration of traditional and innovative approaches aimed at improving the normal faults seimogenic identification and characterization, focusing mainly on slip-rate estimate as a measure of the fault activity. The L’Aquila Mw 6.3 April 6, 2009 earthquake causative fault, namely the Paganica - San Demetrio fault system (PSDFS), was used as a test site. We developed a multidisciplinary and scale‐based strategy consisting of paleoseismological investigations, detailed geomorphological and geological field studies, as well as shallow geophysical imaging and an innovative application of physical properties measurements. We produced a detailed geomorphological and geological map of the PSDFS, defining its tectonic style, arrangement, kinematics, extent, geometry and internal complexities. The PSDFS is a 19 km-long tectonic structure, characterized by a complex structural setting and arranged in two main sectors: the Paganica sector to the NW, characterized by a narrow deformation zone, and the San Demetrio sector to SE, where the strain is accommodated by several tectonic structures, exhuming and dissecting a wide Quaternary basin, suggesting the occurrence of strain migration through time. The integration of all the fault displacement data and age constraints (radiocarbon dating, optically stimulated luminescence (OSL) and tephrochronology) helped in calculating an average Quaternary slip-rate representative for the PSDFS of 0.27 - 0.48 mm/yr. On the basis of its length (ca. 20 km) and slip per event (up to 0.8 m) we also estimated a max expected Magnitude of 6.3-6.8 for this fault. All these topics have a significant implication in terms of surface faulting hazard in the area and may contribute also to the understanding of the PSDFS seismic behavior and of the local seismic hazard.
Resumo:
China is a large country characterized by remarkable growth and distinct regional diversity. Spatial disparity has always been a hot issue since China has been struggling to follow a balanced growth path but still confronting with unprecedented pressures and challenges. To better understand the inequality level benchmarking spatial distributions of Chinese provinces and municipalities and estimate dynamic trajectory of sustainable development in China, I constructed the Composite Index of Regional Development (CIRD) with five sub pillars/dimensions involving Macroeconomic Index (MEI), Science and Innovation Index (SCI), Environmental Sustainability Index (ESI), Human Capital Index (HCI) and Public Facilities Index (PFI), endeavoring to cover various fields of regional socioeconomic development. Ranking reports on the five sub dimensions and aggregated CIRD were provided in order to better measure the developmental degrees of 31 or 30 Chinese provinces and municipalities over 13 years from 1998 to 2010 as the time interval of three “Five-year Plans”. Further empirical applications of this CIRD focused on clustering and convergence estimation, attempting to fill up the gap in quantifying the developmental levels of regional comprehensive socioeconomics and estimating the dynamic convergence trajectory of regional sustainable development in a long run. Four clusters were benchmarked geographically-oriented in the map on the basis of cluster analysis, and club-convergence was observed in the Chinese provinces and municipalities based on stochastic kernel density estimation.
Resumo:
We have used kinematic models in two Italian regions to reproduce surface interseismic velocities obtained from InSAR and GPS measurements. We have considered a Block modeling, BM, approach to evaluate which fault system is actively accommodating the occurring deformation in both considered areas. We have performed a study for the Umbria-Marche Apennines, obtaining that the tectonic extension observed by GPS measurements is explained by the active contribution of at least two fault systems, one of which is the Alto Tiberina fault, ATF. We have estimated also the interseismic coupling distribution for the ATF using a 3D surface and the result shows an interesting correlation between the microseismicity and the uncoupled fault portions. The second area analyzed concerns the Gargano promontory for which we have used jointly the available InSAR and GPS velocities. Firstly we have attached the two datasets to the same terrestrial reference frame and then using a simple dislocation approach, we have estimated the best fault parameters reproducing the available data, providing a solution corresponding to the Mattinata fault. Subsequently we have considered within a BM analysis both GPS and InSAR datasets in order to evaluate if the Mattinata fault may accommodate the deformation occurring in the central Adriatic due to the relative motion between the North-Adriatic and South-Adriatic plates. We obtain that the deformation occurring in that region should be accommodated by more that one fault system, that is however difficult to detect since the poor coverage of geodetic measurement offshore of the Gargano promontory. Finally we have performed also the estimate of the interseismic coupling distribution for the Mattinata fault, obtaining a shallow coupling pattern. Both of coupling distributions found using the BM approach have been tested by means of resolution checkerboard tests and they demonstrate that the coupling patterns depend on the geodetic data positions.
Resumo:
The aim of this work is to present various aspects of numerical simulation of particle and radiation transport for industrial and environmental protection applications, to enable the analysis of complex physical processes in a fast, reliable, and efficient way. In the first part we deal with speed-up of numerical simulation of neutron transport for nuclear reactor core analysis. The convergence properties of the source iteration scheme of the Method of Characteristics applied to be heterogeneous structured geometries has been enhanced by means of Boundary Projection Acceleration, enabling the study of 2D and 3D geometries with transport theory without spatial homogenization. The computational performances have been verified with the C5G7 2D and 3D benchmarks, showing a sensible reduction of iterations and CPU time. The second part is devoted to the study of temperature-dependent elastic scattering of neutrons for heavy isotopes near to the thermal zone. A numerical computation of the Doppler convolution of the elastic scattering kernel based on the gas model is presented, for a general energy dependent cross section and scattering law in the center of mass system. The range of integration has been optimized employing a numerical cutoff, allowing a faster numerical evaluation of the convolution integral. Legendre moments of the transfer kernel are subsequently obtained by direct quadrature and a numerical analysis of the convergence is presented. In the third part we focus our attention to remote sensing applications of radiative transfer employed to investigate the Earth's cryosphere. The photon transport equation is applied to simulate reflectivity of glaciers varying the age of the layer of snow or ice, its thickness, the presence or not other underlying layers, the degree of dust included in the snow, creating a framework able to decipher spectral signals collected by orbiting detectors.
Resumo:
The public awareness that chemical substances are present ubiquitously in the environment, can be assumed through the diet and can exhibit various health effects, is very high in Europe and Italy. National and international institutions are called to provide figures on the magnitude, frequency, and duration of the population exposure to chemicals, including both natural or anthropogenic substances, voluntarily added to consumers’ good or accidentally entering the production chains. This thesis focuses broadly on how human population exposure to chemicals can be estimated, with particular attention to the methodological approaches and specific focus on dietary exposure assessment and biomonitoring. From the results obtained in the different studies collected in this thesis, it has been pointed out that when selecting the approach to use for the estimate of the exposure to chemicals, several different aspects must be taken into account: the nature of the chemical substance, the population of interest, clarify if the objective is to assess chronic or acute exposure, and finally, take into account the quality and quantity of data available in order to specify and quantify the uncertainty of the estimate.
Resumo:
The first part of this work deals with the inverse problem solution in the X-ray spectroscopy field. An original strategy to solve the inverse problem by using the maximum entropy principle is illustrated. It is built the code UMESTRAT, to apply the described strategy in a semiautomatic way. The application of UMESTRAT is shown with a computational example. The second part of this work deals with the improvement of the X-ray Boltzmann model, by studying two radiative interactions neglected in the current photon models. Firstly it is studied the characteristic line emission due to Compton ionization. It is developed a strategy that allows the evaluation of this contribution for the shells K, L and M of all elements with Z from 11 to 92. It is evaluated the single shell Compton/photoelectric ratio as a function of the primary photon energy. It is derived the energy values at which the Compton interaction becomes the prevailing process to produce ionization for the considered shells. Finally it is introduced a new kernel for the XRF from Compton ionization. In a second place it is characterized the bremsstrahlung radiative contribution due the secondary electrons. The bremsstrahlung radiation is characterized in terms of space, angle and energy, for all elements whit Z=1-92 in the energy range 1–150 keV by using the Monte Carlo code PENELOPE. It is demonstrated that bremsstrahlung radiative contribution can be well approximated with an isotropic point photon source. It is created a data library comprising the energetic distributions of bremsstrahlung. It is developed a new bremsstrahlung kernel which allows the introduction of this contribution in the modified Boltzmann equation. An example of application to the simulation of a synchrotron experiment is shown.