4 resultados para Data-representation

em AMS Tesi di Dottorato - Alm@DL - Università di Bologna


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Machine learning comprises a series of techniques for automatic extraction of meaningful information from large collections of noisy data. In many real world applications, data is naturally represented in structured form. Since traditional methods in machine learning deal with vectorial information, they require an a priori form of preprocessing. Among all the learning techniques for dealing with structured data, kernel methods are recognized to have a strong theoretical background and to be effective approaches. They do not require an explicit vectorial representation of the data in terms of features, but rely on a measure of similarity between any pair of objects of a domain, the kernel function. Designing fast and good kernel functions is a challenging problem. In the case of tree structured data two issues become relevant: kernel for trees should not be sparse and should be fast to compute. The sparsity problem arises when, given a dataset and a kernel function, most structures of the dataset are completely dissimilar to one another. In those cases the classifier has too few information for making correct predictions on unseen data. In fact, it tends to produce a discriminating function behaving as the nearest neighbour rule. Sparsity is likely to arise for some standard tree kernel functions, such as the subtree and subset tree kernel, when they are applied to datasets with node labels belonging to a large domain. A second drawback of using tree kernels is the time complexity required both in learning and classification phases. Such a complexity can sometimes prevents the kernel application in scenarios involving large amount of data. This thesis proposes three contributions for resolving the above issues of kernel for trees. A first contribution aims at creating kernel functions which adapt to the statistical properties of the dataset, thus reducing its sparsity with respect to traditional tree kernel functions. Specifically, we propose to encode the input trees by an algorithm able to project the data onto a lower dimensional space with the property that similar structures are mapped similarly. By building kernel functions on the lower dimensional representation, we are able to perform inexact matchings between different inputs in the original space. A second contribution is the proposal of a novel kernel function based on the convolution kernel framework. Convolution kernel measures the similarity of two objects in terms of the similarities of their subparts. Most convolution kernels are based on counting the number of shared substructures, partially discarding information about their position in the original structure. The kernel function we propose is, instead, especially focused on this aspect. A third contribution is devoted at reducing the computational burden related to the calculation of a kernel function between a tree and a forest of trees, which is a typical operation in the classification phase and, for some algorithms, also in the learning phase. We propose a general methodology applicable to convolution kernels. Moreover, we show an instantiation of our technique when kernels such as the subtree and subset tree kernels are employed. In those cases, Direct Acyclic Graphs can be used to compactly represent shared substructures in different trees, thus reducing the computational burden and storage requirements.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present a non linear technique to invert strong motion records with the aim of obtaining the final slip and rupture velocity distributions on the fault plane. In this thesis, the ground motion simulation is obtained evaluating the representation integral in the frequency. The Green’s tractions are computed using the discrete wave-number integration technique that provides the full wave-field in a 1D layered propagation medium. The representation integral is computed through a finite elements technique, based on a Delaunay’s triangulation on the fault plane. The rupture velocity is defined on a coarser regular grid and rupture times are computed by integration of the eikonal equation. For the inversion, the slip distribution is parameterized by 2D overlapping Gaussian functions, which can easily relate the spectrum of the possible solutions with the minimum resolvable wavelength, related to source-station distribution and data processing. The inverse problem is solved by a two-step procedure aimed at separating the computation of the rupture velocity from the evaluation of the slip distribution, the latter being a linear problem, when the rupture velocity is fixed. The non-linear step is solved by optimization of an L2 misfit function between synthetic and real seismograms, and solution is searched by the use of the Neighbourhood Algorithm. The conjugate gradient method is used to solve the linear step instead. The developed methodology has been applied to the M7.2, Iwate Nairiku Miyagi, Japan, earthquake. The estimated magnitude seismic moment is 2.6326 dyne∙cm that corresponds to a moment magnitude MW 6.9 while the mean the rupture velocity is 2.0 km/s. A large slip patch extends from the hypocenter to the southern shallow part of the fault plane. A second relatively large slip patch is found in the northern shallow part. Finally, we gave a quantitative estimation of errors associates with the parameters.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The aim of this thesis is to apply multilevel regression model in context of household surveys. Hierarchical structure in this type of data is characterized by many small groups. In last years comparative and multilevel analysis in the field of perceived health have grown in size. The purpose of this thesis is to develop a multilevel analysis with three level of hierarchy for Physical Component Summary outcome to: evaluate magnitude of within and between variance at each level (individual, household and municipality); explore which covariates affect on perceived physical health at each level; compare model-based and design-based approach in order to establish informativeness of sampling design; estimate a quantile regression for hierarchical data. The target population are the Italian residents aged 18 years and older. Our study shows a high degree of homogeneity within level 1 units belonging from the same group, with an intraclass correlation of 27% in a level-2 null model. Almost all variance is explained by level 1 covariates. In fact, in our model the explanatory variables having more impact on the outcome are disability, unable to work, age and chronic diseases (18 pathologies). An additional analysis are performed by using novel procedure of analysis :"Linear Quantile Mixed Model", named "Multilevel Linear Quantile Regression", estimate. This give us the possibility to describe more generally the conditional distribution of the response through the estimation of its quantiles, while accounting for the dependence among the observations. This has represented a great advantage of our models with respect to classic multilevel regression. The median regression with random effects reveals to be more efficient than the mean regression in representation of the outcome central tendency. A more detailed analysis of the conditional distribution of the response on other quantiles highlighted a differential effect of some covariate along the distribution.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Assessment of the integrity of structural components is of great importance for aerospace systems, land and marine transportation, civil infrastructures and other biological and mechanical applications. Guided waves (GWs) based inspections are an attractive mean for structural health monitoring. In this thesis, the study and development of techniques for GW ultrasound signal analysis and compression in the context of non-destructive testing of structures will be presented. In guided wave inspections, it is necessary to address the problem of the dispersion compensation. A signal processing approach based on frequency warping was adopted. Such operator maps the frequencies axis through a function derived by the group velocity of the test material and it is used to remove the dependence on the travelled distance from the acquired signals. Such processing strategy was fruitfully applied for impact location and damage localization tasks in composite and aluminum panels. It has been shown that, basing on this processing tool, low power embedded system for GW structural monitoring can be implemented. Finally, a new procedure based on Compressive Sensing has been developed and applied for data reduction. Such procedure has also a beneficial effect in enhancing the accuracy of structural defects localization. This algorithm uses the convolutive model of the propagation of ultrasonic guided waves which takes advantage of a sparse signal representation in the warped frequency domain. The recovery from the compressed samples is based on an alternating minimization procedure which achieves both an accurate reconstruction of the ultrasonic signal and a precise estimation of waves time of flight. Such information is used to feed hyperbolic or elliptic localization procedures, for accurate impact or damage localization.