3 resultados para Classification Tree Pruning
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
Cesarean Delivery (CD) rates are rising in many parts of the world. In order to define strategies to reduce them, it is important to explore the role of clinical and organizational factors. This thesis has the objective to describe the contemporary CD practice and study clinical and organizational variables as determinants of CD in all women who gave birth between 2005 and June 2010 in the Emilia Romagna region (Italy). All hospital discharge abstracts of women who delivered between 2005 and mid 2010 in the region were selected and linked with birth certificates. In addition to descriptive statistics, in order to study the role of clinical and organizational variables (teaching or non-teaching hospital, birth volumes, time and day of delivery) multilevel Poisson regression models and a classification tree were used. A substantial inter-hospital variability in CD rate was found, and this was only partially explained by the considered variables. The most important risk factors of CD were: previous CD (RR 4,95; 95%CI: 4,85-5,05), cord prolapse (RR 3,51; 95% CI:2,96-4,16), and malposition/malpresentation (RR 2,72; 95%CI: 2,66-2,77). Delivery between 7 pm and 7 am and during non working days protect against CD in all subgroups including those with a small number of elective CDs while delivery at a teaching hospital and birth volumes were not statistically significant risk factors. The classification tree shows that previous CD and malposition/malpresentation are the most important variables discriminating between high and low risk of CD. These results indicate that other not considered factors might explain CD variability and do not provide clear evidence that small hospitals have a poor performance in terms of CD rate. Some strategies to reduce CD could be found by focusing on the differences in delivery practice between day and night and between working and no-working day deliveries.
Resumo:
Machine learning comprises a series of techniques for automatic extraction of meaningful information from large collections of noisy data. In many real world applications, data is naturally represented in structured form. Since traditional methods in machine learning deal with vectorial information, they require an a priori form of preprocessing. Among all the learning techniques for dealing with structured data, kernel methods are recognized to have a strong theoretical background and to be effective approaches. They do not require an explicit vectorial representation of the data in terms of features, but rely on a measure of similarity between any pair of objects of a domain, the kernel function. Designing fast and good kernel functions is a challenging problem. In the case of tree structured data two issues become relevant: kernel for trees should not be sparse and should be fast to compute. The sparsity problem arises when, given a dataset and a kernel function, most structures of the dataset are completely dissimilar to one another. In those cases the classifier has too few information for making correct predictions on unseen data. In fact, it tends to produce a discriminating function behaving as the nearest neighbour rule. Sparsity is likely to arise for some standard tree kernel functions, such as the subtree and subset tree kernel, when they are applied to datasets with node labels belonging to a large domain. A second drawback of using tree kernels is the time complexity required both in learning and classification phases. Such a complexity can sometimes prevents the kernel application in scenarios involving large amount of data. This thesis proposes three contributions for resolving the above issues of kernel for trees. A first contribution aims at creating kernel functions which adapt to the statistical properties of the dataset, thus reducing its sparsity with respect to traditional tree kernel functions. Specifically, we propose to encode the input trees by an algorithm able to project the data onto a lower dimensional space with the property that similar structures are mapped similarly. By building kernel functions on the lower dimensional representation, we are able to perform inexact matchings between different inputs in the original space. A second contribution is the proposal of a novel kernel function based on the convolution kernel framework. Convolution kernel measures the similarity of two objects in terms of the similarities of their subparts. Most convolution kernels are based on counting the number of shared substructures, partially discarding information about their position in the original structure. The kernel function we propose is, instead, especially focused on this aspect. A third contribution is devoted at reducing the computational burden related to the calculation of a kernel function between a tree and a forest of trees, which is a typical operation in the classification phase and, for some algorithms, also in the learning phase. We propose a general methodology applicable to convolution kernels. Moreover, we show an instantiation of our technique when kernels such as the subtree and subset tree kernels are employed. In those cases, Direct Acyclic Graphs can be used to compactly represent shared substructures in different trees, thus reducing the computational burden and storage requirements.
Resumo:
The introduction of dwarfed rootstocks in apple crop has led to a new concept of intensive planting systems with the aim of producing early high yield and with returns of the initial high investment. Although yield is an important aspect to the grower, the consumer has become demanding regards fruit quality and is generally attracted by appearance. To fulfil the consumer’s expectations the grower may need to choose a proper training system along with an ideal pruning technique, which ensure a good light distribution in different parts of the canopy and a marketable fruit quality in terms of size and skin colour. Although these aspects are important, these fruits might not reach the proper ripening stage within the canopy because they are often heterogeneous. To describe the variability present in a tree, a software (PlantToon®), was used to recreate the tree architecture in 3D in the two training systems. The ripening stage of each of the fruits was determined using a non-destructive device (DA-Meter), thus allowing to estimate the fruit ripening variability. This study deals with some of the main parameters that can influence fruit quality and ripening stage within the canopy and orchard management techniques that can ameliorate a ripening fruit homogeneity. Significant differences in fruit quality were found within the canopies due to their position, flowering time and bud wood age. Bi-axis appeared to be suitable for high density planting, even though the fruit quality traits resulted often similar to those obtained with a Slender Spindle, suggesting similar fruit light availability within the canopies. Crop load confirmed to be an important factor that influenced fruit quality as much as the interesting innovative pruning method “Click”, in intensive planting systems.