4 resultados para multi-class classification
em AMS Tesi di Laurea - Alm@DL - Università di Bologna
Resumo:
In this thesis we address a multi-label hierarchical text classification problem in a low-resource setting and explore different approaches to identify the best one for our case. The goal is to train a model that classifies English school exercises according to a hierarchical taxonomy with few labeled data. The experiments made in this work employ different machine learning models and text representation techniques: CatBoost with tf-idf features, classifiers based on pre-trained models (mBERT, LASER), and SetFit, a framework for few-shot text classification. SetFit proved to be the most promising approach, achieving better performance when during training only a few labeled examples per class are available. However, this thesis does not consider all the hierarchical taxonomy, but only the first two levels: to address classification with the classes at the third level further experiments should be carried out, exploring methods for zero-shot text classification, data augmentation, and strategies to exploit the hierarchical structure of the taxonomy during training.
Resumo:
In this report it was designed an innovative satellite-based monitoring approach applied on the Iraqi Marshlands to survey the extent and distribution of marshland re-flooding and assess the development of wetland vegetation cover. The study, conducted in collaboration with MEEO Srl , makes use of images collected from the sensor (A)ATSR onboard ESA ENVISAT Satellite to collect data at multi-temporal scales and an analysis was adopted to observe the evolution of marshland re-flooding. The methodology uses a multi-temporal pixel-based approach based on classification maps produced by the classification tool SOIL MAPPER ®. The catalogue of the classification maps is available as web service through the Service Support Environment Portal (SSE, supported by ESA). The inundation of the Iraqi marshlands, which has been continuous since April 2003, is characterized by a high degree of variability, ad-hoc interventions and uncertainty. Given the security constraints and vastness of the Iraqi marshlands, as well as cost-effectiveness considerations, satellite remote sensing was the only viable tool to observe the changes taking place on a continuous basis. The proposed system (ALCS – AATSR LAND CLASSIFICATION SYSTEM) avoids the direct use of the (A)ATSR images and foresees the application of LULCC evolution models directly to „stock‟ of classified maps. This approach is made possible by the availability of a 13 year classified image database, conceived and implemented in the CARD project (http://earth.esa.int/rtd/Projects/#CARD).The approach here presented evolves toward an innovative, efficient and fast method to exploit the potentiality of multi-temporal LULCC analysis of (A)ATSR images. The two main objectives of this work are both linked to a sort of assessment: the first is to assessing the ability of modeling with the web-application ALCS using image-based AATSR classified with SOIL MAPPER ® and the second is to evaluate the magnitude, the character and the extension of wetland rehabilitation.
Resumo:
Nowadays communication is switching from a centralized scenario, where communication media like newspapers, radio, TV programs produce information and people are just consumers, to a completely different decentralized scenario, where everyone is potentially an information producer through the use of social networks, blogs, forums that allow a real-time worldwide information exchange. These new instruments, as a result of their widespread diffusion, have started playing an important socio-economic role. They are the most used communication media and, as a consequence, they constitute the main source of information enterprises, political parties and other organizations can rely on. Analyzing data stored in servers all over the world is feasible by means of Text Mining techniques like Sentiment Analysis, which aims to extract opinions from huge amount of unstructured texts. This could lead to determine, for instance, the user satisfaction degree about products, services, politicians and so on. In this context, this dissertation presents new Document Sentiment Classification methods based on the mathematical theory of Markov Chains. All these approaches bank on a Markov Chain based model, which is language independent and whose killing features are simplicity and generality, which make it interesting with respect to previous sophisticated techniques. Every discussed technique has been tested in both Single-Domain and Cross-Domain Sentiment Classification areas, comparing performance with those of other two previous works. The performed analysis shows that some of the examined algorithms produce results comparable with the best methods in literature, with reference to both single-domain and cross-domain tasks, in $2$-classes (i.e. positive and negative) Document Sentiment Classification. However, there is still room for improvement, because this work also shows the way to walk in order to enhance performance, that is, a good novel feature selection process would be enough to outperform the state of the art. Furthermore, since some of the proposed approaches show promising results in $2$-classes Single-Domain Sentiment Classification, another future work will regard validating these results also in tasks with more than $2$ classes.
Resumo:
Within the classification of orbits in axisymmetric stellar systems, we present a new algorithm able to automatically classify the orbits according to their nature. The algorithm involves the application of the correlation integral method to the surface of section of the orbit; fitting the cumulative distribution function built with the consequents in the surface of section of the orbit, we can obtain the value of its logarithmic slope m which is directly related to the orbit’s nature: for slopes m ≈ 1 we expect the orbit to be regular, for slopes m ≈ 2 we expect it to be chaotic. With this method we have a fast and reliable way to classify orbits and, furthermore, we provide an analytical expression of the probability that an orbit is regular or chaotic given the logarithmic slope m of its correlation integral. Although this method works statistically well, the underlying algorithm can fail in some cases, misclassifying individual orbits under some peculiar circumstances. The performance of the algorithm benefits from a rich sampling of the traces of the SoS, which can be obtained with long numerical integration of orbits. Finally we note that the algorithm does not differentiate between the subtypes of regular orbits: resonantly trapped and untrapped orbits. Such distinction would be a useful feature, which we leave for future work. Since the result of the analysis is a probability linked to a Gaussian distribution, for the very definition of distribution, some orbits even if they have a certain nature are classified as belonging to the opposite class and create the probabilistic tails of the distribution. So while the method produces fair statistical results, it lacks in absolute classification precision.