820 resultados para Data classification


Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we discuss the temporal aspects of indexing and classification in information systems. Basing this discussion off of the three sources of research of scheme change: of indexing: (1) analytical research on the types of scheme change and (2) empirical data on scheme change in systems and (3) evidence of cataloguer decision-making in the context of scheme change. From this general discussion we propose two constructs along which we might craft metrics to measure scheme change: collocative integrity and semantic gravity. The paper closes with a discussion of these constructs.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Résumé : La texture dispose d’un bon potentiel discriminant qui complète celui des paramètres radiométriques dans le processus de classification d’image. L’indice Compact Texture Unit (CTU) multibande, récemment mis au point par Safia et He (2014), permet d’extraire la texture sur plusieurs bandes à la fois, donc de tirer parti d’un surcroît d’informations ignorées jusqu’ici dans les analyses texturales traditionnelles : l’interdépendance entre les bandes. Toutefois, ce nouvel outil n’a pas encore été testé sur des images multisources, usage qui peut se révéler d’un grand intérêt quand on considère par exemple toute la richesse texturale que le radar peut apporter en supplément à l’optique, par combinaison de données. Cette étude permet donc de compléter la validation initiée par Safia (2014) en appliquant le CTU sur un couple d’images optique-radar. L’analyse texturale de ce jeu de données a permis de générer une image en « texture couleur ». Ces bandes texturales créées sont à nouveau combinées avec les bandes initiales de l’optique, avant d’être intégrées dans un processus de classification de l’occupation du sol sous eCognition. Le même procédé de classification (mais sans CTU) est appliqué respectivement sur : la donnée Optique, puis le Radar, et enfin la combinaison Optique-Radar. Par ailleurs le CTU généré sur l’Optique uniquement (monosource) est comparé à celui dérivant du couple Optique-Radar (multisources). L’analyse du pouvoir séparateur de ces différentes bandes à partir d’histogrammes, ainsi que l’outil matrice de confusion, permet de confronter la performance de ces différents cas de figure et paramètres utilisés. Ces éléments de comparaison présentent le CTU, et notamment le CTU multisources, comme le critère le plus discriminant ; sa présence rajoute de la variabilité dans l’image permettant ainsi une segmentation plus nette, une classification à la fois plus détaillée et plus performante. En effet, la précision passe de 0.5 avec l’image Optique à 0.74 pour l’image CTU, alors que la confusion diminue en passant de 0.30 (dans l’Optique) à 0.02 (dans le CTU).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

If marine management policies and actions are to achieve long-term sustainable use and management of the marine environment and its resources, they need to be informed by data giving the spatial distribution of seafloor habitats over large areas. Broad-scale seafloor habitat mapping is an approachwhich has the benefit of producing maps covering large extents at a reasonable cost. This approach was first investigated by Roff et al. (2003), who, acknowledging that benthic communities are strongly influenced by the physical characteristics of the seafloor, proposed overlaying mapped physical variables using a geographic information system (GIS) to produce an integrated map of the physical characteristics of the seafloor. In Europe the method was adapted to the marine section of the EUNIS (European Nature Information System) classification of habitat types under the MESH project, andwas applied at an operational level in 2011 under the EUSeaMap project. The present study compiled GIS layers for fundamental physical parameters in the northeast Atlantic, including (i) bathymetry, (ii) substrate type, (iii) light penetration depth and (iv) exposure to near-seafloor currents andwave action. Based on analyses of biological occurrences, significant thresholds were fine-tuned for each of the abiotic layers and later used in multi-criteria raster algebra for the integration of the layers into a seafloor habitat map. The final result was a harmonised broad-scale seafloor habitat map with a 250 m pixel size covering four extensive areas, i.e. Ireland, the Bay of Biscay, the Iberian Peninsula and the Azores. The map provided the first comprehensive perception of habitat spatial distribution for the Iberian Peninsula and the Azores, and fed into the initiative for a pan- European map initiated by the EUSeaMap project for Baltic, North, Celtic and Mediterranean seas.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The semiarid region of northeastern Brazil, the Caatinga, is extremely important due to its biodiversity and endemism. Measurements of plant physiology are crucial to the calibration of Dynamic Global Vegetation Models (DGVMs) that are currently used to simulate the responses of vegetation in face of global changes. In a field work realized in an area of preserved Caatinga forest located in Petrolina, Pernambuco, measurements of carbon assimilation (in response to light and CO2) were performed on 11 individuals of Poincianella microphylla, a native species that is abundant in this region. These data were used to calibrate the maximum carboxylation velocity (Vcmax) used in the INLAND model. The calibration techniques used were Multiple Linear Regression (MLR), and data mining techniques as the Classification And Regression Tree (CART) and K-MEANS. The results were compared to the UNCALIBRATED model. It was found that simulated Gross Primary Productivity (GPP) reached 72% of observed GPP when using the calibrated Vcmax values, whereas the UNCALIBRATED approach accounted for 42% of observed GPP. Thus, this work shows the benefits of calibrating DGVMs using field ecophysiological measurements, especially in areas where field data is scarce or non-existent, such as in the Caatinga

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Monitoring agricultural crops constitutes a vital task for the general understanding of land use spatio-temporal dynamics. This paper presents an approach for the enhancement of current crop monitoring capabilities on a regional scale, in order to allow for the analysis of environmental and socio-economic drivers and impacts of agricultural land use. This work discusses the advantages and current limitations of using 250m VI data from the Moderate Resolution Imaging Spectroradiometer (MODIS) for this purpose, with emphasis in the difficulty of correctly analyzing pixels whose temporal responses are disturbed due to certain sources of interference such as mixed or heterogeneous land cover. It is shown that the influence of noisy or disturbed pixels can be minimized, and a much more consistent and useful result can be attained, if individual agricultural fields are identified and each field's pixels are analyzed in a collective manner. As such, a method is proposed that makes use of image segmentation techniques based on MODIS temporal information in order to identify portions of the study area that agree with actual agricultural field borders. The pixels of each portion or segment are then analyzed individually in order to estimate the reliability of the temporal signal observed and the consequent relevance of any estimation of land use from that data. The proposed method was applied in the state of Mato Grosso, in mid-western Brazil, where extensive ground truth data was available. Experiments were carried out using several supervised classification algorithms as well as different subsets of land cover classes, in order to test the methodology in a comprehensive way. Results show that the proposed method is capable of consistently improving classification results not only in terms of overall accuracy but also qualitatively by allowing a better understanding of the land use patterns detected. It thus provides a practical and straightforward procedure for enhancing crop-mapping capabilities using temporal series of moderate resolution remote sensing data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Collecting ground truth data is an important step to be accomplished before performing a supervised classification. However, its quality depends on human, financial and time ressources. It is then important to apply a validation process to assess the reliability of the acquired data. In this study, agricultural infomation was collected in the Brazilian Amazonian State of Mato Grosso in order to map crop expansion based on MODIS EVI temporal profiles. The field work was carried out through interviews for the years 2005-2006 and 2006-2007. This work presents a methodology to validate the training data quality and determine the optimal sample to be used according to the classifier employed. The technique is based on the detection of outlier pixels for each class and is carried out by computing Mahalanobis distances for each pixel. The higher the distance, the further the pixel is from the class centre. Preliminary observations through variation coefficent validate the efficiency of the technique to detect outliers. Then, various subsamples are defined by applying different thresholds to exclude outlier pixels from the classification process. The classification results prove the robustness of the Maximum Likelihood and Spectral Angle Mapper classifiers. Indeed, those classifiers were insensitive to outlier exclusion. On the contrary, the decision tree classifier showed better results when deleting 7.5% of pixels in the training data. The technique managed to detect outliers for all classes. In this study, few outliers were present in the training data, so that the classification quality was not deeply affected by the outliers.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In these last years a great effort has been put in the development of new techniques for automatic object classification, also due to the consequences in many applications such as medical imaging or driverless cars. To this end, several mathematical models have been developed from logistic regression to neural networks. A crucial aspect of these so called classification algorithms is the use of algebraic tools to represent and approximate the input data. In this thesis, we examine two different models for image classification based on a particular tensor decomposition named Tensor-Train (TT) decomposition. The use of tensor approaches preserves the multidimensional structure of the data and the neighboring relations among pixels. Furthermore the Tensor-Train, differently from other tensor decompositions, does not suffer from the curse of dimensionality making it an extremely powerful strategy when dealing with high-dimensional data. It also allows data compression when combined with truncation strategies that reduce memory requirements without spoiling classification performance. The first model we propose is based on a direct decomposition of the database by means of the TT decomposition to find basis vectors used to classify a new object. The second model is a tensor dictionary learning model, based on the TT decomposition where the terms of the decomposition are estimated using a proximal alternating linearized minimization algorithm with a spectral stepsize.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The fast development of Information Communication Technologies (ICT) offers new opportunities to realize future smart cities. To understand, manage and forecast the city's behavior, it is necessary the analysis of different kinds of data from the most varied dataset acquisition systems. The aim of this research activity in the framework of Data Science and Complex Systems Physics is to provide stakeholders with new knowledge tools to improve the sustainability of mobility demand in future cities. Under this perspective, the governance of mobility demand generated by large tourist flows is becoming a vital issue for the quality of life in Italian cities' historical centers, which will worsen in the next future due to the continuous globalization process. Another critical theme is sustainable mobility, which aims to reduce private transportation means in the cities and improve multimodal mobility. We analyze the statistical properties of urban mobility of Venice, Rimini, and Bologna by using different datasets provided by companies and local authorities. We develop algorithms and tools for cartography extraction, trips reconstruction, multimodality classification, and mobility simulation. We show the existence of characteristic mobility paths and statistical properties depending on transport means and user's kinds. Finally, we use our results to model and simulate the overall behavior of the cars moving in the Emilia Romagna Region and the pedestrians moving in Venice with software able to replicate in silico the demand for mobility and its dynamic.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Intelligent systems are currently inherent to the society, supporting a synergistic human-machine collaboration. Beyond economical and climate factors, energy consumption is strongly affected by the performance of computing systems. The quality of software functioning may invalidate any improvement attempt. In addition, data-driven machine learning algorithms are the basis for human-centered applications, being their interpretability one of the most important features of computational systems. Software maintenance is a critical discipline to support automatic and life-long system operation. As most software registers its inner events by means of logs, log analysis is an approach to keep system operation. Logs are characterized as Big data assembled in large-flow streams, being unstructured, heterogeneous, imprecise, and uncertain. This thesis addresses fuzzy and neuro-granular methods to provide maintenance solutions applied to anomaly detection (AD) and log parsing (LP), dealing with data uncertainty, identifying ideal time periods for detailed software analyses. LP provides deeper semantics interpretation of the anomalous occurrences. The solutions evolve over time and are general-purpose, being highly applicable, scalable, and maintainable. Granular classification models, namely, Fuzzy set-Based evolving Model (FBeM), evolving Granular Neural Network (eGNN), and evolving Gaussian Fuzzy Classifier (eGFC), are compared considering the AD problem. The evolving Log Parsing (eLP) method is proposed to approach the automatic parsing applied to system logs. All the methods perform recursive mechanisms to create, update, merge, and delete information granules according with the data behavior. For the first time in the evolving intelligent systems literature, the proposed method, eLP, is able to process streams of words and sentences. Essentially, regarding to AD accuracy, FBeM achieved (85.64+-3.69)%; eGNN reached (96.17+-0.78)%; eGFC obtained (92.48+-1.21)%; and eLP reached (96.05+-1.04)%. Besides being competitive, eLP particularly generates a log grammar, and presents a higher level of model interpretability.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Deep learning methods are extremely promising machine learning tools to analyze neuroimaging data. However, their potential use in clinical settings is limited because of the existing challenges of applying these methods to neuroimaging data. In this study, first a data leakage type caused by slice-level data split that is introduced during training and validation of a 2D CNN is surveyed and a quantitative assessment of the model’s performance overestimation is presented. Second, an interpretable, leakage-fee deep learning software written in a python language with a wide range of options has been developed to conduct both classification and regression analysis. The software was applied to the study of mild cognitive impairment (MCI) in patients with small vessel disease (SVD) using multi-parametric MRI data where the cognitive performance of 58 patients measured by five neuropsychological tests is predicted using a multi-input CNN model taking brain image and demographic data. Each of the cognitive test scores was predicted using different MRI-derived features. As MCI due to SVD has been hypothesized to be the effect of white matter damage, DTI-derived features MD and FA produced the best prediction outcome of the TMT-A score which is consistent with the existing literature. In a second study, an interpretable deep learning system aimed at 1) classifying Alzheimer disease and healthy subjects 2) examining the neural correlates of the disease that causes a cognitive decline in AD patients using CNN visualization tools and 3) highlighting the potential of interpretability techniques to capture a biased deep learning model is developed. Structural magnetic resonance imaging (MRI) data of 200 subjects was used by the proposed CNN model which was trained using a transfer learning-based approach producing a balanced accuracy of 71.6%. Brain regions in the frontal and parietal lobe showing the cerebral cortex atrophy were highlighted by the visualization tools.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In recent years, there has been exponential growth in using virtual spaces, including dialogue systems, that handle personal information. The concept of personal privacy in the literature is discussed and controversial, whereas, in the technological field, it directly influences the degree of reliability perceived in the information system (privacy ‘as trust’). This work aims to protect the right to privacy on personal data (GDPR, 2018) and avoid the loss of sensitive content by exploring sensitive information detection (SID) task. It is grounded on the following research questions: (RQ1) What does sensitive data mean? How to define a personal sensitive information domain? (RQ2) How to create a state-of-the-art model for SID?(RQ3) How to evaluate the model? RQ1 theoretically investigates the concepts of privacy and the ontological state-of-the-art representation of personal information. The Data Privacy Vocabulary (DPV) is the taxonomic resource taken as an authoritative reference for the definition of the knowledge domain. Concerning RQ2, we investigate two approaches to classify sensitive data: the first - bottom-up - explores automatic learning methods based on transformer networks, the second - top-down - proposes logical-symbolic methods with the construction of privaframe, a knowledge graph of compositional frames representing personal data categories. Both approaches are tested. For the evaluation - RQ3 – we create SPeDaC, a sentence-level labeled resource. This can be used as a benchmark or training in the SID task, filling the gap of a shared resource in this field. If the approach based on artificial neural networks confirms the validity of the direction adopted in the most recent studies on SID, the logical-symbolic approach emerges as the preferred way for the classification of fine-grained personal data categories, thanks to the semantic-grounded tailor modeling it allows. At the same time, the results highlight the strong potential of hybrid architectures in solving automatic tasks.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The dissertation addresses the still not solved challenges concerned with the source-based digital 3D reconstruction, visualisation and documentation in the domain of archaeology, art and architecture history. The emerging BIM methodology and the exchange data format IFC are changing the way of collaboration, visualisation and documentation in the planning, construction and facility management process. The introduction and development of the Semantic Web (Web 3.0), spreading the idea of structured, formalised and linked data, offers semantically enriched human- and machine-readable data. In contrast to civil engineering and cultural heritage, academic object-oriented disciplines, like archaeology, art and architecture history, are acting as outside spectators. Since the 1990s, it has been argued that a 3D model is not likely to be considered a scientific reconstruction unless it is grounded on accurate documentation and visualisation. However, these standards are still missing and the validation of the outcomes is not fulfilled. Meanwhile, the digital research data remain ephemeral and continue to fill the growing digital cemeteries. This study focuses, therefore, on the evaluation of the source-based digital 3D reconstructions and, especially, on uncertainty assessment in the case of hypothetical reconstructions of destroyed or never built artefacts according to scientific principles, making the models shareable and reusable by a potentially wide audience. The work initially focuses on terminology and on the definition of a workflow especially related to the classification and visualisation of uncertainty. The workflow is then applied to specific cases of 3D models uploaded to the DFG repository of the AI Mainz. In this way, the available methods of documenting, visualising and communicating uncertainty are analysed. In the end, this process will lead to a validation or a correction of the workflow and the initial assumptions, but also (dealing with different hypotheses) to a better definition of the levels of uncertainty.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Workflow activity was the following: Preliminary phase: Identification of 18 Formalin-fixed paraffin embedded (FFPE) samples (9 patients) («matched» 9 AK lesions and 9 SCC lesions). Working on biopsies samples we perform an extraction and RNA analysis with droplet Digital PCR (ddPCR) and we perform the data analysis. Second and final step phase: Evaluation of additional 39 subjects (36 men and 3 women). Results: We perform an evaluation and comparison of the following miRNA: miR-320 (a miRNA involved in apoptosis and cell proliferation control; miR-204, a miRNA involved in cell proliferation in and miRNA-16-5p, a miRNA involved in apoptosis).Conclusion: Our data suggest that there is no significant variation in the expression of the three tested microRNAs between adjacent AK lesions and squamous-cell carcinoma. However, a relevant trend has been observed Furthermore, by evaluating the miRNA expression trend between keratosis and carcinoma of the same patient, it is observed that there is no "uniform trend": for some samples the expression rises for the transition from AK to SCC and viceversa.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this work, we explore and demonstrate the potential for modeling and classification using quantile-based distributions, which are random variables defined by their quantile function. In the first part we formalize a least squares estimation framework for the class of linear quantile functions, leading to unbiased and asymptotically normal estimators. Among the distributions with a linear quantile function, we focus on the flattened generalized logistic distribution (fgld), which offers a wide range of distributional shapes. A novel naïve-Bayes classifier is proposed that utilizes the fgld estimated via least squares, and through simulations and applications, we demonstrate its competitiveness against state-of-the-art alternatives. In the second part we consider the Bayesian estimation of quantile-based distributions. We introduce a factor model with independent latent variables, which are distributed according to the fgld. Similar to the independent factor analysis model, this approach accommodates flexible factor distributions while using fewer parameters. The model is presented within a Bayesian framework, an MCMC algorithm for its estimation is developed, and its effectiveness is illustrated with data coming from the European Social Survey. The third part focuses on depth functions, which extend the concept of quantiles to multivariate data by imposing a center-outward ordering in the multivariate space. We investigate the recently introduced integrated rank-weighted (IRW) depth function, which is based on the distribution of random spherical projections of the multivariate data. This depth function proves to be computationally efficient and to increase its flexibility we propose different methods to explicitly model the projected univariate distributions. Its usefulness is shown in classification tasks: the maximum depth classifier based on the IRW depth is proven to be asymptotically optimal under certain conditions, and classifiers based on the IRW depth are shown to perform well in simulated and real data experiments.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The abundance of visual data and the push for robust AI are driving the need for automated visual sensemaking. Computer Vision (CV) faces growing demand for models that can discern not only what images "represent," but also what they "evoke." This is a demand for tools mimicking human perception at a high semantic level, categorizing images based on concepts like freedom, danger, or safety. However, automating this process is challenging due to entropy, scarcity, subjectivity, and ethical considerations. These challenges not only impact performance but also underscore the critical need for interoperability. This dissertation focuses on abstract concept-based (AC) image classification, guided by three technical principles: situated grounding, performance enhancement, and interpretability. We introduce ART-stract, a novel dataset of cultural images annotated with ACs, serving as the foundation for a series of experiments across four key domains: assessing the effectiveness of the end-to-end DL paradigm, exploring cognitive-inspired semantic intermediaries, incorporating cultural and commonsense aspects, and neuro-symbolic integration of sensory-perceptual data with cognitive-based knowledge. Our results demonstrate that integrating CV approaches with semantic technologies yields methods that surpass the current state of the art in AC image classification, outperforming the end-to-end deep vision paradigm. The results emphasize the role semantic technologies can play in developing both effective and interpretable systems, through the capturing, situating, and reasoning over knowledge related to visual data. Furthermore, this dissertation explores the complex interplay between technical and socio-technical factors. By merging technical expertise with an understanding of human and societal aspects, we advocate for responsible labeling and training practices in visual media. These insights and techniques not only advance efforts in CV and explainable artificial intelligence but also propel us toward an era of AI development that harmonizes technical prowess with deep awareness of its human and societal implications.