905 resultados para Classification Methods


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Sequential panel selection methods (spsms — procedures that sequentially use conventional panel unit root tests to identify I(0)I(0) time series in panels) are increasingly used in the empirical literature. We check the reliability of spsms by using Monte Carlo simulations based on generating directly the individual asymptotic pp values to be combined into the panel unit root tests, in this way isolating the classification abilities of the procedures from the small sample properties of the underlying univariate unit root tests. The simulations consider both independent and cross-dependent individual test statistics. Results suggest that spsms may offer advantages over time series tests only under special conditions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Aim: To determine the prevalence and classification of bifid mandibular canals using cone beam computed tomography (CBCT). Methods: The sample comprised 300 CBCT scans obtained from the Radiology and Imaging Department database at São Leopoldo Mandic Dental School, Campinas, SP, Brazil. All images were performed on Classic I-Cat® CBCT scanner, with standardized voxel at 0.25 mm and 13 cm FOV (field of view). From an axial slice (0.25 mm) a guiding plane was drawn along the alveolar ridge in order to obtain a cross-section. Results: Among 300 patients, 188 (62.7%) were female and 112 (37.3%) were male, aged between 13 to 87 years. Changes in the mandibular canal were observed in 90 patients, 30.0% of the sample, 51 women (56.7%) and 39 men (43.3%). Regarding affected sides, 32.2% were on the right and 24.5% on the left, with 43.3% bilateral cases. Conclusions: According to the results obtained in this study, a prevalence of 30% of bifid mandibular canals was found, with the most prevalent types classified as B (mesial direction) and bilateral.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper analyses the advantages and limitations in using the Troll, Hargreaves and modified Thornthwaite approaches for the demarcation of the semi-arid tropics. Data from India, Africa, Brazil, Australia and Thailand, were used for the comparison of these three methods. The modified Thornthwaite approach provided the most relevant agriculturally oriented demarcation of the semi-arid tropics. This method in not only simple, tut uses input data that are avaliable for a global network of stations. Using this method the semi-arid tropics include major dryland or rainfed agricultural zones with annual rainfall varying from about 400 to 1,250 mm. Major dryland crops are pearl millet, sorghum, pigeonpea and groundnut. This paper also presents the brief description of climate, soils and farming systems of the semi-arid tropics.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

When it comes to information sets in real life, often pieces of the whole set may not be available. This problem can find its origin in various reasons, describing therefore different patterns. In the literature, this problem is known as Missing Data. This issue can be fixed in various ways, from not taking into consideration incomplete observations, to guessing what those values originally were, or just ignoring the fact that some values are missing. The methods used to estimate missing data are called Imputation Methods. The work presented in this thesis has two main goals. The first one is to determine whether any kind of interactions exists between Missing Data, Imputation Methods and Supervised Classification algorithms, when they are applied together. For this first problem we consider a scenario in which the databases used are discrete, understanding discrete as that it is assumed that there is no relation between observations. These datasets underwent processes involving different combina- tions of the three components mentioned. The outcome showed that the missing data pattern strongly influences the outcome produced by a classifier. Also, in some of the cases, the complex imputation techniques investigated in the thesis were able to obtain better results than simple ones. The second goal of this work is to propose a new imputation strategy, but this time we constrain the specifications of the previous problem to a special kind of datasets, the multivariate Time Series. We designed new imputation techniques for this particular domain, and combined them with some of the contrasted strategies tested in the pre- vious chapter of this thesis. The time series also were subjected to processes involving missing data and imputation to finally propose an overall better imputation method. In the final chapter of this work, a real-world example is presented, describing a wa- ter quality prediction problem. The databases that characterized this problem had their own original latent values, which provides a real-world benchmark to test the algorithms developed in this thesis.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The accuracy of a map is dependent on the reference dataset used in its construction. Classification analyses used in thematic mapping can, for example, be sensitive to a range of sampling and data quality concerns. With particular focus on the latter, the effects of reference data quality on land cover classifications from airborne thematic mapper data are explored. Variations in sampling intensity and effort are highlighted in a dataset that is widely used in mapping and modelling studies; these may need accounting for in analyses. The quality of the labelling in the reference dataset was also a key variable influencing mapping accuracy. Accuracy varied with the amount and nature of mislabelled training cases with the nature of the effects varying between classifiers. The largest impacts on accuracy occurred when mislabelling involved confusion between similar classes. Accuracy was also typically negatively related to the magnitude of mislabelled cases and the support vector machine (SVM), which has been claimed to be relatively insensitive to training data error, was the most sensitive of the set of classifiers investigated, with overall classification accuracy declining by 8% (significant at 95% level of confidence) with the use of a training set containing 20% mislabelled cases.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In knowledge technology work, as expressed by the scope of this conference, there are a number of communities, each uncovering new methods, theories, and practices. The Library and Information Science (LIS) community is one such community. This community, through tradition and innovation, theories and practice, organizes knowledge and develops knowledge technologies formed by iterative research hewn to the values of equal access and discovery for all. The Information Modeling community is another contributor to knowledge technologies. It concerns itself with the construction of symbolic models that capture the meaning of information and organize it in ways that are computer-based, but human understandable. A recent paper that examines certain assumptions in information modeling builds a bridge between these two communities, offering a forum for a discussion on common aims from a common perspective. In a June 2000 article, Parsons and Wand separate classes from instances in information modeling in order to free instances from what they call the “tyranny” of classes. They attribute a number of problems in information modeling to inherent classification – or the disregard for the fact that instances can be conceptualized independent of any class assignment. By faceting instances from classes, Parsons and Wand strike a sonorous chord with classification theory as understood in LIS. In the practice community and in the publications of LIS, faceted classification has shifted the paradigm of knowledge organization theory in the twentieth century. Here, with the proposal of inherent classification and the resulting layered information modeling, a clear line joins both the LIS classification theory community and the information modeling community. Both communities have their eyes turned toward networked resource discovery, and with this conceptual conjunction a new paradigmatic conversation can take place. Parsons and Wand propose that the layered information model can facilitate schema integration, schema evolution, and interoperability. These three spheres in information modeling have their own connotation, but are not distant from the aims of classification research in LIS. In this new conceptual conjunction, established by Parsons and Ward, information modeling through the layered information model, can expand the horizons of classification theory beyond LIS, promoting a cross-fertilization of ideas on the interoperability of subject access tools like classification schemes, thesauri, taxonomies, and ontologies. This paper examines the common ground between the layered information model and faceted classification, establishing a vocabulary and outlining some common principles. It then turns to the issue of schema and the horizons of conventional classification and the differences between Information Modeling and Library and Information Science. Finally, a framework is proposed that deploys an interpretation of the layered information modeling approach in a knowledge technologies context. In order to design subject access systems that will integrate, evolve and interoperate in a networked environment, knowledge organization specialists must consider a semantic class independence like Parsons and Wand propose for information modeling.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The aim of this thesis project is to automatically localize HCC tumors in the human liver and subsequently predict if the tumor will undergo microvascular infiltration (MVI), the initial stage of metastasis development. The input data for the work have been partially supplied by Sant'Orsola Hospital and partially downloaded from online medical databases. Two Unet models have been implemented for the automatic segmentation of the livers and the HCC malignancies within it. The segmentation models have been evaluated with the Intersection-over-Union and the Dice Coefficient metrics. The outcomes obtained for the liver automatic segmentation are quite good (IOU = 0.82; DC = 0.35); the outcomes obtained for the tumor automatic segmentation (IOU = 0.35; DC = 0.46) are, instead, affected by some limitations: it can be state that the algorithm is almost always able to detect the location of the tumor, but it tends to underestimate its dimensions. The purpose is to achieve the CT images of the HCC tumors, necessary for features extraction. The 14 Haralick features calculated from the 3D-GLCM, the 120 Radiomic features and the patients' clinical information are collected to build a dataset of 153 features. Now, the goal is to build a model able to discriminate, based on the features given, the tumors that will undergo MVI and those that will not. This task can be seen as a classification problem: each tumor needs to be classified either as “MVI positive” or “MVI negative”. Techniques for features selection are implemented to identify the most descriptive features for the problem at hand and then, a set of classification models are trained and compared. Among all, the models with the best performances (around 80-84% ± 8-15%) result to be the XGBoost Classifier, the SDG Classifier and the Logist Regression models (without penalization and with Lasso, Ridge or Elastic Net penalization).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The world of Computational Biology and Bioinformatics presently integrates many different expertise, including computer science and electronic engineering. A major aim in Data Science is the development and tuning of specific computational approaches to interpret the complexity of Biology. Molecular biologists and medical doctors heavily rely on an interdisciplinary expert capable of understanding the biological background to apply algorithms for finding optimal solutions to their problems. With this problem-solving orientation, I was involved in two basic research fields: Cancer Genomics and Enzyme Proteomics. For this reason, what I developed and implemented can be considered a general effort to help data analysis both in Cancer Genomics and in Enzyme Proteomics, focusing on enzymes which catalyse all the biochemical reactions in cells. Specifically, as to Cancer Genomics I contributed to the characterization of intratumoral immune microenvironment in gastrointestinal stromal tumours (GISTs) correlating immune cell population levels with tumour subtypes. I was involved in the setup of strategies for the evaluation and standardization of different approaches for fusion transcript detection in sarcomas that can be applied in routine diagnostic. This was part of a coordinated effort of the Sarcoma working group of "Alleanza Contro il Cancro". As to Enzyme Proteomics, I generated a derived database collecting all the human proteins and enzymes which are known to be associated to genetic disease. I curated the data search in freely available databases such as PDB, UniProt, Humsavar, Clinvar and I was responsible of searching, updating, and handling the information content, and computing statistics. I also developed a web server, BENZ, which allows researchers to annotate an enzyme sequence with the corresponding Enzyme Commission number, the important feature fully describing the catalysed reaction. More to this, I greatly contributed to the characterization of the enzyme-genetic disease association, for a better classification of the metabolic genetic diseases.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The dissertation addresses the still not solved challenges concerned with the source-based digital 3D reconstruction, visualisation and documentation in the domain of archaeology, art and architecture history. The emerging BIM methodology and the exchange data format IFC are changing the way of collaboration, visualisation and documentation in the planning, construction and facility management process. The introduction and development of the Semantic Web (Web 3.0), spreading the idea of structured, formalised and linked data, offers semantically enriched human- and machine-readable data. In contrast to civil engineering and cultural heritage, academic object-oriented disciplines, like archaeology, art and architecture history, are acting as outside spectators. Since the 1990s, it has been argued that a 3D model is not likely to be considered a scientific reconstruction unless it is grounded on accurate documentation and visualisation. However, these standards are still missing and the validation of the outcomes is not fulfilled. Meanwhile, the digital research data remain ephemeral and continue to fill the growing digital cemeteries. This study focuses, therefore, on the evaluation of the source-based digital 3D reconstructions and, especially, on uncertainty assessment in the case of hypothetical reconstructions of destroyed or never built artefacts according to scientific principles, making the models shareable and reusable by a potentially wide audience. The work initially focuses on terminology and on the definition of a workflow especially related to the classification and visualisation of uncertainty. The workflow is then applied to specific cases of 3D models uploaded to the DFG repository of the AI Mainz. In this way, the available methods of documenting, visualising and communicating uncertainty are analysed. In the end, this process will lead to a validation or a correction of the workflow and the initial assumptions, but also (dealing with different hypotheses) to a better definition of the levels of uncertainty.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this work, we explore and demonstrate the potential for modeling and classification using quantile-based distributions, which are random variables defined by their quantile function. In the first part we formalize a least squares estimation framework for the class of linear quantile functions, leading to unbiased and asymptotically normal estimators. Among the distributions with a linear quantile function, we focus on the flattened generalized logistic distribution (fgld), which offers a wide range of distributional shapes. A novel naïve-Bayes classifier is proposed that utilizes the fgld estimated via least squares, and through simulations and applications, we demonstrate its competitiveness against state-of-the-art alternatives. In the second part we consider the Bayesian estimation of quantile-based distributions. We introduce a factor model with independent latent variables, which are distributed according to the fgld. Similar to the independent factor analysis model, this approach accommodates flexible factor distributions while using fewer parameters. The model is presented within a Bayesian framework, an MCMC algorithm for its estimation is developed, and its effectiveness is illustrated with data coming from the European Social Survey. The third part focuses on depth functions, which extend the concept of quantiles to multivariate data by imposing a center-outward ordering in the multivariate space. We investigate the recently introduced integrated rank-weighted (IRW) depth function, which is based on the distribution of random spherical projections of the multivariate data. This depth function proves to be computationally efficient and to increase its flexibility we propose different methods to explicitly model the projected univariate distributions. Its usefulness is shown in classification tasks: the maximum depth classifier based on the IRW depth is proven to be asymptotically optimal under certain conditions, and classifiers based on the IRW depth are shown to perform well in simulated and real data experiments.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The abundance of visual data and the push for robust AI are driving the need for automated visual sensemaking. Computer Vision (CV) faces growing demand for models that can discern not only what images "represent," but also what they "evoke." This is a demand for tools mimicking human perception at a high semantic level, categorizing images based on concepts like freedom, danger, or safety. However, automating this process is challenging due to entropy, scarcity, subjectivity, and ethical considerations. These challenges not only impact performance but also underscore the critical need for interoperability. This dissertation focuses on abstract concept-based (AC) image classification, guided by three technical principles: situated grounding, performance enhancement, and interpretability. We introduce ART-stract, a novel dataset of cultural images annotated with ACs, serving as the foundation for a series of experiments across four key domains: assessing the effectiveness of the end-to-end DL paradigm, exploring cognitive-inspired semantic intermediaries, incorporating cultural and commonsense aspects, and neuro-symbolic integration of sensory-perceptual data with cognitive-based knowledge. Our results demonstrate that integrating CV approaches with semantic technologies yields methods that surpass the current state of the art in AC image classification, outperforming the end-to-end deep vision paradigm. The results emphasize the role semantic technologies can play in developing both effective and interpretable systems, through the capturing, situating, and reasoning over knowledge related to visual data. Furthermore, this dissertation explores the complex interplay between technical and socio-technical factors. By merging technical expertise with an understanding of human and societal aspects, we advocate for responsible labeling and training practices in visual media. These insights and techniques not only advance efforts in CV and explainable artificial intelligence but also propel us toward an era of AI development that harmonizes technical prowess with deep awareness of its human and societal implications.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Introduction: Recently, the American Association of Gynecologic Laparoscopists proposed a new classification and scoring system with the specific aim to assess surgical complexity. This study sought to assess if a higher AAGL score correlates with an increased risk of peri-operative complications in women submitted to surgery for endometriosis. Methods: This is a retrospective cohort study conducted in a third level referral center. We collected data from women with endometriosis submitted to complete surgical removal of endometriosis from January 2019 to December 2021. ENZIAN, r-ASRM classifications and AAGL total score was calculated for each patient. Population was divided in two groups according to the occurrence or not of at least one peri-operative complication. Our primary outcome was to evaluate the correlation between AAGL score and occurrence of complications. Results: During the study period we analyzed data from 282 eligible patients. Among them, 80 (28.4%) experienced peri-operative complications. No statistically significant difference was found between the two groups in terms of baseline characteristics, except for pre-operative hemoglobin (Hb), which was lower in patients with complications (p=0.001). Surgical variables associated with the occurrence of complications were recto-sigmoid surgery (p=0.003), ileocecal resection (0.034), and longer operative time (p=0.007). Furthermore, a higher ENZIAN B score (p=0.006), AAGL score (p=0.045) and stage (p=0.022) were found in the group of patients with complications. The multivariate analysis only confirmed the significant association between the occurrence of peri-operative complications and lower pre-operative Hb level (OR 0.74; 95% CI, 0.59 - 0.94; p=0.014), longer operative time (OR 1.00; 95% CI, 1.00 – 1.01; p=0.013), recto-sigmoid surgery - especially discoid resection (OR 8.73; 95% CI, 2.18 – 35; p=0.016) and ENZIAN B3 (OR 3.62; 95% CI, 1.46 – 8.99; p= 0.006). Conclusion: According to our findings, high AAGL scores or stages do not seem to increase the risk of peri-operative complications.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

With the advent of high-performance computing devices, deep neural networks have gained a lot of popularity in solving many Natural Language Processing tasks. However, they are also vulnerable to adversarial attacks, which are able to modify the input text in order to mislead the target model. Adversarial attacks are a serious threat to the security of deep neural networks, and they can be used to craft adversarial examples that steer the model towards a wrong decision. In this dissertation, we propose SynBA, a novel contextualized synonym-based adversarial attack for text classification. SynBA is based on the idea of replacing words in the input text with their synonyms, which are selected according to the context of the sentence. We show that SynBA successfully generates adversarial examples that are able to fool the target model with a high success rate. We demonstrate three advantages of this proposed approach: (1) effective - it outperforms state-of-the-art attacks by semantic similarity and perturbation rate, (2) utility-preserving - it preserves semantic content, grammaticality, and correct types classified by humans, and (3) efficient - it performs attacks faster than other methods.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this thesis we address a multi-label hierarchical text classification problem in a low-resource setting and explore different approaches to identify the best one for our case. The goal is to train a model that classifies English school exercises according to a hierarchical taxonomy with few labeled data. The experiments made in this work employ different machine learning models and text representation techniques: CatBoost with tf-idf features, classifiers based on pre-trained models (mBERT, LASER), and SetFit, a framework for few-shot text classification. SetFit proved to be the most promising approach, achieving better performance when during training only a few labeled examples per class are available. However, this thesis does not consider all the hierarchical taxonomy, but only the first two levels: to address classification with the classes at the third level further experiments should be carried out, exploring methods for zero-shot text classification, data augmentation, and strategies to exploit the hierarchical structure of the taxonomy during training.