847 resultados para Classification Methods


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Data quality (DQ) assessment can be significantly enhanced with the use of the right DQ assessment methods, which provide automated solutions to assess DQ. The range of DQ assessment methods is very broad: from data profiling and semantic profiling to data matching and data validation. This paper gives an overview of current methods for DQ assessment and classifies the DQ assessment methods into an existing taxonomy of DQ problems. Specific examples of the placement of each DQ method in the taxonomy are provided and illustrate why the method is relevant to the particular taxonomy position. The gaps in the taxonomy, where no current DQ methods exist, show where new methods are required and can guide future research and DQ tool development.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper reviews the fingerprint classification literature looking at the problem from a double perspective. We first deal with feature extraction methods, including the different models considered for singular point detection and for orientation map extraction. Then, we focus on the different learning models considered to build the classifiers used to label new fingerprints. Taxonomies and classifications for the feature extraction, singular point detection, orientation extraction and learning methods are presented. A critical view of the existing literature have led us to present a discussion on the existing methods and their drawbacks such as difficulty in their reimplementation, lack of details or major differences in their evaluations procedures. On this account, an experimental analysis of the most relevant methods is carried out in the second part of this paper, and a new method based on their combination is presented.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Sediment particle size analysis (PSA) is routinely used to support benthic macrofaunal community distribution data in habitat mapping and Ecological Status (ES) assessment. No optimal PSA Method to explain variability in multivariate macrofaunal distribution has been identified nor have the effects of changing sampling strategy been examined. Here, we use benthic macrofaunal and PSA grabs from two embayments in the south of Ireland. Four frequently used PSA Methods and two common sampling strategies are applied. A combination of laser particle sizing and wet/dry sieving without peroxide pre-treatment to remove organics was identified as the optimal Method for explaining macrofaunal distributions. ES classifications and EUNIS sediment classification were robust to changes in PSA Method. Fauna and PSA samples returned from the same grab sample significantly decreased macrofaunal variance explained by PSA and caused ES to be classified as lower. Employing the optimal PSA Method and sampling strategy will improve benthic monitoring. © 2012 Elsevier Ltd.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Recent studies showed that features extracted from brain MRIs can well discriminate Alzheimer’s disease from Mild Cognitive Impairment. This study provides an algorithm that sequentially applies advanced feature selection methods for findings the best subset of features in terms of binary classification accuracy. The classifiers that provided the highest accuracies, have been then used for solving a multi-class problem by the one-versus-one strategy. Although several approaches based on Regions of Interest (ROIs) extraction exist, the prediction power of features has not yet investigated by comparing filter and wrapper techniques. The findings of this work suggest that (i) the IntraCranial Volume (ICV) normalization can lead to overfitting and worst the accuracy prediction of test set and (ii) the combined use of a Random Forest-based filter with a Support Vector Machines-based wrapper, improves accuracy of binary classification.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Predictive performance evaluation is a fundamental issue in design, development, and deployment of classification systems. As predictive performance evaluation is a multidimensional problem, single scalar summaries such as error rate, although quite convenient due to its simplicity, can seldom evaluate all the aspects that a complete and reliable evaluation must consider. Due to this, various graphical performance evaluation methods are increasingly drawing the attention of machine learning, data mining, and pattern recognition communities. The main advantage of these types of methods resides in their ability to depict the trade-offs between evaluation aspects in a multidimensional space rather than reducing these aspects to an arbitrarily chosen (and often biased) single scalar measure. Furthermore, to appropriately select a suitable graphical method for a given task, it is crucial to identify its strengths and weaknesses. This paper surveys various graphical methods often used for predictive performance evaluation. By presenting these methods in the same framework, we hope this paper may shed some light on deciding which methods are more suitable to use in different situations.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This article deals with classification problems involving unequal probabilities in each class and discusses metrics to systems that use multilayer perceptrons neural networks (MLP) for the task of classifying new patterns. In addition we propose three new pruning methods that were compared to other seven existing methods in the literature for MLP networks. All pruning algorithms presented in this paper have been modified by the authors to do pruning of neurons, in order to produce fully connected MLP networks but being small in its intermediary layer. Experiments were carried out involving the E. coli unbalanced classification problem and ten pruning methods. The proposed methods had obtained good results, actually, better results than another pruning methods previously defined at the MLP neural network area. (C) 2014 Elsevier Ltd. All rights reserved.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

SynopsisBackgroundCellulite refers to skin relief alterations in womens thighs and buttocks, causing dissatisfaction and search for treatment. Its physiopathology is complex and not completely understood. Many therapeutic options have been reported with no scientific evidence about benefits. The majority of the studies are not controlled nor randomized; most efficacy endpoints are subjective, like not well-standardized photographs and investigator opinion. Objective measures could improve severity assessment. Our purpose was to correlate non-invasive instrumental measures and standardized clinical evaluation.MethodsTwenty six women presenting cellulite on buttocks, aged from 25 to 41, were evaluated by: body mass index; standardized photography analysis (10-point severity and 5-point photonumeric scales) by five dermatologists; cutometry and high-frequency ultrasonography (dermal density and dermis/hypodermis interface length). Quality of life impact was assessed. Correlations between clinical and instrumental parameters were performed.ResultsGood agreement among dermatologists and main investigator perceptions was detected. Positive correlations: body mass index and clinical scores; ultrasonographic measures. Negative correlation: cutometry and clinical scores. Quality of life score was correlated to dermal collagen density.ConclusionCellulite caused impact in quality of life. Poor correlation between objective measures and clinical evaluation was detected. Cellulite severity assessment is a challenge, and objective parameters should be optimized for clinical trials.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Information is nowadays a key resource: machine learning and data mining techniques have been developed to extract high-level information from great amounts of data. As most data comes in form of unstructured text in natural languages, research on text mining is currently very active and dealing with practical problems. Among these, text categorization deals with the automatic organization of large quantities of documents in priorly defined taxonomies of topic categories, possibly arranged in large hierarchies. In commonly proposed machine learning approaches, classifiers are automatically trained from pre-labeled documents: they can perform very accurate classification, but often require a consistent training set and notable computational effort. Methods for cross-domain text categorization have been proposed, allowing to leverage a set of labeled documents of one domain to classify those of another one. Most methods use advanced statistical techniques, usually involving tuning of parameters. A first contribution presented here is a method based on nearest centroid classification, where profiles of categories are generated from the known domain and then iteratively adapted to the unknown one. Despite being conceptually simple and having easily tuned parameters, this method achieves state-of-the-art accuracy in most benchmark datasets with fast running times. A second, deeper contribution involves the design of a domain-independent model to distinguish the degree and type of relatedness between arbitrary documents and topics, inferred from the different types of semantic relationships between respective representative words, identified by specific search algorithms. The application of this model is tested on both flat and hierarchical text categorization, where it potentially allows the efficient addition of new categories during classification. Results show that classification accuracy still requires improvements, but models generated from one domain are shown to be effectively able to be reused in a different one.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This thesis is aimed to assess similarities and mismatches between the outputs from two independent methods for the cloud cover quantification and classification based on quite different physical basis. One of them is the SAFNWC software package designed to process radiance data acquired by the SEVIRI sensor in the VIS/IR. The other is the MWCC algorithm, which uses the brightness temperatures acquired by the AMSU-B and MHS sensors in their channels centered in the MW water vapour absorption band. At a first stage their cloud detection capability has been tested, by comparing the Cloud Masks they produced. These showed a good agreement between two methods, although some critical situations stand out. The MWCC, in effect, fails to reveal clouds which according to SAFNWC are fractional, cirrus, very low and high opaque clouds. In the second stage of the inter-comparison the pixels classified as cloudy according to both softwares have been. The overall observed tendency of the MWCC method, is an overestimation of the lower cloud classes. Viceversa, the more the cloud top height grows up, the more the MWCC not reveal a certain cloud portion, rather detected by means of the SAFNWC tool. This is what also emerges from a series of tests carried out by using the cloud top height information in order to evaluate the height ranges in which each MWCC category is defined. Therefore, although the involved methods intend to provide the same kind of information, in reality they return quite different details on the same atmospheric column. The SAFNWC retrieval being very sensitive to the top temperature of a cloud, brings the actual level reached by this. The MWCC, by exploiting the capability of the microwaves, is able to give an information about the levels that are located more deeply within the atmospheric column.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Background and purpose: Breast cancer continues to be a health problem for women, representing 28 percent of all female cancers and remaining one of the leading causes of death for women. Breast cancer incidence rates become substantial before the age of 50. After menopause, breast cancer incidence rates continue to increase with age creating a long-lasting source of concern (Harris et al., 1992). Mammography, a technique for the detection of breast tumors in their nonpalpable stage when they are most curable, has taken on considerable importance as a public health measure. The lifetime risk of breast cancer is approximately 1 in 9 and occurs over many decades. Recommendations are that screening be periodic in order to detect cancer at early stages. These recommendations, largely, are not followed. Not only are most women not getting regular mammograms, but this circumstance is particularly the case among older women where regular mammography has been proven to reduce mortality by approximately 30 percent. The purpose of this project was to increase our understanding of factors that are associated with stage of readiness to obtain subsequent mammograms. A secondary purpose of this research was to suggest further conceptual considerations toward the extension of the Transtheoretical Model (TTM) of behavior change to repeat screening mammography. ^ Methods. A sample (n = 1,222) of women 50 years and older in a large multi-specialty clinic in Houston, Texas was surveyed by mail questionnaire regarding their previous screening experience and stage of readiness to obtain repeat screening. A computerized database, maintained on all women who undergo mammography at the clinic, was used to identify women who are eligible for the project. The major statistical technique employed to select the significant variables and to examine the man and interaction effects of independent variables on dependent variables was polychotomous stepwise, logistic regression. A prediction model for each stage of readiness definition was estimated. The expected probabilities for stage of readiness were calculated to assess the magnitude and direction of significant predictors. ^ Results. Analysis showed that both ways of defining stage of readiness for obtaining a screening mammogram were associated with specific constructs, including decisional balance and processes of the change. ^ Conclusions. The results of the present study demonstrate that the TTM appears to translate to repeat mammography screening. Findings in the current study also support finding of previous studies that suggest that stage of readiness is associated with respondent decisional balance and the processes of change. ^

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A land classification method was designed for the Community of Madrid (CM), which has lands suitable for either agriculture use or natural spaces. The process started from an extensive previous CM study that contains sets of land attributes with data for 122 types and a minimum-requirements method providing a land quality classification (SQ) for each land. Borrowing some tools from Operations Research (OR) and from Decision Science, that SQ has been complemented by an additive valuation method that involves a more restricted set of 13 representative attributes analysed using Attribute Valuation Functions to obtain a quality index, QI, and by an original composite method that uses a fuzzy set procedure to obtain a combined quality index, CQI, that contains relevant information from both the SQ and the QI methods.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In the current Information Age, data production and processing demands are ever increasing. This has motivated the appearance of large-scale distributed information. This phenomenon also applies to Pattern Recognition so that classic and common algorithms, such as the k-Nearest Neighbour, are unable to be used. To improve the efficiency of this classifier, Prototype Selection (PS) strategies can be used. Nevertheless, current PS algorithms were not designed to deal with distributed data, and their performance is therefore unknown under these conditions. This work is devoted to carrying out an experimental study on a simulated framework in which PS strategies can be compared under classical conditions as well as those expected in distributed scenarios. Our results report a general behaviour that is degraded as conditions approach to more realistic scenarios. However, our experiments also show that some methods are able to achieve a fairly similar performance to that of the non-distributed scenario. Thus, although there is a clear need for developing specific PS methodologies and algorithms for tackling these situations, those that reported a higher robustness against such conditions may be good candidates from which to start.