862 resultados para supervised classification
Resumo:
Joint sentiment-topic (JST) model was previously proposed to detect sentiment and topic simultaneously from text. The only supervision required by JST model learning is domain-independent polarity word priors. In this paper, we modify the JST model by incorporating word polarity priors through modifying the topic-word Dirichlet priors. We study the polarity-bearing topics extracted by JST and show that by augmenting the original feature space with polarity-bearing topics, the in-domain supervised classifiers learned from augmented feature representation achieve the state-of-the-art performance of 95% on the movie review data and an average of 90% on the multi-domain sentiment dataset. Furthermore, using feature augmentation and selection according to the information gain criteria for cross-domain sentiment classification, our proposed approach performs either better or comparably compared to previous approaches. Nevertheless, our approach is much simpler and does not require difficult parameter tuning.
Resumo:
Web APIs have gained increasing popularity in recent Web service technology development owing to its simplicity of technology stack and the proliferation of mashups. However, efficiently discovering Web APIs and the relevant documentations on the Web is still a challenging task even with the best resources available on the Web. In this paper we cast the problem of detecting the Web API documentations as a text classification problem of classifying a given Web page as Web API associated or not. We propose a supervised generative topic model called feature latent Dirichlet allocation (feaLDA) which offers a generic probabilistic framework for automatic detection of Web APIs. feaLDA not only captures the correspondence between data and the associated class labels, but also provides a mechanism for incorporating side information such as labelled features automatically learned from data that can effectively help improving classification performance. Extensive experiments on our Web APIs documentation dataset shows that the feaLDA model outperforms three strong supervised baselines including naive Bayes, support vector machines, and the maximum entropy model, by over 3% in classification accuracy. In addition, feaLDA also gives superior performance when compared against other existing supervised topic models.
Resumo:
Report published in the Proceedings of the National Conference on "Education in the Information Society", Plovdiv, May, 2013
Resumo:
The social media classification problems draw more and more attention in the past few years. With the rapid development of Internet and the popularity of computers, there is astronomical amount of information in the social network (social media platforms). The datasets are generally large scale and are often corrupted by noise. The presence of noise in training set has strong impact on the performance of supervised learning (classification) techniques. A budget-driven One-class SVM approach is presented in this thesis that is suitable for large scale social media data classification. Our approach is based on an existing online One-class SVM learning algorithm, referred as STOCS (Self-Tuning One-Class SVM) algorithm. To justify our choice, we first analyze the noise-resilient ability of STOCS using synthetic data. The experiments suggest that STOCS is more robust against label noise than several other existing approaches. Next, to handle big data classification problem for social media data, we introduce several budget driven features, which allow the algorithm to be trained within limited time and under limited memory requirement. Besides, the resulting algorithm can be easily adapted to changes in dynamic data with minimal computational cost. Compared with two state-of-the-art approaches, Lib-Linear and kNN, our approach is shown to be competitive with lower requirements of memory and time.
Resumo:
The Lena River Delta, situated in Northern Siberia (72.0 - 73.8° N, 122.0 - 129.5° E), is the largest Arctic delta and covers 29,000 km**2. Since natural deltas are characterised by complex geomorphological patterns and various types of ecosystems, high spatial resolution information on the distribution and extent of the delta environments is necessary for a spatial assessment and accurate quantification of biogeochemical processes as drivers for the emission of greenhouse gases from tundra soils. In this study, the first land cover classification for the entire Lena Delta based on Landsat 7 Enhanced Thematic Mapper (ETM+) images was conducted and used for the quantification of methane emissions from the delta ecosystems on the regional scale. The applied supervised minimum distance classification was very effective with the few ancillary data that were available for training site selection. Nine land cover classes of aquatic and terrestrial ecosystems in the wetland dominated (72%) Lena Delta could be defined by this classification approach. The mean daily methane emission of the entire Lena Delta was calculated with 10.35 mg CH4/m**2/d. Taking our multi-scale approach into account we find that the methane source strength of certain tundra wetland types is lower than calculated previously on coarser scales.
Resumo:
A certain type of bacterial inclusion, known as a bacterial microcompartment, was recently identified and imaged through cryo-electron tomography. A reconstructed 3D object from single-axis limited angle tilt-series cryo-electron tomography contains missing regions and this problem is known as the missing wedge problem. Due to missing regions on the reconstructed images, analyzing their 3D structures is a challenging problem. The existing methods overcome this problem by aligning and averaging several similar shaped objects. These schemes work well if the objects are symmetric and several objects with almost similar shapes and sizes are available. Since the bacterial inclusions studied here are not symmetric, are deformed, and show a wide range of shapes and sizes, the existing approaches are not appropriate. This research develops new statistical methods for analyzing geometric properties, such as volume, symmetry, aspect ratio, polyhedral structures etc., of these bacterial inclusions in presence of missing data. These methods work with deformed and non-symmetric varied shaped objects and do not necessitate multiple objects for handling the missing wedge problem. The developed methods and contributions include: (a) an improved method for manual image segmentation, (b) a new approach to 'complete' the segmented and reconstructed incomplete 3D images, (c) a polyhedral structural distance model to predict the polyhedral shapes of these microstructures, (d) a new shape descriptor for polyhedral shapes, named as polyhedron profile statistic, and (e) the Bayes classifier, linear discriminant analysis and support vector machine based classifiers for supervised incomplete polyhedral shape classification. Finally, the predicted 3D shapes for these bacterial microstructures belong to the Johnson solids family, and these shapes along with their other geometric properties are important for better understanding of their chemical and biological characteristics.
Resumo:
The accuracy of a map is dependent on the reference dataset used in its construction. Classification analyses used in thematic mapping can, for example, be sensitive to a range of sampling and data quality concerns. With particular focus on the latter, the effects of reference data quality on land cover classifications from airborne thematic mapper data are explored. Variations in sampling intensity and effort are highlighted in a dataset that is widely used in mapping and modelling studies; these may need accounting for in analyses. The quality of the labelling in the reference dataset was also a key variable influencing mapping accuracy. Accuracy varied with the amount and nature of mislabelled training cases with the nature of the effects varying between classifiers. The largest impacts on accuracy occurred when mislabelling involved confusion between similar classes. Accuracy was also typically negatively related to the magnitude of mislabelled cases and the support vector machine (SVM), which has been claimed to be relatively insensitive to training data error, was the most sensitive of the set of classifiers investigated, with overall classification accuracy declining by 8% (significant at 95% level of confidence) with the use of a training set containing 20% mislabelled cases.
Resumo:
Ochnaceae s.str. (Malpighiales) are a pantropical family of about 500 species and 27 genera of almost exclusively woody plants. Infrafamilial classification and relationships have been controversial partially due to the lack of a robust phylogenetic framework. Including all genera except Indosinia and Perissocarpa and DNA sequence data for five DNA regions (ITS, matK, ndhF, rbcL, trnL-F), we provide for the first time a nearly complete molecular phylogenetic analysis of Ochnaceae s.l. resolving most of the phylogenetic backbone of the family. Based on this, we present a new classification of Ochnaceae s.l., with Medusagynoideae and Quiinoideae included as subfamilies and the former subfamilies Ochnoideae and Sauvagesioideae recognized at the rank of tribe. Our data support a monophyletic Ochneae, but Sauvagesieae in the traditional circumscription is paraphyletic because Testulea emerges as sister to the rest of Ochnoideae, and the next clade shows Luxemburgia+Philacra as sister group to the remaining Ochnoideae. To avoid paraphyly, we classify Luxemburgieae and Testuleeae as new tribes. The African genus Lophira, which has switched between subfamilies (here tribes) in past classifications, emerges as sister to all other Ochneae. Thus, endosperm-free seeds and ovules with partly to completely united integuments (resulting in an apparently single integument) are characters that unite all members of that tribe. The relationships within its largest clade, Ochnineae (former Ochneae), are poorly resolved, but former Ochninae (Brackenridgea, Ochna) are polyphyletic. Within Sauvagesieae, the genus Sauvagesia in its broad circumscription is polyphyletic as Sauvagesia serrata is sister to a clade of Adenarake, Sauvagesia spp., and three other genera. Within Quiinoideae, in contrast to former phylogenetic hypotheses, Lacunaria and Touroulia form a clade that is sister to Quiina. Bayesian ancestral state reconstructions showed that zygomorphic flowers with adaptations to buzz-pollination (poricidal anthers), a syncarpous gynoecium (a near-apocarpous gynoecium evolved independently in Quiinoideae and Ochninae), numerous ovules, septicidal capsules, and winged seeds with endosperm are the ancestral condition in Ochnoideae. Although in some lineages poricidal anthers were lost secondarily, the evolution of poricidal superstructures secured the maintenance of buzz-pollination in some of these genera, indicating a strong selective pressure on keeping that specialized pollination system.
Resumo:
Diabetic Retinopathy (DR) is a complication of diabetes that can lead to blindness if not readily discovered. Automated screening algorithms have the potential to improve identification of patients who need further medical attention. However, the identification of lesions must be accurate to be useful for clinical application. The bag-of-visual-words (BoVW) algorithm employs a maximum-margin classifier in a flexible framework that is able to detect the most common DR-related lesions such as microaneurysms, cotton-wool spots and hard exudates. BoVW allows to bypass the need for pre- and post-processing of the retinographic images, as well as the need of specific ad hoc techniques for identification of each type of lesion. An extensive evaluation of the BoVW model, using three large retinograph datasets (DR1, DR2 and Messidor) with different resolution and collected by different healthcare personnel, was performed. The results demonstrate that the BoVW classification approach can identify different lesions within an image without having to utilize different algorithms for each lesion reducing processing time and providing a more flexible diagnostic system. Our BoVW scheme is based on sparse low-level feature detection with a Speeded-Up Robust Features (SURF) local descriptor, and mid-level features based on semi-soft coding with max pooling. The best BoVW representation for retinal image classification was an area under the receiver operating characteristic curve (AUC-ROC) of 97.8% (exudates) and 93.5% (red lesions), applying a cross-dataset validation protocol. To assess the accuracy for detecting cases that require referral within one year, the sparse extraction technique associated with semi-soft coding and max pooling obtained an AUC of 94.2 ± 2.0%, outperforming current methods. Those results indicate that, for retinal image classification tasks in clinical practice, BoVW is equal and, in some instances, surpasses results obtained using dense detection (widely believed to be the best choice in many vision problems) for the low-level descriptors.
Resumo:
261
Resumo:
The Subaxial Injury Classification (SLIC) system and severity score has been developed to help surgeons in the decision-making process of treatment of subaxial cervical spine injuries. A detailed description of all potential scored injures of the SLIC is lacking. We performed a systematic review in the PubMed database from 2007 to 2014 to describe the relationship between the scored injuries in the SLIC and their eventual treatment according to the system score. Patients with an SLIC of 1-3 points (conservative treatment) are neurologically intact with the spinous process, laminar or small facet fractures. Patients with compression and burst fractures who are neurologically intact are also treated nonsurgically. Patients with an SLIC of 4 points may have an incomplete spinal cord injury such as a central cord syndrome, compression injuries with incomplete neurologic deficits and burst fractures with complete neurologic deficits. SLIC of 5-10 points includes distraction and rotational injuries, traumatic disc herniation in the setting of a neurological deficit and burst fractures with an incomplete neurologic deficit. The SLIC injury severity score can help surgeons guide fracture treatment. Knowledge of the potential scored injures and their relationships with the SLIC are of paramount importance for spine surgeons who treated subaxial cervical spine injuries.
Resumo:
to assess the construct validity and reliability of the Pediatric Patient Classification Instrument. correlation study developed at a teaching hospital. The classification involved 227 patients, using the pediatric patient classification instrument. The construct validity was assessed through the factor analysis approach and reliability through internal consistency. the Exploratory Factor Analysis identified three constructs with 67.5% of variance explanation and, in the reliability assessment, the following Cronbach's alpha coefficients were found: 0.92 for the instrument as a whole; 0.88 for the Patient domain; 0.81 for the Family domain; 0.44 for the Therapeutic procedures domain. the instrument evidenced its construct validity and reliability, and these analyses indicate the feasibility of the instrument. The validation of the Pediatric Patient Classification Instrument still represents a challenge, due to its relevance for a closer look at pediatric nursing care and management. Further research should be considered to explore its dimensionality and content validity.
Resumo:
Frankfurters are widely consumed all over the world, and the production requires a wide range of meat and non-meat ingredients. Due to these characteristics, frankfurters are products that can be easily adulterated with lower value meats, and the presence of undeclared species. Adulterations are often still difficult to detect, due the fact that the adulterant components are usually very similar to the authentic product. In this work, FT-Raman spectroscopy was employed as a rapid technique for assessing the quality of frankfurters. Based on information provided by the Raman spectra, a multivariate classification model was developed to identify the frankfurter type. The aim was to study three types of frankfurters (chicken, turkey and mixed meat) according to their Raman spectra, based on the fatty vibrational bands. Classification model was built using partial least square discriminant analysis (PLS-DA) and the performance model was evaluated in terms of sensitivity, specificity, accuracy, efficiency and Matthews's correlation coefficient. The PLS-DA models give sensitivity and specificity values on the test set in the ranges of 88%-100%, showing good performance of the classification models. The work shows the Raman spectroscopy with chemometric tools can be used as an analytical tool in quality control of frankfurters.
Resumo:
To compare the distributions of patients with clinical-pathological subtypes of luminal B-like breast cancer according to the 2011 and 2013 St. Gallen International Breast Cancer Conference Expert Panel. We studied 142 women with breast cancer who were positive to estrogen receptor and had been treated in São Paulo state, southeast Brazil. The expression of the following receptors was assessed by immunohistochemistry: estrogen, progesterone (PR) and Ki-67. The expression of HER-2 was measured by fluorescent in situ hybridization analysis in tissue microarray. There were 29 cases of luminal A breast cancers according to the 2011 St. Gallen International Breast Cancer Conference Expert Panel that were classified as luminal B-like in the 2013 version. Among the 65 luminal B-like breast cancer cases, 29 (45%) were previous luminal A tumors, 15 cases (20%) had a Ki-67 >14% and were at least 20% PR positive and 21 cases (35%) had Ki-67 >14% and more than 20% were PR positive. The 2013 St. Gallen consensus updated the definition of intrinsic molecular subtypes and increased the number of patients classified as having luminal B-like breast cancer in our series, for whom the use of cytotoxic drugs will probably be proposed with additional treatment cost.