92 resultados para one-class classification


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Learning robust subspaces to maximize class discrimination is challenging, and most current works consider a weak connection between dimensionality reduction and classifier design. We propose an alternate framework wherein these two steps are combined in a joint formulation to exploit the direct connection between dimensionality reduction and classification. Specifically, we learn an optimal subspace on the Grassmann manifold jointly minimizing the classification error of an SVM classifier. We minimize the regularized empirical risk over both the hypothesis space of functions that underlies this new generalized multi-class Lagrangian SVM and the Grassmann manifold such that a linear projection is to be found. We propose an iterative algorithm to meet the dual goal of optimizing both the classifier and projection. Extensive numerical studies on challenging datasets show robust performance of the proposed scheme over other alternatives in contexts wherein limited training data is used, verifying the advantage of the joint formulation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In statistical classification work, one method of speeding up the process is to use only a small percentage of the total parameter set available. In this paper, we apply this technique both to the classification of malware and the identification of malware from a set combined with cleanware. In order to demonstrate the usefulness of our method, we use the same sets of malware and cleanware as in an earlier paper. Using the statistical technique Information Gain (IG), we reduce the set of features used in the experiment from 7,605 to just over 1,000. The best accuracy obtained in the former paper using 7,605 features is 97.3% for malware versus cleanware detection and 97.4% for malware family classification; on the reduced feature set, we obtain a (best) accuracy of 94.6% on the malware versus cleanware test and 94.5% on the malware classification test. An interesting feature of the new tests presented here is the reduction in false negative rates by a factor of about 1/3 when compared with the results of the earlier paper. In addition, the speed with which our tests run is reduced by a factor of approximately 3/5 from the times posted for the original paper. The small loss in accuracy and improved false negative rate along with significant improvement in speed indicate that feature reduction should be further pursued as a tool to prevent algorithms from becoming intractable due to too much data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents an application of machine learning to the problem of classifying patients with glaucoma into one of two classes:stable and progressive glaucoma. The novelty of the work is the use of new features for the data analysis combined with machine learning techniques to classify the medical data. The paper describes the new features and the results of using decision trees to separate stable and progressive cases. Furthermore, we show the results of using an incremental learning algorithm for tracking stable and progressive cases over time. In both cases we used a dataset of progressive and stable glaucoma patients obtained from a glaucoma clinic.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study aimed to examine the reliability and validity of the modified Children’s Leisure Activities Study Survey (CLASS) Chinese-version questionnaire in assessing physical activity among Hong Kong Chinese Children. Test-retest reliability was examined in 84 boys and 136 girls aged 9–12 years by comparing data from two administrations of the survey conducted one week apart. Validity was determined by comparing data from the second administration with accelerometer estimates. The results suggested that the questionnaire provided reliable and valid estimates in overall physical activity patterns in Hong Kong Chinese children. However, substantial overestimation was observed in vigorous activity.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Epoetin-δ (Dynepo™ Shire Pharmaceuticals, Basing stoke, UK) is a synthetic form of erythropoietin (EPO) whose resemblance with endogenous EPO makes it hard to identify using the classical identification criteria. Urine samples collected from six healthy volunteers treated with epoetin-δ injections and from a control population were immuno-purified and analyzed with the usual IEF method. On the basis of the EPO profiles integration, a linear multivariate model was computed for discriminant analysis. For each sample, a pattern classification algorithm returned a bands distribution and intensity score (bands intensity score) saying how representative this sample is of one of the two classes, positive or negative. Effort profiles were also integrated in the model. The method yielded a good sensitivity versus specificity relation and was used to determine the detection window of the molecule following multiple injections. The bands intensity score, which can be generalized to epoetin-α and epoetin-β, is proposed as an alternative criterion and a supplementary evidence for the identification of EPO abuse.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This article is devoted to experimental investigation of a novel application of a clustering technique introduced by the authors recently in order to use robust and stable consensus functions in information security, where it is often necessary to process large data sets and monitor outcomes in real time, as it is required, for example, for intrusion detection. Here we concentrate on a particular case of application to profiling of phishing websites. First, we apply several independent clustering algorithms to a randomized sample of data to obtain independent initial clusterings. Silhouette index is used to determine the number of clusters. Second, rank correlation is used to select a subset of features for dimensionality reduction. We investigate the effectiveness of the Pearson Linear Correlation Coefficient, the Spearman Rank Correlation Coefficient and the Goodman--Kruskal Correlation Coefficient in this application. Third, we use a consensus function to combine independent initial clusterings into one consensus clustering. Fourth, we train fast supervised classification algorithms on the resulting consensus clustering in order to enable them to process the whole large data set as well as new data. The precision and recall of classifiers at the final stage of this scheme are critical for the effectiveness of the whole procedure. We investigated various combinations of several correlation coefficients, consensus functions, and a variety of supervised classification algorithms.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a fuzzy ARTMAP (FAM) based modular architecture for multi-class pattern recognition known as modular adaptive resonance theory map (MARTMAP). The prediction of class membership is made collectively by combining outputs from multiple novelty detectors. Distance-based familiarity discrimination is introduced to improve the robustness of MARTMAP in the presence of noise. The effectiveness of the proposed architecture is analyzed and compared with ARTMAP-FD network, FAM network, and One-Against-One Support Vector Machine (OAO-SVM). Experimental results show that MARTMAP is able to retain effective familiarity discrimination in noisy environment, and yet less sensitive to class imbalance problem as compared to its counterparts.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background Pre-school language impairment is common and greatly reduces educational performance. Population attempts to identify children who would benefit from appropriately timed intervention might be improved by greater knowledge about the typical profiles of language development. Specifically, this could be used to help with the early identification of children who will be impaired on school entry.

Methods This study applied longitudinal latent class analysis to assessments at 8, 12, 24, 36 and 48 months on 1113 children from a population-based study, in order to identify classes exhibiting distinct communicative developmental profiles.

Results Five substantive classes were identified: Typical, i.e. development in the typical range at each age; Precocious (late), i.e. typical development in infancy followed by high probabilities of precocity from 24 months onwards; Impaired (early), i.e. high probabilities of impairment up to 12 months followed by typical language development thereafter; Impaired (late), i.e. typical development in infancy but impairment from 24 months onwards; Precocious (early), i.e. high probabilities of precocity in early life followed by typical language by 48 months. The entropy statistic (0.84) suggested classes were fairly well defined, although there was a non-trivial degree of uncertainty in classification of children. That half of the Impaired (late) class was expected to have typical language at 4 years and 6% of the numerically large Typical class was expected to be impaired at 4 years illustrates this. Characteristics indicative of social advantage were more commonly found in the classes with improving profiles.

Conclusions Developmental profiles show that some pre-schoolers' language is characterized by periods of accelerated development, slow development and catch-up growth. Given the uncertainty in classifying children into these profiles, use of this knowledge for identifying children who will be impaired on school entry is not straightforward. The findings do, however, indicate greater need for language enrichment programmes among disadvantaged children.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Radio Frequency Identification (RFID) is a technology that enables the non-contact, automatic and unique identification of objects using radio waves. Its use for commercial applications has recently become attractive with RFID technology seen as the replacement for the optical barcode system that is currently in widespread use. RFID has many advantages over the traditional barcode and these advantages have the potential to significantly increase the efficiency of decentralised business environments such as logistics and supply chain management. One of the important features of an RFID system is its ability to search for a particular tag among a group of tags. In order to ensure the privacy and security of the tags, the search has to be conducted in a secure fashion. To our knowledge not much work has been done in this secure search area of RFID. The limited work that has been done does not comply with the EPC Class-1 Gen-2 standards since most of them use expensive hash operations or sophisticated encryption schemes that cannot be implemented on low-cost passive tags that are highly resource constrained. Our work aims to fill this gap by proposing a serverless ultra-lightweight secure search protocol that does not use the expensive hash functions or any complex encryption schemes but achieves compliance with EPC Class-1 Gen-2 standards while meeting the required security requirements. Our protocol is based on XOR encryption and random numbers - operations that are easily implemented on low-cost RFID tags. Our protocol also provides additional protection using a blind-factor to prevent tracking attacks. Since our protocol is EPC Class-1 Gen-2 compliant it makes it possible to implement it on low-cost passive RFID tags.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

One of the issues associated with pattern classification using data based machine learning systems is the “curse of dimensionality”. In this paper, the circle-segments method is proposed as a feature selection method to identify important input features before the entire data set is provided for learning with machine learning systems. Specifically, four machine learning systems are deployed for classification, viz. Multilayer Perceptron (MLP), Support Vector Machine (SVM), Fuzzy ARTMAP (FAM), and k-Nearest Neighbour (kNN). The integration between the circle-segments method and the machine learning systems has been applied to two case studies comprising one benchmark and one real data sets. Overall, the results after feature selection using the circle segments method demonstrate improvements in performance even with more than 50% of the input features eliminated from the original data sets.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Radio Frequency Identification (RFID) is a technological revolution that is expected to soon replace barcode systems. One of the important features of an RFID system is its ability to search for a particular tag among a group of tags. This task is quite common where RFID systems play a vital role. To our knowledge not much work has been done in this secure search area of RFID. Also, most of the existing work do not comply with the C1G2 standards. Our work aims to fill that gap by proposing a protocol based on Quadratic Residues property that does not use the expensive hash functions or any complex encryption schemes but achieves total compliance with industry standards while meeting the security requirements.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Building on a habitat mapping project completed in 2011, Deakin University was commissioned by Parks Victoria (PV) to apply the same methodology and ground-truth data to a second, more recent and higher resolution satellite image to create habitat maps for areas within the Corner Inlet and Nooramunga Marine and Coastal Park and Ramsar area. A ground-truth data set using in situ video and still photographs was used to develop and assess predictive models of benthic marine habitat distributions incorporating data from both RapidEye satellite imagery (corrected for atmospheric and water column effects by CSIRO) and LiDAR (Light Detection and Ranging) bathymetry. This report describes the results of the mapping effort as well as the methodology used to produce these habitat maps.

Overall accuracies of habitat classifications were good, with error rates similar to or better than the earlier classification (>73 % and kappa values > 0.58 for both study localities). The RapidEye classification failed to accurately detect Pyura and reef habitat classes at the Corner Inlet locality, possibly due to differences in spectral frequencies. For comparison, these categories were combined into a ‘non-seagrass’ category, similar to the one used at the Nooramunga locality in the original classification. Habitats predicted with highest accuracies differed from the earlier classification and were Posidonia in Corner Inlet (89%), and bare sediment (no-visible seagrass class) in Nooramunga (90%). In the Corner Inlet locality reef and Pyura habitat categories were not distinguishable in the repeated classification and so were combined with bare sediments. The majority of remaining classification errors were due to the misclassification of Zosteraceae as bare sediment and vice versa. Dominant habitats were the same as those from the 2011 classification with some differences in extent. For the Corner Inlet study locality the no-visible seagrass category remained the most extensive (9059 ha), followed by Posidonia (5,513 ha) and Zosteraceae (5,504 ha). In Nooramunga no-visible seagrass (6,294 ha), Zosteraceae (3,122 ha) and wet saltmarsh (1,562 ha) habitat classes were most dominant.

Change detection analyses between the 2009 and 2011 imagery were undertaken as part of this project, following the analyses presented in Monk et al. (2011) and incorporating error estimates from both classifications. These analyses indicated some shifts in classification between Posidonia and Zosteraceae as well as a general reduction in the area of Zosteraceae. Issues with classification of mixed beds were apparent, particularly in the main Posidonia bed at Nooramunga where a mosaic of Zosteraceae and Posidonia was seen that was not evident in the ALOS classification. Results of a reanalysis of the 1998-2009 change detection illustrating effects of binning of mixed beds is also provided as an appendix.

This work has been successful in providing baseline maps at an improved level of detail using a repeatable method meaning that any future changes in intertidal and shallow water marine habitats may be assessed in a consistent way with quantitative error assessments. In wider use, these maps should also allow improved conservation planning, advance fisheries and catchment management, and progress infrastructure planning to limit impacts on the Inlet environment.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Thanks to Bollywood, a Non-Resident Indian (NRI) is predominantly imagined, back home in India, as super-rich, fully westernized in manners and doing India proud in foreign lands. One reason for this as explained by renowned Bollywood producer-director Late Yash Chopra, in his address at the first Pravasi Bharatiya Divas (Expatriate Indians Day) in 2003, is that as a director he is also working as a ‘historian’ and carrying on his shoulders the ‘moral responsibility [ … ] to depict India [and the Indian Diaspora] at its best’. In this regard, Ghassan Hage also notes that the ‘last thing’ the migrants (particularly men) would like to share with their families back home is shocking stories about racism, discrimination or prejudices that they may have experienced in public or the workplace. Such a revelation would obviously be followed by ‘why did you make us suffer and move to the end of the world just to get demeaned and insulted?’ Hage further notes that therefore the migrants’ familial and class experiences, be it in films, literature or even some sociological studies, are often ‘portrayed as a positive experience’ and this is ‘how the whole migratory enterprise continues to legitimise itself’'. It could be argued that this is one of the reasons the alleged ‘racist’ attacks against Indian students received so much attention in the Indian media. It was not just discrimination but the notion of discrimination and second class treatment (based on skin colour and origin) against the revered and much envied diasporic Indian that created such a media furor in India.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Toce-Gerstein et al. (Addiction 98:1661–1672, 2003) investigated the distribution of Diagnostic and Statistical Manual for Mental Disorders, 4th edition (DSM-IV) pathological gambling criteria endorsement in a U.S. community sample for those people endorsing a least one of the DSM-IV criteria (n = 399). They proposed a hierarchy of gambling disorders where endorsement of 1–2 criteria were deemed ‘At-Risk’, 3–4 ‘Problem gamblers’, 5–7 ‘Low Pathological’, and 8–10 ‘High Pathological’ gamblers. This article examines these claims in a larger Australian treatment seeking population. Data from 4,349 clients attending specialist problem gambling services were assessed for meeting the ten DSM-IV pathological gambling criteria. Results found higher overall criteria endorsement frequencies, three components, a direct relationship between criteria endorsement and gambling severity, clustering of criteria similar to the Toce-Gerstein et al. taxonomy, high accuracy scores for numerical and criteria specific taxonomies, and also high accuracy scores for dichotomous pathological gambling diagnoses. These results suggest significant complexities in the frequencies of criteria reports and relationships between criteria.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

There is currently considerable imprecision in the nosology of biomarkers used in the study of neuropsychiatric disease. The neuropsychiatric field lags behind others such as oncology, wherein, rather than using 'biomarker' as a blanket term for a diverse range of clinical phenomena, biomarkers have been actively classified into separate categories, including prognostic and predictive tests. A similar taxonomy is proposed for neuropsychiatric diseases in which the core biology remains relatively unknown. This paper divides potential biomarkers into those of (1) risk, (2) diagnosis/trait, (3) state or acuity, (4) stage, (5) treatment response and (6) prognosis, and provides illustrative exemplars. Of course, biomarkers rely on available technology and, as we learn more about the neurobiological correlates of neuropsychiatric disorders, we will realize that the classification of biomarkers across these six categories can change, and some markers may fit into more than one category.Molecular Psychiatry advance online publication, 28 October 2014; doi:10.1038/mp.2014.139.