849 resultados para Classification approach
Resumo:
Within online learning communities, receiving timely and meaningful insights into the quality of learning activities is an important part of an effective educational experience. Commonly adopted methods – such as the Community of Inquiry framework – rely on manual coding of online discussion transcripts, which is a costly and time consuming process. There are several efforts underway to enable the automated classification of online discussion messages using supervised machine learning, which would enable the real-time analysis of interactions occurring within online learning communities. This paper investigates the importance of incorporating features that utilise the structure of on-line discussions for the classification of "cognitive presence" – the central dimension of the Community of Inquiry framework focusing on the quality of students' critical thinking within online learning communities. We implemented a Conditional Random Field classification solution, which incorporates structural features that may be useful in increasing classification performance over other implementations. Our approach leads to an improvement in classification accuracy of 5.8% over current existing techniques when tested on the same dataset, with a precision and recall of 0.630 and 0.504 respectively.
Resumo:
Background - Bipolar disorder (BD) is one of the leading causes of disability worldwide. Patients are further disadvantaged by delays in accurate diagnosis ranging between 5 and 10 years. We applied Gaussian process classifiers (GPCs) to structural magnetic resonance imaging (sMRI) data to evaluate the feasibility of using pattern recognition techniques for the diagnostic classification of patients with BD. Method - GPCs were applied to gray (GM) and white matter (WM) sMRI data derived from two independent samples of patients with BD (cohort 1: n = 26; cohort 2: n = 14). Within each cohort patients were matched on age, sex and IQ to an equal number of healthy controls. Results - The diagnostic accuracy of the GPC for GM was 73% in cohort 1 and 72% in cohort 2; the sensitivity and specificity of the GM classification were respectively 69% and 77% in cohort 1 and 64% and 99% in cohort 2. The diagnostic accuracy of the GPC for WM was 69% in cohort 1 and 78% in cohort 2; the sensitivity and specificity of the WM classification were both 69% in cohort 1 and 71% and 86% respectively in cohort 2. In both samples, GM and WM clusters discriminating between patients and controls were localized within cortical and subcortical structures implicated in BD. Conclusions - Our results demonstrate the predictive value of neuroanatomical data in discriminating patients with BD from healthy individuals. The overlap between discriminative networks and regions implicated in the pathophysiology of BD supports the biological plausibility of the classifiers.
Resumo:
Given a set of images of scenes containing different object categories (e.g. grass, roads) our objective is to discover these objects in each image, and to use this object occurrences to perform a scene classification (e.g. beach scene, mountain scene). We achieve this by using a supervised learning algorithm able to learn with few images to facilitate the user task. We use a probabilistic model to recognise the objects and further we classify the scene based on their object occurrences. Experimental results are shown and evaluated to prove the validity of our proposal. Object recognition performance is compared to the approaches of He et al. (2004) and Marti et al. (2001) using their own datasets. Furthermore an unsupervised method is implemented in order to evaluate the advantages and disadvantages of our supervised classification approach versus an unsupervised one
Resumo:
Fine-grained leaf classification has concentrated on the use of traditional shape and statistical features to classify ideal images. In this paper we evaluate the effectiveness of traditional hand-crafted features and propose the use of deep convolutional neural network (ConvNet) features. We introduce a range of condition variations to explore the robustness of these features, including: translation, scaling, rotation, shading and occlusion. Evaluations on the Flavia dataset demonstrate that in ideal imaging conditions, combining traditional and ConvNet features yields state-of-theart performance with an average accuracy of 97:3%�0:6% compared to traditional features which obtain an average accuracy of 91:2%�1:6%. Further experiments show that this combined classification approach consistently outperforms the best set of traditional features by an average of 5:7% for all of the evaluated condition variations.
Resumo:
Acoustics is a rich source of environmental information that can reflect the ecological dynamics. To deal with the escalating acoustic data, a variety of automated classification techniques have been used for acoustic patterns or scene recognition, including urban soundscapes such as streets and restaurants; and natural soundscapes such as raining and thundering. It is common to classify acoustic patterns under the assumption that a single type of soundscapes present in an audio clip. This assumption is reasonable for some carefully selected audios. However, only few experiments have been focused on classifying simultaneous acoustic patterns in long-duration recordings. This paper proposes a binary relevance based multi-label classification approach to recognise simultaneous acoustic patterns in one-minute audio clips. By utilising acoustic indices as global features and multilayer perceptron as a base classifier, we achieve good classification performance on in-the-field data. Compared with single-label classification, multi-label classification approach provides more detailed information about the distributions of various acoustic patterns in long-duration recordings. These results will merit further biodiversity investigations, such as bird species surveys.
Resumo:
Support vector machine (SVM) is a powerful technique for data classification. Despite of its good theoretic foundations and high classification accuracy, normal SVM is not suitable for classification of large data sets, because the training complexity of SVM is highly dependent on the size of data set. This paper presents a novel SVM classification approach for large data sets by using minimum enclosing ball clustering. After the training data are partitioned by the proposed clustering method, the centers of the clusters are used for the first time SVM classification. Then we use the clusters whose centers are support vectors or those clusters which have different classes to perform the second time SVM classification. In this stage most data are removed. Several experimental results show that the approach proposed in this paper has good classification accuracy compared with classic SVM while the training is significantly faster than several other SVM classifiers.
Resumo:
The water column overlying the submerged aquatic vegetation (SAV) canopy presents difficulties when using remote sensing images for mapping such vegetation. Inherent and apparent water optical properties and its optically active components, which are commonly present in natural waters, in addition to the water column height over the canopy, and plant characteristics are some of the factors that affect the signal from SAV mainly due to its strong energy absorption in the near-infrared. By considering these interferences, a hypothesis was developed that the vegetation signal is better conserved and less absorbed by the water column in certain intervals of the visible region of the spectrum; as a consequence, it is possible to distinguish the SAV signal. To distinguish the signal from SAV, two types of classification approaches were selected. Both of these methods consider the hemispherical-conical reflectance factor (HCRF) spectrum shape, although one type was supervised and the other one was not. The first method adopts cluster analysis and uses the parameters of the band (absorption, asymmetry, height and width) obtained by continuum removal as the input of the classification. The spectral angle mapper (SAM) was adopted as the supervised classification approach. Both approaches tested different wavelength intervals in the visible and near-infrared spectra. It was demonstrated that the 585 to 685-nm interval, corresponding to the green, yellow and red wavelength bands, offered the best results in both classification approaches. However, SAM classification showed better results relative to cluster analysis and correctly separated all spectral curves with or without SAV. Based on this research, it can be concluded that it is possible to discriminate areas with and without SAV using remote sensing. © 2013 by the authors; licensee MDPI, Basel, Switzerland.
Resumo:
High-throughput gene expression technologies such as microarrays have been utilized in a variety of scientific applications. Most of the work has been on assessing univariate associations between gene expression with clinical outcome (variable selection) or on developing classification procedures with gene expression data (supervised learning). We consider a hybrid variable selection/classification approach that is based on linear combinations of the gene expression profiles that maximize an accuracy measure summarized using the receiver operating characteristic curve. Under a specific probability model, this leads to consideration of linear discriminant functions. We incorporate an automated variable selection approach using LASSO. An equivalence between LASSO estimation with support vector machines allows for model fitting using standard software. We apply the proposed method to simulated data as well as data from a recently published prostate cancer study.
Resumo:
The Lena River Delta, situated in Northern Siberia (72.0 - 73.8° N, 122.0 - 129.5° E), is the largest Arctic delta and covers 29,000 km**2. Since natural deltas are characterised by complex geomorphological patterns and various types of ecosystems, high spatial resolution information on the distribution and extent of the delta environments is necessary for a spatial assessment and accurate quantification of biogeochemical processes as drivers for the emission of greenhouse gases from tundra soils. In this study, the first land cover classification for the entire Lena Delta based on Landsat 7 Enhanced Thematic Mapper (ETM+) images was conducted and used for the quantification of methane emissions from the delta ecosystems on the regional scale. The applied supervised minimum distance classification was very effective with the few ancillary data that were available for training site selection. Nine land cover classes of aquatic and terrestrial ecosystems in the wetland dominated (72%) Lena Delta could be defined by this classification approach. The mean daily methane emission of the entire Lena Delta was calculated with 10.35 mg CH4/m**2/d. Taking our multi-scale approach into account we find that the methane source strength of certain tundra wetland types is lower than calculated previously on coarser scales.
Resumo:
Precise, up-to-date and increasingly detailed road maps are crucial for various advanced road applications, such as lane-level vehicle navigation, and advanced driver assistant systems. With the very high resolution (VHR) imagery from digital airborne sources, it will greatly facilitate the data acquisition, data collection and updates if the road details can be automatically extracted from the aerial images. In this paper, we proposed an effective approach to detect road lane information from aerial images with employment of the object-oriented image analysis method. Our proposed algorithm starts with constructing the DSM and true orthophotos from the stereo images. The road lane details are detected using an object-oriented rule based image classification approach. Due to the affection of other objects with similar spectral and geometrical attributes, the extracted road lanes are filtered with the road surface obtained by a progressive two-class decision classifier. The generated road network is evaluated using the datasets provided by Queensland department of Main Roads. The evaluation shows completeness values that range between 76% and 98% and correctness values that range between 82% and 97%.
Resumo:
Automated analysis of the sentiments presented in online consumer feedbacks can facilitate both organizations’ business strategy development and individual consumers’ comparison shopping. Nevertheless, existing opinion mining methods either adopt a context-free sentiment classification approach or rely on a large number of manually annotated training examples to perform context sensitive sentiment classification. Guided by the design science research methodology, we illustrate the design, development, and evaluation of a novel fuzzy domain ontology based contextsensitive opinion mining system. Our novel ontology extraction mechanism underpinned by a variant of Kullback-Leibler divergence can automatically acquire contextual sentiment knowledge across various product domains to improve the sentiment analysis processes. Evaluated based on a benchmark dataset and real consumer reviews collected from Amazon.com, our system shows remarkable performance improvement over the context-free baseline.
Resumo:
The use of the shear wave velocity data as a field index for evaluating the liquefaction potential of sands is receiving increased attention because both shear wave velocity and liquefaction resistance are similarly influenced by many of the same factors such as void ratio, state of stress, stress history and geologic age. In this paper, the potential of support vector machine (SVM) based classification approach has been used to assess the liquefaction potential from actual shear wave velocity data. In this approach, an approximate implementation of a structural risk minimization (SRM) induction principle is done, which aims at minimizing a bound on the generalization error of a model rather than minimizing only the mean square error over the data set. Here SVM has been used as a classification tool to predict liquefaction potential of a soil based on shear wave velocity. The dataset consists the information of soil characteristics such as effective vertical stress (sigma'(v0)), soil type, shear wave velocity (V-s) and earthquake parameters such as peak horizontal acceleration (a(max)) and earthquake magnitude (M). Out of the available 186 datasets, 130 are considered for training and remaining 56 are used for testing the model. The study indicated that SVM can successfully model the complex relationship between seismic parameters, soil parameters and the liquefaction potential. In the model based on soil characteristics, the input parameters used are sigma'(v0), soil type. V-s, a(max) and M. In the other model based on shear wave velocity alone uses V-s, a(max) and M as input parameters. In this paper, it has been demonstrated that Vs alone can be used to predict the liquefaction potential of a soil using a support vector machine model. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
Urbanisation is the increase in the population of cities in proportion to the region's rural population. Urbanisation in India is very rapid with urban population growing at around 2.3 percent per annum. Urban sprawl refers to the dispersed development along highways or surrounding the city and in rural countryside with implications such as loss of agricultural land, open space and ecologically sensitive habitats. Sprawl is thus a pattern and pace of land use in which the rate of land consumed for urban purposes exceeds the rate of population growth resulting in an inefficient and consumptive use of land and its associated resources. This unprecedented urbanisation trend due to burgeoning population has posed serious challenges to the decision makers in the city planning and management process involving plethora of issues like infrastructure development, traffic congestion, and basic amenities (electricity, water, and sanitation), etc. In this context, to aid the decision makers in following the holistic approaches in the city and urban planning, the pattern, analysis, visualization of urban growth and its impact on natural resources has gained importance. This communication, analyses the urbanisation pattern and trends using temporal remote sensing data based on supervised learning using maximum likelihood estimation of multivariate normal density parameters and Bayesian classification approach. The technique is implemented for Greater Bangalore – one of the fastest growing city in the World, with Landsat data of 1973, 1992 and 2000, IRS LISS-3 data of 1999, 2006 and MODIS data of 2002 and 2007. The study shows that there has been a growth of 466% in urban areas of Greater Bangalore across 35 years (1973 to 2007). The study unravels the pattern of growth in Greater Bangalore and its implication on local climate and also on the natural resources, necessitating appropriate strategies for the sustainable management.
Resumo:
Background: Malignancies arising in the large bowel cause the second largest number of deaths from cancer in the Western World. Despite progresses made during the last decades, colorectal cancer remains one of the most frequent and deadly neoplasias in the western countries. Methods: A genomic study of human colorectal cancer has been carried out on a total of 31 tumoral samples, corresponding to different stages of the disease, and 33 non-tumoral samples. The study was carried out by hybridisation of the tumour samples against a reference pool of non-tumoral samples using Agilent Human 1A 60- mer oligo microarrays. The results obtained were validated by qRT-PCR. In the subsequent bioinformatics analysis, gene networks by means of Bayesian classifiers, variable selection and bootstrap resampling were built. The consensus among all the induced models produced a hierarchy of dependences and, thus, of variables. Results: After an exhaustive process of pre-processing to ensure data quality–lost values imputation, probes quality, data smoothing and intraclass variability filtering–the final dataset comprised a total of 8, 104 probes. Next, a supervised classification approach and data analysis was carried out to obtain the most relevant genes. Two of them are directly involved in cancer progression and in particular in colorectal cancer. Finally, a supervised classifier was induced to classify new unseen samples. Conclusions: We have developed a tentative model for the diagnosis of colorectal cancer based on a biomarker panel. Our results indicate that the gene profile described herein can discriminate between non-cancerous and cancerous samples with 94.45% accuracy using different supervised classifiers (AUC values in the range of 0.997 and 0.955).
Resumo:
Mobile malware has continued to grow at an alarming rate despite on-going mitigation efforts. This has been much more prevalent on Android due to being an open platform that is rapidly overtaking other competing platforms in the mobile smart devices market. Recently, a new generation of Android malware families has emerged with advanced evasion capabilities which make them much more difficult to detect using conventional methods. This paper proposes and investigates a parallel machine learning based classification approach for early detection of Android malware. Using real malware samples and benign applications, a composite classification model is developed from parallel combination of heterogeneous classifiers. The empirical evaluation of the model under different combination schemes demonstrates its efficacy and potential to improve detection accuracy. More importantly, by utilizing several classifiers with diverse characteristics, their strengths can be harnessed not only for enhanced Android malware detection but also quicker white box analysis by means of the more interpretable constituent classifiers.