905 resultados para Classification Methods


Relevância:

30.00% 30.00%

Publicador:

Resumo:

2000 Mathematics Subject Classification: 62P10, 62H30

Relevância:

30.00% 30.00%

Publicador:

Resumo:

2000 Mathematics Subject Classification: 62H30, 62J20, 62P12, 68T99

Relevância:

30.00% 30.00%

Publicador:

Resumo:

ACM Computing Classification System (1998): I.4.9, I.4.10.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

2000 Mathematics Subject Classification: 65H10.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

2000 Mathematics Subject Classification: 65G99, 65K10, 47H04.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This research is to establish new optimization methods for pattern recognition and classification of different white blood cells in actual patient data to enhance the process of diagnosis. Beckman-Coulter Corporation supplied flow cytometry data of numerous patients that are used as training sets to exploit the different physiological characteristics of the different samples provided. The methods of Support Vector Machines (SVM) and Artificial Neural Networks (ANN) were used as promising pattern classification techniques to identify different white blood cell samples and provide information to medical doctors in the form of diagnostic references for the specific disease states, leukemia. The obtained results prove that when a neural network classifier is well configured and trained with cross-validation, it can perform better than support vector classifiers alone for this type of data. Furthermore, a new unsupervised learning algorithm---Density based Adaptive Window Clustering algorithm (DAWC) was designed to process large volumes of data for finding location of high data cluster in real-time. It reduces the computational load to ∼O(N) number of computations, and thus making the algorithm more attractive and faster than current hierarchical algorithms.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A mosaic of two WorldView-2 high resolution multispectral images (Acquisition dates: October 2010 and April 2012), in conjunction with field survey data, was used to create a habitat map of the Danajon Bank, Philippines (10°15'0'' N, 124°08'0'' E) using an object-based approach. To create the habitat map, we conducted benthic cover (seafloor) field surveys using two methods. Firstly, we undertook georeferenced point intercept transects (English et al., 1997). For ten sites we recorded habitat cover types at 1 m intervals on 10 m long transects (n= 2,070 points). Second, we conducted geo-referenced spot check surveys, by placing a viewing bucket in the water to estimate the percent cover benthic cover types (n = 2,357 points). Survey locations were chosen to cover a diverse and representative subset of habitats found in the Danajon Bank. The combination of methods was a compromise between the higher accuracy of point intercept transects and the larger sample area achievable through spot check surveys (Roelfsema and Phinn, 2008, doi:10.1117/12.804806). Object-based image analysis, using the field data as calibration data, was used to classify the image mosaic at each of the reef, geomorphic and benthic community levels. The benthic community level segregated the image into a total of 17 pure and mixed benthic classes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Current state of the art techniques for landmine detection in ground penetrating radar (GPR) utilize statistical methods to identify characteristics of a landmine response. This research makes use of 2-D slices of data in which subsurface landmine responses have hyperbolic shapes. Various methods from the field of visual image processing are adapted to the 2-D GPR data, producing superior landmine detection results. This research goes on to develop a physics-based GPR augmentation method motivated by current advances in visual object detection. This GPR specific augmentation is used to mitigate issues caused by insufficient training sets. This work shows that augmentation improves detection performance under training conditions that are normally very difficult. Finally, this work introduces the use of convolutional neural networks as a method to learn feature extraction parameters. These learned convolutional features outperform hand-designed features in GPR detection tasks. This work presents a number of methods, both borrowed from and motivated by the substantial work in visual image processing. The methods developed and presented in this work show an improvement in overall detection performance and introduce a method to improve the robustness of statistical classification.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Hypertrophic cardiomyopathy (HCM) is a cardiovascular disease where the heart muscle is partially thickened and blood flow is - potentially fatally - obstructed. It is one of the leading causes of sudden cardiac death in young people. Electrocardiography (ECG) and Echocardiography (Echo) are the standard tests for identifying HCM and other cardiac abnormalities. The American Heart Association has recommended using a pre-participation questionnaire for young athletes instead of ECG or Echo tests due to considerations of cost and time involved in interpreting the results of these tests by an expert cardiologist. Initially we set out to develop a classifier for automated prediction of young athletes’ heart conditions based on the answers to the questionnaire. Classification results and further in-depth analysis using computational and statistical methods indicated significant shortcomings of the questionnaire in predicting cardiac abnormalities. Automated methods for analyzing ECG signals can help reduce cost and save time in the pre-participation screening process by detecting HCM and other cardiac abnormalities. Therefore, the main goal of this dissertation work is to identify HCM through computational analysis of 12-lead ECG. ECG signals recorded on one or two leads have been analyzed in the past for classifying individual heartbeats into different types of arrhythmia as annotated primarily in the MIT-BIH database. In contrast, we classify complete sequences of 12-lead ECGs to assign patients into two groups: HCM vs. non-HCM. The challenges and issues we address include missing ECG waves in one or more leads and the dimensionality of a large feature-set. We address these by proposing imputation and feature-selection methods. We develop heartbeat-classifiers by employing Random Forests and Support Vector Machines, and propose a method to classify full 12-lead ECGs based on the proportion of heartbeats classified as HCM. The results from our experiments show that the classifiers developed using our methods perform well in identifying HCM. Thus the two contributions of this thesis are the utilization of computational and statistical methods for discovering shortcomings in a current screening procedure and the development of methods to identify HCM through computational analysis of 12-lead ECG signals.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background and aims: Machine learning techniques for the text mining of cancer-related clinical documents have not been sufficiently explored. Here some techniques are presented for the pre-processing of free-text breast cancer pathology reports, with the aim of facilitating the extraction of information relevant to cancer staging.

Materials and methods: The first technique was implemented using the freely available software RapidMiner to classify the reports according to their general layout: ‘semi-structured’ and ‘unstructured’. The second technique was developed using the open source language engineering framework GATE and aimed at the prediction of chunks of the report text containing information pertaining to the cancer morphology, the tumour size, its hormone receptor status and the number of positive nodes. The classifiers were trained and tested respectively on sets of 635 and 163 manually classified or annotated reports, from the Northern Ireland Cancer Registry.

Results: The best result of 99.4% accuracy – which included only one semi-structured report predicted as unstructured – was produced by the layout classifier with the k nearest algorithm, using the binary term occurrence word vector type with stopword filter and pruning. For chunk recognition, the best results were found using the PAUM algorithm with the same parameters for all cases, except for the prediction of chunks containing cancer morphology. For semi-structured reports the performance ranged from 0.97 to 0.94 and from 0.92 to 0.83 in precision and recall, while for unstructured reports performance ranged from 0.91 to 0.64 and from 0.68 to 0.41 in precision and recall. Poor results were found when the classifier was trained on semi-structured reports but tested on unstructured.

Conclusions: These results show that it is possible and beneficial to predict the layout of reports and that the accuracy of prediction of which segments of a report may contain certain information is sensitive to the report layout and the type of information sought.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Evidence suggests that health benefits are associated with consuming recommended amounts of fruits and vegetables (F&V), yet standardised assessment methods to measure F&V intake are lacking. The current review aims to identify methods to assess F&V intake among children and adults in pan-European studies and inform the development of the DEDIPAC (DEterminants of DIet and Physical Activity) toolbox of methods suitable for use in future European studies. A literature search was conducted using three electronic databases and by hand-searching reference lists. English-language studies of any design which assessed F&V intake were included in the review. Studies involving two or more European countries were included in the review. Healthy, free-living children or adults. The review identified fifty-one pan-European studies which assessed F&V intake. The FFQ was the most commonly used (n 42), followed by 24 h recall (n 11) and diet records/diet history (n 7). Differences existed between the identified methods; for example, the number of F&V items on the FFQ and whether potatoes/legumes were classified as vegetables. In total, eight validated instruments were identified which assessed F&V intake among adults, adolescents or children. The current review indicates that an agreed classification of F&V is needed in order to standardise intake data more effectively between European countries. Validated methods used in pan-European populations encompassing a range of European regions were identified. These methods should be considered for use by future studies focused on evaluating intake of F&V.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The main drivers for the development and evolution of Cyber Physical Systems (CPS) are the reduction of development costs and time along with the enhancement of the designed products. The aim of this survey paper is to provide an overview of different types of system and the associated transition process from mechatronics to CPS and cloud-based (IoT) systems. It will further consider the requirement that methodologies for CPS-design should be part of a multi-disciplinary development process within which designers should focus not only on the separate physical and computational components, but also on their integration and interaction. Challenges related to CPS-design are therefore considered in the paper from the perspectives of the physical processes, computation and integration respectively. Illustrative case studies are selected from different system levels starting with the description of the overlaying concept of Cyber Physical Production Systems (CPPSs). The analysis and evaluation of the specific properties of a sub-system using a condition monitoring system, important for the maintenance purposes, is then given for a wind turbine.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Introduction: Baseline severity and clinical stroke syndrome (Oxford Community Stroke Project, OCSP) classification are predictors of outcome in stroke. We used data from the ‘Tinzaparin in Acute Ischaemic Stroke Trial’ (TAIST) to assess the relationship between stroke severity, early recovery, outcome and OCSP syndrome. Methods: TAIST was a randomised controlled trial assessing the safety and efficacy of tinzaparin versus aspirin in 1,484 patients with acute ischaemic stroke. Severity was measured as the Scandinavian Neurological Stroke Scale (SNSS) at baseline and days 4, 7 and 10, and baseline OCSP clinical classification recorded: total anterior circulation infarct (TACI), partial anterior circulation infarct (PACI), lacunar infarct (LACI) and posterior circulation infarction (POCI). Recovery was calculated as change in SNSS from baseline at day 4 and 10. The relationship between stroke syndrome and SNSS at days 4 and 10, and outcome (modified Rankin scale at 90 days) were assessed. Results: Stroke severity was significantly different between TACI (most severe) and LACI (mildest) at all four time points (p<0.001), with no difference between PACI and POCI. The largest change in SNSS score occurred between baseline and day 4; improvement was least in TACI (median 2 units), compared to other groups (median 3 units) (p<0.001). If SNSS did not improve by day 4, then early recovery and late functional outcome tended to be limited irrespective of clinical syndrome (SNSS, baseline: 31, day 10: 32; mRS, day 90: 4); patients who recovered early tended to continue to improve and had better functional outcome irrespective of syndrome (SNSS, baseline: 35, day 10: 50; mRS, day 90: 2). Conclusions: Although functional outcome is related to baseline clinical syndrome (best with LACI, worst with TACI), patients who improve early have a more favourable functional outcome, irrespective of their OCSP syndrome. Hence, patients with a TACI syndrome may still achieve a reasonable outcome if early recovery occurs.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Statistical analysis of DNA microarray data provides a valuable diagnostic tool for the investigation of genetic components of diseases. To take advantage of the multitude of available data sets and analysis methods, it is desirable to combine both different algorithms and data from different studies. Applying ensemble learning, consensus clustering and cross-study normalization methods for this purpose in an almost fully automated process and linking different analysis modules together under a single interface would simplify many microarray analysis tasks. Results: We present ArrayMining.net, a web-application for microarray analysis that provides easy access to a wide choice of feature selection, clustering, prediction, gene set analysis and cross-study normalization methods. In contrast to other microarray-related web-tools, multiple algorithms and data sets for an analysis task can be combined using ensemble feature selection, ensemble prediction, consensus clustering and cross-platform data integration. By interlinking different analysis tools in a modular fashion, new exploratory routes become available, e.g. ensemble sample classification using features obtained from a gene set analysis and data from multiple studies. The analysis is further simplified by automatic parameter selection mechanisms and linkage to web tools and databases for functional annotation and literature mining. Conclusion: ArrayMining.net is a free web-application for microarray analysis combining a broad choice of algorithms based on ensemble and consensus methods, using automatic parameter selection and integration with annotation databases.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The paper catalogues the procedures and steps involved in agroclimatic classification. These vary from conventional descriptive methods to modern computer-based numerical techniques. There are three mutually independent numerical classification techniques, namely Ordination, Cluster analysis, and Minimum spanning tree; and under each technique there are several forms of grouping techniques existing. The vhoice of numerical classification procedure differs with the type of data set. In the case of numerical continuous data sets with booth positive and negative values, the simple and least controversial procedures are unweighted pair group method (UPGMA) and weighted pair group method (WPGMA) under clustering techniques with similarity measure obtained either from Gower metric or standardized Euclidean metric. Where the number of attributes are large, these could be reduced to fewer new attributes defined by the principal components or coordinates by ordination technique. The first few components or coodinates explain the maximum variance in the data matrix. These revided attributes are less affected by noise in the data set. It is possible to check misclassifications using minimum spanning tree.