984 resultados para Training sets


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Spatial activity recognition is challenging due to the amount of noise incorporated during video tracking in everyday environments. We address the spatial recognition problem with a biologically-inspired chemotactic model that is capable of handling noisy data. The model is based on bacterial chemotaxis, a process that allows bacteria to change motile behaviour in relation to environmental gradients. Through adoption of chemotactic principles, we propose the chemotactic model and evaluate its performance in a smart house environment. The model exhibits greater than 99% recognition performance with a diverse six class dataset and outperforms the Hidden Markov Model (HMM). The approach also maintains high accuracy (90-99%) with small training sets of one to five sequences. Importantly, unlike other low-level spatial activity recognition models, we show that the chemotactic model is capable of recognising simple interwoven activities.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this work, a new approach for supervised pattern recognition is presented which improves the learning algorithm of the Optimum-Path Forest classifier (OPF), centered on detection and elimination of outliers in the training set. Identification of outliers is based on a penalty computed for each sample in the training set from the corresponding number of imputable false positive and false negative classification of samples. This approach enhances the accuracy of OPF while still gaining in classification time, at the expense of a slight increase in training time. © 2010 Springer-Verlag.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper we would like to shed light the problem of efficiency and effectiveness of image classification in large datasets. As the amount of data to be processed and further classified has increased in the last years, there is a need for faster and more precise pattern recognition algorithms in order to perform online and offline training and classification procedures. We deal here with the problem of moist area classification in radar image in a fast manner. Experimental results using Optimum-Path Forest and its training set pruning algorithm also provided and discussed. © 2011 IEEE.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Pattern recognition in large amount of data has been paramount in the last decade, since that is not straightforward to design interactive and real time classification systems. Very recently, the Optimum-Path Forest classifier was proposed to overcome such limitations, together with its training set pruning algorithm, which requires a parameter that has been empirically set up to date. In this paper, we propose a Harmony Search-based algorithm that can find near optimal values for that. The experimental results have showed that our algorithm is able to find proper values for the OPF pruning algorithm parameter. © 2011 IEEE.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The discovering process of new morbid genes and new target proteins for drugs have been shown to be very costly and laborious. Having in view cutting costs and speeding up this process, we propose, in this work, a new method to determine the gene druggability score and morbidity score, the probabilities of the protein encoded by the gene have the characteristics that make it a new target for drugs and in case of an alteration in that gene, we observed a phenotype that characterizes a genetic based illness. To determine these characteristics, we built, analyzed and determined the characteristics of the topology of the integrated molecular interactions network among human genes containing physical interactions between proteins, metabolic interactions and interactions of transcriptional regulation, and included other data such as level of gene transcription and cellular localization of the protein encoded by the gene. We tested our model in training sets and achieved results equal or better than the ones achieved by similar methods in the literature. Finally, with the purpose of investigating whether the assigned scores resembles the potential druggabilities and morbities of the previously unclassi ed genes, we looked for evidences in biomedical literature supporting the potential druggability and morbidity status of genes with the 10 highest scores. We found clear evidences for 73% and 90% of potential druggable and morbid genes respectively

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Dimensionality reduction is employed for visual data analysis as a way to obtaining reduced spaces for high dimensional data or to mapping data directly into 2D or 3D spaces. Although techniques have evolved to improve data segregation on reduced or visual spaces, they have limited capabilities for adjusting the results according to user's knowledge. In this paper, we propose a novel approach to handling both dimensionality reduction and visualization of high dimensional data, taking into account user's input. It employs Partial Least Squares (PLS), a statistical tool to perform retrieval of latent spaces focusing on the discriminability of the data. The method employs a training set for building a highly precise model that can then be applied to a much larger data set very effectively. The reduced data set can be exhibited using various existing visualization techniques. The training data is important to code user's knowledge into the loop. However, this work also devises a strategy for calculating PLS reduced spaces when no training data is available. The approach produces increasingly precise visual mappings as the user feeds back his or her knowledge and is capable of working with small and unbalanced training sets.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Chironomid-temperature inference models based on North American, European and combined surface sediment training sets were compared to assess the overall reliability of their predictions. Between 67 and 76 of the major chironomid taxa in each data set showed a unimodal response to July temperature, whereas between 5 and 22 of the common taxa showed a sigmoidal response. July temperature optima were highly correlated among the training sets, but the correlations for other taxon parameters such as tolerances and weighted averaging partial least squares (WA-PLS) and partial least squares (PLS) regression coefficients were much weaker. PLS, weighted averaging, WA-PLS, and the Modern Analogue Technique, all provided useful and reliable temperature inferences. Although jack-knifed error statistics suggested that two-component WA-PLS models had the highest predictive power, intercontinental tests suggested that other inference models performed better. The various models were able to provide good July temperature inferences, even where neither good nor close modern analogues for the fossil chironomid assemblages existed. When the models were applied to fossil Lateglacial assemblages from North America and Europe, the inferred rates and magnitude of July temperature changes varied among models. All models, however, revealed similar patterns of Lateglacial temperature change. Depending on the model used, the inferred Younger Dryas July temperature decrease ranged between 2.5 and 6°C.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The objective of this study was to assess the potential of visible and near infrared spectroscopy (VIS+NIRS) combined with multivariate analysis for identifying the geographical origin of cork. The study was carried out on cork planks and natural cork stoppers from the most representative cork-producing areas in the world. Two training sets of international and national cork planks were studied. The first set comprised a total of 479 samples from Morocco, Portugal, and Spain, while the second set comprised a total of 179 samples from the Spanish regions of Andalusia, Catalonia, and Extremadura. A training set of 90 cork stoppers from Andalusia and Catalonia was also studied. Original spectroscopic data were obtained for the transverse sections of the cork planks and for the body and top of the cork stoppers by means of a 6500 Foss-NIRSystems SY II spectrophotometer using a fiber optic probe. Remote reflectance was employed in the wavelength range of 400 to 2500 nm. After analyzing the spectroscopic data, discriminant models were obtained by means of partial least square (PLS) with 70% of the samples. The best models were then validated using 30% of the remaining samples. At least 98% of the international cork plank samples and 95% of the national samples were correctly classified in the calibration and validation stage. The best model for the cork stoppers was obtained for the top of the stoppers, with at least 90% of the samples being correctly classified. The results demonstrate the potential of VIS + NIRS technology as a rapid and accurate method for predicting the geographical origin of cork plank and stoppers

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We perform a review of Web Mining techniques and we describe a Bootstrap Statistics methodology applied to pattern model classifier optimization and verification for Supervised Learning for Tour-Guide Robot knowledge repository management. It is virtually impossible to test thoroughly Web Page Classifiers and many other Internet Applications with pure empirical data, due to the need for human intervention to generate training sets and test sets. We propose using the computer-based Bootstrap paradigm to design a test environment where they are checked with better reliability.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background: Determination of the subcellular location of a protein is essential to understanding its biochemical function. This information can provide insight into the function of hypothetical or novel proteins. These data are difficult to obtain experimentally but have become especially important since many whole genome sequencing projects have been finished and many resulting protein sequences are still lacking detailed functional information. In order to address this paucity of data, many computational prediction methods have been developed. However, these methods have varying levels of accuracy and perform differently based on the sequences that are presented to the underlying algorithm. It is therefore useful to compare these methods and monitor their performance. Results: In order to perform a comprehensive survey of prediction methods, we selected only methods that accepted large batches of protein sequences, were publicly available, and were able to predict localization to at least nine of the major subcellular locations (nucleus, cytosol, mitochondrion, extracellular region, plasma membrane, Golgi apparatus, endoplasmic reticulum (ER), peroxisome, and lysosome). The selected methods were CELLO, MultiLoc, Proteome Analyst, pTarget and WoLF PSORT. These methods were evaluated using 3763 mouse proteins from SwissProt that represent the source of the training sets used in development of the individual methods. In addition, an independent evaluation set of 2145 mouse proteins from LOCATE with a bias towards the subcellular localization underrepresented in SwissProt was used. The sensitivity and specificity were calculated for each method and compared to a theoretical value based on what might be observed by random chance. Conclusion: No individual method had a sufficient level of sensitivity across both evaluation sets that would enable reliable application to hypothetical proteins. All methods showed lower performance on the LOCATE dataset and variable performance on individual subcellular localizations was observed. Proteins localized to the secretory pathway were the most difficult to predict, while nuclear and extracellular proteins were predicted with the highest sensitivity.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background - Vaccine development in the post-genomic era often begins with the in silico screening of genome information, with the most probable protective antigens being predicted rather than requiring causative microorganisms to be grown. Despite the obvious advantages of this approach – such as speed and cost efficiency – its success remains dependent on the accuracy of antigen prediction. Most approaches use sequence alignment to identify antigens. This is problematic for several reasons. Some proteins lack obvious sequence similarity, although they may share similar structures and biological properties. The antigenicity of a sequence may be encoded in a subtle and recondite manner not amendable to direct identification by sequence alignment. The discovery of truly novel antigens will be frustrated by their lack of similarity to antigens of known provenance. To overcome the limitations of alignment-dependent methods, we propose a new alignment-free approach for antigen prediction, which is based on auto cross covariance (ACC) transformation of protein sequences into uniform vectors of principal amino acid properties. Results - Bacterial, viral and tumour protein datasets were used to derive models for prediction of whole protein antigenicity. Every set consisted of 100 known antigens and 100 non-antigens. The derived models were tested by internal leave-one-out cross-validation and external validation using test sets. An additional five training sets for each class of antigens were used to test the stability of the discrimination between antigens and non-antigens. The models performed well in both validations showing prediction accuracy of 70% to 89%. The models were implemented in a server, which we call VaxiJen. Conclusion - VaxiJen is the first server for alignment-independent prediction of protective antigens. It was developed to allow antigen classification solely based on the physicochemical properties of proteins without recourse to sequence alignment. The server can be used on its own or in combination with alignment-based prediction methods.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This research is to establish new optimization methods for pattern recognition and classification of different white blood cells in actual patient data to enhance the process of diagnosis. Beckman-Coulter Corporation supplied flow cytometry data of numerous patients that are used as training sets to exploit the different physiological characteristics of the different samples provided. The methods of Support Vector Machines (SVM) and Artificial Neural Networks (ANN) were used as promising pattern classification techniques to identify different white blood cell samples and provide information to medical doctors in the form of diagnostic references for the specific disease states, leukemia. The obtained results prove that when a neural network classifier is well configured and trained with cross-validation, it can perform better than support vector classifiers alone for this type of data. Furthermore, a new unsupervised learning algorithm---Density based Adaptive Window Clustering algorithm (DAWC) was designed to process large volumes of data for finding location of high data cluster in real-time. It reduces the computational load to ∼O(N) number of computations, and thus making the algorithm more attractive and faster than current hierarchical algorithms.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The relative abundance of diatom species in different habitats can be used as a tool to infer prior environmental conditions and evaluate management decisions that influence habitat quality. Diatom distribution patterns were examined to characterize relationships between assemblage composition and environmental gradients in a subtropical estuarine watershed. We identified environmental correlates of diatom distribution patterns across the Charlotte Harbor, Florida, watershed; evaluated differences among three major river drainages; and determined how accurately local environmental conditions can be predicted using inference models based on diatom assemblages. Sampling locations ranged from freshwater to marine (0.1–37.2 ppt salinity) and spanned broad nutrient concentration gradients. Salinity was the predominant driver of difference among diatom assemblages across the watershed, but other environmental variables had stronger correlations with assemblages within the subregions of the three rivers and harbor. Eighteen indicator taxa were significantly affiliated with subregions. Relationships between diatom taxon distributions and salinity, distance from the harbor, total phosphorus (TP), and total nitrogen (TN) were evaluated to determine the utility of diatom assemblages to predict environmental values using a weighted averaging-regression approach. Diatom-based inferences of these variables were strong (salinity R 2 = 0.96; distance R 2 = 0.93; TN R 2 = 0.83; TP R 2 = 0.83). Diatom assemblages provide reliable estimates of environmental parameters on different spatial scales across the watershed. Because many coastal diatom taxa are ubiquitous, the diatom training sets provided here should enable diatom-based environmental reconstructions in subtropical estuaries that are being rapidly altered by land and water use changes and sea level rise.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Current state of the art techniques for landmine detection in ground penetrating radar (GPR) utilize statistical methods to identify characteristics of a landmine response. This research makes use of 2-D slices of data in which subsurface landmine responses have hyperbolic shapes. Various methods from the field of visual image processing are adapted to the 2-D GPR data, producing superior landmine detection results. This research goes on to develop a physics-based GPR augmentation method motivated by current advances in visual object detection. This GPR specific augmentation is used to mitigate issues caused by insufficient training sets. This work shows that augmentation improves detection performance under training conditions that are normally very difficult. Finally, this work introduces the use of convolutional neural networks as a method to learn feature extraction parameters. These learned convolutional features outperform hand-designed features in GPR detection tasks. This work presents a number of methods, both borrowed from and motivated by the substantial work in visual image processing. The methods developed and presented in this work show an improvement in overall detection performance and introduce a method to improve the robustness of statistical classification.