48 resultados para k-nearest neighbours

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)


Relevância:

80.00% 80.00%

Publicador:

Resumo:

The efficacy of fluorescence spectroscopy to detect squamous cell carcinoma is evaluated in an animal model following laser excitation at 442 and 532 nm. Lesions are chemically induced with a topical DMBA application at the left lateral tongue of Golden Syrian hamsters. The animals are investigated every 2 weeks after the 4th week of induction until a total of 26 weeks. The right lateral tongue of each animal is considered as a control site (normal contralateral tissue) and the induced lesions are analyzed as a set of points covering the entire clinically detectable area. Based on fluorescence spectral differences, four indices are determined to discriminate normal and carcinoma tissues, based on intraspectral analysis. The spectral data are also analyzed using a multivariate data analysis and the results are compared with histology as the diagnostic gold standard. The best result achieved is for blue excitation using the KNN (K-nearest neighbor, a interspectral analysis) algorithm with a sensitivity of 95.7% and a specificity of 91.6%. These high indices indicate that fluorescence spectroscopy may constitute a fast noninvasive auxiliary tool for diagnostic of cancer within the oral cavity. (C) 2008 Society of Photo-Optical Instrumentation Engineers.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Quality control of toys for avoiding children exposure to potentially toxic elements is of utmost relevance and it is a common requirement in national and/or international norms for health and safety reasons. Laser-induced breakdown spectroscopy (LIBS) was recently evaluated at authors` laboratory for direct analysis of plastic toys and one of the main difficulties for the determination of Cd. Cr and Pb was the variety of mixtures and types of polymers. As most norms rely on migration (lixiviation) protocols, chemometric classification models from LIBS spectra were tested for sampling toys that present potential risk of Cd, Cr and Pb contamination. The classification models were generated from the emission spectra of 51 polymeric toys and by using Partial Least Squares - Discriminant Analysis (PLS-DA), Soft Independent Modeling of Class Analogy (SIMCA) and K-Nearest Neighbor (KNN). The classification models and validations were carried out with 40 and 11 test samples, respectively. Best results were obtained when KNN was used, with corrected predictions varying from 95% for Cd to 100% for Cr and Pb. (C) 2011 Elsevier B.V. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A rapid method for classification of mineral waters is proposed. The discrimination power was evaluated by a novel combination of chemometric data analysis and qualitative multi-elemental fingerprints of mineral water samples acquired from different regions of the Brazilian territory. The classification of mineral waters was assessed using only the wavelength emission intensities obtained by inductively coupled plasma optical emission spectrometry (ICP OES), monitoring different lines of Al, B, Ba, Ca, Cl, Cu, Co, Cr, Fe, K, Mg, Mn, Na, Ni, P, Pb, S, Sb, Si, Sr, Ti, V, and Zn, and Be, Dy, Gd, In, La, Sc and Y as internal standards. Data acquisition was done under robust (RC) and non-robust (NRC) conditions. Also, the combination of signal intensities of two or more emission lines for each element were evaluated instead of the individual lines. The performance of two classification-k-nearest neighbor (kNN) and soft independent modeling of class analogy (SIMCA)-and preprocessing algorithms, autoscaling and Pareto scaling, were evaluated for the ability to differentiate between the various samples in each approach tested (combination of robust or non-robust conditions with use of individual lines or sum of the intensities of emission lines). It was shown that qualitative ICP OES fingerprinting in combination with multivariate analysis is a promising analytical tool that has potential to become a recognized procedure for rapid authenticity and adulteration testing of mineral water samples or other material whose physicochemical properties (or origin) are directly related to mineral content.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Recently, we have built a classification model that is capable of assigning a given sesquiterpene lactone (STL) into exactly one tribe of the plant family Asteraceae from which the STL has been isolated. Although many plant species are able to biosynthesize a set of peculiar compounds, the occurrence of the same secondary metabolites in more than one tribe of Asteraceae is frequent. Building on our previous work, in this paper, we explore the possibility of assigning an STL to more than one tribe (class) simultaneously. When an object may belong to more than one class simultaneously, it is called multilabeled. In this work, we present a general overview of the techniques available to examine multilabeled data. The problem of evaluating the performance of a multilabeled classifier is discussed. Two particular multilabeled classification methods-cross-training with support vector machines (ct-SVM) and multilabeled k-nearest neighbors (M-L-kNN)were applied to the classification of the STLs into seven tribes from the plant family Asteraceae. The results are compared to a single-label classification and are analyzed from a chemotaxonomic point of view. The multilabeled approach allowed us to (1) model the reality as closely as possible, (2) improve our understanding of the relationship between the secondary metabolite profiles of different Asteraceae tribes, and (3) significantly decrease the number of plant sources to be considered for finding a certain STL. The presented classification models are useful for the targeted collection of plants with the objective of finding plant sources of natural compounds that are biologically active or possess other specific properties of interest.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The supervised pattern recognition methods K-Nearest Neighbors (KNN), stepwise discriminant analysis (SDA), and soft independent modelling of class analogy (SIMCA) were employed in this work with the aim to investigate the relationship between the molecular structure of 27 cannabinoid compounds and their analgesic activity. Previous analyses using two unsupervised pattern recognition methods (PCA-principal component analysis and HCA-hierarchical cluster analysis) were performed and five descriptors were selected as the most relevants for the analgesic activity of the compounds studied: R (3) (charge density on substituent at position C(3)), Q (1) (charge on atom C(1)), A (surface area), log P (logarithm of the partition coefficient) and MR (molecular refractivity). The supervised pattern recognition methods (SDA, KNN, and SIMCA) were employed in order to construct a reliable model that can be able to predict the analgesic activity of new cannabinoid compounds and to validate our previous study. The results obtained using the SDA, KNN, and SIMCA methods agree perfectly with our previous model. Comparing the SDA, KNN, and SIMCA results with the PCA and HCA ones we could notice that all multivariate statistical methods classified the cannabinoid compounds studied in three groups exactly in the same way: active, moderately active, and inactive.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Objective: To develop a model to predict the bleeding source and identify the cohort amongst patients with acute gastrointestinal bleeding (GIB) who require urgent intervention, including endoscopy. Patients with acute GIB, an unpredictable event, are most commonly evaluated and managed by non-gastroenterologists. Rapid and consistently reliable risk stratification of patients with acute GIB for urgent endoscopy may potentially improve outcomes amongst such patients by targeting scarce health-care resources to those who need it the most. Design and methods: Using ICD-9 codes for acute GIB, 189 patients with acute GIB and all. available data variables required to develop and test models were identified from a hospital medical records database. Data on 122 patients was utilized for development of the model and on 67 patients utilized to perform comparative analysis of the models. Clinical data such as presenting signs and symptoms, demographic data, presence of co-morbidities, laboratory data and corresponding endoscopic diagnosis and outcomes were collected. Clinical data and endoscopic diagnosis collected for each patient was utilized to retrospectively ascertain optimal management for each patient. Clinical presentations and corresponding treatment was utilized as training examples. Eight mathematical models including artificial neural network (ANN), support vector machine (SVM), k-nearest neighbor, linear discriminant analysis (LDA), shrunken centroid (SC), random forest (RF), logistic regression, and boosting were trained and tested. The performance of these models was compared using standard statistical analysis and ROC curves. Results: Overall the random forest model best predicted the source, need for resuscitation, and disposition with accuracies of approximately 80% or higher (accuracy for endoscopy was greater than 75%). The area under ROC curve for RF was greater than 0.85, indicating excellent performance by the random forest model Conclusion: While most mathematical models are effective as a decision support system for evaluation and management of patients with acute GIB, in our testing, the RF model consistently demonstrated the best performance. Amongst patients presenting with acute GIB, mathematical models may facilitate the identification of the source of GIB, need for intervention and allow optimization of care and healthcare resource allocation; these however require further validation. (c) 2007 Elsevier B.V. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

An important feature of a database management systems (DBMS) is its client/server architecture, where managing shared memory among the clients and the server is always an tough issue. However, similarity queries are specially sensitive to this kind of architecture, since the answer sizes vary widely. Usually, the answers of similarity query are fully processed to be sent in full to the user, who often is interested in just parts of the answer, e.g. just few elements closer or farther to the query reference. Compelling the DBMS to retrieve the full answer, further ignoring its majority is at least a waste of server processing power. Paging the answer is a technique that splits the answer onto several pages, following client requests. Despite the success of paging on traditional queries, little work has been done to support it in similarity queries. In this work, we present a technique that not only provides paging in similarity range or k-nearest neighbor queries, but also supports them in two variations: the forward similarity query and the backward similarity query. They return elements either increasingly farther of increasingly closer to the query reference. The reported experiments show that, depending on the proportion of the interesting part over the full answer, both techniques allow answering queries much faster than it is obtained in the non-paged way. (C) 2010 Elsevier Inc. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The substitution of missing values, also called imputation, is an important data preparation task for many domains. Ideally, the substitution of missing values should not insert biases into the dataset. This aspect has been usually assessed by some measures of the prediction capability of imputation methods. Such measures assume the simulation of missing entries for some attributes whose values are actually known. These artificially missing values are imputed and then compared with the original values. Although this evaluation is useful, it does not allow the influence of imputed values in the ultimate modelling task (e.g. in classification) to be inferred. We argue that imputation cannot be properly evaluated apart from the modelling task. Thus, alternative approaches are needed. This article elaborates on the influence of imputed values in classification. In particular, a practical procedure for estimating the inserted bias is described. As an additional contribution, we have used such a procedure to empirically illustrate the performance of three imputation methods (majority, naive Bayes and Bayesian networks) in three datasets. Three classifiers (decision tree, naive Bayes and nearest neighbours) have been used as modelling tools in our experiments. The achieved results illustrate a variety of situations that can take place in the data preparation practice.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Structured meaning-signal mappings, i.e., mappings that preserve neighborhood relationships by associating similar signals with similar meanings, are advantageous in an environment where signals are corrupted by noise and sub-optimal meaning inferences are rewarded as well. The evolution of these mappings, however, cannot be explained within a traditional language evolutionary game scenario in which individuals meet randomly because the evolutionary dynamics is trapped in local maxima that do not reflect the structure of the meaning and signal spaces. Here we use a simple game theoretical model to show analytically that when individuals adopting the same communication code meet more frequently than individuals using different codes-a result of the spatial organization of the population-then advantageous linguistic innovations can spread and take over the population. In addition, we report results of simulations in which an individual can communicate only with its K nearest neighbors and show that the probability that the lineage of a mutant that uses a more efficient communication code becomes fixed decreases exponentially with increasing K. These findings support the mother tongue hypothesis that human language evolved as a communication system used among kin, especially between mothers and offspring.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We investigate synchronization in a Kuramoto-like model with nearest neighbor coupling. Upon analyzing the behavior of individual oscillators at the onset of complete synchronization, we show that the time interval between bursts in the time dependence of the frequencies of the oscillators exhibits universal scaling and blows up at the critical coupling strength. We also bring out a key mechanism that leads to phase locking. Finally, we deduce forms for the phases and frequencies at the onset of complete synchronization.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Transport properties and magnetization measurements of the K(x)MoO(2-delta) (0 <= x <= 0.25) compound are reported. The compound crystallizes in the oxygen deficient MoO(2) monoclinic structure with potassium atoms occupying interstitial positions. An unconventional metallic behavior with power-law temperature dependence is related to a magnetic ordering. Superconducting transition with small volume fraction is also observed near 7 K for a sample with low potassium composition.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Large-conductance Ca(2+)-activated K(+) channels (BK) play a fundamental role in modulating membrane potential in many cell types. The gating of BK channels and its modulation by Ca(2+) and voltage has been the subject of intensive research over almost three decades, yielding several of the most complicated kinetic mechanisms ever proposed. A large number of open and closed states disposed, respectively, in two planes, named tiers, characterize these mechanisms. Transitions between states in the same plane are cooperative and modulated by Ca(2+). Transitions across planes are highly concerted and voltage-dependent. Here we reexamine the validity of the two-tiered hypothesis by restricting attention to the modulation by Ca(2+). Large single channel data sets at five Ca(2+) concentrations were simultaneously analyzed from a Bayesian perspective by using hidden Markov models and Markov-chain Monte Carlo stochastic integration techniques. Our results support a dramatic reduction in model complexity, favoring a simple mechanism derived from the Monod-Wyman-Changeux allosteric model for homotetramers, able to explain the Ca(2+) modulation of the gating process. This model differs from the standard Monod-Wyman-Changeux scheme in that one distinguishes when two Ca(2+) ions are bound to adjacent or diagonal subunits of the tetramer.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present the first spin alignment measurements for the K*(0)(892) and phi(1020) vector mesons produced at midrapidity with transverse momenta up to 5 GeV/c at root s(NN) = 200 GeV at RHIC. The diagonal spin-density matrix elements with respect to the reaction plane in Au+Au collisions are rho(00) = 0.32 +/- 0.04 (stat) +/- 0.09 (syst) for the K*(0) (0.8 < p(T) < 5.0 GeV/c) and rho(00) = 0.34 +/- 0.02 (stat) +/- 0.03 (syst) for the phi (0.4 < p(T) < 5.0 GeV/c) and are constant with transverse momentum and collision centrality. The data are consistent with the unpolarized expectation of 1/3 and thus no evidence is found for the transfer of the orbital angular momentum of the colliding system to the vector-meson spins. Spin alignments for K(*0) and phi in Au+Au collisions were also measured with respect to the particle's production plane. The phi result, rho(00) = 0.41 +/- 0.02 (stat) +/- 0.04 (syst), is consistent with that in p+p collisions, rho(00) = 0.39 +/- 0.03 (stat) +/- 0.06 (syst), also measured in this work. The measurements thus constrain the possible size of polarization phenomena in the production dynamics of vector mesons.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Aims. We present the analysis of the [alpha/Fe] abundance ratios for a large number of stars at several locations in the Milky Way bulge with the aim of constraining its formation scenario. Methods. We obtained FLAMES-GIRAFFE spectra (R = 22 500) at the ESO Very Large Telescope for 650 bulge red giant branch (RGB) stars and performed spectral synthesis to measure Mg, Ca, Ti, and Si abundances. This sample is composed of 474 giant stars observed in 3 fields along the minor axis of the Galactic bulge and at latitudes b = -4 degrees, b = -6 degrees, b = -12 degrees. Another 176 stars belong to a field containing the globular cluster NGC 6553, located at b = -3 degrees and 5 degrees away from the other three fields along the major axis. Stellar parameters and metallicities for these stars were presented in Zoccali et al. (2008, A&A, 486, 177). We have also re-derived stellar parameters and abundances for the sample of thick and thin disk red giants analyzed in Alves-Brito et al. (2010, A&A, 513, A35). Therefore using a homogeneous abundance database for the bulge, thick and thin disk, we have performed a differential analysis minimizing systematic errors, to compare the formation scenarios of these Galactic components. Results. Our results confirm, with large number statistics, the chemical similarity between the Galactic bulge and thick disk, which are both enhanced in alpha elements when compared to the thin disk. In the same context, we analyze [alpha/Fe] vs. [Fe/H] trends across different bulge regions. The most metal rich stars, showing low [alpha/Fe] ratios at b = -4 degrees disappear at higher Galactic latitudes in agreement with the observed metallicity gradient in the bulge. Metal-poor stars ([Fe/H] < -0.2) show a remarkable homogeneity at different bulge locations. Conclusions. We have obtained further constrains for the formation scenario of the Galactic bulge. A metal-poor component chemically indistinguishable from the thick disk hints for a fast and early formation for both the bulge and the thick disk. Such a component shows no variation, neither in abundances nor kinematics, among different bulge regions. A metal-rich component showing low [alpha/Fe] similar to those of the thin disk disappears at larger latitudes. This allows us to trace a component formed through fast early mergers (classical bulge) and a disk/bar component formed on a more extended timescale.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Based on high-resolution spectra obtained with the MIKE spectrograph on the Magellan telescopes, we present detailed elemental abundances for 20 red giant stars in the outer Galactic disk, located at Galactocentric distances between 9 and 13 kpc. The outer disk sample is complemented with samples of red giants from the inner Galactic disk and the solar neighborhood, analyzed using identical methods. For Galactocentric distances beyond 10 kpc, we only find chemical patterns associated with the local thin disk, even for stars far above the Galactic plane. Our results show that the relative densities of the thick and thin disks are dramatically different from the solar neighborhood, and we therefore suggest that the radial scale length of the thick disk is much shorter than that of the thin disk. We make a first estimate of the thick disk scale length of L(thick) = 2.0 kpc, assuming L(thin) = 3.8 kpc for the thin disk. We suggest that radial migration may explain the lack of radial age, metallicity, and abundance gradients in the thick disk, possibly also explaining the link between the thick disk and the metal-poor bulge.