938 resultados para k-means clustering
Resumo:
Objective: To characterize the PI component of long latency auditory evoked potentials (LLAEPs) in cochlear implant users with auditory neuropathy spectrum disorder (ANSD) and determine firstly whether they correlate with speech perception performance and secondly whether they correlate with other variables related to cochlear implant use. Methods: This study was conducted at the Center for Audiological Research at the University of Sao Paulo. The sample included 14 pediatric (4-11 years of age) cochlear implant users with ANSD, of both sexes, with profound prelingual hearing loss. Patients with hypoplasia or agenesis of the auditory nerve were excluded from the study. LLAEPs produced in response to speech stimuli were recorded using a Smart EP USB Jr. system. The subjects' speech perception was evaluated using tests 5 and 6 of the Glendonald Auditory Screening Procedure (GASP). Results: The P-1 component was detected in 12/14 (85.7%) children with ANSD. Latency of the P-1 component correlated with duration of sensorial hearing deprivation (*p = 0.007, r = 0.7278), but not with duration of cochlear implant use. An analysis of groups assigned according to GASP performance (k-means clustering) revealed that aspects of prior central auditory system development reflected in the P-1 component are related to behavioral auditory skills. Conclusions: In children with ANSD using cochlear implants, the P-1 component can serve as a marker of central auditory cortical development and a predictor of the implanted child's speech perception performance. (c) 2012 Elsevier Ireland Ltd. All rights reserved.
Resumo:
In 1998-2001 Finland suffered the most severe insect outbreak ever recorded, over 500,000 hectares. The outbreak was caused by the common pine sawfly (Diprion pini L.). The outbreak has continued in the study area, Palokangas, ever since. To find a good method to monitor this type of outbreaks, the purpose of this study was to examine the efficacy of multi-temporal ERS-2 and ENVISAT SAR imagery for estimating Scots pine (Pinus sylvestris L.) defoliation. Three methods were tested: unsupervised k-means clustering, supervised linear discriminant analysis (LDA) and logistic regression. In addition, I assessed if harvested areas could be differentiated from the defoliated forest using the same methods. Two different speckle filters were used to determine the effect of filtering on the SAR imagery and subsequent results. The logistic regression performed best, producing a classification accuracy of 81.6% (kappa 0.62) with two classes (no defoliation, >20% defoliation). LDA accuracy was with two classes at best 77.7% (kappa 0.54) and k-means 72.8 (0.46). In general, the largest speckle filter, 5 x 5 image window, performed best. When additional classes were added the accuracy was usually degraded on a step-by-step basis. The results were good, but because of the restrictions in the study they should be confirmed with independent data, before full conclusions can be made that results are reliable. The restrictions include the small size field data and, thus, the problems with accuracy assessment (no separate testing data) as well as the lack of meteorological data from the imaging dates.
Resumo:
The primary goal of this project is to demonstrate the practical use of data mining algorithms to cluster a solved steady-state computational fluids simulation (CFD) flow domain into a simplified lumped-parameter network. A commercial-quality code, “cfdMine” was created using a volume-weighted k-means clustering that that can accomplish the clustering of a 20 million cell CFD domain on a single CPU in several hours or less. Additionally agglomeration and k-means Mahalanobis were added as optional post-processing steps to further enhance the separation of the clusters. The resultant nodal network is considered a reduced-order model and can be solved transiently at a very minimal computational cost. The reduced order network is then instantiated in the commercial thermal solver MuSES to perform transient conjugate heat transfer using convection predicted using a lumped network (based on steady-state CFD). When inserting the lumped nodal network into a MuSES model, the potential for developing a “localized heat transfer coefficient” is shown to be an improvement over existing techniques. Also, it was found that the use of the clustering created a new flow visualization technique. Finally, fixing clusters near equipment newly demonstrates a capability to track temperatures near specific objects (such as equipment in vehicles).
Resumo:
In young, first-episode, productive, medication-naive patients with schizophrenia, EEG microstates (building blocks of mentation) tend to be shortened. Koenig et al. [Koenig, T., Lehmann, D., Merlo, M., Kochi, K., Hell, D., Koukkou, M., 1999. A deviant EEG brain microstate in acute, neuroleptic-naïve schizophrenics at rest. European Archives of Psychiatry and Clinical Neuroscience 249, 205–211] suggested that shortening concerned specific microstate classes. Sequence rules (microstate concatenations, syntax) conceivably might also be affected. In 27 patients of the above type and 27 controls, from three centers, multichannel resting EEG was analyzed into microstates using k-means clustering of momentary potential topographies into four microstate classes (A–D). In patients, microstates were shortened in classes B and D (from 80 to 70 ms and from 94 to 82 ms, respectively), occurred more frequently in classes A and C, and covered more time in A and less in B. Topography differed only in class B where LORETA tomography predominantly showed stronger left and anterior activity in patients. Microstate concatenation (syntax) generally were disturbed in patients; specifically, the class sequence A→C→D→A predominated in controls, but was reversed in patients (A→D→C→A). In schizophrenia, information processing in certain classes of mental operations might deviate because of precocious termination. The intermittent occurrence might account for Bleuler's “double bookkeeping.” The disturbed microstate syntax opens a novel physiological comparison of mental operations between patients and controls.
Resumo:
Computer vision-based food recognition could be used to estimate a meal's carbohydrate content for diabetic patients. This study proposes a methodology for automatic food recognition, based on the Bag of Features (BoF) model. An extensive technical investigation was conducted for the identification and optimization of the best performing components involved in the BoF architecture, as well as the estimation of the corresponding parameters. For the design and evaluation of the prototype system, a visual dataset with nearly 5,000 food images was created and organized into 11 classes. The optimized system computes dense local features, using the scale-invariant feature transform on the HSV color space, builds a visual dictionary of 10,000 visual words by using the hierarchical k-means clustering and finally classifies the food images with a linear support vector machine classifier. The system achieved classification accuracy of the order of 78%, thus proving the feasibility of the proposed approach in a very challenging image dataset.
Resumo:
In the last few years there has been a heightened interest in data treatment and analysis with the aim of discovering hidden knowledge and eliciting relationships and patterns within this data. Data mining techniques (also known as Knowledge Discovery in Databases) have been applied over a wide range of fields such as marketing, investment, fraud detection, manufacturing, telecommunications and health. In this study, well-known data mining techniques such as artificial neural networks (ANN), genetic programming (GP), forward selection linear regression (LR) and k-means clustering techniques, are proposed to the health and sports community in order to aid with resistance training prescription. Appropriate resistance training prescription is effective for developing fitness, health and for enhancing general quality of life. Resistance exercise intensity is commonly prescribed as a percent of the one repetition maximum. 1RM, dynamic muscular strength, one repetition maximum or one execution maximum, is operationally defined as the heaviest load that can be moved over a specific range of motion, one time and with correct performance. The safety of the 1RM assessment has been questioned as such an enormous effort may lead to muscular injury. Prediction equations could help to tackle the problem of predicting the 1RM from submaximal loads, in order to avoid or at least, reduce the associated risks. We built different models from data on 30 men who performed up to 5 sets to exhaustion at different percentages of the 1RM in the bench press action, until reaching their actual 1RM. Also, a comparison of different existing prediction equations is carried out. The LR model seems to outperform the ANN and GP models for the 1RM prediction in the range between 1 and 10 repetitions. At 75% of the 1RM some subjects (n = 5) could perform 13 repetitions with proper technique in the bench press action, whilst other subjects (n = 20) performed statistically significant (p < 0:05) more repetitions at 70% than at 75% of their actual 1RM in the bench press action. Rate of perceived exertion (RPE) seems not to be a good predictor for 1RM when all the sets are performed until exhaustion, as no significant differences (p < 0:05) were found in the RPE at 75%, 80% and 90% of the 1RM. Also, years of experience and weekly hours of strength training are better correlated to 1RM (p < 0:05) than body weight. O'Connor et al. 1RM prediction equation seems to arise from the data gathered and seems to be the most accurate 1RM prediction equation from those proposed in literature and used in this study. Epley's 1RM prediction equation is reproduced by means of data simulation from 1RM literature equations. Finally, future lines of research are proposed related to the problem of the 1RM prediction by means of genetic algorithms, neural networks and clustering techniques. RESUMEN En los últimos años ha habido un creciente interés en el tratamiento y análisis de datos con el propósito de descubrir relaciones, patrones y conocimiento oculto en los mismos. Las técnicas de data mining (también llamadas de \Descubrimiento de conocimiento en bases de datos\) se han aplicado consistentemente a lo gran de un gran espectro de áreas como el marketing, inversiones, detección de fraude, producción industrial, telecomunicaciones y salud. En este estudio, técnicas bien conocidas de data mining como las redes neuronales artificiales (ANN), programación genética (GP), regresión lineal con selección hacia adelante (LR) y la técnica de clustering k-means, se proponen a la comunidad del deporte y la salud con el objetivo de ayudar con la prescripción del entrenamiento de fuerza. Una apropiada prescripción de entrenamiento de fuerza es efectiva no solo para mejorar el estado de forma general, sino para mejorar la salud e incrementar la calidad de vida. La intensidad en un ejercicio de fuerza se prescribe generalmente como un porcentaje de la repetición máxima. 1RM, fuerza muscular dinámica, una repetición máxima o una ejecución máxima, se define operacionalmente como la carga máxima que puede ser movida en un rango de movimiento específico, una vez y con una técnica correcta. La seguridad de las pruebas de 1RM ha sido cuestionada debido a que el gran esfuerzo requerido para llevarlas a cabo puede derivar en serias lesiones musculares. Las ecuaciones predictivas pueden ayudar a atajar el problema de la predicción de la 1RM con cargas sub-máximas y son empleadas con el propósito de eliminar o al menos, reducir los riesgos asociados. En este estudio, se construyeron distintos modelos a partir de los datos recogidos de 30 hombres que realizaron hasta 5 series al fallo en el ejercicio press de banca a distintos porcentajes de la 1RM, hasta llegar a su 1RM real. También se muestra una comparación de algunas de las distintas ecuaciones de predicción propuestas con anterioridad. El modelo LR parece superar a los modelos ANN y GP para la predicción de la 1RM entre 1 y 10 repeticiones. Al 75% de la 1RM algunos sujetos (n = 5) pudieron realizar 13 repeticiones con una técnica apropiada en el ejercicio press de banca, mientras que otros (n = 20) realizaron significativamente (p < 0:05) más repeticiones al 70% que al 75% de su 1RM en el press de banca. El ínndice de esfuerzo percibido (RPE) parece no ser un buen predictor del 1RM cuando todas las series se realizan al fallo, puesto que no existen diferencias signifiativas (p < 0:05) en el RPE al 75%, 80% y el 90% de la 1RM. Además, los años de experiencia y las horas semanales dedicadas al entrenamiento de fuerza están más correlacionadas con la 1RM (p < 0:05) que el peso corporal. La ecuación de O'Connor et al. parece surgir de los datos recogidos y parece ser la ecuación de predicción de 1RM más precisa de aquellas propuestas en la literatura y empleadas en este estudio. La ecuación de predicción de la 1RM de Epley es reproducida mediante simulación de datos a partir de algunas ecuaciones de predicción de la 1RM propuestas con anterioridad. Finalmente, se proponen futuras líneas de investigación relacionadas con el problema de la predicción de la 1RM mediante algoritmos genéticos, redes neuronales y técnicas de clustering.
Resumo:
We use quantitative X-ray diffraction to determine the mineralogy of late Quaternary marine sediments from the West and East Greenland shelves offshore from early Tertiary basalt outcrops. Despite the similar basalt outcrop area (60 000-70 000 km**2), there are significant differences between East and West Greenland sediments in the fraction of minerals (e.g. pyroxene) sourced from the basalt outcrops. We demonstrate the differences in the mineralogy between East and West Greenland marine sediments on three scales: (1) modern day, (2) late Quaternary inputs and (3) detailed down-core variations in 10 cores from the two margins. On the East Greenland Shelf (EGS), late Quaternary samples have an average quartz weight per cent of 6.2 ± 2.3 versus 12.8 ± 3.9 from the West Greenland Shelf (WGS), and 12.02 ± 4.8 versus 1.9 ± 2.3 wt% for pyroxene. K-means clustering indicated only 9% of the samples did not fit a simple EGS vs. WGS dichotomy. Sediments from the EGS and WGS are also isotopically distinct, with the EGS having higher eNd (-18 to 4) than those from the WGS (eNd = -25 to -35). We attribute the striking dichotomy in sediment composition to fundamentally different long-term Quaternary styles of glaciation on the two basalt outcrops.
Resumo:
Thesis (Master's)--University of Washington, 2016-06
Resumo:
Thesis (Master's)--University of Washington, 2016-06
Resumo:
Coarse-resolution thematic maps derived from remotely sensed data and implemented in GIS play an important role in coastal and marine conservation, research and management. Here, we describe an approach for fine-resolution mapping of land-cover types using aerial photography and ancillary GIs and ground data in a large (100 x 35 km) subtropical estuarine system (Moreton Bay, Queensland, Australia). We have developed and implemented a classification scheme representing 24 coastal (subtidal, intertidal. mangrove, supratidal and terrestrial) cover types relevant to the ecology of estuarine animals, nekton and shorebirds. The accuracy of classifications of the intertidal and subtidal cover types, as indicated by the agreement between the mapped (predicted) and reference (ground) data, was 77-88%, depending on the zone and level of generalization required. The variability and spatial distribution of habitat mosaics (landscape types) across the mapped environment were assessed using K-means clustering and validated with Classification and Regression Tree models. Seven broad landscape types could be distinguished and ways of incorporating the information on landscape composition into site-specific conservation and field research are discussed. This research illustrates the importance and potential applications of fine-resolution mapping for conservation and management of estuarine habitats and their terrestrial and aquatic wildlife. (c) 2005 Elsevier Ltd. All rights reserved.
Resumo:
Collaborative recommendation is one of widely used recommendation systems, which recommend items to visitor on a basis of referring other's preference that is similar to current user. User profiling technique upon Web transaction data is able to capture such informative knowledge of user task or interest. With the discovered usage pattern information, it is likely to recommend Web users more preferred content or customize the Web presentation to visitors via collaborative recommendation. In addition, it is helpful to identify the underlying relationships among Web users, items as well as latent tasks during Web mining period. In this paper, we propose a Web recommendation framework based on user profiling technique. In this approach, we employ Probabilistic Latent Semantic Analysis (PLSA) to model the co-occurrence activities and develop a modified k-means clustering algorithm to build user profiles as the representatives of usage patterns. Moreover, the hidden task model is derived by characterizing the meaningful latent factor space. With the discovered user profiles, we then choose the most matched profile, which possesses the closely similar preference to current user and make collaborative recommendation based on the corresponding page weights appeared in the selected user profile. The preliminary experimental results performed on real world data sets show that the proposed approach is capable of making recommendation accurately and efficiently.
Resumo:
We propose a hybrid generative/discriminative framework for semantic parsing which combines the hidden vector state (HVS) model and the hidden Markov support vector machines (HM-SVMs). The HVS model is an extension of the basic discrete Markov model in which context is encoded as a stack-oriented state vector. The HM-SVMs combine the advantages of the hidden Markov models and the support vector machines. By employing a modified K-means clustering method, a small set of most representative sentences can be automatically selected from an un-annotated corpus. These sentences together with their abstract annotations are used to train an HVS model which could be subsequently applied on the whole corpus to generate semantic parsing results. The most confident semantic parsing results are selected to generate a fully-annotated corpus which is used to train the HM-SVMs. The proposed framework has been tested on the DARPA Communicator Data. Experimental results show that an improvement over the baseline HVS parser has been observed using the hybrid framework. When compared with the HM-SVMs trained from the fully-annotated corpus, the hybrid framework gave a comparable performance with only a small set of lightly annotated sentences. © 2008. Licensed under the Creative Commons.
Resumo:
A likviditás mérésére többféle mutató terjedt el, amelyek a likviditás jelenségét különböző szempontok alapján számszerűsítik. A cikk a szakirodalom által javasolt, különféle likviditási mutatókat elemzi sokdimenziós statisztikai módszerekkel: főkomponens-elemzés segítségével keresünk olyan faktorokat, amelyek legjobban tömörítik a likviditási jellemzőket, majd megnézzük, hogy az egyes mutatók milyen mértékben mozognak együtt a faktorokkal, illetve a korrelációk alapján klaszterezési eljárással keresünk hasonló tulajdonságokkal bíró csoportokat. Arra keressük a választ, hogy a rendelkezésünkre álló minta elemzésével kialakított változócsoportok egybeesnek-e a likviditás egyes aspektusaihoz kapcsolt mutatókkal, valamint meghatározhatók-e olyan összetett likviditási mérőszámok, amelyeknek a segítségével a likviditás jelensége több dimenzióban mérhető. / === / Liquidity is measured from different aspects (e.g. tightness, depth, and resiliency) by different ratios. We studied the co-movements and the clustering of different liquidity measures on a sample of the Swiss stock market. We performed a PCA to obtain the main factors that explain the cross-sectional variability of liquidity measures, and we used the k-means clustering methodology to defi ne groups of liquidity measures. Based on our explorative data analysis, we formed clusters of liquidity measures, and we compared the resulting groups with the expectations and intuition. Our modelling methodology provides a framework to analyze the correlation between the different aspects of liquidity as well as a means to defi ne complex liquidity measures.
Resumo:
Understanding the impact of atmospheric black carbon (BC) containing particles on human health and radiative forcing requires knowledge of the mixing state of BC, including the characteristics of the materials with which it is internally mixed. In this study, we demonstrate for the first time the capabilities of the Aerodyne Soot-Particle Aerosol Mass Spectrometer equipped with a light scattering module (LS-SP-AMS) to examine the mixing state of refractory BC (rBC) and other aerosol components in an urban environment (downtown Toronto). K-means clustering analysis was used to classify single particle mass spectra into chemically distinct groups. One resultant cluster is dominated by rBC mass spectral signals (C+1 to C+5) while the organic signals fall into a few major clusters, identified as hydrocarbon-like organic aerosol (HOA), oxygenated organic aerosol (OOA), and cooking emission organic aerosol (COA). A nearly external mixing is observed with small BC particles only thinly coated by HOA ( 28% by mass on average), while over 90% of the HOA-rich particles did not contain detectable amounts of rBC. Most of the particles classified into other inorganic and organic clusters were not significantly associated with BC. The single particle results also suggest that HOA and COA emitted from anthropogenic sources were likely major contributors to organic-rich particles with low to mid-range aerodynamic diameter (dva). The similar temporal profiles and mass spectral features of the organic clusters and the factors from a positive matrix factorization (PMF) analysis of the ensemble aerosol dataset validate the conventional interpretation of the PMF results.
Resumo:
The recent advent of new technologies has led to huge amounts of genomic data. With these data come new opportunities to understand biological cellular processes underlying hidden regulation mechanisms and to identify disease related biomarkers for informative diagnostics. However, extracting biological insights from the immense amounts of genomic data is a challenging task. Therefore, effective and efficient computational techniques are needed to analyze and interpret genomic data. In this thesis, novel computational methods are proposed to address such challenges: a Bayesian mixture model, an extended Bayesian mixture model, and an Eigen-brain approach. The Bayesian mixture framework involves integration of the Bayesian network and the Gaussian mixture model. Based on the proposed framework and its conjunction with K-means clustering and principal component analysis (PCA), biological insights are derived such as context specific/dependent relationships and nested structures within microarray where biological replicates are encapsulated. The Bayesian mixture framework is then extended to explore posterior distributions of network space by incorporating a Markov chain Monte Carlo (MCMC) model. The extended Bayesian mixture model summarizes the sampled network structures by extracting biologically meaningful features. Finally, an Eigen-brain approach is proposed to analyze in situ hybridization data for the identification of the cell-type specific genes, which can be useful for informative blood diagnostics. Computational results with region-based clustering reveals the critical evidence for the consistency with brain anatomical structure.