928 resultados para K-Means Cluster


Relevância:

80.00% 80.00%

Publicador:

Resumo:

A erodibilidade é um fator de extrema importância na caracterização da perda de solo, representando os processos que regulam a infiltração de água e sua resistência à desagregação e o transporte de partículas. Assim, por meio da análise de dependência espacial dos componentes principais da erodibilidade (fator K), objetivou-se estimar a erodibilidade do solo em uma área de nascentes da microbacia do Córrego do Tijuco, Monte Alto-SP, e analisar a variabilidade espacial das variáveis granulométricas do solo ao longo do relevo. A erodibilidade média da área foi considerada alta, e a análise de agrupamento k-means apontou para uma formação de cinco grupos: no primeiro, os altos teores de areia grossa (AG) e média (AM) condicionaram sua distribuição nas áreas planas; o segundo, caracterizado pelo alto teor de areia fina (AF), distribui-se nos declives mais convexos; o terceiro, com altos teores de silte e areia muito fina (AMF), concentrou-se nos maiores declives e concavidades; o quarto, com maior teor de argila, seguiu as zonas de escoamento de água; e o quinto, com alto teor de matéria orgânica (MO) e areia grossa (AG), distribui-se nas proximidades da zona urbana. A análise de componentes principais (ACP) mostrou quatro componentes com 87,4 % das informações, sendo o primeiro componente principal (CP1) discriminado pelo transporte seletivo de partículas principalmente em zonas pontuais de maior declividade e acúmulo de sedimentos; o segundo (CP2), discriminado pela baixa coesão entre as partículas, mostra acúmulo da areia fina nas áreas de menor cota em toda a área de concentração de água; o terceiro (CP3), discriminado pela maior agregação do solo, concentra-se principalmente nas bases de grandes declives; e o quarto (CP4), discriminado pela areia muito fina, distribui-se ao longo das declividades nas maiores altitudes. Os resultados sugerem o comportamento granulométrico do solo, que se mostra suscetível ao processo erosivo devido às condições texturais superficiais e à movimentação do relevo.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Since different pedologists will draw different soil maps of a same area, it is important to compare the differences between mapping by specialists and mapping techniques, as for example currently intensively discussed Digital Soil Mapping. Four detailed soil maps (scale 1:10.000) of a 182-ha sugarcane farm in the county of Rafard, São Paulo State, Brazil, were compared. The area has a large variation of soil formation factors. The maps were drawn independently by four soil scientists and compared with a fifth map obtained by a digital soil mapping technique. All pedologists were given the same set of information. As many field expeditions and soil pits as required by each surveyor were provided to define the mapping units (MUs). For the Digital Soil Map (DSM), spectral data were extracted from Landsat 5 Thematic Mapper (TM) imagery as well as six terrain attributes from the topographic map of the area. These data were summarized by principal component analysis to generate the map designs of groups through Fuzzy K-means clustering. Field observations were made to identify the soils in the MUs and classify them according to the Brazilian Soil Classification System (BSCS). To compare the conventional and digital (DSM) soil maps, they were crossed pairwise to generate confusion matrices that were mapped. The categorical analysis at each classification level of the BSCS showed that the agreement between the maps decreased towards the lower levels of classification and the great influence of the surveyor on both the mapping and definition of MUs in the soil map. The average correspondence between the conventional and DSM maps was similar. Therefore, the method used to obtain the DSM yielded similar results to those obtained by the conventional technique, while providing additional information about the landscape of each soil, useful for applications in future surveys of similar areas.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A compreensão do potencial agrícola de um solo muitas vezes é com base apenas na interpretação por meio de análises univariadas, podendo elevar a dimensão dos problemas no momento de escolher o manejo adequado ao solo. Dessa forma, a análise multivariada surge como alternativa, pois ela é um conjunto de procedimentos que visa agrupar e discriminar grupos de indivíduos. Também, serve como instrumento de seleção de variáveis, na medida em que aquelas com maior peso na construção dos primeiros componentes principais são as possíveis que melhor representam o conjunto de dados estudados. O objetivo deste trabalho foi identificar, por meio de análises multivariadas, os atributos do solo que melhor explicam a variabilidade espacial na produção da cultura do feijão. No ano agrícola de 2006/2007, no município de Selvíria, MS, foi analisada a produtividade do feijão em razão de alguns atributos físicos e químicos de um Latossolo Vermelho distroférrico, cultivado nas condições de elevado nível tecnológico de manejo pelo sistema de cultivo mínimo irrigado com pivô central. Foi demarcada a malha geoestatística para a coleta de dados do solo e da planta, com 117 pontos amostrais, numa área de 2.025 m2 e declive homogêneo de 0,055 m m-1. A classificação em grupos foi feita por três métodos: método de agrupamentos hierárquico, método não hierárquico k-means e análise de componentes principais. Essa última análise permitiu identificar três grupos que explicam 86,3 % da variabilidade total dos dados, os quais são constituídos pelos atributos físicos: densidade do solo, porosidade total, umidade gravimétrica e umidade volumétrica, que se evidenciaram com maior poder de explicação da variação da produtividade. Pode-se concluir que a análise multivariada aliada à geoestatística é importante ferramenta no auxílio ao manejo localizado.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

*This study reconstructs the phylogeography of Aegilops geniculata, an allotetraploid relative of wheat, to discuss the impact of past climate changes and recent human activities (e.g. the early expansion of agriculture) on the genetic diversity of ruderal plant species. *We combined chloroplast DNA (cpDNA) sequencing, analysed using statistical parsimony network, with nonhierarchical K-means clustering of amplified fragment length polymorphism (AFLP) genotyping, to unravel patterns of genetic structure across the native range of Ae. geniculata. The AFLP dataset was further explored by measurement of the regional genetic diversity and the detection of isolation by distance patterns. *Both cpDNA and AFLP suggest an eastern Mediterranean origin of Ae. geniculata. Two lineages have spread independently over northern and southern Mediterranean areas. Northern populations show low genetic diversity but strong phylogeographical structure among the main peninsulas, indicating a major influence of glacial cycles. By contrast, low genetic structuring and a high genetic diversity are detected in southern Mediterranean populations. Finally, we highlight human-mediated dispersal resulting in substantial introgression between resident and migrant populations. *We have shown that the evolutionary trajectories of ruderal plants can be similar to those of wild species, but are interfered by human activities, promoting range expansions through increased long-distance dispersal and the creation of suitable habitats.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Axée dans un premier temps sur le formalisme et les méthodes, cette thèse est construite sur trois concepts formalisés: une table de contingence, une matrice de dissimilarités euclidiennes et une matrice d'échange. À partir de ces derniers, plusieurs méthodes d'Analyse des données ou d'apprentissage automatique sont exprimées et développées: l'analyse factorielle des correspondances (AFC), vue comme un cas particulier du multidimensional scaling; la classification supervisée, ou non, combinée aux transformations de Schoenberg; et les indices d'autocorrélation et d'autocorrélation croisée, adaptés à des analyses multivariées et permettant de considérer diverses familles de voisinages. Ces méthodes débouchent dans un second temps sur une pratique de l'analyse exploratoire de différentes données textuelles et musicales. Pour les données textuelles, on s'intéresse à la classification automatique en types de discours de propositions énoncées, en se basant sur les catégories morphosyntaxiques (CMS) qu'elles contiennent. Bien que le lien statistique entre les CMS et les types de discours soit confirmé, les résultats de la classification obtenus avec la méthode K- means, combinée à une transformation de Schoenberg, ainsi qu'avec une variante floue de l'algorithme K-means, sont plus difficiles à interpréter. On traite aussi de la classification supervisée multi-étiquette en actes de dialogue de tours de parole, en se basant à nouveau sur les CMS qu'ils contiennent, mais aussi sur les lemmes et le sens des verbes. Les résultats obtenus par l'intermédiaire de l'analyse discriminante combinée à une transformation de Schoenberg sont prometteurs. Finalement, on examine l'autocorrélation textuelle, sous l'angle des similarités entre diverses positions d'un texte, pensé comme une séquence d'unités. En particulier, le phénomène d'alternance de la longueur des mots dans un texte est observé pour des voisinages d'empan variable. On étudie aussi les similarités en fonction de l'apparition, ou non, de certaines parties du discours, ainsi que les similarités sémantiques des diverses positions d'un texte. Concernant les données musicales, on propose une représentation d'une partition musicale sous forme d'une table de contingence. On commence par utiliser l'AFC et l'indice d'autocorrélation pour découvrir les structures existant dans chaque partition. Ensuite, on opère le même type d'approche sur les différentes voix d'une partition, grâce à l'analyse des correspondances multiples, dans une variante floue, et à l'indice d'autocorrélation croisée. Qu'il s'agisse de la partition complète ou des différentes voix qu'elle contient, des structures répétées sont effectivement détectées, à condition qu'elles ne soient pas transposées. Finalement, on propose de classer automatiquement vingt partitions de quatre compositeurs différents, chacune représentée par une table de contingence, par l'intermédiaire d'un indice mesurant la similarité de deux configurations. Les résultats ainsi obtenus permettent de regrouper avec succès la plupart des oeuvres selon leur compositeur.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We present in this paper the results of the application of several visual methods on a group of locations, dated between VI and I centuries BC, of the ager Tarraconensis (Tarragona, Spain) a Hinterland of the roman colony of Tarraco. The difficulty in interpreting the diverse results in a combined way has been resolved by means of the use of statistical methods, such as Principal Components Analysis (PCA) and K-means clustering analysis. These methods have allowed us to carry out site classifications in function of the landscape's visual structure that contains them and of the visual relationships that could be given among them.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

En aquest projecte fem un estudi de diferents mètodes per a la segmentació i extracció de línies de mapes de metro com a suport per a daltònics. Hem aplicat dos mètodes amb intervenció de l’usuari i cinc mètodes automàtics on fem servir K-means per a la segmentació de color i Hough per a l’extracció de línies. Dels mètodes amb intervenció obtenim millors resultats amb un mètode d’assignació aproximada del color, i entre els autoàatics tenim com a millor una solució ad-hoc sense paràmetres aplicada sobre l’espai RGB. D’acord amb els resultats experimentals, aquests mètodes ens permeten fer una bona segmentació i extracció de les línies de metro.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

OBJECTIVE: In contrast to conventional (CONV) neuromuscular electrical stimulation (NMES), the use of "wide-pulse, high-frequencies" (WPHF) can generate higher forces than expected by the direct activation of motor axons alone. We aimed at investigating the occurrence, magnitude, variability and underlying neuromuscular mechanisms of these "Extra Forces" (EF). METHODS: Electrically-evoked isometric plantar flexion force was recorded in 42 healthy subjects. Additionally, twitch potentiation, H-reflex and M-wave responses were assessed in 13 participants. CONV (25Hz, 0.05ms) and WPHF (100Hz, 1ms) NMES consisted of five stimulation trains (20s on-90s off). RESULTS: K-means clustering analysis disclosed a responder rate of almost 60%. Within this group of responders, force significantly increased from 4% to 16% of the maximal voluntary contraction force and H-reflexes were depressed after WPHF NMES. In contrast, non-responders showed neither EF nor H-reflex depression. Twitch potentiation and resting EMG data were similar between groups. Interestingly, a large inter- and intrasubject variability of EF was observed. CONCLUSION: The responder percentage was overestimated in previous studies. SIGNIFICANCE: This study proposes a novel methodological framework for unraveling the neurophysiological mechanisms involved in EF and provides further evidence for a central contribution to EF in responders.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Conventional (CONV) neuromuscular electrical stimulation (NMES) (i.e., short pulse duration, low frequencies) induces a higher energetic response as compared to voluntary contractions (VOL). In contrast, wide-pulse, high-frequency (WPHF) NMES might elicit-at least in some subjects (i.e., responders)-a different motor unit recruitment compared to CONV that resembles the physiological muscle activation pattern of VOL. We therefore hypothesized that for these responder subjects, the metabolic demand of WPHF would be lower than CONV and comparable to VOL. 18 healthy subjects performed isometric plantar flexions at 10% of their maximal voluntary contraction force for CONV (25 Hz, 0.05 ms), WPHF (100 Hz, 1 ms) and VOL protocols. For each protocol, force time integral (FTI) was quantified and subjects were classified as responders and non-responders to WPHF based on k-means clustering analysis. Furthermore, a fatigue index based on FTI loss at the end of each protocol compared with the beginning of the protocol was calculated. Phosphocreatine depletion (ΔPCr) was assessed using 31P magnetic resonance spectroscopy. Responders developed four times higher FTI's during WPHF (99 ± 37 ×103 N.s) than non-responders (26 ± 12 ×103 N.s). For both responders and non-responders, CONV was metabolically more demanding than VOL when ΔPCr was expressed relative to the FTI. Only for the responder group, the ∆PCr/FTI ratio of WPHF (0.74 ± 0.19 M/N.s) was significantly lower compared to CONV (1.48 ± 0.46 M/N.s) but similar to VOL (0.65 ± 0.21 M/N.s). Moreover, the fatigue index was not different between WPHF (-16%) and CONV (-25%) for the responders. WPHF could therefore be considered as the less demanding NMES modality-at least in this subgroup of subjects-by possibly exhibiting a muscle activation pattern similar to VOL contractions.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The aim of this study was to group temporal profiles of 10-day composites NDVI product by similarity, which was obtained by the SPOT Vegetation sensor, for municipalities with high soybean production in the state of Paraná, Brazil, in the 2005/2006 cropping season. Data mining is a valuable tool that allows extracting knowledge from a database, identifying valid, new, potentially useful and understandable patterns. Therefore, it was used the methods for clusters generation by means of the algorithms K-Means, MAXVER and DBSCAN, implemented in the WEKA software package. Clusters were created based on the average temporal profiles of NDVI of the 277 municipalities with high soybean production in the state and the best results were found with the K-Means algorithm, grouping the municipalities into six clusters, considering the period from the beginning of October until the end of March, which is equivalent to the crop vegetative cycle. Half of the generated clusters presented spectro-temporal pattern, a characteristic of soybeans and were mostly under the soybean belt in the state of Paraná, which shows good results that were obtained with the proposed methodology as for identification of homogeneous areas. These results will be useful for the creation of regional soybean "masks" to estimate the planted area for this crop.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Positron Emission Tomography (PET) using 18F-FDG is playing a vital role in the diagnosis and treatment planning of cancer. However, the most widely used radiotracer, 18F-FDG, is not specific for tumours and can also accumulate in inflammatory lesions as well as normal physiologically active tissues making diagnosis and treatment planning complicated for the physicians. Malignant, inflammatory and normal tissues are known to have different pathways for glucose metabolism which could possibly be evident from different characteristics of the time activity curves from a dynamic PET acquisition protocol. Therefore, we aimed to develop new image analysis methods, for PET scans of the head and neck region, which could differentiate between inflammation, tumour and normal tissues using this functional information within these radiotracer uptake areas. We developed different dynamic features from the time activity curves of voxels in these areas and compared them with the widely used static parameter, SUV, using Gaussian Mixture Model algorithm as well as K-means algorithm in order to assess their effectiveness in discriminating metabolically different areas. Moreover, we also correlated dynamic features with other clinical metrics obtained independently of PET imaging. The results show that some of the developed features can prove to be useful in differentiating tumour tissues from inflammatory regions and some dynamic features also provide positive correlations with clinical metrics. If these proposed methods are further explored then they can prove to be useful in reducing false positive tumour detections and developing real world applications for tumour diagnosis and contouring.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Chronic hepatitis B (HBV) and C (HCV) virus infections are the most important factors associated with hepatocellular carcinoma (HCC), but tumor prognosis remains poor due to the lack of diagnostic biomarkers. In order to identify novel diagnostic markers and therapeutic targets, the gene expression profile associated with viral and non-viral HCC was assessed in 9 tumor samples by oligo-microarrays. The differentially expressed genes were examined using a z-score and KEGG pathway for the search of ontological biological processes. We selected a non-redundant set of 15 genes with the lowest P value for clustering samples into three groups using the non-supervised algorithm k-means. Fisher’s linear discriminant analysis was then applied in an exhaustive search of trios of genes that could be used to build classifiers for class distinction. Different transcriptional levels of genes were identified in HCC of different etiologies and from different HCC samples. When comparing HBV-HCC vs HCV-HCC, HBV-HCC/HCV-HCC vs non-viral (NV)-HCC, HBC-HCC vs NV-HCC, and HCV-HCC vs NV-HCC of the 58 non-redundant differentially expressed genes, only 6 genes (IKBKβ, CREBBP, WNT10B, PRDX6, ITGAV, and IFNAR1) were found to be associated with hepatic carcinogenesis. By combining trios, classifiers could be generated, which correctly classified 100% of the samples. This expression profiling may provide a useful tool for research into the pathophysiology of HCC. A detailed understanding of how these distinct genes are involved in molecular pathways is of fundamental importance to the development of effective HCC chemoprevention and treatment.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Ce mémoire de maîtrise présente une nouvelle approche non supervisée pour détecter et segmenter les régions urbaines dans les images hyperspectrales. La méthode proposée n ́ecessite trois étapes. Tout d’abord, afin de réduire le coût calculatoire de notre algorithme, une image couleur du contenu spectral est estimée. A cette fin, une étape de réduction de dimensionalité non-linéaire, basée sur deux critères complémentaires mais contradictoires de bonne visualisation; à savoir la précision et le contraste, est réalisée pour l’affichage couleur de chaque image hyperspectrale. Ensuite, pour discriminer les régions urbaines des régions non urbaines, la seconde étape consiste à extraire quelques caractéristiques discriminantes (et complémentaires) sur cette image hyperspectrale couleur. A cette fin, nous avons extrait une série de paramètres discriminants pour décrire les caractéristiques d’une zone urbaine, principalement composée d’objets manufacturés de formes simples g ́eométriques et régulières. Nous avons utilisé des caractéristiques texturales basées sur les niveaux de gris, la magnitude du gradient ou des paramètres issus de la matrice de co-occurrence combinés avec des caractéristiques structurelles basées sur l’orientation locale du gradient de l’image et la détection locale de segments de droites. Afin de réduire encore la complexité de calcul de notre approche et éviter le problème de la ”malédiction de la dimensionnalité” quand on décide de regrouper des données de dimensions élevées, nous avons décidé de classifier individuellement, dans la dernière étape, chaque caractéristique texturale ou structurelle avec une simple procédure de K-moyennes et ensuite de combiner ces segmentations grossières, obtenues à faible coût, avec un modèle efficace de fusion de cartes de segmentations. Les expérimentations données dans ce rapport montrent que cette stratégie est efficace visuellement et se compare favorablement aux autres méthodes de détection et segmentation de zones urbaines à partir d’images hyperspectrales.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Computational Biology is the research are that contributes to the analysis of biological data through the development of algorithms which will address significant research problems.The data from molecular biology includes DNA,RNA ,Protein and Gene expression data.Gene Expression Data provides the expression level of genes under different conditions.Gene expression is the process of transcribing the DNA sequence of a gene into mRNA sequences which in turn are later translated into proteins.The number of copies of mRNA produced is called the expression level of a gene.Gene expression data is organized in the form of a matrix. Rows in the matrix represent genes and columns in the matrix represent experimental conditions.Experimental conditions can be different tissue types or time points.Entries in the gene expression matrix are real values.Through the analysis of gene expression data it is possible to determine the behavioral patterns of genes such as similarity of their behavior,nature of their interaction,their respective contribution to the same pathways and so on. Similar expression patterns are exhibited by the genes participating in the same biological process.These patterns have immense relevance and application in bioinformatics and clinical research.Theses patterns are used in the medical domain for aid in more accurate diagnosis,prognosis,treatment planning.drug discovery and protein network analysis.To identify various patterns from gene expression data,data mining techniques are essential.Clustering is an important data mining technique for the analysis of gene expression data.To overcome the problems associated with clustering,biclustering is introduced.Biclustering refers to simultaneous clustering of both rows and columns of a data matrix. Clustering is a global whereas biclustering is a local model.Discovering local expression patterns is essential for identfying many genetic pathways that are not apparent otherwise.It is therefore necessary to move beyond the clustering paradigm towards developing approaches which are capable of discovering local patterns in gene expression data.A biclusters is a submatrix of the gene expression data matrix.The rows and columns in the submatrix need not be contiguous as in the gene expression data matrix.Biclusters are not disjoint.Computation of biclusters is costly because one will have to consider all the combinations of columans and rows in order to find out all the biclusters.The search space for the biclustering problem is 2 m+n where m and n are the number of genes and conditions respectively.Usually m+n is more than 3000.The biclustering problem is NP-hard.Biclustering is a powerful analytical tool for the biologist.The research reported in this thesis addresses the problem of biclustering.Ten algorithms are developed for the identification of coherent biclusters from gene expression data.All these algorithms are making use of a measure called mean squared residue to search for biclusters.The objective here is to identify the biclusters of maximum size with the mean squared residue lower than a given threshold. All these algorithms begin the search from tightly coregulated submatrices called the seeds.These seeds are generated by K-Means clustering algorithm.The algorithms developed can be classified as constraint based,greedy and metaheuristic.Constarint based algorithms uses one or more of the various constaints namely the MSR threshold and the MSR difference threshold.The greedy approach makes a locally optimal choice at each stage with the objective of finding the global optimum.In metaheuristic approaches particle Swarm Optimization(PSO) and variants of Greedy Randomized Adaptive Search Procedure(GRASP) are used for the identification of biclusters.These algorithms are implemented on the Yeast and Lymphoma datasets.Biologically relevant and statistically significant biclusters are identified by all these algorithms which are validated by Gene Ontology database.All these algorithms are compared with some other biclustering algorithms.Algorithms developed in this work overcome some of the problems associated with the already existing algorithms.With the help of some of the algorithms which are developed in this work biclusters with very high row variance,which is higher than the row variance of any other algorithm using mean squared residue, are identified from both Yeast and Lymphoma data sets.Such biclusters which make significant change in the expression level are highly relevant biologically.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Biometrics deals with the physiological and behavioral characteristics of an individual to establish identity. Fingerprint based authentication is the most advanced biometric authentication technology. The minutiae based fingerprint identification method offer reasonable identification rate. The feature minutiae map consists of about 70-100 minutia points and matching accuracy is dropping down while the size of database is growing up. Hence it is inevitable to make the size of the fingerprint feature code to be as smaller as possible so that identification may be much easier. In this research, a novel global singularity based fingerprint representation is proposed. Fingerprint baseline, which is the line between distal and intermediate phalangeal joint line in the fingerprint, is taken as the reference line. A polygon is formed with the singularities and the fingerprint baseline. The feature vectors are the polygonal angle, sides, area, type and the ridge counts in between the singularities. 100% recognition rate is achieved in this method. The method is compared with the conventional minutiae based recognition method in terms of computation time, receiver operator characteristics (ROC) and the feature vector length. Speech is a behavioural biometric modality and can be used for identification of a speaker. In this work, MFCC of text dependant speeches are computed and clustered using k-means algorithm. A backpropagation based Artificial Neural Network is trained to identify the clustered speech code. The performance of the neural network classifier is compared with the VQ based Euclidean minimum classifier. Biometric systems that use a single modality are usually affected by problems like noisy sensor data, non-universality and/or lack of distinctiveness of the biometric trait, unacceptable error rates, and spoof attacks. Multifinger feature level fusion based fingerprint recognition is developed and the performances are measured in terms of the ROC curve. Score level fusion of fingerprint and speech based recognition system is done and 100% accuracy is achieved for a considerable range of matching threshold