795 resultados para label hierarchical clustering
Resumo:
Feature selection is an important and active issue in clustering and classification problems. By choosing an adequate feature subset, a dataset dimensionality reduction is allowed, thus contributing to decreasing the classification computational complexity, and to improving the classifier performance by avoiding redundant or irrelevant features. Although feature selection can be formally defined as an optimisation problem with only one objective, that is, the classification accuracy obtained by using the selected feature subset, in recent years, some multi-objective approaches to this problem have been proposed. These either select features that not only improve the classification accuracy, but also the generalisation capability in case of supervised classifiers, or counterbalance the bias toward lower or higher numbers of features that present some methods used to validate the clustering/classification in case of unsupervised classifiers. The main contribution of this paper is a multi-objective approach for feature selection and its application to an unsupervised clustering procedure based on Growing Hierarchical Self-Organising Maps (GHSOMs) that includes a new method for unit labelling and efficient determination of the winning unit. In the network anomaly detection problem here considered, this multi-objective approach makes it possible not only to differentiate between normal and anomalous traffic but also among different anomalies. The efficiency of our proposals has been evaluated by using the well-known DARPA/NSL-KDD datasets that contain extracted features and labelled attacks from around 2 million connections. The selected feature sets computed in our experiments provide detection rates up to 99.8% with normal traffic and up to 99.6% with anomalous traffic, as well as accuracy values up to 99.12%.
Resumo:
En este trabajo se propone un nuevo sistema híbrido para el análisis de sentimientos en clase múltiple basado en el uso del diccionario General Inquirer (GI) y un enfoque jerárquico del clasificador Logistic Model Tree (LMT). Este nuevo sistema se compone de tres capas, la capa bipolar (BL) que consta de un LMT (LMT-1) para la clasificación de la polaridad de sentimientos, mientras que la segunda capa es la capa de la Intensidad (IL) y comprende dos LMTs (LMT-2 y LMT3) para detectar por separado tres intensidades de sentimientos positivos y tres intensidades de sentimientos negativos. Sólo en la fase de construcción, la capa de Agrupación (GL) se utiliza para agrupar las instancias positivas y negativas mediante el empleo de 2 k-means, respectivamente. En la fase de Pre-procesamiento, los textos son segmentados por palabras que son etiquetadas, reducidas a sus raíces y sometidas finalmente al diccionario GI con el objetivo de contar y etiquetar sólo los verbos, los sustantivos, los adjetivos y los adverbios con 24 marcadores que se utilizan luego para calcular los vectores de características. En la fase de Clasificación de Sentimientos, los vectores de características se introducen primero al LMT-1, a continuación, se agrupan en GL según la etiqueta de clase, después se etiquetan estos grupos de forma manual, y finalmente las instancias positivas son introducidas a LMT-2 y las instancias negativas a LMT-3. Los tres árboles están entrenados y evaluados usando las bases de datos Movie Review y SenTube con validación cruzada estratificada de 10-pliegues. LMT-1 produce un árbol de 48 hojas y 95 de tamaño, con 90,88% de exactitud, mientras que tanto LMT-2 y LMT-3 proporcionan dos árboles de una hoja y uno de tamaño, con 99,28% y 99,37% de exactitud,respectivamente. Los experimentos muestran que la metodología de clasificación jerárquica propuesta da un mejor rendimiento en comparación con otros enfoques prevalecientes.
Resumo:
Overrecentdecades,remotesensinghasemergedasaneffectivetoolforimprov- ing agriculture productivity. In particular, many works have dealt with the problem of identifying characteristics or phenomena of crops and orchards on different scales using remote sensed images. Since the natural processes are scale dependent and most of them are hierarchically structured, the determination of optimal study scales is mandatory in understanding these processes and their interactions. The concept of multi-scale/multi- resolution inherent to OBIA methodologies allows the scale problem to be dealt with. But for that multi-scale and hierarchical segmentation algorithms are required. The question that remains unsolved is to determine the suitable scale segmentation that allows different objects and phenomena to be characterized in a single image. In this work, an adaptation of the Simple Linear Iterative Clustering (SLIC) algorithm to perform a multi-scale hierarchi- cal segmentation of satellite images is proposed. The selection of the optimal multi-scale segmentation for different regions of the image is carried out by evaluating the intra- variability and inter-heterogeneity of the regions obtained on each scale with respect to the parent-regions defined by the coarsest scale. To achieve this goal, an objective function, that combines weighted variance and the global Moran index, has been used. Two different kinds of experiment have been carried out, generating the number of regions on each scale through linear and dyadic approaches. This methodology has allowed, on the one hand, the detection of objects on different scales and, on the other hand, to represent them all in a sin- gle image. Altogether, the procedure provides the user with a better comprehension of the land cover, the objects on it and the phenomena occurring.
Resumo:
Different types of water bodies, including lakes, streams, and coastal marine waters, are often susceptible to fecal contamination from a range of point and nonpoint sources, and have been evaluated using fecal indicator microorganisms. The most commonly used fecal indicator is Escherichia coli, but traditional cultivation methods do not allow discrimination of the source of pollution. The use of triplex PCR offers an approach that is fast and inexpensive, and here enabled the identification of phylogroups. The phylogenetic distribution of E. coli subgroups isolated from water samples revealed higher frequencies of subgroups A1 and B23 in rivers impacted by human pollution sources, while subgroups D1 and D2 were associated with pristine sites, and subgroup B1 with domesticated animal sources, suggesting their use as a first screening for pollution source identification. A simple classification is also proposed based on phylogenetic subgroup distribution using the w-clique metric, enabling differentiation of polluted and unpolluted sites.
Resumo:
In the southern region of Mato Grosso do Sul state, Brazil, a foot-and-mouth disease (FMD) epidemic started in September 2005. A total of 33 outbreaks were detected and 33,741 FMD-susceptible animals were slaughtered and destroyed. There were no reports of FMD cases in other species than bovines. Based on the data of this epidemic, it was carried out an analysis using the K-function and it was observed spatial clustering of outbreaks within a range of 25km. This observation may be related to the dynamics of foot-and-mouth disease spread and to the measures undertaken to control the disease dissemination. The control measures were effective once the disease did not spread to farms more than 47 km apart from the initial outbreaks.
Resumo:
Gene clustering is a useful exploratory technique to group together genes with similar expression levels under distinct cell cycle phases or distinct conditions. It helps the biologist to identify potentially meaningful relationships between genes. In this study, we propose a clustering method based on multivariate normal mixture models, where the number of clusters is predicted via sequential hypothesis tests: at each step, the method considers a mixture model of m components (m = 2 in the first step) and tests if in fact it should be m - 1. If the hypothesis is rejected, m is increased and a new test is carried out. The method continues (increasing m) until the hypothesis is accepted. The theoretical core of the method is the full Bayesian significance test, an intuitive Bayesian approach, which needs no model complexity penalization nor positive probabilities for sharp hypotheses. Numerical experiments were based on a cDNA microarray dataset consisting of expression levels of 205 genes belonging to four functional categories, for 10 distinct strains of Saccharomyces cerevisiae. To analyze the method's sensitivity to data dimension, we performed principal components analysis on the original dataset and predicted the number of classes using 2 to 10 principal components. Compared to Mclust (model-based clustering), our method shows more consistent results.
Resumo:
Introduction and Purpose: Bimatoprost and the fixed combination of latanoprost with timolol maleate are 2 medications widely used to treat glaucoma and ocular hypertension (OHT). The aim of the study is to compare the efficacy of these 2 drugs in reducing intraocular pressure (IOP) after 8 weeks of treatment in patients with primary open angle glaucoma (POAG) or OHT. Methods: In this randomized, open-label trial, 44 patients with POAG or OHT were allocated to receive either bimatoprost (1 drop QD) or latanoprost/timolol (1 drop QD). Primary outcome was the mean diurnal IOP measurement at the 8th week, calculated as the mean IOP measurements taken at 8:00 AM, 10: 00 AM, and 12: 00 PM Secondary outcomes included the baseline change in IOP measured 3 times a day, after the water-drinking test (performed after the last IOP measurement), and the assessment of side effects of each therapy. Results: The mean IOP levels of latanoprost/timolol (13.83, SD = 2.54) was significantly lower than of bimatoprost (16.16, SD = 3.28; P < 0.0001) at week 8. Also, the change in mean IOP values was significantly higher in the latanoprost/timolol group at 10:00 AM (P = 0.013) and 12:00 PM (P = 0.01), but not at 8: 00 AM (P = ns). During the water-drinking test, there was no signifi cant difference in IOP increase (absolute and percentage) between groups; however, there was a signifi cant decrease in mean heart rate in the latanoprost/timolol group. Finally, no signifi cant changes in blood pressure and lung spirometry were observed in either groups. Conclusions: The fixed combination of latanoprost/timolol was significantly superior to bimatoprost alone in reducing IOP in patients with POAG or OHT. Further studies with large sample sizes should be taken to support the superior efficacy of latanoprost/timolol, as well as to better assess its profile of side effects.
Resumo:
Biological neuronal networks constitute a special class of dynamical systems, as they are formed by individual geometrical components, namely the neurons. In the existing literature, relatively little attention has been given to the influence of neuron shape on the overall connectivity and dynamics of the emerging networks. The current work addresses this issue by considering simplified neuronal shapes consisting of circular regions (soma/axons) with spokes (dendrites). Networks are grown by placing these patterns randomly in the two-dimensional (2D) plane and establishing connections whenever a piece of dendrite falls inside an axon. Several topological and dynamical properties of the resulting graph are measured, including the degree distribution, clustering coefficients, symmetry of connections, size of the largest connected component, as well as three hierarchical measurements of the local topology. By varying the number of processes of the individual basic patterns, we can quantify relationships between the individual neuronal shape and the topological and dynamical features of the networks. Integrate-and-fire dynamics on these networks is also investigated with respect to transient activation from a source node, indicating that long-range connections play an important role in the propagation of avalanches.
Resumo:
This work shows the application of the analytic hierarchy process (AHP) in the full cost accounting (FCA) within the integrated resource planning (IRP) process. For this purpose, a pioneer case was developed and different energy solutions of supply and demand for a metropolitan airport (Congonhas) were considered [Moreira, E.M., 2005. Modelamento energetico para o desenvolvimento limpo de aeroporto metropolitano baseado na filosofia do PIR-O caso da metropole de Sao Paulo. Dissertacao de mestrado, GEPEA/USP]. These solutions were compared and analyzed utilizing the software solution ""Decision Lens"" that implements the AHP. The final part of this work has a classification of resources that can be considered to be the initial target as energy resources, thus facilitating the restraints of the IRP of the airport and setting parameters aiming at sustainable development. (C) 2007 Elsevier Ltd. All rights reserved.
Resumo:
Background: Tramadol is a well tolerated and effective analgesic used to treat moderate to severe pain. Several generic formulations of tramadol are available in Brazil; however, published information regarding their bioequivalence in the Brazilian population is not available. A study was designed for Brazilian regulatory authorities to allow marketing of a generic formulation. Objective: The purpose of this study was to compare the bioequivalence of 2 commercial tablet preparations containing tramadol 100 mg marketed for use in Brazil. Methods: A randomized, open-label, 2 x 2 crossover study was performed in healthy Brazilian volunteers under fasting conditions with a washout period of 12 days. Two tablet formulations of tramadol 100 mg (test and reference formulations) were administered as a single oral dose, and blood samples were collected over 24 hours. Tramadol plasma concentrations were quantified using a validated HPLC method. A plasma concentration time profile was generated for each volunteer and then mean values were determined, from which C(max), T(max), AUC(0-t), AUC(0-infinity), k(e), and t(1/2) were calculated using a noncompartmental model. Bioequivalence between the products was determined by calculating 90% CIs for the ratios of C(max), AUC(0-t), and AUC(0-infinity) values for the test and reference products using log-transformed data. Tolerability was assessed by monitoring vital signs (temperature, blood pressure, heart rate), laboratory tests (hematology, blood biochemistry, hepatic function, urinalysis), and interviews with the volunteers before medication administration and every 2 hours during the study. Results: Twenty-six healthy volunteers (13 men, 13 women) were enrolled in and completed the study. Mean (SD) age was 30 (6.8) years (range, 21-44 years), mean weight was 64 (8.3) kg (range, 53-79 kg), and mean height was 166 (6.4) cm (range, 155-178 cm). The 90% CIs for the ratios of C(max) (1.01-1.17), AUC(0-t) (1.00-1.13), and AUC(0-infinity) (1.00-1.14) values for the test and reference products fell within the interval of 0.80 to 1.25 proposed by most regulatory agencies, including the Brazilian regulatory body. No clinically important adverse effects were reported; only mild somnolence was reported by 4 volunteers and mild headaches by 5 volunteers, and there was no need to use medication to treat these symptoms. Conclusion: Pharmacokinetic analysis in these healthy Brazilian volunteers suggested that the test and reference formulations of tramadol 100-mg tablets met the regulatory requirements to assume bio-equivalence based on the Brazilian regulatory definition. (Clin Ther 2010;32:758-765) (C) 2010 Excerpta Medica Inc.
Resumo:
Background: Zidovudine is a thymidine nucleoside reverse transcriptase inhibitor with activity against HIV type 1. Some (similar to 8) generic formulations of zidovudine are available in Brazil; however, based on a literature search, information concerning their bioavailability and pharmacokinetic properties in the Brazilian population has not been reported. Objective: The aim of this study was to compare the bioavailability and pharmacokinetic properties of 2 capsule formulations of zidovudine 100 mg in healthy Brazilian volunteers. Methods: This open-label, randomized, 2-way crossover study utilized a 1-week washout period between doses. Blood samples were collected for 8 hours after a single dose of zidovudine 100-mg test (Zidovudina, Fundaqdo para o Remedio Popular, Sao Paulo, Brazil) or reference formulation (Retrovir (R), GlaxoSmithKline, Philadelphia, Pennsylvania). Plasma zidovudine concentrations were determined using a validated high-performance liquid chromatography method with ultraviolet detection at 265 nm. C-max, T-max, AUC(0-t), AUC(0-infinity), t(1/2), and the elimination constant (k(e)) were determined using noncompartmental analysis. The formulations were considered bioequivalent if the 90% CIS for C-max, AUC(0-t), and AUC(0-infinity) fell within the interval of 80 % to 125 %, the regulatory definition set by the US Food and Drug Administration (FDA). Results: Twenty-four healthy volunteers (12 males, 12 females; mean age, 27 years; weight, 60 kg; height, 167 cm) were enrolled and completed the study. The 90% CIs of the treatment ratios for the logarithmic-transformed values of C-max, AUC(0-t), and AUC(0-infinity) were 80.0% to 113.6%, 93.9% to 109.7%, and 93.6% to 110.1 %, respectively. The values for the test and reference formulations were within the FDA bioequivalence definition intervals of 80% to 125%. Conclusions: In this small study in healthy subjects, no statistically significant differences in C-max, AUC(0-t), and AUC(0-)infinity were found between the test and reference formulations of zidovudine 100-mg capsules. The 90% CIs for the mean ratio values for the test and reference formulations of AUC(0-t), AUC(0-infinity), and C-max indicated that the reported data were entirely within the bioequivalence acceptance range proposed by the FDA of 80% to 125% (using log-transformed data).
Resumo:
A chemotaxonomic analysis is described of a database containing various types of compounds from the Heliantheae tribe (Asteraceae) using Self-Organizing Maps (SOM). The numbers of occurrences of 9 chemical classes in different taxa of the tribe were used as variables. The study shows that SOM applied to chemical data can contribute to differentiate genera, subtribes, and groups of subtribes (subtribe branches), as well as to tribal and subtribal classifications of Heliantheae, exhibiting a high hit percentage comparable to that of an expert performance, and in agreement with the previous tribe classification proposed by Stuessy.
Resumo:
Recent efforts in the characterization of air-water flows properties have included some clustering process analysis. A cluster of bubbles is defined as a group of two or more bubbles, with a distinct separation from other bubbles before and after the cluster. The present paper compares the results of clustering processes two hydraulic structures. That is, a large-size dropshaft and a hydraulic jump in a rectangular horizontal channel. The comparison highlighted some significant differences in clustering production and structures. Both dropshaft and hydraulic jump flows are complex turbulent shear flows, and some clustering index may provide some measure of the bubble-turbulence interactions and associated energy dissipation.