853 resultados para height partition clustering


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Our purpose is to provide a set-theoretical frame to clustering fuzzy relational data basically based on cardinality of the fuzzy subsets that represent objects and their complementaries, without applying any crisp property. From this perspective we define a family of fuzzy similarity indexes which includes a set of fuzzy indexes introduced by Tolias et al, and we analyze under which conditions it is defined a fuzzy proximity relation. Following an original idea due to S. Miyamoto we evaluate the similarity between objects and features by means the same mathematical procedure. Joining these concepts and methods we establish an algorithm to clustering fuzzy relational data. Finally, we present an example to make clear all the process

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Estudi, disseny i implementació de diferents tècniques d’agrupament defibres (clustering) per tal d’integrar a la plataforma DTIWeb diferentsalgorismes de clustering i tècniques de visualització de clústers de fibres de forma quefaciliti la interpretació de dades de DTI als especialistes

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A methodology of exploratory data analysis investigating the phenomenon of orographic precipitation enhancement is proposed. The precipitation observations obtained from three Swiss Doppler weather radars are analysed for the major precipitation event of August 2005 in the Alps. Image processing techniques are used to detect significant precipitation cells/pixels from radar images while filtering out spurious effects due to ground clutter. The contribution of topography to precipitation patterns is described by an extensive set of topographical descriptors computed from the digital elevation model at multiple spatial scales. Additionally, the motion vector field is derived from subsequent radar images and integrated into a set of topographic features to highlight the slopes exposed to main flows. Following the exploratory data analysis with a recent algorithm of spectral clustering, it is shown that orographic precipitation cells are generated under specific flow and topographic conditions. Repeatability of precipitation patterns in particular spatial locations is found to be linked to specific local terrain shapes, e.g. at the top of hills and on the upwind side of the mountains. This methodology and our empirical findings for the Alpine region provide a basis for building computational data-driven models of orographic enhancement and triggering of precipitation. Copyright (C) 2011 Royal Meteorological Society .

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this project a research both in finding predictors via clustering techniques and in reviewing the Data Mining free software is achieved. The research is based in a case of study, from where additionally to the KDD free software used by the scientific community; a new free tool for pre-processing the data is presented. The predictors are intended for the e-learning domain as the data from where these predictors have to be inferred are student qualifications from different e-learning environments. Through our case of study not only clustering algorithms are tested but also additional goals are proposed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

OBJECTIVE: : Identification of children with elevated blood pressure (BP) is difficult because of the multiple sex, age, and height-specific thresholds to define elevated BP. We propose a simple set of absolute height-specific BP thresholds and evaluate their performance to identify children with elevated BP in two different populations. METHODS: : Using the 95th sex, age, and relative-height BP US thresholds to define elevated BP in children (standard criteria), we derived a set of (non sex- and non age-specific) absolute height-specific BP thresholds for 11 height categories by 10 cm increments. Using data from large school-based surveys conducted in Switzerland (N = 5207; 2621 boys, 2586 girls; age range: 10.1-14.9 years) and in the Seychelles (N = 25 759; 13 048 boys, 12 711 girls; age range: 4.4-18.8 years), we evaluated the performance of these height-specific thresholds to identify children with elevated BP. We also derived sex-specific absolute height-specific BP thresholds and compared their performance. RESULTS: : In the Swiss and the Seychelles surveys, the prevalence of elevated BP (standard criteria) was 11.4 and 9.1%, respectively. The height-specific thresholds to identify elevated BP had a sensitivity of 80 and 84%, a specificity of 99 and 99%, a positive predictive value of 92 and 91%, and a negative predictive value of 97 and 98%, respectively. Performance of sex-specific absolute height-specific BP thresholds was similar. CONCLUSION: : A simple table of height-specific BP thresholds allowed identifying children with elevated BP with high sensitivity and excellent specificity.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

HEMOLIA (a project under European community’s 7th framework programme) is a new generation Anti-Money Laundering (AML) intelligent multi-agent alert and investigation system which in addition to the traditional financial data makes extensive use of modern society’s huge telecom data source, thereby opening up a new dimension of capabilities to all Money Laundering fighters (FIUs, LEAs) and Financial Institutes (Banks, Insurance Companies, etc.). This Master-Thesis project is done at AIA, one of the partners for the HEMOLIA project in Barcelona. The objective of this thesis is to find the clusters in a network drawn by using the financial data. An extensive literature survey has been carried out and several standard algorithms related to networks have been studied and implemented. The clustering problem is a NP-hard problem and several algorithms like K-Means and Hierarchical clustering are being implemented for studying several problems relating to sociology, evolution, anthropology etc. However, these algorithms have certain drawbacks which make them very difficult to implement. The thesis suggests (a) a possible improvement to the K-Means algorithm, (b) a novel approach to the clustering problem using the Genetic Algorithms and (c) a new algorithm for finding the cluster of a node using the Genetic Algorithm.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

OBJECTIVE: This study assessed clustering of multiple risk behaviors (i.e., low leisure-time physical activity, low fruits/vegetables intake, and high alcohol consumption) with level of cigarette consumption. METHODS: Data from the 2002 Swiss Health Survey, a population-based cross-sectional telephone survey assessing health and self-reported risk behaviors, were used. 18,005 subjects (8052 men and 9953 women) aged 25 years old or more participated. RESULTS: Smokers more frequently had low leisure time physical activity, low fruits/vegetables intake, and high alcohol consumption than non- and ex-smokers. Frequency of each risk behavior increased steadily with cigarette consumption. Clustering of risk behaviors increased with cigarette consumption in both men and women. For men, the odds ratios of multiple (> or =2) risk behaviors other than smoking, adjusted for age, nationality, and educational level, were 1.14 (95% confidence interval: 0.97, 1.33) for ex-smokers, 1.24 (0.93, 1.64) for light smokers (1-9 cigarettes/day), 1.72 (1.36, 2.17) for moderate smokers (10-19 cigarettes/day), and 3.07 (2.59, 3.64) for heavy smokers (> or =20 cigarettes/day) versus non-smokers. Similar odds ratios were found for women for corresponding groups, i.e., 1.01 (0.86, 1.19), 1.26 (1.00, 1.58), 1.62 (1.33, 1.98), and 2.75 (2.30, 3.29). CONCLUSIONS: Counseling and intervention with smokers should take into account the strong clustering of risk behaviors with level of cigarette consumption.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Our essay aims at studying suitable statistical methods for the clustering ofcompositional data in situations where observations are constituted by trajectories ofcompositional data, that is, by sequences of composition measurements along a domain.Observed trajectories are known as “functional data” and several methods have beenproposed for their analysis.In particular, methods for clustering functional data, known as Functional ClusterAnalysis (FCA), have been applied by practitioners and scientists in many fields. To ourknowledge, FCA techniques have not been extended to cope with the problem ofclustering compositional data trajectories. In order to extend FCA techniques to theanalysis of compositional data, FCA clustering techniques have to be adapted by using asuitable compositional algebra.The present work centres on the following question: given a sample of compositionaldata trajectories, how can we formulate a segmentation procedure giving homogeneousclasses? To address this problem we follow the steps described below.First of all we adapt the well-known spline smoothing techniques in order to cope withthe smoothing of compositional data trajectories. In fact, an observed curve can bethought of as the sum of a smooth part plus some noise due to measurement errors.Spline smoothing techniques are used to isolate the smooth part of the trajectory:clustering algorithms are then applied to these smooth curves.The second step consists in building suitable metrics for measuring the dissimilaritybetween trajectories: we propose a metric that accounts for difference in both shape andlevel, and a metric accounting for differences in shape only.A simulation study is performed in order to evaluate the proposed methodologies, usingboth hierarchical and partitional clustering algorithm. The quality of the obtained resultsis assessed by means of several indices

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Immobile location-allocation (LA) problems is a type of LA problem that consists in determining the service each facility should offer in order to optimize some criterion (like the global demand), given the positions of the facilities and the customers. Due to the complexity of the problem, i.e. it is a combinatorial problem (where is the number of possible services and the number of facilities) with a non-convex search space with several sub-optimums, traditional methods cannot be applied directly to optimize this problem. Thus we proposed the use of clustering analysis to convert the initial problem into several smaller sub-problems. By this way, we presented and analyzed the suitability of some clustering methods to partition the commented LA problem. Then we explored the use of some metaheuristic techniques such as genetic algorithms, simulated annealing or cuckoo search in order to solve the sub-problems after the clustering analysis

Relevância:

20.00% 20.00%

Publicador:

Resumo:

ECG criteria for left ventricular hypertrophy (LVH) have been almost exclusively elaborated and calibrated in white populations. Because several interethnic differences in ECG characteristics have been found, the applicability of these criteria to African individuals remains to be demonstrated. We therefore investigated the performance of classic ECG criteria for LVH detection in an African population. Digitized 12-lead ECG tracings were obtained from 334 African individuals randomly selected from the general population of the Republic of Seychelles (Indian Ocean). Left ventricular mass was calculated with M-mode echocardiography and indexed to body height. LVH was defined by taking the 95th percentile of body height-indexed LVM values in a reference subgroup. In the entire study sample, 16 men and 15 women (prevalence 9.3%) were finally declared to have LVH, of whom 9 were of the reference subgroup. Sensitivity, specificity, accuracy, and positive and negative predictive values for LVH were calculated for 9 classic ECG criteria, and receiver operating characteristic curves were computed. We also generated a new composite time-voltage criterion with stepwise multiple linear regression: weighted time-voltage criterion=(0.2366R(aVL)+0.0551R(V5)+0.0785S(V3)+ 0.2993T(V1))xQRS duration. The Sokolow-Lyon criterion reached the highest sensitivity (61%) and the R(aVL) voltage criterion reached the highest specificity (97%) when evaluated at their traditional partition value. However, at a fixed specificity of 95%, the sensitivity of these 10 criteria ranged from 16% to 32%. Best accuracy was obtained with the R(aVL) voltage criterion and the new composite time-voltage criterion (89% for both). Positive and negative predictive values varied considerably depending on the concomitant presence of 3 clinical risk factors for LVH (hypertension, age >/=50 years, overweight). Median positive and negative predictive values of the 10 ECG criteria were 15% and 95%, respectively, for subjects with none or 1 of these risk factors compared with 63% and 76% for subjects with all of them. In conclusion, the performance of classic ECG criteria for LVH detection was largely disparate and appeared to be lower in this population of East African origin than in white subjects. A newly generated composite time-voltage criterion might provide improved performance. The predictive value of ECG criteria for LVH was considerably enhanced with the integration of information on concomitant clinical risk factors for LVH.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Abstract: To cluster textual sequence types (discourse types/modes) in French texts, K-means algorithm with high-dimensional embeddings and fuzzy clustering algorithm were applied on clauses whose POS (part-ofspeech) n-gram profiles were previously extracted. Uni-, bi- and trigrams were used on four 19th century French short stories by Maupassant. For high-dimensional embeddings, power transformations on the chi-squared distances between clauses were explored. Preliminary results show that highdimensional embeddings improve the quality of clustering, contrasting the use of bi and trigrams whose performance is disappointing, possibly because of feature space sparsity.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

BACKGROUND: The diagnosis of hypertension in children is difficult because of the multiple sex-, age-, and height-specific thresholds to define elevated blood pressure (BP). Blood pressure-to-height ratio (BPHR) has been proposed to facilitate the identification of elevated BP in children. OBJECTIVE: We assessed the performance of BPHR at a single screening visit to identify children with hypertension that is sustained elevated BP. METHOD: In a school-based study conducted in Switzerland, BP was measured at up to three visits in 5207 children. Children had hypertension if BP was elevated at the three visits. Sensitivity, specificity, negative predictive value (NPV), and positive predictive value (PPV) for the identification of hypertension were assessed for different thresholds of BPHR. The ability of BPHR at a single screening visit to discriminate children with and without hypertension was evaluated with receiver operating characteristic (ROC) curve analyses. RESULTS: The prevalence of systolic/diastolic hypertension was 2.2%. Systolic BPHR had a better performance to identify hypertension compared with diastolic BPHR (area under the ROC curve: 0.95 vs. 0.84). The highest performance was obtained with a systolic BPHR threshold set at 0.80 mmHg/cm (sensitivity: 98%; specificity: 85%; PPV: 12%; and NPV: 100%) and a diastolic BPHR threshold set at 0.45 mmHg/cm (sensitivity: 79%; specificity: 70%; PPV: 5%; and NPV: 99%). The PPV was higher among tall or overweight children. CONCLUSION: BPHR at a single screening visit had a high performance to identify hypertension in children, although the low prevalence of hypertension led to a low PPV.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

BACKGROUND: The trithorax group (trxG) and Polycomb group (PcG) proteins are responsible for the maintenance of stable transcriptional patterns of many developmental regulators. They bind to specific regions of DNA and direct the post-translational modifications of histones, playing a role in the dynamics of chromatin structure. RESULTS: We have performed genome-wide expression studies of trx and ash2 mutants in Drosophila melanogaster. Using computational analysis of our microarray data, we have identified 25 clusters of genes potentially regulated by TRX. Most of these clusters consist of genes that encode structural proteins involved in cuticle formation. This organization appears to be a distinctive feature of the regulatory networks of TRX and other chromatin regulators, since we have observed the same arrangement in clusters after experiments performed with ASH2, as well as in experiments performed by others with NURF, dMyc, and ASH1. We have also found many of these clusters to be significantly conserved in D. simulans, D. yakuba, D. pseudoobscura and partially in Anopheles gambiae. CONCLUSION: The analysis of genes governed by chromatin regulators has led to the identification of clusters of functionally related genes conserved in other insect species, suggesting this chromosomal organization is biologically important. Moreover, our results indicate that TRX and other chromatin regulators may act globally on chromatin domains that contain transcriptionally co-regulated genes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Acquiring lexical information is a complex problem, typically approached by relying on a number of contexts to contribute information for classification. One of the first issues to address in this domain is the determination of such contexts. The work presented here proposes the use of automatically obtained FORMAL role descriptors as features used to draw nouns from the same lexical semantic class together in an unsupervised clustering task. We have dealt with three lexical semantic classes (HUMAN, LOCATION and EVENT) in English. The results obtained show that it is possible to discriminate between elements from different lexical semantic classes using only FORMAL role information, hence validating our initial hypothesis. Also, iterating our method accurately accounts for fine-grained distinctions within lexical classes, namely distinctions involving ambiguous expressions. Moreover, a filtering and bootstrapping strategy employed in extracting FORMAL role descriptors proved to minimize effects of sparse data and noise in our task.