Biblioteca Digital

853 resultados para height partition clustering

Simcluster: clustering enumeration gene expression data on the simplex space

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Abstract Background Transcript enumeration methods such as SAGE, MPSS, and sequencing-by-synthesis EST "digital northern", are important high-throughput techniques for digital gene expression measurement. As other counting or voting processes, these measurements constitute compositional data exhibiting properties particular to the simplex space where the summation of the components is constrained. These properties are not present on regular Euclidean spaces, on which hybridization-based microarray data is often modeled. Therefore, pattern recognition methods commonly used for microarray data analysis may be non-informative for the data generated by transcript enumeration techniques since they ignore certain fundamental properties of this space. Results Here we present a software tool, Simcluster, designed to perform clustering analysis for data on the simplex space. We present Simcluster as a stand-alone command-line C package and as a user-friendly on-line tool. Both versions are available at: http://xerad.systemsbiology.net/simcluster. Conclusion Simcluster is designed in accordance with a well-established mathematical framework for compositional data analysis, which provides principled procedures for dealing with the simplex space, and is thus applicable in a number of contexts, including enumeration-based gene expression data.

Is waist-to-height ratio a useful indicator of cardio-metabolic risk in 6-10-year-old children?

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: Childhood obesity is a public health problem worldwide. Visceral obesity, particularly associated with cardio-metabolic risk, has been assessed by body mass index (BMI) and waist circumference, but both methods use sex-and age-specific percentile tables and are influenced by sexual maturity. Waist-to-height ratio (WHtR) is easier to obtain, does not involve tables and can be used to diagnose visceral obesity, even in normal-weight individuals. This study aims to compare the WHtR to the 2007 World Health Organization (WHO) reference for BMI in screening for the presence of cardio-metabolic and inflammatory risk factors in 6–10-year-old children. Methods: A cross-sectional study was undertaken with 175 subjects selected from the Reference Center for the Treatment of Children and Adolescents in Campos, Rio de Janeiro, Brazil. The subjects were classified according to the 2007 WHO standard as normal-weight (BMI z score > −1 and < 1) or overweight/obese (BMI z score ≥ 1). Systolic blood pressure (SBP), diastolic blood pressure (DBP), fasting glycemia, low-density lipoprotein (LDL), high-density lipoprotein (HDL), triglyceride (TG), Homeostatic Model Assessment – Insulin Resistance (HOMA-IR), leukocyte count and ultrasensitive C-reactive protein (CRP) were also analyzed. Results: There were significant correlations between WHtR and BMI z score (r = 0.88, p < 0.0001), SBP (r = 0.51, p < 0.0001), DBP (r = 0.49, p < 0.0001), LDL (r = 0.25, p < 0.0008, HDL (r = −0.28, p < 0.0002), TG (r = 0.26, p < 0.0006), HOMA-IR (r = 0.83, p < 0.0001) and CRP (r = 0.51, p < 0.0001). WHtR and BMI areas under the curve were similar for all the cardio-metabolic parameters. A WHtR cut-off value of > 0.47 was sensitive for screening insulin resistance and any one of the cardio-metabolic parameters. Conclusions: The WHtR was as sensitive as the 2007 WHO BMI in screening for metabolic risk factors in 6-10-year-old children. The public health message “keep your waist to less than half your height” can be effective in reducing cardio-metabolic risk because most of these risk factors are already present at a cut point of WHtR ≥ 0.5. However, as this is the first study to correlate the WHtR with inflammatory markers, we recommend further exploration of the use of WHtR in this age group and other population-based samples.

Functional clustering of time series gene expression data by Granger causality

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: A common approach for time series gene expression data analysis includes the clustering of genes with similar expression patterns throughout time. Clustered gene expression proﬁles point to the joint contribution of groups of genes to a particular cellular process. However, since genes belong to intricate networks, other features, besides comparable expression patterns, should provide additional information for the identiﬁcation of functionally similar genes. Results: In this study we perform gene clustering through the identiﬁcation of Granger causality between and within sets of time series gene expression data. Granger causality is based on the idea that the cause of an event cannot come after its consequence. Conclusions: This kind of analysis can be used as a complementary approach for functional clustering, wherein genes would be clustered not solely based on their expression similarity but on their topological proximity built according to the intensity of Granger causality among them.

Vertical growth of mini watermelon according to the training height and plant density

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The watermelon is traditionally cultivated horizontally on the ground. The cultivars of small fruits (1 to 3 kg), which reach better market prices, are also being grown in a greenhouse, where the plants are trained upward on vertical supports, with branches pruning and fruits thinning. These practices make possible an increase of the plant density, fruit quality and yield compared to the traditional growth system. The aim of this experiment was to evaluate the influence of three training heights (1.7, 2.2 and 2.7 m) and two planting densities (3.17 and 4.76 plants m-2) over the productive and qualitative characteristics of mini watermelon "Smile" cultivated in greenhouse. The pruning was done at 43, 55 and 66 days after transplanting (DAT), when the plant height reached 1.7, 2.2 and 2.7 m, respectively. The dry mass of branches, petioles, leaves and total were affected by the training height, where the highest values were obtained by the plants pruned at 2.2 and 2.7 m. Leaf area, specific leaf area and leaf area index were not affected by the height of the plants. The training height of 2.7 m raised the total yield, however, marketable yield, average fruit mass and all the quality characteristics did not differ significantly from those obtained by the training height of 2.2 m. Regarding to plant density, the best option was 4.76 plants m-2, due to the increasing of marketable yield in 37.4% without reducing the average weight of fruits.

Waist circumference and waist circumference to height ratios of Kaingáng indigenous adolescents from the State of Rio Grande do Sul, Brazil

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The aim of this study was to describe the distribution of waist circumference (WC) and WC to height (WCTH) values among Kaingáng indigenous adolescents in order to estimate the prevalence of high WCTH values and evaluate the correlation between WC and WCTH and body mass index (BMI)-for-age. A total of 1,803 indigenous adolescents were evaluated using a school-based cross-sectional study. WCTH values > 0.5 were considered high. Higher mean WC and WCTH values were observed for girls in all age categories. WCTH values > 0.5 were observed in 25.68% of the overall sample of adolescents. Mean WC and WCTH values were significantly higher for adolescents with BMI/age z-scores > 2 than for those with normal z-scores. The correlation coefficients of WC and WCTH for BMI/age were r = 0.68 and 0.76, respectively, for boys, and r = 0.79 and 0.80, respectively, for girls. This study highlights elevated mean WC and WCTH values and high prevalence of abdominal obesity among Kaingáng indigenous adolescents.

Clustering of variables around latent components: an application in consumer science

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The present work proposes a method based on CLV (Clustering around Latent Variables) for identifying groups of consumers in L-shape data. This kind of datastructure is very common in consumer studies where a panel of consumers is asked to assess the global liking of a certain number of products and then, preference scores are arranged in a two-way table Y. External information on both products (physicalchemical description or sensory attributes) and consumers (socio-demographic background, purchase behaviours or consumption habits) may be available in a row descriptor matrix X and in a column descriptor matrix Z respectively. The aim of this method is to automatically provide a consumer segmentation where all the three matrices play an active role in the classification, getting homogeneous groups from all points of view: preference, products and consumer characteristics. The proposed clustering method is illustrated on data from preference studies on food products: juices based on berry fruits and traditional cheeses from Trentino. The hedonic ratings given by the consumer panel on the products under study were explained with respect to the product chemical compounds, sensory evaluation and consumer socio-demographic information, purchase behaviour and consumption habits.

Gestione operativa della logistica distributiva mediante tecniche di clustering e vehicle routing: la piattaforma software LOG-OPTIMIZER

Relevância:

20.00% 20.00%

Publicador:

Ranking & Clustering: applicazione a un dataset reale e valutazione dei risultati

Relevância:

20.00% 20.00%

Publicador:

Automated clustering in collaborative tagging systems: scopes and method

Relevância:

20.00% 20.00%

Publicador:

Detection and Modeling of Boundary Layer Height

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis tackles the problem of the automated detection of the atmospheric boundary layer (BL) height, h, from aerosol lidar/ceilometer observations. A new method, the Bayesian Selective Method (BSM), is presented. It implements a Bayesian statistical inference procedure which combines in an statistically optimal way different sources of information. Firstly atmospheric stratification boundaries are located from discontinuities in the ceilometer back-scattered signal. The BSM then identifies the discontinuity edge that has the highest probability to effectively mark the BL height. Information from the contemporaneus physical boundary layer model simulations and a climatological dataset of BL height evolution are combined in the assimilation framework to assist this choice. The BSM algorithm has been tested for four months of continuous ceilometer measurements collected during the BASE:ALFA project and is shown to realistically diagnose the BL depth evolution in many different weather conditions. Then the BASE:ALFA dataset is used to investigate the boundary layer structure in stable conditions. Functions from the Obukhov similarity theory are used as regression curves to fit observed velocity and temperature profiles in the lower half of the stable boundary layer. Surface fluxes of heat and momentum are best-fitting parameters in this exercise and are compared with what measured by a sonic anemometer. The comparison shows remarkable discrepancies, more evident in cases for which the bulk Richardson number turns out to be quite large. This analysis supports earlier results, that surface turbulent fluxes are not the appropriate scaling parameters for profiles of mean quantities in very stable conditions. One of the practical consequences is that boundary layer height diagnostic formulations which mainly rely on surface fluxes are in disagreement to what obtained by inspecting co-located radiosounding profiles.

Progettazione e sviluppo di una versione distribuita di un algoritmo di subspace clustering

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Il task del data mining si pone come obiettivo l'estrazione automatica di schemi significativi da grandi quantità di dati. Un esempio di schemi che possono essere cercati sono raggruppamenti significativi dei dati, si parla in questo caso di clustering. Gli algoritmi di clustering tradizionali mostrano grossi limiti in caso di dataset ad alta dimensionalità, composti cioè da oggetti descritti da un numero consistente di attributi. Di fronte a queste tipologie di dataset è necessario quindi adottare una diversa metodologia di analisi: il subspace clustering. Il subspace clustering consiste nella visita del reticolo di tutti i possibili sottospazi alla ricerca di gruppi signicativi (cluster). Una ricerca di questo tipo è un'operazione particolarmente costosa dal punto di vista computazionale. Diverse ottimizzazioni sono state proposte al fine di rendere gli algoritmi di subspace clustering più efficienti. In questo lavoro di tesi si è affrontato il problema da un punto di vista diverso: l'utilizzo della parallelizzazione al fine di ridurre il costo computazionale di un algoritmo di subspace clustering.

Clustering di ammassi di galassie con cataloghi otticamente selezionati

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In questo lavoro di tesi si è studiato il clustering degli ammassi di galassie e la determinazione della posizione del picco BAO per ottenere vincoli sui parametri cosmologici. A tale scopo si è implementato un codice per la stima dell'errore tramite i metodi di jackknife e bootstrap. La misura del picco BAO confrontata con i modelli cosmologici, grazie all'errore stimato molto piccolo, è risultato in accordo con il modelli LambdaCDM, e permette di ottenere vincoli su alcuni parametri dei modelli cosmologici.

A clustering method for robust and reliable large scale functional and structural protein sequence annotation

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Bioinformatics, in the last few decades, has played a fundamental role to give sense to the huge amount of data produced. Obtained the complete sequence of a genome, the major problem of knowing as much as possible of its coding regions, is crucial. Protein sequence annotation is challenging and, due to the size of the problem, only computational approaches can provide a feasible solution. As it has been recently pointed out by the Critical Assessment of Function Annotations (CAFA), most accurate methods are those based on the transfer-by-homology approach and the most incisive contribution is given by cross-genome comparisons. In the present thesis it is described a non-hierarchical sequence clustering method for protein automatic large-scale annotation, called “The Bologna Annotation Resource Plus” (BAR+). The method is based on an all-against-all alignment of more than 13 millions protein sequences characterized by a very stringent metric. BAR+ can safely transfer functional features (Gene Ontology and Pfam terms) inside clusters by means of a statistical validation, even in the case of multi-domain proteins. Within BAR+ clusters it is also possible to transfer the three dimensional structure (when a template is available). This is possible by the way of cluster-specific HMM profiles that can be used to calculate reliable template-to-target alignments even in the case of distantly related proteins (sequence identity < 30%). Other BAR+ based applications have been developed during my doctorate including the prediction of Magnesium binding sites in human proteins, the ABC transporters superfamily classification and the functional prediction (GO terms) of the CAFA targets. Remarkably, in the CAFA assessment, BAR+ placed among the ten most accurate methods. At present, as a web server for the functional and structural protein sequence annotation, BAR+ is freely available at http://bar.biocomp.unibo.it/bar2.0.

Sviluppo di algoritmi di grid data clustering basati su metodi statistici

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Lo scopo del clustering è quindi quello di individuare strutture nei dati significative, ed è proprio dalla seguente definizione che è iniziata questa attività di tesi , fornendo un approccio innovativo ed inesplorato al cluster, ovvero non ricercando la relazione ma ragionando su cosa non lo sia. Osservando un insieme di dati ,cosa rappresenta la non relazione? Una domanda difficile da porsi , che ha intrinsecamente la sua risposta, ovvero l’indipendenza di ogni singolo dato da tutti gli altri. La ricerca quindi dell’indipendenza tra i dati ha portato il nostro pensiero all’approccio statistico ai dati , in quanto essa è ben descritta e dimostrata in statistica. Ogni punto in un dataset, per essere considerato “privo di collegamenti/relazioni” , significa che la stessa probabilità di essere presente in ogni elemento spaziale dell’intero dataset. Matematicamente parlando , ogni punto P in uno spazio S ha la stessa probabilità di cadere in una regione R ; il che vuol dire che tale punto può CASUALMENTE essere all’interno di una qualsiasi regione del dataset. Da questa assunzione inizia il lavoro di tesi, diviso in più parti. Il secondo capitolo analizza lo stato dell’arte del clustering, raffrontato alla crescente problematica della mole di dati, che con l’avvento della diffusione della rete ha visto incrementare esponenzialmente la grandezza delle basi di conoscenza sia in termini di attributi (dimensioni) che in termini di quantità di dati (Big Data). Il terzo capitolo richiama i concetti teorico-statistici utilizzati dagli algoritimi statistici implementati. Nel quarto capitolo vi sono i dettagli relativi all’implementazione degli algoritmi , ove sono descritte le varie fasi di investigazione ,le motivazioni sulle scelte architetturali e le considerazioni che hanno portato all’esclusione di una delle 3 versioni implementate. Nel quinto capitolo gli algoritmi 2 e 3 sono confrontati con alcuni algoritmi presenti in letteratura, per dimostrare le potenzialità e le problematiche dell’algoritmo sviluppato , tali test sono a livello qualitativo , in quanto l’obbiettivo del lavoro di tesi è dimostrare come un approccio statistico può rivelarsi un’arma vincente e non quello di fornire un nuovo algoritmo utilizzabile nelle varie problematiche di clustering. Nel sesto capitolo saranno tratte le conclusioni sul lavoro svolto e saranno elencati i possibili interventi futuri dai quali la ricerca appena iniziata del clustering statistico potrebbe crescere.

Do centimetres matter? Self-reported versus estimated height measurements in parents

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An impressive discrepancy between reported and measured parental height is often observed. The aims of this study were: (a) to assess whether there is a significant difference between the reported and measured parental height; (b) to focus on the reported and, thereafter, measured height of the partner; (c) to analyse its impact on the calculated target height range.

«
1
2
...
29
30
31
32
33
34
35
...
56
57
»