853 resultados para height partition clustering


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Usually, data warehousing populating processes are data-oriented workflows composed by dozens of granular tasks that are responsible for the integration of data coming from different data sources. Specific subset of these tasks can be grouped on a collection together with their relationships in order to form higher- level constructs. Increasing task granularity allows for the generalization of processes, simplifying their views and providing methods to carry out expertise to new applications. Well-proven practices can be used to describe general solutions that use basic skeletons configured and instantiated according to a set of specific integration requirements. Patterns can be applied to ETL processes aiming to simplify not only a possible conceptual representation but also to reduce the gap that often exists between two design perspectives. In this paper, we demonstrate the feasibility and effectiveness of an ETL pattern-based approach using task clustering, analyzing a real world ETL scenario through the definitions of two commonly used clusters of tasks: a data lookup cluster and a data conciliation and integration cluster.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

When a pregnant woman is guided to a hospital for obstetrics purposes, many outcomes are possible, depending on her current conditions. An improved understanding of these conditions could provide a more direct medical approach by categorizing the different types of patients, enabling a faster response to risk situations, and therefore increasing the quality of services. In this case study, the characteristics of the patients admitted in the maternity care unit of Centro Hospitalar of Porto are acknowledged, allowing categorizing the patient women through clustering techniques. The main goal is to predict the patients’ route through the maternity care, adapting the services according to their conditions, providing the best clinical decisions and a cost-effective treatment to patients. The models developed presented very interesting results, being the best clustering evaluation index: 0.65. The evaluation of the clustering algorithms proved the viability of using clustering based data mining models to characterize pregnant patients, identifying which conditions can be used as an alert to prevent the occurrence of medical complications.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Lecture Notes in Computer Science, 9273

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes the establishment of a second diameter measuring standard at 30cm shoot extension ('diam30') as input variable for allometric biomass estimation of small and mid-sized plant shoots. This diameter standard is better suited than the diameter at breast height (DBH, i.e. diameter at 1.30m shoot extension) for adequate characterization of plant dimensions in low bushy vegetation or in primary forest undergrowth. The relationships between both diameter standards are established based on a dataset of 8645 tree, liana and palm shoots in secondary and primary forests of central Amazonia (ranging from 1-150mm dbh). Dbh can be predicted from the diam(30) with high precision, the error introduced by diameter transformation is only 2-3% for trees and palms, and 5% for lianas. This is well acceptable for most field study purposes. Relationships deviate slightly from linearity and differ between growth forms. Relationships were markedly similar for different vegetation types (low secondary regrowth vs. primary forests), soils, and selected genera or species. This points to a general validity and applicability of diameter transformations for other field studies. This study provides researchers with a tool for the allometric estimation of biomass in low or structurally heterogeneous vegetation. Rather than applying a uniform diameter standard, the measuring position which best represents the respective plant can be decided on shoot-by-shoot. Plant diameters measured at 30cm height can be transformed to dbh for subsequent allometric biomass estimation. We recommend the use of these diameter transformations only for plants extending well beyond the theoretical minimum shoot length (i.e., >2m height). This study also prepares the ground for the comparability and compatability of future allometric equations specifically developed for small- to mid-sized vegetation components (i.e., bushes, undergrowth) which are based on the diam(30) measuring standard.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Partition behavior of adenosine and guanine mononucleotides was examined in aqueous dextran-polyethylene glycol (PEG) and PEG-sodium sulfate two-phase systems. The partition coefficients for each series of mononucleotides were analyzed as a functions of the number of phosphate groups and found to be dependent on the nature of nucleic base and on the type of \ATPS\ utilized. It was concluded that an average contribution of a phosphate group into logarithm of partition coefficient of a mononucleotide cannot be used to estimate the difference between the electrostatic properties of the coexisting phases of ATPS. The data obtained in this study were considered together with those for other organic compounds and proteins reported previously, and the linear interrelationship between logarithms of partition coefficients in dextran-PEG, PEG-Na2SO4 and PEG-Na2SO4-0.215 M NaCl (all in 0.01 M Na- or K/Na-phosphate buffer, pH 7.4 or 6.8) was established. Similar relationship was found for the previously reported data for proteins in Dex-PEG, PEG-600-Na2SO4, and PEG-8000-Na2SO4 ATPS. It is suggested that the linear relationships of the kind established in \ATPS\ may be observed for biological properties of compounds as well.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

OBJECTIVE - A population-based prospective study was analysed to: a) determine the prevalence of hypertension; b) investigate the clustering of other cardiovascular risk factors and c) verify whether older differed from younger adults in the pattern of clustering. METHODS - The data comprised a representative sample of the population of Bambuí, Brazil. Multiple logistic regression was used to investigate the independent association between hypertension and selected factors. RESULTS - A total of 820 younger adults (82.5%) and 1494 older adults (85.9%) participated in this study. The overall prevalence of hypertension was 24.8% (SE=1.4 %), being higher in women (26.9±1.5%) than in men (22.0± 1.7%) (p=0.033). Hypertension was positively and significantly associated with physical inactivity, overweight, hypercholesterolemia hyperglycemia and hypertriglyceridemia. The coexistence of hypertension with 4 or more of these risk factors occurred 6 times more than expected by chance, after adjusting for age and sex (OR=6.3; 95%CI: 3.4-11.9). The pattern of risk factor clustering in hypertensive individuals differed with age. CONCLUSION - Our results reinforce the need to increase detection and treatment of hypertension and to approach patients' global risk profiles.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Data analysis, fuzzy clustering, fuzzy rules, air traffic management

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Magdeburg, Univ., Fak. für Informatik, Habil.-Schr., 2006

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Magdeburg, Univ., Fak. für Inf., Diss., 2014

Relevância:

20.00% 20.00%

Publicador:

Resumo:

...In dieser Arbeit untersuche ich den ”Fluch der Dimensionen” mittels dem Begriff der Distanzkonzentration. Ich zeige, dass dieser Effekt im Datenmodell mittels der paarweisen Kovarianzkoeffizienten der Randverteilungen beschrieben werden kann. Zusätzlich vergleiche ich 10 prototypbasierte Clusteralgorithmen mittels 800.000 Clusterergebnissen von künstlich erzeugten Datensätzen. Ich erforsche, wie und warum Clusteralgorithmen von der Anzahl der Merkmale beeinflusst werden. Mit den Clusterergebnissen untersuche ich außerdem, wie gut 5 der populärsten Clusterqualitätsmaße die tatsächliche Clusterqualität schätzen.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The aim of this paper is to suggest a method to find endogenously the points that group the individuals of a given distribution in k clusters, where k is endogenously determined. These points are the cut-points. Thus, we need to determine a partition of the N individuals into a number k of groups, in such way that individuals in the same group are as alike as possible, but as distinct as possible from individuals in other groups. This method can be applied to endogenously identify k groups in income distributions: possible applications can be poverty

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We give a case-free proof that the lattice of noncrossing partitions associated to any finite real reflection group is EL-shellable. Shellability of these lattices was open for the groups of type Dn and those of exceptional type and rank at least three.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The long term goal of this research is to develop a program able to produce an automatic segmentation and categorization of textual sequences into discourse types. In this preliminary contribution, we present the construction of an algorithm which takes a segmented text as input and attempts to produce a categorization of sequences, such as narrative, argumentative, descriptive and so on. Also, this work aims at investigating a possible convergence between the typological approach developed in particular in the field of text and discourse analysis in French by Adam (2008) and Bronckart (1997) and unsupervised statistical learning.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Memòria elaborada a partir d’una estada al projecte Proteus de la New York University entre abril i juny del 2007. Les tècniques de clustering poden ajudar a reduir la supervisió en processos d’obtenció de patrons per a Extracció d’Informació. Tanmateix, és necessari disposar d’algorismes adequats a documents, i aquests algorismes requereixen mesures adequades de similitud entre patrons. Els kernels poden oferir una solució a aquests problemes, però l’aprenentatge no supervisat requereix d’estrat`egies m´es astutes que l’aprenentatge supervisat per a incorporar major quantitat d’informació. En aquesta memòria, fruit de la meva estada de mes d’Abril al de Juny de 2007 al projecte. Proteus de la New York University, es proposen i avaluen diversos kernels sobre patrons. Ini- cialment s’estudien kernels amb una família de patrons restringits, i a continuació s’apliquen kernels ja usats en tasques supervisades d’Extracció d’Informació. Degut a la degradació del rendiment que experimenta el clustering a l’afegir informació irrellevant, els kernels se simpli- fiquen i es busquen estratègies per a incorporar-hi semàntica de forma selectiva. Finalment, s’estudia quin efecte té aplicar clustering sobre el coneixement semàntic com a pas previ al clustering de patrons. Les diverses estratègies s’avaluen en tasques de clustering de documents i patrons usant dades reals.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper develops a methodology to estimate the entire population distributions from bin-aggregated sample data. We do this through the estimation of the parameters of mixtures of distributions that allow for maximal parametric flexibility. The statistical approach we develop enables comparisons of the full distributions of height data from potential army conscripts across France's 88 departments for most of the nineteenth century. These comparisons are made by testing for differences-of-means stochastic dominance. Corrections for possible measurement errors are also devised by taking advantage of the richness of the data sets. Our methodology is of interest to researchers working on historical as well as contemporary bin-aggregated or histogram-type data, something that is still widely done since much of the information that is publicly available is in that form, often due to restrictions due to political sensitivity and/or confidentiality concerns.