990 resultados para Evidence accumulation clustering
Resumo:
A definition of medium voltage (MV) load diagrams was made, based on the data base knowledge discovery process. Clustering techniques were used as support for the agents of the electric power retail markets to obtain specific knowledge of their customers’ consumption habits. Each customer class resulting from the clustering operation is represented by its load diagram. The Two-step clustering algorithm and the WEACS approach based on evidence accumulation (EAC) were applied to an electricity consumption data from a utility client’s database in order to form the customer’s classes and to find a set of representative consumption patterns. The WEACS approach is a clustering ensemble combination approach that uses subsampling and that weights differently the partitions in the co-association matrix. As a complementary step to the WEACS approach, all the final data partitions produced by the different variations of the method are combined and the Ward Link algorithm is used to obtain the final data partition. Experiment results showed that WEACS approach led to better accuracy than many other clustering approaches. In this paper the WEACS approach separates better the customer’s population than Two-step clustering algorithm.
Resumo:
With the electricity market liberalization, the distribution and retail companies are looking for better market strategies based on adequate information upon the consumption patterns of its electricity consumers. A fair insight on the consumers’ behavior will permit the definition of specific contract aspects based on the different consumption patterns. In order to form the different consumers’ classes, and find a set of representative consumption patterns we use electricity consumption data from a utility client’s database and two approaches: Two-step clustering algorithm and the WEACS approach based on evidence accumulation (EAC) for combining partitions in a clustering ensemble. While EAC uses a voting mechanism to produce a co-association matrix based on the pairwise associations obtained from N partitions and where each partition has equal weight in the combination process, the WEACS approach uses subsampling and weights differently the partitions. As a complementary step to the WEACS approach, we combine the partitions obtained in the WEACS approach with the ALL clustering ensemble construction method and we use the Ward Link algorithm to obtain the final data partition. The characterization of the obtained consumers’ clusters was performed using the C5.0 classification algorithm. Experiment results showed that the WEACS approach leads to better results than many other clustering approaches.
Resumo:
The Evidence Accumulation Clustering (EAC) paradigm is a clustering ensemble method which derives a consensus partition from a collection of base clusterings obtained using different algorithms. It collects from the partitions in the ensemble a set of pairwise observations about the co-occurrence of objects in a same cluster and it uses these co-occurrence statistics to derive a similarity matrix, referred to as co-association matrix. The Probabilistic Evidence Accumulation for Clustering Ensembles (PEACE) algorithm is a principled approach for the extraction of a consensus clustering from the observations encoded in the co-association matrix based on a probabilistic model for the co-association matrix parameterized by the unknown assignments of objects to clusters. In this paper we extend the PEACE algorithm by deriving a consensus solution according to a MAP approach with Dirichlet priors defined for the unknown probabilistic cluster assignments. In particular, we study the positive regularization effect of Dirichlet priors on the final consensus solution with both synthetic and real benchmark data.
Resumo:
Clustering ensemble methods produce a consensus partition of a set of data points by combining the results of a collection of base clustering algorithms. In the evidence accumulation clustering (EAC) paradigm, the clustering ensemble is transformed into a pairwise co-association matrix, thus avoiding the label correspondence problem, which is intrinsic to other clustering ensemble schemes. In this paper, we propose a consensus clustering approach based on the EAC paradigm, which is not limited to crisp partitions and fully exploits the nature of the co-association matrix. Our solution determines probabilistic assignments of data points to clusters by minimizing a Bregman divergence between the observed co-association frequencies and the corresponding co-occurrence probabilities expressed as functions of the unknown assignments. We additionally propose an optimization algorithm to find a solution under any double-convex Bregman divergence. Experiments on both synthetic and real benchmark data show the effectiveness of the proposed approach.
Resumo:
Background: The development of sugarcane as a sustainable crop has unlimited applications. The crop is one of the most economically viable for renewable energy production, and CO2 balance. Linkage maps are valuable tools for understanding genetic and genomic organization, particularly in sugarcane due to its complex polyploid genome of multispecific origins. The overall objective of our study was to construct a novel sugarcane linkage map, compiling AFLP and EST-SSR markers, and to generate data on the distribution of markers anchored to sequences of scIvana_1, a complete sugarcane transposable element, and member of the Copia superfamily. Results: The mapping population parents ('IAC66-6' and 'TUC71-7') contributed equally to polymorphisms, independent of marker type, and generated markers that were distributed into nearly the same number of co-segregation groups (or CGs). Bi-parentally inherited alleles provided the integration of 19 CGs. The marker number per CG ranged from two to 39. The total map length was 4,843.19 cM, with a marker density of 8.87 cM. Markers were assembled into 92 CGs that ranged in length from 1.14 to 404.72 cM, with an estimated average length of 52.64 cM. The greatest distance between two adjacent markers was 48.25 cM. The scIvana_1-based markers (56) were positioned on 21 CGs, but were not regularly distributed. Interestingly, the distance between adjacent scIvana_1-based markers was less than 5 cM, and was observed on five CGs, suggesting a clustered organization. Conclusions: Results indicated the use of a NBS-profiling technique was efficient to develop retrotransposon-based markers in sugarcane. The simultaneous maximum-likelihood estimates of linkage and linkage phase based strategies confirmed the suitability of its approach to estimate linkage, and construct the linkage map. Interestingly, using our genetic data it was possible to calculate the number of retrotransposonscIvana_1 (similar to 60) copies in the sugarcane genome, confirming previously reported molecular results. In addition, this research possibly will have indirect implications in crop economics e. g., productivity enhancement via QTL studies, as the mapping population parents differ in response to an important fungal disease.
Resumo:
Biosignals analysis has become widespread, upstaging their typical use in clinical settings. Electrocardiography (ECG) plays a central role in patient monitoring as a diagnosis tool in today's medicine and as an emerging biometric trait. In this paper we adopt a consensus clustering approach for the unsupervised analysis of an ECG-based biometric records. This type of analysis highlights natural groups within the population under investigation, which can be correlated with ground truth information in order to gain more insights about the data. Preliminary results are promising, for meaningful clusters are extracted from the population under analysis. © 2014 EURASIP.
Resumo:
SUMMARY There is interest in the potential of companion animal surveillance to provide data to improve pet health and to provide early warning of environmental hazards to people. We implemented a companion animal surveillance system in Calgary, Alberta and the surrounding communities. Informatics technologies automatically extracted electronic medical records from participating veterinary practices and identified cases of enteric syndrome in the warehoused records. The data were analysed using time-series analyses and a retrospective space-time permutation scan statistic. We identified a seasonal pattern of reports of occurrences of enteric syndromes in companion animals and four statistically significant clusters of enteric syndrome cases. The cases within each cluster were examined and information about the animals involved (species, age, sex), their vaccination history, possible exposure or risk behaviour history, information about disease severity, and the aetiological diagnosis was collected. We then assessed whether the cases within the cluster were unusual and if they represented an animal or public health threat. There was often insufficient information recorded in the medical record to characterize the clusters by aetiology or exposures. Space-time analysis of companion animal enteric syndrome cases found evidence of clustering. Collection of more epidemiologically relevant data would enhance the utility of practice-based companion animal surveillance.
Resumo:
The aetiology of childhood cancers remains largely unknown. It has been hypothesized that infections may be involved and that mini-epidemics thereof could result in space-time clustering of incident cases. Most previous studies support spatio-temporal clustering for leukaemia, while results for other diagnostic groups remain mixed. Few studies have corrected for uneven regional population shifts which can lead to spurious detection of clustering. We examined whether there is space-time clustering of childhood cancers in Switzerland identifying cases diagnosed at age <16 years between 1985 and 2010 from the Swiss Childhood Cancer Registry. Knox tests were performed on geocoded residence at birth and diagnosis separately for leukaemia, acute lymphoid leukaemia (ALL), lymphomas, tumours of the central nervous system, neuroblastomas and soft tissue sarcomas. We used Baker's Max statistic to correct for multiple testing and randomly sampled time-, sex- and age-matched controls from the resident population to correct for uneven regional population shifts. We observed space-time clustering of childhood leukaemia at birth (Baker's Max p = 0.045) but not at diagnosis (p = 0.98). Clustering was strongest for a spatial lag of <1 km and a temporal lag of <2 years (Observed/expected close pairs: 124/98; p Knox test = 0.003). A similar clustering pattern was observed for ALL though overall evidence was weaker (Baker's Max p = 0.13). Little evidence of clustering was found for other diagnostic groups (p > 0.2). Our study suggests that childhood leukaemia tends to cluster in space-time due to an etiologic factor present in early life.
Resumo:
In patients with Pick's disease (PD), high densities of tau positive Pick bodies (PB) have been observed within the granule cell layer of the dentate gyrus. This study investigated the spatial patterns of PB along the granule cell layer in coronal sections of the hippocampus in eight patients with PD. In all patients, there was evidence of clustering of PB within the granule cell layer; however, there was considerable variation in the pattern of clustering. In five patients, the clusters of PB were regularly distributed along the dentate gyms, and in two of these patients, the smaller clusters were aggregated into larger superclusters. In three patients, a single large cluster of PB, more than 1200 μm in diameter, was present. Clustering of PB may reflect a primary degenerative process within the granule cells or the degeneration of pathways that project to the dentate gyrus.
Resumo:
Liver samples from rabbits killed by RHDV, collected from five States in Australia in 1996 and 1997 were analysed by RT-PCR. A 398 bp fragment of the capsid protein (VP60) gene was amplified by PCR and directly sequenced. The alignment of the nucleotide and amino acid sequences and their comparison with the original strain of the virus released in Australia indicated genetic changes after two years have been small with 98.2% to 100% identity. The constructed phylogenetic tree suggests slight differences in nucleotide substitutions in various States but there is no clear evidence of clustering of sequences according to their geographic origin. In practical terms, sequencing of viral RNA provides a means of testing the efficacy of further releases and subsequent spread of the virus if such a strategy is employed as a means of enhancing RHD as a biological control of the wild rabbit in Australia.
Resumo:
BACKGROUND: Recent neuroimaging studies suggest that value-based decision-making may rely on mechanisms of evidence accumulation. However no studies have explicitly investigated the time when single decisions are taken based on such an accumulation process. NEW METHOD: Here, we outline a novel electroencephalography (EEG) decoding technique which is based on accumulating the probability of appearance of prototypical voltage topographies and can be used for predicting subjects' decisions. We use this approach for studying the time-course of single decisions, during a task where subjects were asked to compare reward vs. loss points for accepting or rejecting offers. RESULTS: We show that based on this new method, we can accurately decode decisions for the majority of the subjects. The typical time-period for accurate decoding was modulated by task difficulty on a trial-by-trial basis. Typical latencies of when decisions are made were detected at ∼500ms for 'easy' vs. ∼700ms for 'hard' decisions, well before subjects' response (∼340ms). Importantly, this decision time correlated with the drift rates of a diffusion model, evaluated independently at the behavioral level. COMPARISON WITH EXISTING METHOD(S): We compare the performance of our algorithm with logistic regression and support vector machine and show that we obtain significant results for a higher number of subjects than with these two approaches. We also carry out analyses at the average event-related potential level, for comparison with previous studies on decision-making. CONCLUSIONS: We present a novel approach for studying the timing of value-based decision-making, by accumulating patterns of topographic EEG activity at single-trial level.
Resumo:
Une variété de modèles sur le processus de prise de décision dans divers contextes présume que les sujets accumulent les évidences sensorielles, échantillonnent et intègrent constamment les signaux pour et contre des hypothèses alternatives. L'intégration continue jusqu'à ce que les évidences en faveur de l'une des hypothèses dépassent un seuil de critère de décision (niveau de preuve exigé pour prendre une décision). De nouveaux modèles suggèrent que ce processus de décision est plutôt dynamique; les différents paramètres peuvent varier entre les essais et même pendant l’essai plutôt que d’être un processus statique avec des paramètres qui ne changent qu’entre les blocs d’essais. Ce projet de doctorat a pour but de démontrer que les décisions concernant les mouvements d’atteinte impliquent un mécanisme d’accumulation temporelle des informations sensorielles menant à un seuil de décision. Pour ce faire, nous avons élaboré un paradigme de prise de décision basée sur un stimulus ambigu afin de voir si les neurones du cortex moteur primaire (M1), prémoteur dorsal (PMd) et préfrontal (DLPFc) démontrent des corrélats neuronaux de ce processus d’accumulation temporelle. Nous avons tout d’abord testé différentes versions de la tâche avec l’aide de sujets humains afin de développer une tâche où l’on observe le comportement idéal des sujets pour nous permettre de vérifier l’hypothèse de travail. Les données comportementales chez l’humain et les singes des temps de réaction et du pourcentage d'erreurs montrent une augmentation systématique avec l'augmentation de l'ambigüité du stimulus. Ces résultats sont cohérents avec les prédictions des modèles de diffusion, tel que confirmé par une modélisation computationnelle des données. Nous avons, par la suite, enregistré des cellules dans M1, PMd et DLPFc de 2 singes pendant qu'ils s'exécutaient à la tâche. Les neurones de M1 ne semblent pas être influencés par l'ambiguïté des stimuli mais déchargent plutôt en corrélation avec le mouvement exécuté. Les neurones du PMd codent la direction du mouvement choisi par les singes, assez rapidement après la présentation du stimulus. De plus, l’activation de plusieurs cellules du PMd est plus lente lorsque l'ambiguïté du stimulus augmente et prend plus de temps à signaler la direction de mouvement. L’activité des neurones du PMd reflète le choix de l’animal, peu importe si c’est une bonne réponse ou une erreur. Ceci supporte un rôle du PMd dans la prise de décision concernant les mouvements d’atteinte. Finalement, nous avons débuté des enregistrements dans le cortex préfrontal et les résultats présentés sont préliminaires. Les neurones du DLPFc semblent beaucoup plus influencés par les combinaisons des facteurs de couleur et de position spatiale que les neurones du PMd. Notre conclusion est que le cortex PMd est impliqué dans l'évaluation des évidences pour ou contre la position spatiale de différentes cibles potentielles mais assez indépendamment de la couleur de celles-ci. Le cortex DLPFc serait plutôt responsable du traitement des informations pour la combinaison de la couleur et de la position des cibles spatiales et du stimulus ambigu nécessaire pour faire le lien entre le stimulus ambigu et la cible correspondante.
Resumo:
This paper assesses the relationship between amount of climate forcing – as indexed by global mean temperature change – and hydrological response in a sample of UK catchments. It constructs climate scenarios representing different changes in global mean temperature from an ensemble of 21 climate models assessed in the IPCC AR4. The results show a considerable range in impact between the 21 climate models, with – for example - change in summer runoff at a 2oC increase in global mean temperature varying between -40% and +20%. There is evidence of clustering in the results, particularly in projected changes in summer runoff and indicators of low flows, implying that the ensemble mean is not an appropriate generalised indicator of impact, and that the standard deviation of responses does not adequately characterise uncertainty. The uncertainty in hydrological impact is therefore best characterised by considering the shape of the distribution of responses across multiple climate scenarios. For some climate model patterns, and some catchments, there is also evidence that linear climate change forcings produce non-linear hydrological impacts. For most variables and catchments, the effects of climate change are apparent above the effects of natural multi-decadal variability with an increase in global mean temperature above 1oC, but there are differences between catchments. Based on the scenarios represented in the ensemble, the effect of climate change in northern upland catchments will be seen soonest in indicators of high flows, but in southern catchments effects will be apparent soonest in measures of summer and low flows. The uncertainty in response between different climate model patterns is considerably greater than the range due to uncertainty in hydrological model parameterisation.
Resumo:
Foram entrevistados via ligação telefônica 1.410 indivíduos, amostra aleatória e representativa da população acima de 18 anos residente em domicílios conectados à rede de telefonia fixa. A prevalência de tabagismo foi de 21,8%, maior em homens (25%) e em indivíduos na faixa entre 18 e 29 anos. Tabagismo e sedentarismo juntos ocorrem em 13,9% dos homens e 14,2% das mulheres; tabagismo e baixo consumo de frutas em 12,9% dos homens e 12,3% das mulheres; e tabagismo e baixo consumo de legumes em 5,8% dos homens e 5,1% das mulheres. A associação de tabagismo e consumo excessivo de álcool foi observada apenas nos homens (em 3,5% deles) e, da mesma forma que verificada para tabagismo isoladamente, sua ocorrência concomitante a outros fatores comportamentais de risco de doenças e agravos crônicos não transmissíveis (DANT) associou-se inversamente à escolaridade. Os dados apontam indícios de efeito de aglomeração entre tabagismo e sedentarismo, tabagismo e álcool em excesso, tabagismo e dieta inadequada, justificando intervenções focadas na prevenção e redução concomitante dos principais fatores comportamentais de risco de DANT.
Resumo:
This article presents an exercise in meta-comprehension of what has been researched on teaching probability and statistics in Brazil. This research was based on the work on this subject presented in the third International Symposium for Research in Mathematics Education (III SIPEM). Articles were selected from the proceedings of the event analyzed hermeneuticly according to the procedures of phenomenology. We observed no evidence of clustering of research on this topic in terms of region or institutions, and we also emphasize that research on the teaching of Probability and Statistics needs to advance toward a theoretical discussion that transcends the subjects being studied and makes broader and deeper links between theory and practice. Findings also indicate that this sub-area of research in mathematics education is in the process of constituting itself.