875 resultados para document clustering
Resumo:
A procura de padrões nos dados de modo a formar grupos é conhecida como aglomeração de dados ou clustering, sendo uma das tarefas mais realizadas em mineração de dados e reconhecimento de padrões. Nesta dissertação é abordado o conceito de entropia e são usados algoritmos com critérios entrópicos para fazer clustering em dados biomédicos. O uso da entropia para efetuar clustering é relativamente recente e surge numa tentativa da utilização da capacidade que a entropia possui de extrair da distribuição dos dados informação de ordem superior, para usá-la como o critério na formação de grupos (clusters) ou então para complementar/melhorar algoritmos existentes, numa busca de obtenção de melhores resultados. Alguns trabalhos envolvendo o uso de algoritmos baseados em critérios entrópicos demonstraram resultados positivos na análise de dados reais. Neste trabalho, exploraram-se alguns algoritmos baseados em critérios entrópicos e a sua aplicabilidade a dados biomédicos, numa tentativa de avaliar a adequação destes algoritmos a este tipo de dados. Os resultados dos algoritmos testados são comparados com os obtidos por outros algoritmos mais “convencionais" como o k-médias, os algoritmos de spectral clustering e um algoritmo baseado em densidade.
Resumo:
In the present paper we compare clustering solutions using indices of paired agreement. We propose a new method - IADJUST - to correct indices of paired agreement, excluding agreement by chance. This new method overcomes previous limitations known in the literature as it permits the correction of any index. We illustrate its use in external clustering validation, to measure the accordance between clusters and an a priori known structure. The adjusted indices are intended to provide a realistic measure of clustering performance that excludes agreement by chance with ground truth. We use simulated data sets, under a range of scenarios - considering diverse numbers of clusters, clusters overlaps and balances - to discuss the pertinence and the precision of our proposal. Precision is established based on comparisons with the analytical approach for correction specific indices that can be corrected in this way are used for this purpose. The pertinence of the proposed correction is discussed when making a detailed comparison between the performance of two classical clustering approaches, namely Expectation-Maximization (EM) and K-Means (KM) algorithms. Eight indices of paired agreement are studied and new corrected indices are obtained.
Resumo:
In recent years, vehicular cloud computing (VCC) has emerged as a new technology which is being used in wide range of applications in the area of multimedia-based healthcare applications. In VCC, vehicles act as the intelligent machines which can be used to collect and transfer the healthcare data to the local, or global sites for storage, and computation purposes, as vehicles are having comparatively limited storage and computation power for handling the multimedia files. However, due to the dynamic changes in topology, and lack of centralized monitoring points, this information can be altered, or misused. These security breaches can result in disastrous consequences such as-loss of life or financial frauds. Therefore, to address these issues, a learning automata-assisted distributive intrusion detection system is designed based on clustering. Although there exist a number of applications where the proposed scheme can be applied but, we have taken multimedia-based healthcare application for illustration of the proposed scheme. In the proposed scheme, learning automata (LA) are assumed to be stationed on the vehicles which take clustering decisions intelligently and select one of the members of the group as a cluster-head. The cluster-heads then assist in efficient storage and dissemination of information through a cloud-based infrastructure. To secure the proposed scheme from malicious activities, standard cryptographic technique is used in which the auotmaton learns from the environment and takes adaptive decisions for identification of any malicious activity in the network. A reward and penalty is given by the stochastic environment where an automaton performs its actions so that it updates its action probability vector after getting the reinforcement signal from the environment. The proposed scheme was evaluated using extensive simulations on ns-2 with SUMO. The results obtained indicate that the proposed scheme yields an improvement of 10 % in detection rate of malicious nodes when compared with the existing schemes.
Resumo:
O objetivo desta dissertação foi estudar um conjunto de empresas cotadas na bolsa de valores de Lisboa, para identificar aquelas que têm um comportamento semelhante ao longo do tempo. Para isso utilizamos algoritmos de Clustering tais como K-Means, PAM, Modelos hierárquicos, Funny e C-Means tanto com a distância euclidiana como com a distância de Manhattan. Para selecionar o melhor número de clusters identificado por cada um dos algoritmos testados, recorremos a alguns índices de avaliação/validação de clusters como o Davies Bouldin e Calinski-Harabasz entre outros.
Resumo:
Os sistemas autónomos trazem como mais valia aos cenários de busca e salvamento a possibilidade de minimizar a presença de Humanos em situações de perigo e a capacidade de aceder a locais de difícil acesso. Na dissertação propõe-se endereçar novos métodos para perceção e navegação de veículos aéreos não tripulados (UAV), tendo como foco principal o planeamento de trajetórias e deteção de obstáculos. No que respeita à perceção foi desenvolvido um método para gerar clusters tendo por base os voxels gerados pelo Octomap. Na área de navegação, foram desenvolvidos dois novos métodos de planeamento de trajetórias, GPRM (Grid Probabilistic Roadmap) e PPRM (Particle Probabilistic Roadmap), que tem como método base para o seu desenvolvimento o PRM. O primeiro método desenvolvido, GPRM, espalha as partículas numa grid pré-definida, construindo posteriormente o roadmap na área determinada pela grid e com isto estima o trajeto mais curto até ao ponto destino. O segundo método desenvolvido, PPRM, espalha as partículas pelo cenário de aplicação, gera o roadmap considerando o mapa total e atribui uma probabilidade que irá permitir definir a trajetória otimizada. Para analisar a performance de cada método em comparação com o PRM, efetua-se a sua avaliação em três cenários distintos com recurso ao simulador MORSE.
Resumo:
This study focuses on the implementation of several pair trading strategies across three emerging markets, with the objective of comparing the results obtained from the different strategies and assessing if pair trading benefits from a more volatile environment. The results show that, indeed, there are higher potential profits arising from emerging markets. However, the higher excess return will be partially offset by higher transaction costs, which will be a determinant factor to the profitability of pair trading strategies. Also, a new clustering approach based on the Principal Component Analysis was tested as an alternative to the more standard clustering by Industry Groups. The new clustering approach delivers promising results, consistently reducing volatility to a greater extent than the Industry Group approach, with no significant harm to the excess returns.
Resumo:
Abnormalities in the topology of brain networks may be an important feature and etiological factor for psychogenic non-epileptic seizures (PNES). To explore this possibility, we applied a graph theoretical approach to functional networks based on resting state EEGs from 13 PNES patients and 13 age- and gender-matched controls. The networks were extracted from Laplacian-transformed time-series by a cross-correlation method. PNES patients showed close to normal local and global connectivity and small-world structure, estimated with clustering coefficient, modularity, global efficiency, and small-worldness (SW) metrics, respectively. Yet the number of PNES attacks per month correlated with a weakness of local connectedness and a skewed balance between local and global connectedness quantified with SW, all in EEG alpha band. In beta band, patients demonstrated above-normal resiliency, measured with assortativity coefficient, which also correlated with the frequency of PNES attacks. This interictal EEG phenotype may help improve differentiation between PNES and epilepsy. The results also suggest that local connectivity could be a target for therapeutic interventions in PNES. Selective modulation (strengthening) of local connectivity might improve the skewed balance between local and global connectivity and so prevent PNES events.
Resumo:
To assess the effectiveness of a school based physical activity programme during one school year on physical and psychological health in young schoolchildren. Cluster randomised controlled trial. 28 classes from 15 elementary schools in Switzerland randomly selected and assigned in a 4:3 ratio to an intervention (n=16) or control arm (n=12) after stratification for grade (first and fifth grade), from August 2005 to June 2006. 540 children, of whom 502 consented and presented at baseline. Children in the intervention arm (n=297) received a multi-component physical activity programme that included structuring the three existing physical education lessons each week and adding two additional lessons a week, daily short activity breaks, and physical activity homework. Children (n=205) and parents in the control group were not informed of an intervention group. For most outcome measures, the assessors were blinded. Primary outcome measures included body fat (sum of four skinfolds), aerobic fitness (shuttle run test), physical activity (accelerometry), and quality of life (questionnaires). Secondary outcome measures included body mass index and cardiovascular risk score (average z score of waist circumference, mean blood pressure, blood glucose, inverted high density lipoprotein cholesterol, and triglycerides). 498 children completed the baseline and follow-up assessments (mean age 6.9 (SD 0.3) years for first grade, 11.1 (0.5) years for fifth grade). After adjustment for grade, sex, baseline values, and clustering within classes, children in the intervention arm compared with controls showed more negative changes in the z score of the sum of four skinfolds (-0.12, 95 % confidence interval -0.21 to -0.03; P=0.009). Likewise, their z scores for aerobic fitness increased more favourably (0.17, 0.01 to 0.32; P=0.04), as did those for moderate-vigorous physical activity in school (1.19, 0.78 to 1.60; P<0.001), all day moderate-vigorous physical activity (0.44, 0.05 to 0.82; P=0.03), and total physical activity in school (0.92, 0.35 to 1.50; P=0.003). Z scores for overall daily physical activity (0.21, -0.21 to 0.63) and physical quality of life (0.42, -1.23 to 2.06) as well as psychological quality of life (0.59, -0.85 to 2.03) did not change significantly. A school based multi-component physical activity intervention including compulsory elements improved physical activity and fitness and reduced adiposity in children. Trial registration Current Controlled Trials ISRCTN15360785.
Resumo:
Summary : 1. Measuring health literacy in Switzerland: a review of six surveys: 1.1 Comparison of questionnaires - 1.2 Measures of health literacy in Switzerland - 1.3 Discussion of Swiss data on HL - 1.4 Description of the six surveys: 1.4.1 Current health trends and health literacy in the Swiss population (gfs-UNIVOX), 1.4.2 Nutrition, physical exercise and body weight : opinions and perceptions of the Swiss population (USI), 1.4.3 Health Literacy in Switzerland (ISPMZ), 1.4.4 Swiss Health Survey (SHS), 1.4.5 Survey of Health, Ageing and Retirement in Europe (SHARE), 1.4.6 Adult literacy and life skills survey (ALL). - 2 . Economic costs of low health literacy in Switzerland: a rough calculation. Appendix: Screenshots cost model
Resumo:
We report the characterisation of 27 cardiovascular-related traits in 23 inbred mouse strains. Mice were phenotyped either in response to chronic administration of a single dose of the beta-adrenergic receptor blocker atenolol or under a low and a high dose of the beta-agonist isoproterenol and compared to baseline condition. The robustness of our data is supported by high trait heritabilities (typically H(2)>0.7) and significant correlations of trait values measured in baseline condition with independent multistrain datasets of the Mouse Phenome Database. We then focused on the drug-, dose-, and strain-specific responses to beta-stimulation and beta-blockade of a selection of traits including heart rate, systolic blood pressure, cardiac weight indices, ECG parameters and body weight. Because of the wealth of data accumulated, we applied integrative analyses such as comprehensive bi-clustering to investigate the structure of the response across the different phenotypes, strains and experimental conditions. Information extracted from these analyses is discussed in terms of novelty and biological implications. For example, we observe that traits related to ventricular weight in most strains respond only to the high dose of isoproterenol, while heart rate and atrial weight are already affected by the low dose. Finally, we observe little concordance between strain similarity based on the phenotypes and genotypic relatedness computed from genomic SNP profiles. This indicates that cardiovascular phenotypes are unlikely to segregate according to global phylogeny, but rather be governed by smaller, local differences in the genetic architecture of the various strains.
Resumo:
Comprend : Variazoni ; I puritani