808 resultados para Semi-supervised clustering
Resumo:
HEMOLIA (a project under European community’s 7th framework programme) is a new generation Anti-Money Laundering (AML) intelligent multi-agent alert and investigation system which in addition to the traditional financial data makes extensive use of modern society’s huge telecom data source, thereby opening up a new dimension of capabilities to all Money Laundering fighters (FIUs, LEAs) and Financial Institutes (Banks, Insurance Companies, etc.). This Master-Thesis project is done at AIA, one of the partners for the HEMOLIA project in Barcelona. The objective of this thesis is to find the clusters in a network drawn by using the financial data. An extensive literature survey has been carried out and several standard algorithms related to networks have been studied and implemented. The clustering problem is a NP-hard problem and several algorithms like K-Means and Hierarchical clustering are being implemented for studying several problems relating to sociology, evolution, anthropology etc. However, these algorithms have certain drawbacks which make them very difficult to implement. The thesis suggests (a) a possible improvement to the K-Means algorithm, (b) a novel approach to the clustering problem using the Genetic Algorithms and (c) a new algorithm for finding the cluster of a node using the Genetic Algorithm.
Resumo:
OBJECTIVE: This study assessed clustering of multiple risk behaviors (i.e., low leisure-time physical activity, low fruits/vegetables intake, and high alcohol consumption) with level of cigarette consumption. METHODS: Data from the 2002 Swiss Health Survey, a population-based cross-sectional telephone survey assessing health and self-reported risk behaviors, were used. 18,005 subjects (8052 men and 9953 women) aged 25 years old or more participated. RESULTS: Smokers more frequently had low leisure time physical activity, low fruits/vegetables intake, and high alcohol consumption than non- and ex-smokers. Frequency of each risk behavior increased steadily with cigarette consumption. Clustering of risk behaviors increased with cigarette consumption in both men and women. For men, the odds ratios of multiple (> or =2) risk behaviors other than smoking, adjusted for age, nationality, and educational level, were 1.14 (95% confidence interval: 0.97, 1.33) for ex-smokers, 1.24 (0.93, 1.64) for light smokers (1-9 cigarettes/day), 1.72 (1.36, 2.17) for moderate smokers (10-19 cigarettes/day), and 3.07 (2.59, 3.64) for heavy smokers (> or =20 cigarettes/day) versus non-smokers. Similar odds ratios were found for women for corresponding groups, i.e., 1.01 (0.86, 1.19), 1.26 (1.00, 1.58), 1.62 (1.33, 1.98), and 2.75 (2.30, 3.29). CONCLUSIONS: Counseling and intervention with smokers should take into account the strong clustering of risk behaviors with level of cigarette consumption.
Resumo:
Our essay aims at studying suitable statistical methods for the clustering ofcompositional data in situations where observations are constituted by trajectories ofcompositional data, that is, by sequences of composition measurements along a domain.Observed trajectories are known as “functional data” and several methods have beenproposed for their analysis.In particular, methods for clustering functional data, known as Functional ClusterAnalysis (FCA), have been applied by practitioners and scientists in many fields. To ourknowledge, FCA techniques have not been extended to cope with the problem ofclustering compositional data trajectories. In order to extend FCA techniques to theanalysis of compositional data, FCA clustering techniques have to be adapted by using asuitable compositional algebra.The present work centres on the following question: given a sample of compositionaldata trajectories, how can we formulate a segmentation procedure giving homogeneousclasses? To address this problem we follow the steps described below.First of all we adapt the well-known spline smoothing techniques in order to cope withthe smoothing of compositional data trajectories. In fact, an observed curve can bethought of as the sum of a smooth part plus some noise due to measurement errors.Spline smoothing techniques are used to isolate the smooth part of the trajectory:clustering algorithms are then applied to these smooth curves.The second step consists in building suitable metrics for measuring the dissimilaritybetween trajectories: we propose a metric that accounts for difference in both shape andlevel, and a metric accounting for differences in shape only.A simulation study is performed in order to evaluate the proposed methodologies, usingboth hierarchical and partitional clustering algorithm. The quality of the obtained resultsis assessed by means of several indices
Resumo:
Globalization involves several facility location problems that need to be handled at large scale. Location Allocation (LA) is a combinatorial problem in which the distance among points in the data space matter. Precisely, taking advantage of the distance property of the domain we exploit the capability of clustering techniques to partition the data space in order to convert an initial large LA problem into several simpler LA problems. Particularly, our motivation problem involves a huge geographical area that can be partitioned under overall conditions. We present different types of clustering techniques and then we perform a cluster analysis over our dataset in order to partition it. After that, we solve the LA problem applying simulated annealing algorithm to the clustered and non-clustered data in order to work out how profitable is the clustering and which of the presented methods is the most suitable
Resumo:
L’objectiu d’aquest PFC és desenvolupar una eina d’edició de façanes procedural apartir d’una imatge d’una façana real. L’aplicació generarà les regles procedurals de lafaçana a partir de dades adquirides del model que es vol representar, com unafotografia. L’usuari de l’aplicació generarà de forma semi-automàtica i interactiva lesregles de subdivisió i repetició, especificant també la inserció de elementsarquitectònics (portes, finestres), que podran ser instanciats a partir d’una llibreria. Uncop generades, les regles s’escriuran en el format del sistema BuildingEngine perintegrar-se completament dins el procés de modelatge urbà.Aquest projecte es desenvoluparà en Matlab
Resumo:
Este trabalho tem como objetivos identificar a utilização, pelos profissionais de Enfermagem, do toque instrumental e/ou afetivo e suas características, na comunicação não-verbal com os pacientes da UTI e unidade semi-intensiva cirúrgica do HU-USP; e os sentimentos e percepções dos profissionais de Enfermagem e dos pacientes em relação aos toques experimentados. O estudo foi desenvolvido com 19 profissionais e 19 pacientes, no período de outubro a novembro de 2000, através de observação direta das interações e entrevista individual. Os sentimentos e percepções relatados foram categorizados e percebemos que a maioria dos toques é instrumental-afetivo.
Resumo:
Given a set of images of scenes containing different object categories (e.g. grass, roads) our objective is to discover these objects in each image, and to use this object occurrences to perform a scene classification (e.g. beach scene, mountain scene). We achieve this by using a supervised learning algorithm able to learn with few images to facilitate the user task. We use a probabilistic model to recognise the objects and further we classify the scene based on their object occurrences. Experimental results are shown and evaluated to prove the validity of our proposal. Object recognition performance is compared to the approaches of He et al. (2004) and Marti et al. (2001) using their own datasets. Furthermore an unsupervised method is implemented in order to evaluate the advantages and disadvantages of our supervised classification approach versus an unsupervised one
Resumo:
O estudo objetivou caracterizar erros de medicação e avaliar conseqüências na gravidade dos pacientes e carga de trabalho de enfermagem em duas Unidades de Terapia Intensiva (UTI) e duas Semi-Intensiva (USI) de duas instituições hospitalares do município de São Paulo. A amostra foi constituída por 50 pacientes e os dados obtidos por meio do registro de ocorrências e prontuários, retrospectivamente. A gravidade e carga de trabalho de enfermagem foram avaliadas antes e após o erro. Do total de 52 erros, 12 (23,08%) ocorreram por omissão de dose, 11 (21,15%) e 9 (17,31%) por medicamento e dose erradas, respectivamente. Não houve mudança na gravidade dos pacientes (p=0,316), porém houve aumento na carga de trabalho de enfermagem (p=0,009). Quanto ao grupo de medicamentos envolvidos, potencialmente perigosos e não potencialmente perigosos, não houve diferenças estatisticamente significantes na gravidade (p=0,456) e na carga de trabalho de enfermagem (p=0,264), após o erro de medicação.
Resumo:
Abstract: To cluster textual sequence types (discourse types/modes) in French texts, K-means algorithm with high-dimensional embeddings and fuzzy clustering algorithm were applied on clauses whose POS (part-ofspeech) n-gram profiles were previously extracted. Uni-, bi- and trigrams were used on four 19th century French short stories by Maupassant. For high-dimensional embeddings, power transformations on the chi-squared distances between clauses were explored. Preliminary results show that highdimensional embeddings improve the quality of clustering, contrasting the use of bi and trigrams whose performance is disappointing, possibly because of feature space sparsity.
Resumo:
Estudos demonstram que a sobrevida após uma parada cardíaca diminui 10% para cada minuto de atraso na desfibrilação, e que a taxa de sobrevivência é de 98% quando ela é conseguida em 30 segundos. No atendimento de uma parada cardíaca, é primordial que seja incluído no treinamento a utilização dos desfibriladores externos semi-automáticos (DEA). O objetivo deste estudo foi comparar a Habilidade Psicomotora e o Conhecimento Teórico de leigos na técnica da ressuscitação cardiopulmonar (RCP) utilizando o DEA, antes e após treinamento. A amostra constituiu-se de 40 funcionários administrativos de uma instituição pública que receberam treinamento da técnica da RCP, utilizando o DEA, em laboratório. O aumento significativo de acertos nos itens do instrumento de avaliação da Habilidade Psicomotora e do Conhecimento Teórico, após o treinamento, indica que houve melhora no desempenho dos participantes.
Resumo:
BACKGROUND: The trithorax group (trxG) and Polycomb group (PcG) proteins are responsible for the maintenance of stable transcriptional patterns of many developmental regulators. They bind to specific regions of DNA and direct the post-translational modifications of histones, playing a role in the dynamics of chromatin structure. RESULTS: We have performed genome-wide expression studies of trx and ash2 mutants in Drosophila melanogaster. Using computational analysis of our microarray data, we have identified 25 clusters of genes potentially regulated by TRX. Most of these clusters consist of genes that encode structural proteins involved in cuticle formation. This organization appears to be a distinctive feature of the regulatory networks of TRX and other chromatin regulators, since we have observed the same arrangement in clusters after experiments performed with ASH2, as well as in experiments performed by others with NURF, dMyc, and ASH1. We have also found many of these clusters to be significantly conserved in D. simulans, D. yakuba, D. pseudoobscura and partially in Anopheles gambiae. CONCLUSION: The analysis of genes governed by chromatin regulators has led to the identification of clusters of functionally related genes conserved in other insect species, suggesting this chromosomal organization is biologically important. Moreover, our results indicate that TRX and other chromatin regulators may act globally on chromatin domains that contain transcriptionally co-regulated genes.
Resumo:
BACKGROUND: Supervised injection services (SISs) have been developed to promote safer drug injection practices, enhance health-related behaviors among people who inject drugs (PWID), and connect PWID with external health and social services. Nevertheless, SISs have also been accused of fostering drug use and drug trafficking. AIMS: To systematically collect and synthesize the currently available evidence regarding SIS-induced benefits and harm. METHODS: A systematic review was performed via the PubMed, Web of Science, and ScienceDirect databases using the keyword algorithm [("SUPERVISED" OR "SAFER") AND ("INJECTION" OR "INJECTING" OR "SHOOTING" OR "CONSUMPTION") AND ("FACILITY" OR "FACILITIES" OR "ROOM" OR "GALLERY" OR "CENTRE" OR "SITE")]. RESULTS: Seventy-five relevant articles were found. All studies converged to find that SISs were efficacious in attracting the most marginalized PWID, promoting safer injection conditions, enhancing access to primary health care, and reducing the overdose frequency. SISs were not found to increase drug injecting, drug trafficking or crime in the surrounding environments. SISs were found to be associated with reduced levels of public drug injections and dropped syringes. Of the articles, 85% originated from Vancouver or Sydney. CONCLUSION: SISs have largely fulfilled their initial objectives without enhancing drug use or drug trafficking. Almost all of the studies found in this review were performed in Canada or Australia, whereas the majority of SISs are located in Europe. The implementation of new SISs in places with high rates of injection drug use and associated harms appears to be supported by evidence.
Resumo:
Acquiring lexical information is a complex problem, typically approached by relying on a number of contexts to contribute information for classification. One of the first issues to address in this domain is the determination of such contexts. The work presented here proposes the use of automatically obtained FORMAL role descriptors as features used to draw nouns from the same lexical semantic class together in an unsupervised clustering task. We have dealt with three lexical semantic classes (HUMAN, LOCATION and EVENT) in English. The results obtained show that it is possible to discriminate between elements from different lexical semantic classes using only FORMAL role information, hence validating our initial hypothesis. Also, iterating our method accurately accounts for fine-grained distinctions within lexical classes, namely distinctions involving ambiguous expressions. Moreover, a filtering and bootstrapping strategy employed in extracting FORMAL role descriptors proved to minimize effects of sparse data and noise in our task.