18 resultados para text mining clusterizzazione clustering auto-organizzazione conoscenza MoK

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

One of the top ten most influential data mining algorithms, k-means, is known for being simple and scalable. However, it is sensitive to initialization of prototypes and requires that the number of clusters be specified in advance. This paper shows that evolutionary techniques conceived to guide the application of k-means can be more computationally efficient than systematic (i.e., repetitive) approaches that try to get around the above-mentioned drawbacks by repeatedly running the algorithm from different configurations for the number of clusters and initial positions of prototypes. To do so, a modified version of a (k-means based) fast evolutionary algorithm for clustering is employed. Theoretical complexity analyses for the systematic and evolutionary algorithms under interest are provided. Computational experiments and statistical analyses of the results are presented for artificial and text mining data sets. (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The amount of textual information digitally stored is growing every day. However, our capability of processing and analyzing that information is not growing at the same pace. To overcome this limitation, it is important to develop semiautomatic processes to extract relevant knowledge from textual information, such as the text mining process. One of the main and most expensive stages of the text mining process is the text pre-processing stage, where the unstructured text should be transformed to structured format such as an attribute-value table. The stemming process, i.e. linguistics normalization, is usually used to find the attributes of this table. However, the stemming process is strongly dependent on the language in which the original textual information is given. Furthermore, for most languages, the stemming algorithms proposed in the literature are computationally expensive. In this work, several improvements of the well know Porter stemming algorithm for the Portuguese language, which explore the characteristics of this language, are proposed. Experimental results show that the proposed algorithm executes in far less time without affecting the quality of the generated stems.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

OBJETIVO: Validar uma escala de auto-eficácia para adesão ao tratamento anti-retroviral em crianças e adolescentes com HIV/AIDS, levando em consideração a perspectiva dos pais/responsáveis, e avaliar a sua reprodutibilidade. MÉTODOS: O estudo foi realizado no Hospital-Dia do Centro de Referência e Treinamento em DST/AIDS de São Paulo. Foram entrevistados os pais/responsáveis de 54 crianças e adolescentes de 6 meses a 20 anos que passaram em consulta de rotina pelo serviço. Os dados de auto-eficácia foram levantados pela escala de auto-eficácia para seguir prescrição anti-retroviral (AE), que foi calculada de duas maneiras: análise fatorial e fórmula já definida. A consistência interna da escala foi verificada pelo coeficiente ade Cronbach. A validade foi avaliada pela comparação das médias dos escores entre grupos de pacientes aderentes e não aderentes ao tratamento anti-retroviral (teste de Mann-Whitney) e cálculo do coeficiente de correlação de Spearman entre os escores e parâmetros clínicos. A reprodutibilidade foi verificada por meio do teste de Wilcoxon, pelo coeficiente de correlação intraclasse (CCI) e pelo gráfico de Bland-Altman. RESULTADOS: A escala de AE apresentou boa consistência interna (a= 0,87) e boa reprodutibilidade (CCI = 0,69 e CCI = 0,75). Quanto à validade, a escala de AE conseguiu discriminar pacientes aderentes e não aderentes ao tratamento anti-retroviral (p = 0,002) e apresentou correlação significativa com a contagem de CD4 (r = 0,28; p = 0,04). CONCLUSÕES: A escala de AE pode ser utilizada para avaliar a adesão à terapia anti-retroviral em crianças e adolescentes com HIV/AIDS, levando em consideração a perspectiva dos pais/cuidadores.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

OBJETIVO: O objetivo do presente estudo foi comparar as estimativas obtidas por diferentes modalidades de inquérito para condições crônicas auto-referidas em adultos residentes em Campinas (SP) no ano de 2008. MÉTODOS: Foram utilizados os dados do ISACamp, inquérito domiciliar realizado pela Faculdade de Ciências Médicas da Universidade Estadual de Campinas com apoio da Secretaria Municipal de Saúde, e do VIGITEL - Campinas (SP), inquérito telefônico realizado pelo Ministério da Saúde para Vigilância de Fatores de Risco e Proteção para Doenças Crônicas na população adulta (18 anos ou mais). Estimativas do auto-relato de hipertensão arterial, diabetes, osteoporose, asma/bronquite/enfisema, foram avaliadas e comparadas por meio do teste t de Student para duas amostras independentes. RESULTADOS: Para as estimativas globais, maior prevalência de hipertensão arterial e osteoporose foram verificadas pelo inquérito telefônico. Diabetes e asma/bronquite/enfisema não apresentaram diferenças estatísticas significantes. Na análise segundo variáveis sócio-demográficas, maior prevalência de hipertensão foi obtida pelo VIGITEL para os homens, entre as pessoas de 18 a 59 anos e nos que referiram 9 ou mais anos de estudo. Maior prevalência de osteoporose entre adultos (18 a 59 anos) foi verificada pelo VIGITEL. Em relação à asma/bronquite/enfisema nos idosos, maior prevalência foi observada pelo ISACamp. CONCLUSÃO: Exceto para hipertensão arterial, os dados obtidos do inquérito telefônico constituíram uma alternativa rápida para disponibilizar estimativas globais da prevalência das condições estudadas na população adulta residente em Campinas (SP).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

OBJETIVO: Estimar a prevalência de osteoporose auto-referida (com diagnóstico médico prévio) e de fatores de risco e proteção associados. MÉTODOS: Estudo transversal baseado em dados do sistema de Vigilância de Fatores de Risco e Proteção para Doenças Crônicas por Inquérito Telefônico (VIGITEL). Foram entrevistados 54.369 indivíduos com idade >18 anos residentes em domicílios servidos por pelo menos uma linha telefônica fixa nas capitais brasileiras e Distrito Federal em 2006. Estimativas de osteoporose segundo fatores socioeconômicos, comportamentais e índice de massa corporal foram estratificadas por sexo. Foram calculados riscos de ocorrência de osteoporose para cada variável individualmente, e em modelo multivariado, considerando-se odds ratio como proxy da razão de prevalência. RESULTADOS: A prevalência de osteoporose referida foi de 4,4%, predominantemente entre mulheres (7,0%), com idade >45 anos, estado civil não solteiro e ex-fumante. Entre homens, ter mais de 65 anos, ser casado ou viúvo e sedentário associaram-se positivamente ao desfecho. CONCLUSÕES: Dentre os fatores associados à osteoporose, destacam-se aspectos modificáveis relacionados com a prevenção da doença, como a atividade física e tabagismo.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

OBJETIVO: Avaliar se o conteúdo de auto-anticorpos anti-LDL oxidada (anti-LDLox) no plasma de adolescentes correlaciona-se com suas medidas antropométricas e com o perfil lipídico. MÉTODOS: O estudo incluiu 150 adolescentes com idade entre 10 e 15 anos, recrutados do ambulatório de obesidade da Universidade Federal de São Paulo (SP) e de escolas públicas de Piracicaba (SP). Foram avaliadas medidas antropométricas, como índice de massa corporal, circunferência de cintura e do braço, classificando os adolescentes em eutrófico, sobrepeso e obeso. Para as análises bioquímicas, foi realizado o perfil lipídico através de métodos enzimáticos colorimétricos, e para detecção do conteúdo de auto-anticorpos anti-LDLox, utilizou-se o método de ELISA. RESULTADOS: Segundo análises das variáveis antropométricas, o grupo obeso apresentou perfil alterado em relação aos grupos eutrófico e sobrepeso (p < 0,01), indicando risco cardiovascular. Quando o perfil lipídico foi avaliado, observaram-se diferenças estatisticamente significativas para as concentrações de colesterol total (p = 0,011), HDL-colesterol (p = 0,001) e LDL-colesterol (p < 0,042) nos grupos eutrófico e obeso. Para as análises de auto-anticorpos anti-LDLox plasmática, os grupos sobrepeso (p = 0,012) e obeso (p < 0,001) apresentaram valores superiores ao grupo eutrófico. Também houve correlações entre os auto-anticorpos anti-LDLox e variáveis antropométricas. CONCLUSÃO: A presença de auto-anticorpos anti-LDLox em adolescentes e as alterações metabólicas no perfil lipídico variaram de modo proporcional com parâmetros antropométricos, o que torna o conteúdo de anti-LDLox um potencial indicador bioquímico de risco para síndrome metabólica.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

O presente trabalho busca apresentar alguns desdobramentos do método do discurso do sujeito coletivo no que toca à densidade semântica por ele provocada, que implica a presença significativamente mais relevante, nas pesquisas sociais que envolvam coleta de depoimentos, do pensamento coletivo como realidade empírica. Tal presença mais significativa do material empírico, aliada ao entendimento do pensamento das coletividades como referente, permite o diálogo do momento descritivo com o momento interpretativo neste tipo de pesquisa, podendo assim, como nova possibilidade que aponta para o incerto e para o inesperado, contribuir para um entendimento renovado da natureza e do funcionamento das representações sociais como realidades complexas.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the southern region of Mato Grosso do Sul state, Brazil, a foot-and-mouth disease (FMD) epidemic started in September 2005. A total of 33 outbreaks were detected and 33,741 FMD-susceptible animals were slaughtered and destroyed. There were no reports of FMD cases in other species than bovines. Based on the data of this epidemic, it was carried out an analysis using the K-function and it was observed spatial clustering of outbreaks within a range of 25km. This observation may be related to the dynamics of foot-and-mouth disease spread and to the measures undertaken to control the disease dissemination. The control measures were effective once the disease did not spread to farms more than 47 km apart from the initial outbreaks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Gene clustering is a useful exploratory technique to group together genes with similar expression levels under distinct cell cycle phases or distinct conditions. It helps the biologist to identify potentially meaningful relationships between genes. In this study, we propose a clustering method based on multivariate normal mixture models, where the number of clusters is predicted via sequential hypothesis tests: at each step, the method considers a mixture model of m components (m = 2 in the first step) and tests if in fact it should be m - 1. If the hypothesis is rejected, m is increased and a new test is carried out. The method continues (increasing m) until the hypothesis is accepted. The theoretical core of the method is the full Bayesian significance test, an intuitive Bayesian approach, which needs no model complexity penalization nor positive probabilities for sharp hypotheses. Numerical experiments were based on a cDNA microarray dataset consisting of expression levels of 205 genes belonging to four functional categories, for 10 distinct strains of Saccharomyces cerevisiae. To analyze the method's sensitivity to data dimension, we performed principal components analysis on the original dataset and predicted the number of classes using 2 to 10 principal components. Compared to Mclust (model-based clustering), our method shows more consistent results.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Melanoma is a highly aggressive and therapy resistant tumor for which the identification of specific markers and therapeutic targets is highly desirable. We describe here the development and use of a bioinformatic pipeline tool, made publicly available under the name of EST2TSE, for the in silico detection of candidate genes with tissue-specific expression. Using this tool we mined the human EST (Expressed Sequence Tag) database for sequences derived exclusively from melanoma. We found 29 UniGene clusters of multiple ESTs with the potential to predict novel genes with melanoma-specific expression. Using a diverse panel of human tissues and cell lines, we validated the expression of a subset of three previously uncharacterized genes (clusters Hs.295012, Hs.518391, and Hs.559350) to be highly restricted to melanoma/melanocytes and named them RMEL1, 2 and 3, respectively. Expression analysis in nevi, primary melanomas, and metastatic melanomas revealed RMEL1 as a novel melanocytic lineage-specific gene up-regulated during melanoma development. RMEL2 expression was restricted to melanoma tissues and glioblastoma. RMEL3 showed strong up-regulation in nevi and was lost in metastatic tumors. Interestingly, we found correlations of RMEL2 and RMEL3 expression with improved patient outcome, suggesting tumor and/or metastasis suppressor functions for these genes. The three genes are composed of multiple exons and map to 2q12.2, 1q25.3, and 5q11.2, respectively. They are well conserved throughout primates, but not other genomes, and were predicted as having no coding potential, although primate-conserved and human-specific short ORFs could be found. Hairpin RNA secondary structures were also predicted. Concluding, this work offers new melanoma-specific genes for future validation as prognostic markers or as targets for the development of therapeutic strategies to treat melanoma.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An implementation of a computational tool to generate new summaries from new source texts is presented, by means of the connectionist approach (artificial neural networks). Among other contributions that this work intends to bring to natural language processing research, the use of a more biologically plausible connectionist architecture and training for automatic summarization is emphasized. The choice relies on the expectation that it may bring an increase in computational efficiency when compared to the sa-called biologically implausible algorithms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This work proposes a method based on both preprocessing and data mining with the objective of identify harmonic current sources in residential consumers. In addition, this methodology can also be applied to identify linear and nonlinear loads. It should be emphasized that the entire database was obtained through laboratory essays, i.e., real data were acquired from residential loads. Thus, the residential system created in laboratory was fed by a configurable power source and in its output were placed the loads and the power quality analyzers (all measurements were stored in a microcomputer). So, the data were submitted to pre-processing, which was based on attribute selection techniques in order to minimize the complexity in identifying the loads. A newer database was generated maintaining only the attributes selected, thus, Artificial Neural Networks were trained to realized the identification of loads. In order to validate the methodology proposed, the loads were fed both under ideal conditions (without harmonics), but also by harmonic voltages within limits pre-established. These limits are in accordance with IEEE Std. 519-1992 and PRODIST (procedures to delivery energy employed by Brazilian`s utilities). The results obtained seek to validate the methodology proposed and furnish a method that can serve as alternative to conventional methods.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Since the 1990s several large companies have been publishing nonfinancial performance reports. Focusing initially on the physical environment, these reports evolved to consider social relations, as well as data on the firm`s economic performance. A few mining companies pioneered this trend, and in the last years some of them incorporated the three dimensions of sustainable development, publishing so-called sustainability reports. This article reviews 31 reports published between 2001 and 2006 by four major mining companies. A set of 62 assessment items organized in six categories (namely context and commitment, management, environmental, social and economic performance, and accessibility and assurance) were selected to guide the review. The items were derived from international literature and recommended best practices, including the Global Reporting Initiative G3 framework. A content analysis was performed using the report as a sampling unit, and using phrases, graphics, or tables containing certain information as data collection units. A basic rating scale (0 or 1) was used for noting the presence or absence of information and a final percentage score was obtained for each report. Results show that there is a clear evolution in report`s comprehensiveness and depth. Categories ""accessibility and assurance"" and ""economic performance"" featured the lowest scores and do not present a clear evolution trend in the period, whereas categories ""context and commitment"" and ""social performance"" presented the best results and regular improvement; the category ""environmental performance,"" despite it not reaching the biggest scores, also featured constant evolution. Description of data measurement techniques, besides more comprehensive third-party verification are the items most in need of improvement.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

To attend and obtain the systems and. internal controls mechanisms proposed by Sarbanes-Oxley certifications is actually a big challenge,for most of the multinational companies registered in SEC (US Securities and Exchange Commission). This work has the objective of contributing to the analysis of this methodology, not only to attend the law but to reduce cost and generate value through the strengthen of the internal control systems, turning them into animating value generation process mechanisms. So, the idea is to identify the main gaps in the theory through the literature revision and a case study in order to put a question to the main deficiencies, strong points or contributions through the evaluation of the noticed practices. Finally, we can say that a a result of the research and the analyses made in. this case, the vast majority of executives and other employees recognize the benefit that Sarbanes-Oxley Act has brought to the company searched. Also recognize that, although there is still necessity for systemic adequacy and infrastructure, it helps and reinforce reducing and controlling the risks. the system of internal controls in all areas of expertise. They approach and understand that there is the need for a change in the other employees` culture to be inserted in the day-today routine as internal controls, attention to Sarbanes-Oxley and Corporate Governance, making the control cost smaller when compared to the benefits generated.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A graph clustering algorithm constructs groups of closely related parts and machines separately. After they are matched for the least intercell moves, a refining process runs on the initial cell formation to decrease the number of intercell moves. A simple modification of this main approach can deal with some practical constraints, such as the popular constraint of bounding the maximum number of machines in a cell. Our approach makes a big improvement in the computational time. More importantly, improvement is seen in the number of intercell moves when the computational results were compared with best known solutions from the literature. (C) 2009 Elsevier Ltd. All rights reserved.