Biblioteca Digital

8 resultados para Text mining

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)

Efficiency issues of evolutionary k-means

Relevância:

60.00% 60.00%

Publicador:

Resumo:

One of the top ten most influential data mining algorithms, k-means, is known for being simple and scalable. However, it is sensitive to initialization of prototypes and requires that the number of clusters be specified in advance. This paper shows that evolutionary techniques conceived to guide the application of k-means can be more computationally efficient than systematic (i.e., repetitive) approaches that try to get around the above-mentioned drawbacks by repeatedly running the algorithm from different configurations for the number of clusters and initial positions of prototypes. To do so, a modified version of a (k-means based) fast evolutionary algorithm for clustering is employed. Theoretical complexity analyses for the systematic and evolutionary algorithms under interest are provided. Computational experiments and statistical analyses of the results are presented for artificial and text mining data sets. (C) 2010 Elsevier B.V. All rights reserved.

Veja mais

Improvements on the Porter`s Stemming Algorithm for Portuguese

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The amount of textual information digitally stored is growing every day. However, our capability of processing and analyzing that information is not growing at the same pace. To overcome this limitation, it is important to develop semiautomatic processes to extract relevant knowledge from textual information, such as the text mining process. One of the main and most expensive stages of the text mining process is the text pre-processing stage, where the unstructured text should be transformed to structured format such as an attribute-value table. The stemming process, i.e. linguistics normalization, is usually used to find the attributes of this table. However, the stemming process is strongly dependent on the language in which the original textual information is given. Furthermore, for most languages, the stemming algorithms proposed in the literature are computationally expensive. In this work, several improvements of the well know Porter stemming algorithm for the Portuguese language, which explore the characteristics of this language, are proposed. Experimental results show that the proposed algorithm executes in far less time without affecting the quality of the generated stems.

Veja mais

Novel Primate-Specific Genes, RMEL 1, 2 and 3, with Highly Restricted Expression in Melanoma, Assessed by New Data Mining Tool

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Melanoma is a highly aggressive and therapy resistant tumor for which the identification of specific markers and therapeutic targets is highly desirable. We describe here the development and use of a bioinformatic pipeline tool, made publicly available under the name of EST2TSE, for the in silico detection of candidate genes with tissue-specific expression. Using this tool we mined the human EST (Expressed Sequence Tag) database for sequences derived exclusively from melanoma. We found 29 UniGene clusters of multiple ESTs with the potential to predict novel genes with melanoma-specific expression. Using a diverse panel of human tissues and cell lines, we validated the expression of a subset of three previously uncharacterized genes (clusters Hs.295012, Hs.518391, and Hs.559350) to be highly restricted to melanoma/melanocytes and named them RMEL1, 2 and 3, respectively. Expression analysis in nevi, primary melanomas, and metastatic melanomas revealed RMEL1 as a novel melanocytic lineage-specific gene up-regulated during melanoma development. RMEL2 expression was restricted to melanoma tissues and glioblastoma. RMEL3 showed strong up-regulation in nevi and was lost in metastatic tumors. Interestingly, we found correlations of RMEL2 and RMEL3 expression with improved patient outcome, suggesting tumor and/or metastasis suppressor functions for these genes. The three genes are composed of multiple exons and map to 2q12.2, 1q25.3, and 5q11.2, respectively. They are well conserved throughout primates, but not other genomes, and were predicted as having no coding potential, although primate-conserved and human-specific short ORFs could be found. Hairpin RNA secondary structures were also predicted. Concluding, this work offers new melanoma-specific genes for future validation as prognostic markers or as targets for the development of therapeutic strategies to treat melanoma.

Veja mais

SAB(IO): A BIOLOGICALLY PLAUSIBLE CONNECTIONIST APPROACH TO AUTOMATIC TEXT SUMMARIZATION

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An implementation of a computational tool to generate new summaries from new source texts is presented, by means of the connectionist approach (artificial neural networks). Among other contributions that this work intends to bring to natural language processing research, the use of a more biologically plausible connectionist architecture and training for automatic summarization is emphasized. The choice relies on the expectation that it may bring an increase in computational efficiency when compared to the sa-called biologically implausible algorithms.

Veja mais

Data Mining Applied to Harmonic Current Sources Identification in Residential Consumers

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This work proposes a method based on both preprocessing and data mining with the objective of identify harmonic current sources in residential consumers. In addition, this methodology can also be applied to identify linear and nonlinear loads. It should be emphasized that the entire database was obtained through laboratory essays, i.e., real data were acquired from residential loads. Thus, the residential system created in laboratory was fed by a configurable power source and in its output were placed the loads and the power quality analyzers (all measurements were stored in a microcomputer). So, the data were submitted to pre-processing, which was based on attribute selection techniques in order to minimize the complexity in identifying the loads. A newer database was generated maintaining only the attributes selected, thus, Artificial Neural Networks were trained to realized the identification of loads. In order to validate the methodology proposed, the loads were fed both under ideal conditions (without harmonics), but also by harmonic voltages within limits pre-established. These limits are in accordance with IEEE Std. 519-1992 and PRODIST (procedures to delivery energy employed by Brazilian`s utilities). The results obtained seek to validate the methodology proposed and furnish a method that can serve as alternative to conventional methods.

Veja mais

Assessing the evolution of sustainability reporting in the mining sector

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Since the 1990s several large companies have been publishing nonfinancial performance reports. Focusing initially on the physical environment, these reports evolved to consider social relations, as well as data on the firm`s economic performance. A few mining companies pioneered this trend, and in the last years some of them incorporated the three dimensions of sustainable development, publishing so-called sustainability reports. This article reviews 31 reports published between 2001 and 2006 by four major mining companies. A set of 62 assessment items organized in six categories (namely context and commitment, management, environmental, social and economic performance, and accessibility and assurance) were selected to guide the review. The items were derived from international literature and recommended best practices, including the Global Reporting Initiative G3 framework. A content analysis was performed using the report as a sampling unit, and using phrases, graphics, or tables containing certain information as data collection units. A basic rating scale (0 or 1) was used for noting the presence or absence of information and a final percentage score was obtained for each report. Results show that there is a clear evolution in report`s comprehensiveness and depth. Categories ""accessibility and assurance"" and ""economic performance"" featured the lowest scores and do not present a clear evolution trend in the period, whereas categories ""context and commitment"" and ""social performance"" presented the best results and regular improvement; the category ""environmental performance,"" despite it not reaching the biggest scores, also featured constant evolution. Description of data measurement techniques, besides more comprehensive third-party verification are the items most in need of improvement.

Veja mais

IMAGE, MAGIC AND IMAGINATION: CHALLENGES TO THE ANTHROPOLOGICAL TEXT

Relevância:

20.00% 20.00%

Publicador:

Resumo:

What different forms of engagement do image and text allow the spectator/reader? We know that text and image communicate, and that all communication depends on a relationship between those who communicate. The objective of this text is therefore to understand the new possibilities available to an anthropology of the expression of knowledge that makes use of images, such as photographs and films.

Veja mais

Perspectives on sexual and reproductive health among women in an ancient mining area in Brazil

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The purpose of this study was to describe the reproductive profile and frequency of genital infections among women living in the Serra Pelada, a former mining village in the Para state, Brazil. A descriptive study of women living in the mining area of Serra Pelada was performed in 2004 through interviews that gathered demographics and clinical data, and assessed risk behaviors of 209 randomly-selected women. Blood samples were collected for rapid assay for HIV; specimens were taken for Pap smears and Gram stains. Standard descriptive statistical analyses were performed and prevalence was calculated to reflect the relative frequency of each disease. Of the 209 participants, the median age was 38 years, with almost 70% having less than four years of education and 77% having no income or under 1.9 times the minimum wage of Brazil. About 30% did not have access to health care services during the preceding year. Risk behaviors included: alcohol abuse, 24.4%; illicit drug abuse, 4.3%; being a sex worker, 15.8%; and domestic violence, 17.7%. Abnormal Pap smear was found in 8.6%. Prevalence rates of infection were: HIV, 1.9%; trichomoniasis, 2.9%; bacterial vaginosis, 18.7%; candidiasis, 5.7%; Chlamydial-related cytological changes, 3.3%; and HPV-related cytological changes, 3.8%. Women living in this mining area in Brazil are economically and socially vulnerable to health problems. It is important to point out the importance of concomitant broader strategies that include reducing poverty and empowering women to make improvements regarding their health.

Veja mais

8 resultados para Text mining

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)

Filtro por publicador