93 resultados para Pre-processing
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)
Resumo:
Due to the imprecise nature of biological experiments, biological data is often characterized by the presence of redundant and noisy data. This may be due to errors that occurred during data collection, such as contaminations in laboratorial samples. It is the case of gene expression data, where the equipments and tools currently used frequently produce noisy biological data. Machine Learning algorithms have been successfully used in gene expression data analysis. Although many Machine Learning algorithms can deal with noise, detecting and removing noisy instances from the training data set can help the induction of the target hypothesis. This paper evaluates the use of distance-based pre-processing techniques for noise detection in gene expression data classification problems. This evaluation analyzes the effectiveness of the techniques investigated in removing noisy data, measured by the accuracy obtained by different Machine Learning classifiers over the pre-processed data.
Resumo:
Today several different unsupervised classification algorithms are commonly used to cluster similar patterns in a data set based only on its statistical properties. Specially in image data applications, self-organizing methods for unsupervised classification have been successfully applied for clustering pixels or group of pixels in order to perform segmentation tasks. The first important contribution of this paper refers to the development of a self-organizing method for data classification, named Enhanced Independent Component Analysis Mixture Model (EICAMM), which was built by proposing some modifications in the Independent Component Analysis Mixture Model (ICAMM). Such improvements were proposed by considering some of the model limitations as well as by analyzing how it should be improved in order to become more efficient. Moreover, a pre-processing methodology was also proposed, which is based on combining the Sparse Code Shrinkage (SCS) for image denoising and the Sobel edge detector. In the experiments of this work, the EICAMM and other self-organizing models were applied for segmenting images in their original and pre-processed versions. A comparative analysis showed satisfactory and competitive image segmentation results obtained by the proposals presented herein. (C) 2008 Published by Elsevier B.V.
Resumo:
This work proposes a method based on both preprocessing and data mining with the objective of identify harmonic current sources in residential consumers. In addition, this methodology can also be applied to identify linear and nonlinear loads. It should be emphasized that the entire database was obtained through laboratory essays, i.e., real data were acquired from residential loads. Thus, the residential system created in laboratory was fed by a configurable power source and in its output were placed the loads and the power quality analyzers (all measurements were stored in a microcomputer). So, the data were submitted to pre-processing, which was based on attribute selection techniques in order to minimize the complexity in identifying the loads. A newer database was generated maintaining only the attributes selected, thus, Artificial Neural Networks were trained to realized the identification of loads. In order to validate the methodology proposed, the loads were fed both under ideal conditions (without harmonics), but also by harmonic voltages within limits pre-established. These limits are in accordance with IEEE Std. 519-1992 and PRODIST (procedures to delivery energy employed by Brazilian`s utilities). The results obtained seek to validate the methodology proposed and furnish a method that can serve as alternative to conventional methods.
Resumo:
This study presents a solid-like finite element formulation to solve geometric non-linear three-dimensional inhomogeneous frames. To achieve the desired representation, unconstrained vectors are used instead of the classic rigid director triad; as a consequence, the resulting formulation does not use finite rotation schemes. High order curved elements with any cross section are developed using a full three-dimensional constitutive elastic relation. Warping and variable thickness strain modes are introduced to avoid locking. The warping mode is solved numerically in FEM pre-processing computational code, which is coupled to the main program. The extra calculations are relatively small when the number of finite elements. with the same cross section, increases. The warping mode is based on a 2D free torsion (Saint-Venant) problem that considers inhomogeneous material. A scheme that automatically generates shape functions and its derivatives allow the use of any degree of approximation for the developed frame element. General examples are solved to check the objectivity, path independence, locking free behavior, generality and accuracy of the proposed formulation. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
In this paper, we propose a method based on association rule-mining to enhance the diagnosis of medical images (mammograms). It combines low-level features automatically extracted from images and high-level knowledge from specialists to search for patterns. Our method analyzes medical images and automatically generates suggestions of diagnoses employing mining of association rules. The suggestions of diagnosis are used to accelerate the image analysis performed by specialists as well as to provide them an alternative to work on. The proposed method uses two new algorithms, PreSAGe and HiCARe. The PreSAGe algorithm combines, in a single step, feature selection and discretization, and reduces the mining complexity. Experiments performed on PreSAGe show that this algorithm is highly suitable to perform feature selection and discretization in medical images. HiCARe is a new associative classifier. The HiCARe algorithm has an important property that makes it unique: it assigns multiple keywords per image to suggest a diagnosis with high values of accuracy. Our method was applied to real datasets, and the results show high sensitivity (up to 95%) and accuracy (up to 92%), allowing us to claim that the use of association rules is a powerful means to assist in the diagnosing task.
Resumo:
The amount of textual information digitally stored is growing every day. However, our capability of processing and analyzing that information is not growing at the same pace. To overcome this limitation, it is important to develop semiautomatic processes to extract relevant knowledge from textual information, such as the text mining process. One of the main and most expensive stages of the text mining process is the text pre-processing stage, where the unstructured text should be transformed to structured format such as an attribute-value table. The stemming process, i.e. linguistics normalization, is usually used to find the attributes of this table. However, the stemming process is strongly dependent on the language in which the original textual information is given. Furthermore, for most languages, the stemming algorithms proposed in the literature are computationally expensive. In this work, several improvements of the well know Porter stemming algorithm for the Portuguese language, which explore the characteristics of this language, are proposed. Experimental results show that the proposed algorithm executes in far less time without affecting the quality of the generated stems.
Resumo:
In eukaryotes, pre-rRNA processing depends on a large number of nonribosomal trans-acting factors that form intriguingly organized complexes. One of the early stages of pre-rRNA processing includes formation of the two intermediate complexes pre-40S and pre-60S, which then form the mature ribosome subunits. Each of these complexes contains specific pre-rRNAs, ribosomal proteins and processing factors. The yeast nucleolar protein Nop53p has previously been identified in the pre-60S complex and shown to affect pre-rRNA processing by directly binding to 5.8S rRNA, and to interact with Nop17p and Nip7p, which are also involved in this process. Here we show that Nop53p binds 5.8S rRNA co-transcriptionally through its N-terminal region, and that this protein portion can also partially complement growth of the conditional mutant strain Delta nop53/GAL:NOP53. Nop53p interacts with Rrp6p and activates the exosome in vitro. These results indicate that Nop53p may recruit the exosome to 7S pre-rRNA for processing. Consistent with this observation and similar to the observed in exosome mutants, depletion of Nop53p leads to accumulation of polyadenylated pre-rRNAs.
Resumo:
In eukaryotes, pre-rRNA processing depends on a large number of nonribosomal trans-acting factors that form intriguingly organized complexes. Two intermediate complexes, pre-40S and pre-60S, are formed at the early stages of 35S pre-rRNA processing and give rise to the mature ribosome subunits. Each of these complexes contains specific pre-rRNAs, some ribosomal proteins and processing factors. The novel yeast protein Utp25p has previously been identified in the nucleolus, an indication that this protein could be involved in ribosome biogenesis. Here we show that Utp25p interacts with the SSU processome proteins Sas10p and Mpp10p, and affects 18S rRNA maturation. Depletion of Utp25p leads to accumulation of the pre-rRNA 35S and the aberrant rRNA 23S, and to a severe reduction in 40S ribosomal subunit levels. Our results indicate that Utp25p is a novel SSU processome subunit involved in pre-40S maturation.
Resumo:
Orthodox teaching and practice on nutrition and health almost always focuses on nutrients, or else on foods and drinks. Thus, diets that are high in folate and in green leafy vegetables are recommended, whereas diets high in saturated fat and in full-fat milk and other dairy products are not recommended. Food guides such as the US Food Guide Pyramid are designed to encourage consumption of healthier foods, by which is usually meant those higher in vitamins, minerals and other nutrients seen as desirable.What is generally overlooked in such approaches, which currently dominate official and other authoritative information and education programmes, and also food and nutrition public health policies, is food processing. It is now generally acknowledged that the current pandemic of obesity and related chronic diseases has as one of its important causes increased consumption of convenience including pre-prepared foods(1,2). However, the issue of food processing is largely ignored or minimised in education and information about food, nutrition and health, and also in public health policies.A short commentary cannot be comprehensive, and a general proposal such as that made here is bound to have some problems and exceptions. Also, the social, cultural, economic and environmental consequences of food processing are not discussed here. Readers comments and queries are invited
Resumo:
The purposes of this work were a) to evaluate citrus black spot (CBS) incidence in `Valencia` oranges and `Murcott` tangors aimed at the export market, and in Pera`, `Lima` and `Natal` oranges, and `Murcott` tangors, aimed at the domestic market after different processing stages in packinghouses in 2004/05 and 2005/06; b) to evaluate CBS incidence in Pera` and `Lima` oranges and `Murcott` tangors sold at Ceagesp-SP, the biggest wholesale market in the State of Sao Paulo, in 2006. Citrus fruits were collected at the packinghouse, on their arrival, after pre-washing and de-greening, from the packing table, from the pallet and at Ceagesp. They were stored for 14 to 21 days at 25 degrees C and 85-90% RH. The incidence of CBS was visually evaluated after one day and at the end of the storage period. CBS incidence in fruits aimed at the export market decreased, with values under 2.0% on arrival and no CBS symptoms observed on fruits from the pallet. The average incidence of CBS in `Pera`, `Lima` and `Natal` oranges, and `Murcott` tangors in the packinghouse aimed at the domestic market were 64.1, 39.0, 32.1 and 19.3%, respectively, after one day of storage, then remaining constant in all processing stages. The incidence of CBS in Ceagesp fruits was low in winter months and increased in the spring. The increase in disease incidence during the storage period (21 days) was not significant in collected fruits.
Resumo:
No fully effective treatment has been developed since the discovery of Chagas` disease. Since drug-resistant Trypanosoma cruzi strains are occurring and the current therapy is effective in the acute phase but with various adverse side effects, more studies are needed to characterize the susceptibility of T. cruzi to new drugs. Pre-mRNA maturation in trypanosomatids occurs through a process called trans-splicing, which is unusual RNA processing reaction, and it implies the processing of polycistronic transcription units into individual mRNAs; a short transcript spliced leader (SL RNA) is trans-spliced to the acceptor pre-mRNA, giving origin to the mature mRNA. Cubebin derivatives seem to provide treatments with less collateral effects than benznidazole and showed similar or better trypanocidal activities than benznidazole. Therefore, the cubebin derivatives ((-)-6,6`-dinitrohinokinin (DNH) and (-)-hinokinin (HQ)) interference in the mRNA processing was evaluated using T. cruzi permeable cells (Y and BOL (Bolivia) strains) following by RNase protection reaction. These substances seem to intervene in any step of the RNA transcription, promoting alterations in the RNA synthesis, even though the RNA processing mechanism still occurs. Furthermore, HQ presented better activity against the parasites than DNH, meaning that BOL strain seems to be more resistant than Y.
Resumo:
The cytoplasmic and nuclear protein Ki- 1 / 57 was first identified in malignant cells from Hodgkin`s lymphoma. Despite studies showing its phosphorylation, arginine methylation, and interaction with several regulatory proteins, the functional role of Ki- 1 / 57 in human cells remains to be determined. Here, we investigated the relationship of Ki- 1 / 57 with RNA functions. Through immunoprecipitation assays, we verified the association of Ki- 1 / 57 with the endogenous splicing proteins hnRNPQ and SFRS9 in HeLa cell extracts. We also found that recombinant Ki- 1 / 57 was able to bind to a poly- U RNA probe in electrophoretic mobility shift assays. In a classic splicing test, we showed that Ki- 1 / 57 can modify the splicing site selection of the adenoviral E1A minigene in a dose- dependent manner. Further confocal and. uorescence microscopy analysis revealed the localization of enhanced green. uorescent protein - Ki- 1 / 57 to nuclear bodies involved in RNA processing and or small nuclear ribonucleoprotein assembly, depending on the cellular methylation status and its N- terminal region. In summary, our findings suggest that Ki- 1 / 57 is probably involved in cellular events related to RNA functions, such as pre- mRNA splicing.
Cwc24p, a novel Saccharomyces cerevisiae nuclear ring finger protein, affects pre-snoRNA U3 splicing
Resumo:
U3 snoRNA is transcribed from two intron-containing genes in yeast, snR17A and snR17B. Although the assembly of the U3 snoRNP has not been precisely determined, at least some of the core box C/D proteins are known to bind pre-U3 co-transcriptionally, thereby affecting splicing and 3 `-end processing of this snoRNA. We identified the interaction between the box C/D assembly factor Nop17p and Cwc24p, a novel yeast RING finger protein that had been previously isolated in a complex with the splicing factor Cef1p. Here we show that, consistent with the protein interaction data, Cwc24p localizes to the cell nucleus, and its depletion leads to the accumulation of both U3 pre-snoRNAs. U3 snoRNA is involved in the early cleavages of 35 S pre-rRNA, and the defective splicing of pre-U3 detected in cells depleted of Cwc24p causes the accumulation of the 35 S precursor rRNA. These results led us to the conclusion that Cwc 24p is involved in pre-U3 snoRNA splicing, indirectly affecting pre-rRNA processing.
Resumo:
The Shwachman-Bodian-Diamond syndrome protein (SBDS) is a member of a highly conserved protein family of not well understood function, with putative orthologues found in different organisms ranging from Archaea, yeast and plants to vertebrate animals. The yeast orthologue of SBDS, Sdo1p, has been previously identified in association with the 60S ribosomal subunit and is proposed to participate in ribosomal recycling. Here we show that Sdo1p interacts with nucleolar rRNA processing factors and ribosomal proteins, indicating that it might bind the pre-60S complex and remain associated with it during processing and transport to the cytoplasm. Corroborating the protein interaction data, Sdo1p localizes to the nucleus and cytoplasm and co-immunoprecipitates precursors of 60S and 40S subunits, as well as the mature rRNAs. Sdo1p binds RNA directly, suggesting that it may associate with the ribosomal subunits also through RNA interaction. Copyright (C) 2009 John Wiley & Sons, Ltd.
Resumo:
The aim of this study was to evaluate the stress distribution in the cervical region of a sound upper central incisor in two clinical situations, standard and maximum masticatory forces, by means of a 3D model with the highest possible level of fidelity to the anatomic dimensions. Two models with 331,887 linear tetrahedral elements that represent a sound upper central incisor with periodontal ligament, cortical and trabecular bones were loaded at 45º in relation to the tooth's long axis. All structures were considered to be homogeneous and isotropic, with the exception of the enamel (anisotropic). A standard masticatory force (100 N) was simulated on one of the models, while on the other one a maximum masticatory force was simulated (235.9 N). The software used were: PATRAN for pre- and post-processing and Nastran for processing. In the cementoenamel junction area, tensile forces reached 14.7 MPa in the 100 N model, and 40.2 MPa in the 235.9 N model, exceeding the enamel's tensile strength (16.7 MPa). The fact that the stress concentration in the amelodentinal junction exceeded the enamel's tensile strength under simulated conditions of maximum masticatory force suggests the possibility of the occurrence of non-carious cervical lesions such as abfractions.