132 resultados para Pre-processing step
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)
Resumo:
Due to the imprecise nature of biological experiments, biological data is often characterized by the presence of redundant and noisy data. This may be due to errors that occurred during data collection, such as contaminations in laboratorial samples. It is the case of gene expression data, where the equipments and tools currently used frequently produce noisy biological data. Machine Learning algorithms have been successfully used in gene expression data analysis. Although many Machine Learning algorithms can deal with noise, detecting and removing noisy instances from the training data set can help the induction of the target hypothesis. This paper evaluates the use of distance-based pre-processing techniques for noise detection in gene expression data classification problems. This evaluation analyzes the effectiveness of the techniques investigated in removing noisy data, measured by the accuracy obtained by different Machine Learning classifiers over the pre-processed data.
Resumo:
Today several different unsupervised classification algorithms are commonly used to cluster similar patterns in a data set based only on its statistical properties. Specially in image data applications, self-organizing methods for unsupervised classification have been successfully applied for clustering pixels or group of pixels in order to perform segmentation tasks. The first important contribution of this paper refers to the development of a self-organizing method for data classification, named Enhanced Independent Component Analysis Mixture Model (EICAMM), which was built by proposing some modifications in the Independent Component Analysis Mixture Model (ICAMM). Such improvements were proposed by considering some of the model limitations as well as by analyzing how it should be improved in order to become more efficient. Moreover, a pre-processing methodology was also proposed, which is based on combining the Sparse Code Shrinkage (SCS) for image denoising and the Sobel edge detector. In the experiments of this work, the EICAMM and other self-organizing models were applied for segmenting images in their original and pre-processed versions. A comparative analysis showed satisfactory and competitive image segmentation results obtained by the proposals presented herein. (C) 2008 Published by Elsevier B.V.
Resumo:
In this paper, we propose a method based on association rule-mining to enhance the diagnosis of medical images (mammograms). It combines low-level features automatically extracted from images and high-level knowledge from specialists to search for patterns. Our method analyzes medical images and automatically generates suggestions of diagnoses employing mining of association rules. The suggestions of diagnosis are used to accelerate the image analysis performed by specialists as well as to provide them an alternative to work on. The proposed method uses two new algorithms, PreSAGe and HiCARe. The PreSAGe algorithm combines, in a single step, feature selection and discretization, and reduces the mining complexity. Experiments performed on PreSAGe show that this algorithm is highly suitable to perform feature selection and discretization in medical images. HiCARe is a new associative classifier. The HiCARe algorithm has an important property that makes it unique: it assigns multiple keywords per image to suggest a diagnosis with high values of accuracy. Our method was applied to real datasets, and the results show high sensitivity (up to 95%) and accuracy (up to 92%), allowing us to claim that the use of association rules is a powerful means to assist in the diagnosing task.
Resumo:
This work proposes a method based on both preprocessing and data mining with the objective of identify harmonic current sources in residential consumers. In addition, this methodology can also be applied to identify linear and nonlinear loads. It should be emphasized that the entire database was obtained through laboratory essays, i.e., real data were acquired from residential loads. Thus, the residential system created in laboratory was fed by a configurable power source and in its output were placed the loads and the power quality analyzers (all measurements were stored in a microcomputer). So, the data were submitted to pre-processing, which was based on attribute selection techniques in order to minimize the complexity in identifying the loads. A newer database was generated maintaining only the attributes selected, thus, Artificial Neural Networks were trained to realized the identification of loads. In order to validate the methodology proposed, the loads were fed both under ideal conditions (without harmonics), but also by harmonic voltages within limits pre-established. These limits are in accordance with IEEE Std. 519-1992 and PRODIST (procedures to delivery energy employed by Brazilian`s utilities). The results obtained seek to validate the methodology proposed and furnish a method that can serve as alternative to conventional methods.
Resumo:
This study presents a solid-like finite element formulation to solve geometric non-linear three-dimensional inhomogeneous frames. To achieve the desired representation, unconstrained vectors are used instead of the classic rigid director triad; as a consequence, the resulting formulation does not use finite rotation schemes. High order curved elements with any cross section are developed using a full three-dimensional constitutive elastic relation. Warping and variable thickness strain modes are introduced to avoid locking. The warping mode is solved numerically in FEM pre-processing computational code, which is coupled to the main program. The extra calculations are relatively small when the number of finite elements. with the same cross section, increases. The warping mode is based on a 2D free torsion (Saint-Venant) problem that considers inhomogeneous material. A scheme that automatically generates shape functions and its derivatives allow the use of any degree of approximation for the developed frame element. General examples are solved to check the objectivity, path independence, locking free behavior, generality and accuracy of the proposed formulation. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
The effect of a lipase-rich fungal enzymatic preparation, produced by a Penicillium sp. during solid-state fermentation, was evaluated in an anaerobic digester treating dairy wastewater with 1200 mg of oil and grease/L The oil and grease hydrolysis step was carried out with 0.1% (w/v) of solid enzymatic preparation at 30 degrees C for 24 h, and resulted in a final free acid concentration eight times higher than the initial value. The digester operated in sequential batches of 48 h at 30 degrees C for 245 days, and had high chemical oxygen demand (COD) removal efficiencies (around 90%) when fed with pre-hydrolyzed wastewater. However, when the pre-hydrolysis step was removed, the anaerobic digester performed poorly (with an average COD removal of 32%), as the oil and grease accumulated in the biomass and effluent oil and grease concentration increased throughout the operational period. PCR-DGGE analysis of the Bacteria and Archaea domains revealed remarkable differences in the microbial profiles in trials conducted with and without the pre-hydrolysis step, indicating that differences observed in overall parameters were intrinsically related to the microbial diversity of the anaerobic sludge. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
This work studied the radiation resistance of Listeria monocytogenes and Salmonella species and the effect of irradiation on leaf flavonoid content and sensory acceptability of minimally processed arugula. Immersion in ozone-treated water reduced the analyzed microorganisms by 1 log. L. monocytogenes and Salmonella were not isolated from samples. Samples of this vegetable were inoculated with a cocktail of Salmonella spp. and L. monocytogenes and exposed to gamma irradiation. D-10 values for Salmonella ranged from 0.16 to 0.19 kGy and for L. monocytogenes from 0.37 to 0.48 kGy. Kaempferol glycoside levels were 4 and ca. 3 times higher in samples exposed to 1 and 2 kGy, respectively, than in control samples. An increase in quercetin glycoside was also observed mainly in samples exposed to 1 kGy. In sensory evaluation, arugula had good acceptability, even after exposure to 2 and 4 kGy. These results indicate that irradiation has potential as a practical processing step to improve the safety of arugula.
Resumo:
Techniques devoted to generating triangular meshes from intensity images either take as input a segmented image or generate a mesh without distinguishing individual structures contained in the image. These facts may cause difficulties in using such techniques in some applications, such as numerical simulations. In this work we reformulate a previously developed technique for mesh generation from intensity images called Imesh. This reformulation makes Imesh more versatile due to an unified framework that allows an easy change of refinement metric, rendering it effective for constructing meshes for applications with varied requirements, such as numerical simulation and image modeling. Furthermore, a deeper study about the point insertion problem and the development of geometrical criterion for segmentation is also reported in this paper. Meshes with theoretical guarantee of quality can also be obtained for each individual image structure as a post-processing step, a characteristic not usually found in other methods. The tests demonstrate the flexibility and the effectiveness of the approach.
Resumo:
The amount of textual information digitally stored is growing every day. However, our capability of processing and analyzing that information is not growing at the same pace. To overcome this limitation, it is important to develop semiautomatic processes to extract relevant knowledge from textual information, such as the text mining process. One of the main and most expensive stages of the text mining process is the text pre-processing stage, where the unstructured text should be transformed to structured format such as an attribute-value table. The stemming process, i.e. linguistics normalization, is usually used to find the attributes of this table. However, the stemming process is strongly dependent on the language in which the original textual information is given. Furthermore, for most languages, the stemming algorithms proposed in the literature are computationally expensive. In this work, several improvements of the well know Porter stemming algorithm for the Portuguese language, which explore the characteristics of this language, are proposed. Experimental results show that the proposed algorithm executes in far less time without affecting the quality of the generated stems.
Resumo:
OWL-S is an application of OWL, the Web Ontology Language, that describes the semantics of Web Services so that their discovery, selection, invocation and composition can be automated. The research literature reports the use of UML diagrams for the automatic generation of Semantic Web Service descriptions in OWL-S. This paper demonstrates a higher level of automation by generating complete complete Web applications from OWL-S descriptions that have themselves been generated from UML. Previously, we proposed an approach for processing OWL-S descriptions in order to produce MVC-based skeletons for Web applications. The OWL-S ontology undergoes a series of transformations in order to generate a Model-View-Controller application implemented by a combination of Java Beans, JSP, and Servlets code, respectively. In this paper, we show in detail the documents produced at each processing step. We highlight the connections between OWL-S specifications and executable code in the various Java dialects and show the Web interfaces that result from this process.
Resumo:
No fully effective treatment has been developed since the discovery of Chagas` disease. Since drug-resistant Trypanosoma cruzi strains are occurring and the current therapy is effective in the acute phase but with various adverse side effects, more studies are needed to characterize the susceptibility of T. cruzi to new drugs. Pre-mRNA maturation in trypanosomatids occurs through a process called trans-splicing, which is unusual RNA processing reaction, and it implies the processing of polycistronic transcription units into individual mRNAs; a short transcript spliced leader (SL RNA) is trans-spliced to the acceptor pre-mRNA, giving origin to the mature mRNA. Cubebin derivatives seem to provide treatments with less collateral effects than benznidazole and showed similar or better trypanocidal activities than benznidazole. Therefore, the cubebin derivatives ((-)-6,6`-dinitrohinokinin (DNH) and (-)-hinokinin (HQ)) interference in the mRNA processing was evaluated using T. cruzi permeable cells (Y and BOL (Bolivia) strains) following by RNase protection reaction. These substances seem to intervene in any step of the RNA transcription, promoting alterations in the RNA synthesis, even though the RNA processing mechanism still occurs. Furthermore, HQ presented better activity against the parasites than DNH, meaning that BOL strain seems to be more resistant than Y.
Resumo:
In eukaryotes, pre-rRNA processing depends on a large number of nonribosomal trans-acting factors that form intriguingly organized complexes. One of the early stages of pre-rRNA processing includes formation of the two intermediate complexes pre-40S and pre-60S, which then form the mature ribosome subunits. Each of these complexes contains specific pre-rRNAs, ribosomal proteins and processing factors. The yeast nucleolar protein Nop53p has previously been identified in the pre-60S complex and shown to affect pre-rRNA processing by directly binding to 5.8S rRNA, and to interact with Nop17p and Nip7p, which are also involved in this process. Here we show that Nop53p binds 5.8S rRNA co-transcriptionally through its N-terminal region, and that this protein portion can also partially complement growth of the conditional mutant strain Delta nop53/GAL:NOP53. Nop53p interacts with Rrp6p and activates the exosome in vitro. These results indicate that Nop53p may recruit the exosome to 7S pre-rRNA for processing. Consistent with this observation and similar to the observed in exosome mutants, depletion of Nop53p leads to accumulation of polyadenylated pre-rRNAs.
Resumo:
In eukaryotes, pre-rRNA processing depends on a large number of nonribosomal trans-acting factors that form intriguingly organized complexes. Two intermediate complexes, pre-40S and pre-60S, are formed at the early stages of 35S pre-rRNA processing and give rise to the mature ribosome subunits. Each of these complexes contains specific pre-rRNAs, some ribosomal proteins and processing factors. The novel yeast protein Utp25p has previously been identified in the nucleolus, an indication that this protein could be involved in ribosome biogenesis. Here we show that Utp25p interacts with the SSU processome proteins Sas10p and Mpp10p, and affects 18S rRNA maturation. Depletion of Utp25p leads to accumulation of the pre-rRNA 35S and the aberrant rRNA 23S, and to a severe reduction in 40S ribosomal subunit levels. Our results indicate that Utp25p is a novel SSU processome subunit involved in pre-40S maturation.
Resumo:
Orthodox teaching and practice on nutrition and health almost always focuses on nutrients, or else on foods and drinks. Thus, diets that are high in folate and in green leafy vegetables are recommended, whereas diets high in saturated fat and in full-fat milk and other dairy products are not recommended. Food guides such as the US Food Guide Pyramid are designed to encourage consumption of healthier foods, by which is usually meant those higher in vitamins, minerals and other nutrients seen as desirable.What is generally overlooked in such approaches, which currently dominate official and other authoritative information and education programmes, and also food and nutrition public health policies, is food processing. It is now generally acknowledged that the current pandemic of obesity and related chronic diseases has as one of its important causes increased consumption of convenience including pre-prepared foods(1,2). However, the issue of food processing is largely ignored or minimised in education and information about food, nutrition and health, and also in public health policies.A short commentary cannot be comprehensive, and a general proposal such as that made here is bound to have some problems and exceptions. Also, the social, cultural, economic and environmental consequences of food processing are not discussed here. Readers comments and queries are invited
Resumo:
Instrumented indentation has been used to investigate the mechanical properties of BETAMATE 1496 (R) Epoxy adhesive. The properties of the adhesive were analyzed by measuring its hardness and its Young`s modulus in samples extracted from six different positions of the front door of a commercial passenger vehicle in two phases of processing: after application of the adhesive in the door assembling (""pre-cured"" state) and after final cure in the painting oven (""cured"" state). Special attention was given to setting the optimal parameters (""creep"" time and unloading time step) for the instrumented indentation testing for the present application. Young`s modulus values around 1.1 +/- 0.2 GPa and hardness values around 0.15 +/- 0.05 GPa were obtained for all samples, irrespective of the variation of the indentation parameters in the testing procedure and of the relative position of the adhesive in the door frame in both states. (C) 2008 Elsevier Ltd. All rights reserved.