895 resultados para sequence data mining


Relevância:

80.00% 80.00%

Publicador:

Resumo:

In the last years there has been a huge growth and consolidation of the Data Mining field. Some efforts are being done that seek the establishment of standards in the area. Included on these efforts there can be enumerated SEMMA and CRISP-DM. Both grow as industrial standards and define a set of sequential steps that pretends to guide the implementation of data mining applications. The question of the existence of substantial differences between them and the traditional KDD process arose. In this paper, is pretended to establish a parallel between these and the KDD process as well as an understanding of the similarities between them.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In the last years there has been a huge growth and consolidation of the Data Mining field. Some efforts are being done that seek the establishment of standards in the area. Included on these efforts there can be enumerated SEMMA and CRISP-DM. Both grow as industrial standards and define a set of sequential steps that pretends to guide the implementation of data mining applications. The question of the existence of substantial differences between them and the traditional KDD process arose. In this paper, is pretended to establish a parallel between these and the KDD process as well as an understanding of the similarities between them.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Abstract - Recently, long noncoding RNAs have emerged as pivotal molecules for the regulation of coding genes' expression. These molecules might result from antisense transcription of functional genes originating natural antisense transcripts (NATs) or from transcriptional active pseudogenes. TBCA interacts with β-tubulin and is involved in the folding and dimerization of new tubulin heterodimers, the building blocks of microtubules. Methodology/Principal findings: We found that the mouse genome contains two structurally distinct Tbca genes located in chromosomes 13 (Tbca13) and 16 (Tbca16). Interestingly, the two Tbca genes albeit ubiquitously expressed, present differential expression during mouse testis maturation. In fact, as testis maturation progresses Tbca13 mRNA levels increase progressively, while Tbca16 mRNA levels decrease. This suggests a regulatory mechanism between the two genes and prompted us to investigate the presence of the two proteins. However, using tandem mass spectrometry we were unable to identify the TBCA16 protein in testis extracts even in those corresponding to the maturation step with the highest levels of Tbca16 transcripts. These puzzling results led us to re-analyze the expression of Tbca16. We then detected that Tbca16 transcription produces sense and natural antisense transcripts. Strikingly, the specific depletion by RNAi of these transcripts leads to an increase of Tbca13 transcript levels in a mouse spermatocyte cell line. Conclusions/Significance: Our results demonstrate that Tbca13 mRNA levels are post-transcriptionally regulated by the sense and natural antisense Tbca16 mRNA levels. We propose that this regulatory mechanism operates during spermatogenesis, a process that involves microtubule rearrangements, the assembly of specific microtubule structures and requires critical TBCA levels.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The conjugation of antigens with ligands of pattern recognition receptors (PRR) is emerging as a promising strategy for the modulation of specific immunity. Here, we describe a new Escherichia coli system for the cloning and expression of heterologous antigens in fusion with the OprI lipoprotein, a TLR ligand from the Pseudomonas aeruginosa outer membrane (OM). Analysis of the OprI expressed by this system reveals a triacylated lipid moiety mainly composed by palmitic acid residues. By offering a tight regulation of expression and allowing for antigen purification by metal affinity chromatography, the new system circumvents the major drawbacks of former versions. In addition, the anchoring of OprI to the OM of the host cell is further explored for the production of novel recombinant bacterial cell wall-derived formulations (OM fragments and OM vesicles) with distinct potential for PRR activation. As an example, the African swine fever virus ORF A104R was cloned and the recombinant antigen was obtained in the three formulations. Overall, our results validate a new system suitable for the production of immunogenic formulations that can be used for the development of experimental vaccines and for studies on the modulation of acquired immunity.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents a Multi-Agent Market simulator designed for developing new agent market strategies based on a complete understanding of buyer and seller behaviors, preference models and pricing algorithms, considering user risk preferences and game theory for scenario analysis. This tool studies negotiations based on different market mechanisms and, time and behavior dependent strategies. The results of the negotiations between agents are analyzed by data mining algorithms in order to extract rules that give agents feedback to improve their strategies. The system also includes agents that are capable of improving their performance with their own experience, by adapting to the market conditions, and capable of considering other agent reactions.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The present research paper presents five different clustering methods to identify typical load profiles of medium voltage (MV) electricity consumers. These methods are intended to be used in a smart grid environment to extract useful knowledge about customer’s behaviour. The obtained knowledge can be used to support a decision tool, not only for utilities but also for consumers. Load profiles can be used by the utilities to identify the aspects that cause system load peaks and enable the development of specific contracts with their customers. The framework presented throughout the paper consists in several steps, namely the pre-processing data phase, clustering algorithms application and the evaluation of the quality of the partition, which is supported by cluster validity indices. The process ends with the analysis of the discovered knowledge. To validate the proposed framework, a case study with a real database of 208 MV consumers is used.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this work is proposed the design of a system to create and handle Electric Vehicles (EV) charging procedures, based on intelligent process. Due to the electrical power distribution network limitation and absence of smart meter devices, Electric Vehicles charging should be performed in a balanced way, taking into account past experience, weather information based on data mining, and simulation approaches. In order to allow information exchange and to help user mobility, it was also created a mobile application to assist the EV driver on these processes. This proposed Smart ElectricVehicle Charging System uses Vehicle-to-Grid (V2G) technology, in order to connect Electric Vehicles and also renewable energy sources to Smart Grids (SG). This system also explores the new paradigm of Electrical Markets (EM), with deregulation of electricity production and use, in order to obtain the best conditions for commercializing electrical energy.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

With the electricity market liberalization, distribution and retail companies are looking for better market strategies based on adequate information upon the consumption patterns of its electricity customers. In this environment all consumers are free to choose their electricity supplier. A fair insight on the customer´s behaviour will permit the definition of specific contract aspects based on the different consumption patterns. In this paper Data Mining (DM) techniques are applied to electricity consumption data from a utility client’s database. To form the different customer´s classes, and find a set of representative consumption patterns, we have used the Two-Step algorithm which is a hierarchical clustering algorithm. Each consumer class will be represented by its load profile resulting from the clustering operation. Next, to characterize each consumer class a classification model will be constructed with the C5.0 classification algorithm.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We describe a novel approach to explore DNA nucleotide sequence data, aiming to produce high-level categorical and structural information about the underlying chromosomes, genomes and species. The article starts by analyzing chromosomal data through histograms using fixed length DNA sequences. After creating the DNA-related histograms, a correlation between pairs of histograms is computed, producing a global correlation matrix. These data are then used as input to several data processing methods for information extraction and tabular/graphical output generation. A set of 18 species is processed and the extensive results reveal that the proposed method is able to generate significant and diversified outputs, in good accordance with current scientific knowledge in domains such as genomics and phylogenetics.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Electricity markets are complex environments with very particular characteristics. MASCEM is a market simulator developed to allow deep studies of the interactions between the players that take part in the electricity market negotiations. This paper presents a new proposal for the definition of MASCEM players’ strategies to negotiate in the market. The proposed methodology is multiagent based, using reinforcement learning algorithms to provide players with the capabilities to perceive the changes in the environment, while adapting their bids formulation according to their needs, using a set of different techniques that are at their disposal.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The growing importance and influence of new resources connected to the power systems has caused many changes in their operation. Environmental policies and several well know advantages have been made renewable based energy resources largely disseminated. These resources, including Distributed Generation (DG), are being connected to lower voltage levels where Demand Response (DR) must be considered too. These changes increase the complexity of the system operation due to both new operational constraints and amounts of data to be processed. Virtual Power Players (VPP) are entities able to manage these resources. Addressing these issues, this paper proposes a methodology to support VPP actions when these act as a Curtailment Service Provider (CSP) that provides DR capacity to a DR program declared by the Independent System Operator (ISO) or by the VPP itself. The amount of DR capacity that the CSP can assure is determined using data mining techniques applied to a database which is obtained for a large set of operation scenarios. The paper includes a case study based on 27,000 scenarios considering a diversity of distributed resources in a 33 bus distribution network.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper consist in the establishment of a Virtual Producer/Consumer Agent (VPCA) in order to optimize the integrated management of distributed energy resources and to improve and control Demand Side Management DSM) and its aggregated loads. The paper presents the VPCA architecture and the proposed function-based organization to be used in order to coordinate the several generation technologies, the different load types and storage systems. This VPCA organization uses a frame work based on data mining techniques to characterize the costumers. The paper includes results of several experimental tests cases, using real data and taking into account electricity generation resources as well as consumption data.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Proteins are biochemical entities consisting of one or more blocks typically folded in a 3D pattern. Each block (a polypeptide) is a single linear sequence of amino acids that are biochemically bonded together. The amino acid sequence in a protein is defined by the sequence of a gene or several genes encoded in the DNA-based genetic code. This genetic code typically uses twenty amino acids, but in certain organisms the genetic code can also include two other amino acids. After linking the amino acids during protein synthesis, each amino acid becomes a residue in a protein, which is then chemically modified, ultimately changing and defining the protein function. In this study, the authors analyze the amino acid sequence using alignment-free methods, aiming to identify structural patterns in sets of proteins and in the proteome, without any other previous assumptions. The paper starts by analyzing amino acid sequence data by means of histograms using fixed length amino acid words (tuples). After creating the initial relative frequency histograms, they are transformed and processed in order to generate quantitative results for information extraction and graphical visualization. Selected samples from two reference datasets are used, and results reveal that the proposed method is able to generate relevant outputs in accordance with current scientific knowledge in domains like protein sequence/proteome analysis.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Many current e-commerce systems provide personalization when their content is shown to users. In this sense, recommender systems make personalized suggestions and provide information of items available in the system. Nowadays, there is a vast amount of methods, including data mining techniques that can be employed for personalization in recommender systems. However, these methods are still quite vulnerable to some limitations and shortcomings related to recommender environment. In order to deal with some of them, in this work we implement a recommendation methodology in a recommender system for tourism, where classification based on association is applied. Classification based on association methods, also named associative classification methods, consist of an alternative data mining technique, which combines concepts from classification and association in order to allow association rules to be employed in a prediction context. The proposed methodology was evaluated in some case studies, where we could verify that it is able to shorten limitations presented in recommender systems and to enhance recommendation quality.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents an integrated system that helps both retail companies and electricity consumers on the definition of the best retail contracts and tariffs. This integrated system is composed by a Decision Support System (DSS) based on a Consumer Characterization Framework (CCF). The CCF is based on data mining techniques, applied to obtain useful knowledge about electricity consumers from large amounts of consumption data. This knowledge is acquired following an innovative and systematic approach able to identify different consumers’ classes, represented by a load profile, and its characterization using decision trees. The framework generates inputs to use in the knowledge base and in the database of the DSS. The rule sets derived from the decision trees are integrated in the knowledge base of the DSS. The load profiles together with the information about contracts and electricity prices form the database of the DSS. This DSS is able to perform the classification of different consumers, present its load profile and test different electricity tariffs and contracts. The final outputs of the DSS are a comparative economic analysis between different contracts and advice about the most economic contract to each consumer class. The presentation of the DSS is completed with an application example using a real data base of consumers from the Portuguese distribution company.