60 resultados para data gathering algorithm
Resumo:
Background: The genome-wide identification of both morbid genes, i.e., those genes whose mutations cause hereditary human diseases, and druggable genes, i.e., genes coding for proteins whose modulation by small molecules elicits phenotypic effects, requires experimental approaches that are time-consuming and laborious. Thus, a computational approach which could accurately predict such genes on a genome-wide scale would be invaluable for accelerating the pace of discovery of causal relationships between genes and diseases as well as the determination of druggability of gene products.Results: In this paper we propose a machine learning-based computational approach to predict morbid and druggable genes on a genome-wide scale. For this purpose, we constructed a decision tree-based meta-classifier and trained it on datasets containing, for each morbid and druggable gene, network topological features, tissue expression profile and subcellular localization data as learning attributes. This meta-classifier correctly recovered 65% of known morbid genes with a precision of 66% and correctly recovered 78% of known druggable genes with a precision of 75%. It was than used to assign morbidity and druggability scores to genes not known to be morbid and druggable and we showed a good match between these scores and literature data. Finally, we generated decision trees by training the J48 algorithm on the morbidity and druggability datasets to discover cellular rules for morbidity and druggability and, among the rules, we found that the number of regulating transcription factors and plasma membrane localization are the most important factors to morbidity and druggability, respectively.Conclusions: We were able to demonstrate that network topological features along with tissue expression profile and subcellular localization can reliably predict human morbid and druggable genes on a genome-wide scale. Moreover, by constructing decision trees based on these data, we could discover cellular rules governing morbidity and druggability.
Resumo:
The multilayer perceptron network has become one of the most used in the solution of a wide variety of problems. The training process is based on the supervised method where the inputs are presented to the neural network and the output is compared with a desired value. However, the algorithm presents convergence problems when the desired output of the network has small slope in the discrete time samples or the output is a quasi-constant value. The proposal of this paper is presenting an alternative approach to solve this convergence problem with a pre-conditioning method of the desired output data set before the training process and a post-conditioning when the generalization results are obtained. Simulations results are presented in order to validate the proposed approach.
Resumo:
A low-cost computer procedure to determine the orbit of an artificial satellite by using short arc data from an onboard GPS receiver is proposed. Pseudoranges are used as measurements to estimate the orbit via recursive least squares method. The algorithm applies orthogonal Givens rotations for solving recursive and sequential orbit determination problems. To assess the procedure, it was applied to the TOPEX/POSEIDON satellite for data batches of one orbital period (approximately two hours), and force modelling, due to the full JGM-2 gravity field model, was considered. When compared with the reference Precision Orbit Ephemeris (POE) of JPL/NASA, the results have indicated that precision better than 9 m is easily obtained, even when short batches of data are used. Copyright (c) 2007.
Resumo:
A total of 2400 samples of commercial Brazilian C gasoline were collected over a 6-month period from different gas stations in the São Paulo state, Brazil, and analysed with respect to 12 physicochemical parameters according to regulation 309 of the Brazilian Government Petroleum, Natural Gas and Biofuels Agency (ANP). The percentages (v/v) of hydrocarbons (olefins, aromatics and saturated) were also determined. Hierarchical cluster analysis (HCA) was employed to select 150 representative samples that exhibited least similarity on the basis of their physicochemical parameters and hydrocarbon compositions. The chromatographic profiles of the selected samples were measured by gas chromatography with flame ionisation detection and analysed using soft independent modelling of class analogy (SIMCA) method in order to create a classification scheme to identify conform gasolines according to ANP 309 regulation. Following the optimisation of the SIMCA algorithm, it was possible to classify correctly 96% of the commercial gasoline samples present in the training set of 100. In order to check the quality of the model, an external group of 50 gasoline samples (the prediction set) were analysed and the developed SIMCA model classified 94% of these correctly. The developed chemometric method is recommended for screening commercial gasoline quality and detection of potential adulteration. (c) 2007 Elsevier B.V. All rights reserved.
Resumo:
The digital elevation model is important to determine the slope and land use capability, therefore, a proposal of methodology for acquisition of elevation data contemplating an efficient algorithm to generate a slope map was developed. Thus, it was aimed to obtain and evaluate a digital elevation model without the vetorization of the contours on planialtimetric charts. The area for acquisition of elevation data was Sao Manuel, SP. The data were collected by two methods: level contour vetorization and the gathering of elevation points on the level contour with maximum elevation points. The elevation data were analyzed by geostatistical techniques. Inspite of wide difference in the number of collected points between two methods, the variograms were adjusted to the exponential model and showed a range of approximately 1500 m, which does not justify the wide difficulty in vetorization of the planialtimetric charts, once the data points collected in the area were appropriately distributed, they represented rightly the terrain surface.
Resumo:
Phasor Measurement Units (PMUs) optimized allocation allows control, monitoring and accurate operation of electric power distribution systems, improving reliability and service quality. Good quality and considerable results are obtained for transmission systems using fault location techniques based on voltage measurements. Based on these techniques and performing PMUs optimized allocation it is possible to develop an electric power distribution system fault locator, which provides accurate results. The PMUs allocation problem presents combinatorial features related to devices number that can be allocated, and also probably places for allocation. Tabu search algorithm is the proposed technique to carry out PMUs allocation. This technique applied in a 141 buses real-life distribution urban feeder improved significantly the fault location results. © 2004 IEEE.
Resumo:
Until mid 2006, SCIAMACHY data processors for the operational retrieval of nitrogen dioxide (NO2) column data were based on the historical version 2 of the GOME Data Processor (GDP). On top of known problems inherent to GDP 2, ground-based validations of SCIAMACHY NO2 data revealed issues specific to SCIAMACHY, like a large cloud-dependent offset occurring at Northern latitudes. In 2006, the GDOAS prototype algorithm of the improved GDP version 4 was transferred to the off-line SCIAMACHY Ground Processor (SGP) version 3.0. In parallel, the calibration of SCIAMACHY radiometric data was upgraded. Before operational switch-on of SGP 3.0 and public release of upgraded SCIAMACHY NO2 data, we have investigated the accuracy of the algorithm transfer: (a) by checking the consistency of SGP 3.0 with prototype algorithms; and (b) by comparing SGP 3.0 NO2 data with ground-based observations reported by the WMO/GAW NDACC network of UV-visible DOAS/SAOZ spectrometers. This delta-validation study concludes that SGP 3.0 is a significant improvement with respect to the previous processor IPF 5.04. For three particular SCIAMACHY states, the study reveals unexplained features in the slant columns and air mass factors, although the quantitative impact on SGP 3.0 vertical columns is not significant.
Resumo:
The Brazilian National Institute for Space Research (INPE) is operating the Brazilian Environmental Data Collection System that currently amounts to a user community of around 100 organizations and more than 700 data collection platforms installed in Brazil. This system uses the SCD-1, SCD-2, and CBERS-2 low Earth orbit satellites to accomplish the data collection services. The main system applications are hydrology, meteorology, oceanography, water quality, and others. One of the functionalities offered by this system is the geographic localization of the data collection platforms by using Doppler shifts and a batch estimator based on least-squares technique. There is a growing demand to improve the quality of the geographical location of data collection platforms for animal tracking. This work presents an evaluation of the ionospheric and tropospheric effects on the Brazilian Environmental Data Collection System transmitter geographic location. Some models of the ionosphere and troposphere are presented to simulate their impacts and to evaluate performance of the platform location algorithm. The results of the Doppler shift measurements, using the SCD-2 satellite and the data collection platform (DCP) located in Cuiabá town, are presented and discussed.
Resumo:
The risk for venous thromboembolism (VTE) in medical patients is high, but risk assessment is rarely performed because there is not yet a good method to identify candidates for prophylaxis. Purpose: To perform a systematic review about VTE risk factors (RFs) in hospitalized medical patients and generate recommendations (RECs) for prophylaxis that can be implemented into practice. Data sources: A multidisciplinary group of experts from 12 Brazilian Medical Societies searched MEDLINE, Cochrane, and LILACS. Study selection: Two experts independently classified the evidence for each RF by its scientific quality in a standardized manner. A risk-assessment algorithm was created based on the results of the review. Data synthesis: Several VTE RFs have enough evidence to support RECs for prophylaxis in hospitalized medical patients (eg, increasing age, heart failure, and stroke). Other factors are considered adjuncts of risk (eg, varices, obesity, and infections). According to the algorithm, hospitalized medical patients ≥40 years-old with decreased mobility, and ≥1 RFs should receive chemoprophylaxis with heparin, provided they don't have contraindications. High prophylactic doses of unfractionated heparin or low-molecular-weight-heparin must be administered and maintained for 6-14 days. Conclusions: A multidisciplinary group generated evidence-based RECs and an easy-to-use algorithm to facilitate VTE prophylaxis in medical patients. © 2007 Rocha et al, publisher and licensee Dove Medical Press Ltd.
Resumo:
This paper describes an investigation of the hybrid PSO/ACO algorithm to classify automatically the well drilling operation stages. The method feasibility is demonstrated by its application to real mud-logging dataset. The results are compared with bio-inspired methods, and rule induction and decision tree algorithms for data mining. © 2009 Springer Berlin Heidelberg.
Resumo:
A significant set of information stored in different databases around the world, can be shared through peer-topeer databases. With that, is obtained a large base of knowledge, without the need for large investments because they are used existing databases, as well as the infrastructure in place. However, the structural characteristics of peer-topeer, makes complex the process of finding such information. On the other side, these databases are often heterogeneous in their schemas, but semantically similar in their content. A good peer-to-peer databases systems should allow the user access information from databases scattered across the network and receive only the information really relate to your topic of interest. This paper proposes to use ontologies in peer-to-peer database queries to represent the semantics inherent to the data. The main contribution of this work is enable integration between heterogeneous databases, improve the performance of such queries and use the algorithm of optimization Ant Colony to solve the problem of locating information on peer-to-peer networks, which presents an improve of 18% in results. © 2011 IEEE.
Resumo:
In a peer-to-peer network, the nodes interact with each other by sharing resources, services and information. Many applications have been developed using such networks, being a class of such applications are peer-to-peer databases. The peer-to-peer databases systems allow the sharing of unstructured data, being able to integrate data from several sources, without the need of large investments, because they are used existing repositories. However, the high flexibility and dynamicity of networks the network, as well as the absence of a centralized management of information, becomes complex the process of locating information among various participants in the network. In this context, this paper presents original contributions by a proposed architecture for a routing system that uses the Ant Colony algorithm to optimize the search for desired information supported by ontologies to add semantics to shared data, enabling integration among heterogeneous databases and the while seeking to reduce the message traffic on the network without causing losses in the amount of responses, confirmed by the improve of 22.5% in this amount. © 2011 IEEE.
Resumo:
Multi-relational data mining enables pattern mining from multiple tables. The existing multi-relational mining association rules algorithms are not able to process large volumes of data, because the amount of memory required exceeds the amount available. The proposed algorithm MRRadix presents a framework that promotes the optimization of memory usage. It also uses the concept of partitioning to handle large volumes of data. The original contribution of this proposal is enable a superior performance when compared to other related algorithms and moreover successfully concludes the task of mining association rules in large databases, bypass the problem of available memory. One of the tests showed that the MR-Radix presents fourteen times less memory usage than the GFP-growth. © 2011 IEEE.
Resumo:
Aiming to ensure greater reliability and consistency of data stored in the database, the data cleaning stage is set early in the process of Knowledge Discovery in Databases (KDD) and is responsible for eliminating problems and adjust the data for the later stages, especially for the stage of data mining. Such problems occur in the instance level and schema, namely, missing values, null values, duplicate tuples, values outside the domain, among others. Several algorithms were developed to perform the cleaning step in databases, some of them were developed specifically to work with the phonetics of words, since a word can be written in different ways. Within this perspective, this work presents as original contribution an optimization of algorithm for the detection of duplicate tuples in databases through phonetic based on multithreading without the need for trained data, as well as an independent environment of language to be supported for this. © 2011 IEEE.
Resumo:
This paper presents an application to traffic lights control in congested urban traffic, in real time, taking as input the position and route of the vehicles in the involved areas. This data is obtained from the communication between vehicles and infrastructure (V2I). Due to the great complexity of the possible combination of traffic lights and the short time to get a response, Genetic Algorithm was used to optimize this control. According to test results, the application can reduce the number of vehicles in congested areas, even with the entry of vehicles that previously were not being considered in these roads, such as parked vehicles. © 2012 IEEE.