854 resultados para data gathering algorithm


Relevância:

30.00% 30.00%

Publicador:

Resumo:

It is presented a software developed with Delphi programming language to compute the reservoir's annual regulated active storage, based on the sequent-peak algorithm. Mathematical models used for that purpose generally require extended hydrological series. Usually, the analysis of those series is performed with spreadsheets or graphical representations. Based on that, it was developed a software for calculation of reservoir active capacity. An example calculation is shown by 30-years (from 1977 to 2009) monthly mean flow historical data, from Corrente River, located at São Francisco River Basin, Brazil. As an additional tool, an interface was developed to manage water resources, helping to manipulate data and to point out information that it would be of interest to the user. Moreover, with that interface irrigation districts where water consumption is higher can be analyzed as a function of specific seasonal water demands situations. From a practical application, it is possible to conclude that the program provides the calculation originally proposed. It was designed to keep information organized and retrievable at any time, and to show simulation on seasonal water demands throughout the year, contributing with the elements of study concerning reservoir projects. This program, with its functionality, is an important tool for decision making in the water resources management.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

ABSTRACT: The Kalman-Bucy method is here analized and applied to the solution of a specific filtering problem to increase the signal message/noise ratio. The method is a time domain treatment of a geophysical process classified as stochastic non-stationary. The derivation of the estimator is based on the relationship between the Kalman-Bucy and Wiener approaches for linear systems. In the present work we emphasize the criterion used, the model with apriori information, the algorithm, and the quality as related to the results. The examples are for the ideal well-log response, and the results indicate that this method can be used on a variety of geophysical data treatments, and its study clearly offers a proper insight into modeling and processing of geophysical problems.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present an implementation of the F-statistic to carry out the first search in data from the Virgo laser interferometric gravitational wave detector for periodic gravitational waves from a priori unknown, isolated rotating neutron stars. We searched a frequency f(0) range from 100 Hz to 1 kHz and the frequency dependent spindown f(1) range from -1.6(f(0)/100 Hz) x 10(-9) Hz s(-1) to zero. A large part of this frequency-spindown space was unexplored by any of the all-sky searches published so far. Our method consisted of a coherent search over two-day periods using the F-statistic, followed by a search for coincidences among the candidates from the two-day segments. We have introduced a number of novel techniques and algorithms that allow the use of the fast Fourier transform (FFT) algorithm in the coherent part of the search resulting in a fifty-fold speed-up in computation of the F-statistic with respect to the algorithm used in the other pipelines. No significant gravitational wave signal was found. The sensitivity of the search was estimated by injecting signals into the data. In the most sensitive parts of the detector band more than 90% of signals would have been detected with dimensionless gravitational-wave amplitude greater than 5 x 10(-24).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Concept drift, which refers to non stationary learning problems over time, has increasing importance in machine learning and data mining. Many concept drift applications require fast response, which means an algorithm must always be (re)trained with the latest available data. But the process of data labeling is usually expensive and/or time consuming when compared to acquisition of unlabeled data, thus usually only a small fraction of the incoming data may be effectively labeled. Semi-supervised learning methods may help in this scenario, as they use both labeled and unlabeled data in the training process. However, most of them are based on assumptions that the data is static. Therefore, semi-supervised learning with concept drifts is still an open challenging task in machine learning. Recently, a particle competition and cooperation approach has been developed to realize graph-based semi-supervised learning from static data. We have extend that approach to handle data streams and concept drift. The result is a passive algorithm which uses a single classifier approach, naturally adapted to concept changes without any explicit drift detection mechanism. It has built-in mechanisms that provide a natural way of learning from new data, gradually "forgetting" older knowledge as older data items are no longer useful for the classification of newer data items. The proposed algorithm is applied to the KDD Cup 1999 Data of network intrusion, showing its effectiveness.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This Project aims to develop methods for data classification in a Data Warehouse for decision-making purposes. We also have as another goal the reduction of an attribute set in a Data Warehouse, in which a given reduced set is capable of keeping the same properties of the original one. Once we achieve a reduced set, we have a smaller computational cost of processing, we are able to identify non-relevant attributes to certain kinds of situations, and finally we are also able to recognize patterns in the database that will help us to take decisions. In order to achieve these main objectives, it will be implemented the Rough Sets algorithm. We chose PostgreSQL as our data base management system due to its efficiency, consolidation and finally, it’s an open-source system (free distribution)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Representing visually the external appearance of an extinct animal requires, for a reasonably reliable and expressive reconstitution, a good compilation and arrangement of the scientific conclusions on the fossil findings. It is proposed in this work an initial model of a briefing to be applied in a paleodesign process of a paleovertebrate. Briefing can be understood as a gathering of all necessary data to perform a project. We point out what must be known about the relevant structures in order to access all the data and the importance of such information. It is expected that the present briefing suggested might be faced with flexibility, serving as a facilitating interface of the relation between paleoartists and paleontologists.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we propose a Loss Tolerant Reliable (LTR) data transport mechanism for dynamic Event Sensing (LTRES) in WSNs. In LTRES, a reliable event sensing requirement at the transport layer is dynamically determined by the sink. A distributed source rate adaptation mechanism is designed, incorporating a loss rate based lightweight congestion control mechanism, to regulate the data traffic injected into the network so that the reliability requirement can be satisfied. An equation based fair rate control algorithm is used to improve the fairness among the LTRES flows sharing the congestion path. The performance evaluations show that LTRES can provide LTR data transport service for multiple events with short convergence time, low lost rate and high overall bandwidth utilization.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Active machine learning algorithms are used when large numbers of unlabeled examples are available and getting labels for them is costly (e.g. requiring consulting a human expert). Many conventional active learning algorithms focus on refining the decision boundary, at the expense of exploring new regions that the current hypothesis misclassifies. We propose a new active learning algorithm that balances such exploration with refining of the decision boundary by dynamically adjusting the probability to explore at each step. Our experimental results demonstrate improved performance on data sets that require extensive exploration while remaining competitive on data sets that do not. Our algorithm also shows significant tolerance of noise.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Hundreds of Terabytes of CMS (Compact Muon Solenoid) data are being accumulated for storage day by day at the University of Nebraska-Lincoln, which is one of the eight US CMS Tier-2 sites. Managing this data includes retaining useful CMS data sets and clearing storage space for newly arriving data by deleting less useful data sets. This is an important task that is currently being done manually and it requires a large amount of time. The overall objective of this study was to develop a methodology to help identify the data sets to be deleted when there is a requirement for storage space. CMS data is stored using HDFS (Hadoop Distributed File System). HDFS logs give information regarding file access operations. Hadoop MapReduce was used to feed information in these logs to Support Vector Machines (SVMs), a machine learning algorithm applicable to classification and regression which is used in this Thesis to develop a classifier. Time elapsed in data set classification by this method is dependent on the size of the input HDFS log file since the algorithmic complexities of Hadoop MapReduce algorithms here are O(n). The SVM methodology produces a list of data sets for deletion along with their respective sizes. This methodology was also compared with a heuristic called Retention Cost which was calculated using size of the data set and the time since its last access to help decide how useful a data set is. Accuracies of both were compared by calculating the percentage of data sets predicted for deletion which were accessed at a later instance of time. Our methodology using SVMs proved to be more accurate than using the Retention Cost heuristic. This methodology could be used to solve similar problems involving other large data sets.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: A current challenge in gene annotation is to define the gene function in the context of the network of relationships instead of using single genes. The inference of gene networks (GNs) has emerged as an approach to better understand the biology of the system and to study how several components of this network interact with each other and keep their functions stable. However, in general there is no sufficient data to accurately recover the GNs from their expression levels leading to the curse of dimensionality, in which the number of variables is higher than samples. One way to mitigate this problem is to integrate biological data instead of using only the expression profiles in the inference process. Nowadays, the use of several biological information in inference methods had a significant increase in order to better recover the connections between genes and reduce the false positives. What makes this strategy so interesting is the possibility of confirming the known connections through the included biological data, and the possibility of discovering new relationships between genes when observed the expression data. Although several works in data integration have increased the performance of the network inference methods, the real contribution of adding each type of biological information in the obtained improvement is not clear. Methods: We propose a methodology to include biological information into an inference algorithm in order to assess its prediction gain by using biological information and expression profile together. We also evaluated and compared the gain of adding four types of biological information: (a) protein-protein interaction, (b) Rosetta stone fusion proteins, (c) KEGG and (d) KEGG+GO. Results and conclusions: This work presents a first comparison of the gain in the use of prior biological information in the inference of GNs by considering the eukaryote (P. falciparum) organism. Our results indicates that information based on direct interaction can produce a higher improvement in the gain than data about a less specific relationship as GO or KEGG. Also, as expected, the results show that the use of biological information is a very important approach for the improvement of the inference. We also compared the gain in the inference of the global network and only the hubs. The results indicates that the use of biological information can improve the identification of the most connected proteins.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: This paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulations. In this application, it is important not only to correctly predict the free energy of binding, but also to provide a comprehensible model that could be validated by a domain specialist. Decision-tree induction algorithms have been successfully used in drug-design related applications, specially considering that decision trees are simple to understand, interpret, and validate. There are several decision-tree induction algorithms available for general-use, but each one has a bias that makes it more suitable for a particular data distribution. In this article, we propose and investigate the automatic design of decision-tree induction algorithms tailored to particular drug-enzyme binding data sets. We investigate the performance of our new method for evaluating binding conformations of different drug candidates to InhA, and we analyze our findings with respect to decision tree accuracy, comprehensibility, and biological relevance. Results: The empirical analysis indicates that our method is capable of automatically generating decision-tree induction algorithms that significantly outperform the traditional C4.5 algorithm with respect to both accuracy and comprehensibility. In addition, we provide the biological interpretation of the rules generated by our approach, reinforcing the importance of comprehensible predictive models in this particular bioinformatics application. Conclusions: We conclude that automatically designing a decision-tree algorithm tailored to molecular docking data is a promising alternative for the prediction of the free energy from the binding of a drug candidate with a flexible-receptor.