Biblioteca Digital

952 resultados para Data sets storage

Integrating geo-referenced multiscale and multidisciplinary data for the management of biodiversity in livestock genetic resources

Relevância:

90.00% 90.00%

Publicador:

Resumo:

P>In livestock genetic resource conservation, decision making about conservation priorities is based on the simultaneous analysis of several different criteria that may contribute to long-term sustainable breeding conditions, such as genetic and demographic characteristics, environmental conditions, and role of the breed in the local or regional economy. Here we address methods to integrate different data sets and highlight problems related to interdisciplinary comparisons. Data integration is based on the use of geographic coordinates and Geographic Information Systems (GIS). In addition to technical problems related to projection systems, GIS have to face the challenging issue of the non homogeneous scale of their data sets. We give examples of the successful use of GIS for data integration and examine the risk of obtaining biased results when integrating datasets that have been captured at different scales.

EthoSeq: A tool for phylogenetic analysis and data mining in behavioral sequences

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This article introduces the software program called EthoSeq, which is designed to extract probabilistic behavioral sequences (tree-generated sequences, or TGSs) from observational data and to prepare a TGS-species matrix for phylogenetic analysis. The program uses Graph Theory algorithms to automatically detect behavioral patterns within the observational sessions. It includes filtering tools to adjust the search procedure to user-specified statistical needs. Preliminary analyses of data sets, such as grooming sequences in birds and foraging tactics in spiders, uncover a large number of TGSs which together yield single phylogenetic trees. An example of the use of the program is our analysis of felid grooming sequences, in which we have obtained 1,386 felid grooming TGSs for seven species, resulting in a single phylogeny. These results show that behavior is definitely useful in phylogenetic analysis. EthoSeq simplifies and automates such analyses, uncovers much of the hidden patterns of long behavioral sequences, and prepares this data for further analysis with standard phylogenetic programs. We hope it will encourage many empirical studies on the evolution of behavior.

Electrical consumers data clustering through optimum-path forest

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Non-technical losses identification has been paramount in the last decade. Since we have datasets with hundreds of legal and illegal profiles, one may have a method to group data into subprofiles in order to minimize the search for consumers that cause great frauds. In this context, a electric power company may be interested in to go deeper a specific profile of illegal consumer. In this paper, we introduce the Optimum-Path Forest (OPF) clustering technique to this task, and we evaluate the behavior of a dataset provided by a brazilian electric power company with different values of an OPF parameter. © 2011 IEEE.

A semi-automatic method for indirect orientation of aerial images using ground control lines extracted from airborne laser scanner data

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper presents a method for indirect orientation of aerial images using ground control lines extracted from airborne Laser system (ALS) data. This data integration strategy has shown good potential in the automation of photogrammetric tasks, including the indirect orientation of images. The most important characteristic of the proposed approach is that the exterior orientation parameters (EOP) of a single or multiple images can be automatically computed with a space resection procedure from data derived from different sensors. The suggested method works as follows. Firstly, the straight lines are automatically extracted in the digital aerial image (s) and in the intensity image derived from an ALS data-set (S). Then, correspondence between s and S is automatically determined. A line-based coplanarity model that establishes the relationship between straight lines in the object and in the image space is used to estimate the EOP with the iterated extended Kalman filtering (IEKF). Implementation and testing of the method have employed data from different sensors. Experiments were conducted to assess the proposed method and the results obtained showed that the estimation of the EOP is function of ALS positional accuracy.

Combined search for the standard model Higgs boson decaying to bb̄ using the D0 run II data set

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We present the results of the combination of searches for the standard model Higgs boson produced in association with a W or Z boson and decaying into bb̄ using the data sample collected with the D0 detector in pp̄ collisions at √s=1.96TeV at the Fermilab Tevatron Collider. We derive 95% C.L. upper limits on the Higgs boson cross section relative to the standard model prediction in the mass range 100GeV≤M H≤150GeV, and we exclude Higgs bosons with masses smaller than 102 GeV at the 95% C.L. In the mass range 120GeV≤M H≤145GeV, the data exhibit an excess above the background prediction with a global significance of 1.5 standard deviations, consistent with the expectation in the presence of a standard model Higgs boson. © 2012 American Physical Society.

Particle competition and cooperation to prevent error propagation from mislabeled data in semi-supervised learning

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Semi-supervised learning is applied to classification problems where only a small portion of the data items is labeled. In these cases, the reliability of the labels is a crucial factor, because mislabeled items may propagate wrong labels to a large portion or even the entire data set. This paper aims to address this problem by presenting a graph-based (network-based) semi-supervised learning method, specifically designed to handle data sets with mislabeled samples. The method uses teams of walking particles, with competitive and cooperative behavior, for label propagation in the network constructed from the input data set. The proposed model is nature-inspired and it incorporates some features to make it robust to a considerable amount of mislabeled data items. Computer simulations show the performance of the method in the presence of different percentage of mislabeled data, in networks of different sizes and average node degree. Importantly, these simulations reveals the existence of the critical points of the mislabeled subset size, below which the network is free of wrong label contamination, but above which the mislabeled samples start to propagate their labels to the rest of the network. Moreover, numerical comparisons have been made among the proposed method and other representative graph-based semi-supervised learning methods using both artificial and real-world data sets. Interestingly, the proposed method has increasing better performance than the others as the percentage of mislabeled samples is getting larger. © 2012 IEEE.

An assessment of the economic impact of climate change on the water sector in the Turks and Caicos Islands

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The best description of water resources for Grand Turk was offered by Pérez Monteagudo (2000) who suggested that rain water was insufficient to ensure a regular water supply although water catchment was being practised and water catchment possibilities had been analysed. Limestone islands, mostly flat and low lying, have few possibilities for large scale surface storage, and groundwater lenses exist in very delicate equilibrium with saline seawater, and are highly likely to collapse due to sea level rise, improper extraction, drought, tidal waves or other extreme event. A study on the impact of climate change on water resources in the Turks and Caicos Islands is a challenging task, due to the fact that the territory of the Islands covers different environmental resources and conditions, and accurate data are lacking. The present report is based on collected data wherever possible, including grey data from several sources such as the Intergovernmental Panel on Climate Change (IPCC) and Cuban meteorological service data sets. Other data were also used, including the author’s own estimates and modelling results. Although challenging, this was perhaps the best approach towards analysing the situation. Furthermore, IPCC A2 and B2 scenarios were used in the present study in an effort to reduce uncertainty. The main conclusion from the scenario approach is that the trend observed in precipitation during the period 1961 - 1990 is decreasing. Similar behaviour was observed in the Caribbean region. This trend is associated with meteorological causes, particularly with the influence of the North Atlantic Anticyclone. The annual decrease in precipitation is estimated to be between 30-40% with uncertain impacts on marine resources. After an assessment of fresh water resources in Turks and Caicos Islands, the next step was to estimate residential water demand based on a high fertility rate scenario for the Islands (one selected from four scenarios and compared to countries having similar characteristics). The selected scenario presents higher projections on consumption growth, enabling better preparation for growing water demand. Water demand by tourists (stopover and excursionists, mainly cruise passengers) was also obtained, based on international daily consumption estimates. Tourism demand forecasts for Turks and Caicos Islands encompass the forty years between 2011 and 2050 and were obtained by means of an Artificial Neural Networks approach. for the A2 and B2 scenarios, resulting in the relation BAU>B2>A2 in terms of tourist arrivals and water demand levels from tourism. Adaptation options and policies were analysed. Resolving the issue of the best technology to be used for Turks and Caicos Islands is not directly related to climate change. Total estimated water storage capacity is about 1, 270, 800 m3/ year with 80% capacity load for three plants. However, almost 11 desalination plants have been detected on Turks and Caicos Islands. Without more data, it is not possible to estimate long term investment to match possible water demand and more complex adaptation options. One climate change adaptation option would be the construction of elevated (30 metres or higher) storm resistant water reservoirs. The unit cost of the storage capacity is the sum of capital costs and operational and maintenance costs. Electricity costs to pump water are optional as water should, and could, be stored for several months. The costs arising for water storage are in the range of US$ 0.22 cents/m3 without electricity costs. Pérez Monteagudo (2000) estimated water prices at around US$ 2.64/m3 in stand points, US$ 7.92 /m3 for government offices, and US$ 13.2 /m3for cistern truck vehicles. These data need to be updated. As Turks and Caicos Islands continues to depend on tourism and Reverse Osmosis (RO) for obtaining fresh water, an unavoidable condition to maintaining and increasing gross domestic product(GDP) and population welfare, dependence on fossil fuels and vulnerability to increasingly volatile prices will constitute an important restriction. In this sense, mitigation supposes a synergy with adaptation. Energy demand and emissions of carbon dioxide (CO2) were also estimated using an emissions factor of 2. 6 tCO2/ tonne of oil equivalent (toe). Assuming a population of 33,000 inhabitants, primary energy demand was estimated for Turks and Caicos Islands at 110,000 toe with electricity demand of around 110 GWh. The business as usual (BAU), as well as the mitigation scenarios were estimated. The BAU scenario suggests that energy use should be supported by imported fossil fuels with important improvements in energy efficiency. The mitigation scenario explores the use of photovoltaic and concentrating solar power, and wind energy. As this is a preliminary study, the local potential and locations need to be identified to provide more relevant estimates. Macroeconomic assumptions are the same for both scenarios. By 2050, Turks and Caicos Islands could demand 60 m toe less than for the BAU scenario.

Phylogenetic study of Class Armophorea (Alveolata, Ciliophora) based on 18S-rDNA data

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The 18S rDNA phylogeny of Class Armophorea, a group of anaerobic ciliates, is proposed based on an analysis of 44 sequences (out of 195) retrieved from the NCBI/GenBank database. Emphasis was placed on the use of two nucleotide alignment criteria that involved variation in the gap-opening and gap-extension parameters and the use of rRNA secondary structure to orientate multiple-alignment. A sensitivity analysis of 76 data sets was run to assess the effect of variations in indel parameters on tree topologies. Bayesian inference, maximum likelihood and maximum parsimony phylogenetic analyses were used to explore how different analytic frameworks influenced the resulting hypotheses. A sensitivity analysis revealed that the relationships among higher taxa of the Intramacronucleata were dependent upon how indels were determined during multiple-alignment of nucleotides. The phylogenetic analyses rejected the monophyly of the Armophorea most of the time and consistently indicated that the Metopidae and Nyctotheridae were related to the Litostomatea. There was no consensus on the placement of the Caenomorphidae, which could be a sister group of the Metopidae + Nyctorheridae, or could have diverged at the base of the Spirotrichea branch or the Intramacronucleata tree.

Nature-Inspired Framework for Hyperspectral Band Selection

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Statistical detection of spurious variations in daily raingauge data caused by changes in observation practices, as applied to records from various parts of the world

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In the instrumental records of daily precipitation, we often encounter one or more periods in which values below some threshold were not registered. Such periods, besides lacking small values, also have a large number of dry days. Their cumulative distribution function is shifted to the right in relation to that for other portions of the record having more reliable observations. Such problems are examined in this work, based mostly on the two-sample Kolmogorov–Smirnov (KS) test, where the portion of the series with more number of dry days is compared with the portion with less number of dry days. Another relatively common problem in daily rainfall data is the prevalence of integers either throughout the period of record or in some part of it, likely resulting from truncation during data compilation prior to archiving or by coarse rounding of daily readings by observers. This problem is identified by simple calculation of the proportion of integers in the series, taking the expected proportion as 10%. The above two procedures were applied to the daily rainfall data sets from the European Climate Assessment (ECA), Southeast Asian Climate Assessment (SACA), and Brazilian Water Resources Agency (BRA). Taking the statistic D of the KS test >0.15 and the corresponding p-value <0.001 as the condition to classify a given series as suspicious, the proportions of the ECA, SACA, and BRA series falling into this category are, respectively, 34.5%, 54.3%, and 62.5%. With relation to coarse rounding problem, the proportions of series exceeding twice the 10% reference level are 3%, 60%, and 43% for the ECA, SACA, and BRA data sets, respectively. A simple way to visualize the two problems addressed here is by plotting the time series of daily rainfall for a limited range, for instance, 0–10 mm day−1.

A Simpler and More Accurate AUTO-HDS Framework for Clustering and Visualization of Biological Data

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In [1], the authors proposed a framework for automated clustering and visualization of biological data sets named AUTO-HDS. This letter is intended to complement that framework by showing that it is possible to get rid of a user-defined parameter in a way that the clustering stage can be implemented more accurately while having reduced computational complexity

Class-specific metrics for multidimensional data projection applied to CBIR

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Content-based image retrieval is still a challenging issue due to the inherent complexity of images and choice of the most discriminant descriptors. Recent developments in the field have introduced multidimensional projections to burst accuracy in the retrieval process, but many issues such as introduction of pattern recognition tasks and deeper user intervention to assist the process of choosing the most discriminant features still remain unaddressed. In this paper, we present a novel framework to CBIR that combines pattern recognition tasks, class-specific metrics, and multidimensional projection to devise an effective and interactive image retrieval system. User interaction plays an essential role in the computation of the final multidimensional projection from which image retrieval will be attained. Results have shown that the proposed approach outperforms existing methods, turning out to be a very attractive alternative for managing image data sets.

Automatic design of decision-tree induction algorithms tailored to flexible-receptor docking data

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Background: This paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulations. In this application, it is important not only to correctly predict the free energy of binding, but also to provide a comprehensible model that could be validated by a domain specialist. Decision-tree induction algorithms have been successfully used in drug-design related applications, specially considering that decision trees are simple to understand, interpret, and validate. There are several decision-tree induction algorithms available for general-use, but each one has a bias that makes it more suitable for a particular data distribution. In this article, we propose and investigate the automatic design of decision-tree induction algorithms tailored to particular drug-enzyme binding data sets. We investigate the performance of our new method for evaluating binding conformations of different drug candidates to InhA, and we analyze our findings with respect to decision tree accuracy, comprehensibility, and biological relevance. Results: The empirical analysis indicates that our method is capable of automatically generating decision-tree induction algorithms that significantly outperform the traditional C4.5 algorithm with respect to both accuracy and comprehensibility. In addition, we provide the biological interpretation of the rules generated by our approach, reinforcing the importance of comprehensible predictive models in this particular bioinformatics application. Conclusions: We conclude that automatically designing a decision-tree algorithm tailored to molecular docking data is a promising alternative for the prediction of the free energy from the binding of a drug candidate with a flexible-receptor.

Automatic aspect discrimination in data clustering

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The attributes describing a data set may often be arranged in meaningful subsets, each of which corresponds to a different aspect of the data. An unsupervised algorithm (SCAD) that simultaneously performs fuzzy clustering and aspects weighting was proposed in the literature. However, SCAD may fail and halt given certain conditions. To fix this problem, its steps are modified and then reordered to reduce the number of parameters required to be set by the user. In this paper we prove that each step of the resulting algorithm, named ASCAD, globally minimizes its cost-function with respect to the argument being optimized. The asymptotic analysis of ASCAD leads to a time complexity which is the same as that of fuzzy c-means. A hard version of the algorithm and a novel validity criterion that considers aspect weights in order to estimate the number of clusters are also described. The proposed method is assessed over several artificial and real data sets.

A log-linear regression model for the beta-Birnbaum-Saunders distribution with censored data

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The beta-Birnbaum-Saunders (Cordeiro and Lemonte, 2011) and Birnbaum-Saunders (Birnbaum and Saunders, 1969a) distributions have been used quite effectively to model failure times for materials subject to fatigue and lifetime data. We define the log-beta-Birnbaum-Saunders distribution by the logarithm of the beta-Birnbaum-Saunders distribution. Explicit expressions for its generating function and moments are derived. We propose a new log-beta-Birnbaum-Saunders regression model that can be applied to censored data and be used more effectively in survival analysis. We obtain the maximum likelihood estimates of the model parameters for censored data and investigate influence diagnostics. The new location-scale regression model is modified for the possibility that long-term survivors may be presented in the data. Its usefulness is illustrated by means of two real data sets. (C) 2011 Elsevier B.V. All rights reserved.

«
1
2
...
11
12
13
14
15
16
17
...
63
64
»