931 resultados para large data sets
Resumo:
Non-technical losses identification has been paramount in the last decade. Since we have datasets with hundreds of legal and illegal profiles, one may have a method to group data into subprofiles in order to minimize the search for consumers that cause great frauds. In this context, a electric power company may be interested in to go deeper a specific profile of illegal consumer. In this paper, we introduce the Optimum-Path Forest (OPF) clustering technique to this task, and we evaluate the behavior of a dataset provided by a brazilian electric power company with different values of an OPF parameter. © 2011 IEEE.
Resumo:
This paper presents a method for indirect orientation of aerial images using ground control lines extracted from airborne Laser system (ALS) data. This data integration strategy has shown good potential in the automation of photogrammetric tasks, including the indirect orientation of images. The most important characteristic of the proposed approach is that the exterior orientation parameters (EOP) of a single or multiple images can be automatically computed with a space resection procedure from data derived from different sensors. The suggested method works as follows. Firstly, the straight lines are automatically extracted in the digital aerial image (s) and in the intensity image derived from an ALS data-set (S). Then, correspondence between s and S is automatically determined. A line-based coplanarity model that establishes the relationship between straight lines in the object and in the image space is used to estimate the EOP with the iterated extended Kalman filtering (IEKF). Implementation and testing of the method have employed data from different sensors. Experiments were conducted to assess the proposed method and the results obtained showed that the estimation of the EOP is function of ALS positional accuracy.
Resumo:
We present the results of the combination of searches for the standard model Higgs boson produced in association with a W or Z boson and decaying into bb̄ using the data sample collected with the D0 detector in pp̄ collisions at √s=1.96TeV at the Fermilab Tevatron Collider. We derive 95% C.L. upper limits on the Higgs boson cross section relative to the standard model prediction in the mass range 100GeV≤M H≤150GeV, and we exclude Higgs bosons with masses smaller than 102 GeV at the 95% C.L. In the mass range 120GeV≤M H≤145GeV, the data exhibit an excess above the background prediction with a global significance of 1.5 standard deviations, consistent with the expectation in the presence of a standard model Higgs boson. © 2012 American Physical Society.
Resumo:
The best description of water resources for Grand Turk was offered by Pérez Monteagudo (2000) who suggested that rain water was insufficient to ensure a regular water supply although water catchment was being practised and water catchment possibilities had been analysed. Limestone islands, mostly flat and low lying, have few possibilities for large scale surface storage, and groundwater lenses exist in very delicate equilibrium with saline seawater, and are highly likely to collapse due to sea level rise, improper extraction, drought, tidal waves or other extreme event. A study on the impact of climate change on water resources in the Turks and Caicos Islands is a challenging task, due to the fact that the territory of the Islands covers different environmental resources and conditions, and accurate data are lacking. The present report is based on collected data wherever possible, including grey data from several sources such as the Intergovernmental Panel on Climate Change (IPCC) and Cuban meteorological service data sets. Other data were also used, including the author’s own estimates and modelling results. Although challenging, this was perhaps the best approach towards analysing the situation. Furthermore, IPCC A2 and B2 scenarios were used in the present study in an effort to reduce uncertainty. The main conclusion from the scenario approach is that the trend observed in precipitation during the period 1961 - 1990 is decreasing. Similar behaviour was observed in the Caribbean region. This trend is associated with meteorological causes, particularly with the influence of the North Atlantic Anticyclone. The annual decrease in precipitation is estimated to be between 30-40% with uncertain impacts on marine resources. After an assessment of fresh water resources in Turks and Caicos Islands, the next step was to estimate residential water demand based on a high fertility rate scenario for the Islands (one selected from four scenarios and compared to countries having similar characteristics). The selected scenario presents higher projections on consumption growth, enabling better preparation for growing water demand. Water demand by tourists (stopover and excursionists, mainly cruise passengers) was also obtained, based on international daily consumption estimates. Tourism demand forecasts for Turks and Caicos Islands encompass the forty years between 2011 and 2050 and were obtained by means of an Artificial Neural Networks approach. for the A2 and B2 scenarios, resulting in the relation BAU>B2>A2 in terms of tourist arrivals and water demand levels from tourism. Adaptation options and policies were analysed. Resolving the issue of the best technology to be used for Turks and Caicos Islands is not directly related to climate change. Total estimated water storage capacity is about 1, 270, 800 m3/ year with 80% capacity load for three plants. However, almost 11 desalination plants have been detected on Turks and Caicos Islands. Without more data, it is not possible to estimate long term investment to match possible water demand and more complex adaptation options. One climate change adaptation option would be the construction of elevated (30 metres or higher) storm resistant water reservoirs. The unit cost of the storage capacity is the sum of capital costs and operational and maintenance costs. Electricity costs to pump water are optional as water should, and could, be stored for several months. The costs arising for water storage are in the range of US$ 0.22 cents/m3 without electricity costs. Pérez Monteagudo (2000) estimated water prices at around US$ 2.64/m3 in stand points, US$ 7.92 /m3 for government offices, and US$ 13.2 /m3for cistern truck vehicles. These data need to be updated. As Turks and Caicos Islands continues to depend on tourism and Reverse Osmosis (RO) for obtaining fresh water, an unavoidable condition to maintaining and increasing gross domestic product(GDP) and population welfare, dependence on fossil fuels and vulnerability to increasingly volatile prices will constitute an important restriction. In this sense, mitigation supposes a synergy with adaptation. Energy demand and emissions of carbon dioxide (CO2) were also estimated using an emissions factor of 2. 6 tCO2/ tonne of oil equivalent (toe). Assuming a population of 33,000 inhabitants, primary energy demand was estimated for Turks and Caicos Islands at 110,000 toe with electricity demand of around 110 GWh. The business as usual (BAU), as well as the mitigation scenarios were estimated. The BAU scenario suggests that energy use should be supported by imported fossil fuels with important improvements in energy efficiency. The mitigation scenario explores the use of photovoltaic and concentrating solar power, and wind energy. As this is a preliminary study, the local potential and locations need to be identified to provide more relevant estimates. Macroeconomic assumptions are the same for both scenarios. By 2050, Turks and Caicos Islands could demand 60 m toe less than for the BAU scenario.
Resumo:
O método de empilhamento sísmico por Superfície de Reflexão Comum (ou empilhamento SRC) produz a simulação de seções com afastamento nulo (NA) a partir dos dados de cobertura múltipla. Para meios 2D, o operador de empilhamento SRC depende de três parâmetros que são: o ângulo de emergência do raio central com fonte-receptor nulo (β0), o raio de curvatura da onda ponto de incidência normal (RNIP) e o raio de curvatura da onda normal (RN). O problema crucial para a implementação do método de empilhamento SRC consiste na determinação, a partir dos dados sísmicos, dos três parâmetros ótimos associados a cada ponto de amostragem da seção AN a ser simulada. No presente trabalho foi desenvolvido uma nova sequência de processamento para a simulação de seções AN por meio do método de empilhamento SRC. Neste novo algoritmo, a determinação dos três parâmetros ótimos que definem o operador de empilhamento SRC é realizada em três etapas: na primeira etapa são estimados dois parâmetros (β°0 e R°NIP) por meio de uma busca global bidimensional nos dados de cobertura múltipla. Na segunda etapa é usado o valor de β°0 estimado para determinar-se o terceiro parâmetro (R°N) através de uma busca global unidimensional na seção AN resultante da primeira etapa. Em ambas etapas as buscas globais são realizadas aplicando o método de otimização Simulated Annealing (SA). Na terceira etapa são determinados os três parâmetros finais (β0, RNIP e RN) através uma busca local tridimensional aplicando o método de otimização Variable Metric (VM) nos dados de cobertura múltipla. Nesta última etapa é usado o trio de parâmetros (β°0, R°NIP, R°N) estimado nas duas etapas anteriores como aproximação inicial. Com o propósito de simular corretamente os eventos com mergulhos conflitantes, este novo algoritmo prevê a determinação de dois trios de parâmetros associados a pontos de amostragem da seção AN onde há intersecção de eventos. Em outras palavras, nos pontos da seção AN onde dois eventos sísmicos se cruzam são determinados dois trios de parâmetros SRC, os quais serão usados conjuntamente na simulação dos eventos com mergulhos conflitantes. Para avaliar a precisão e eficiência do novo algoritmo, este foi aplicado em dados sintéticos de dois modelos: um com interfaces contínuas e outro com uma interface descontinua. As seções AN simuladas têm elevada razão sinal-ruído e mostram uma clara definição dos eventos refletidos e difratados. A comparação das seções AN simuladas com as suas similares obtidas por modelamento direto mostra uma correta simulação de reflexões e difrações. Além disso, a comparação dos valores dos três parâmetros otimizados com os seus correspondentes valores exatos calculados por modelamento direto revela também um alto grau de precisão. Usando a aproximação hiperbólica dos tempos de trânsito, porém sob a condição de RNIP = RN, foi desenvolvido um novo algoritmo para a simulação de seções AN contendo predominantemente campos de ondas difratados. De forma similar ao algoritmo de empilhamento SRC, este algoritmo denominado empilhamento por Superfícies de Difração Comum (SDC) também usa os métodos de otimização SA e VM para determinar a dupla de parâmetros ótimos (β0, RNIP) que definem o melhor operador de empilhamento SDC. Na primeira etapa utiliza-se o método de otimização SA para determinar os parâmetros iniciais β°0 e R°NIP usando o operador de empilhamento com grande abertura. Na segunda etapa, usando os valores estimados de β°0 e R°NIP, são melhorados as estimativas do parâmetro RNIP por meio da aplicação do algoritmo VM na seção AN resultante da primeira etapa. Na terceira etapa são determinados os melhores valores de β°0 e R°NIP por meio da aplicação do algoritmo VM nos dados de cobertura múltipla. Vale salientar que a aparente repetição de processos tem como efeito a atenuação progressiva dos eventos refletidos. A aplicação do algoritmo de empilhamento SDC em dados sintéticos contendo campos de ondas refletidos e difratados, produz como resultado principal uma seção AN simulada contendo eventos difratados claramente definidos. Como uma aplicação direta deste resultado na interpretação de dados sísmicos, a migração pós-empilhamento em profundidade da seção AN simulada produz uma seção com a localização correta dos pontos difratores associados às descontinuidades do modelo.
Resumo:
The 18S rDNA phylogeny of Class Armophorea, a group of anaerobic ciliates, is proposed based on an analysis of 44 sequences (out of 195) retrieved from the NCBI/GenBank database. Emphasis was placed on the use of two nucleotide alignment criteria that involved variation in the gap-opening and gap-extension parameters and the use of rRNA secondary structure to orientate multiple-alignment. A sensitivity analysis of 76 data sets was run to assess the effect of variations in indel parameters on tree topologies. Bayesian inference, maximum likelihood and maximum parsimony phylogenetic analyses were used to explore how different analytic frameworks influenced the resulting hypotheses. A sensitivity analysis revealed that the relationships among higher taxa of the Intramacronucleata were dependent upon how indels were determined during multiple-alignment of nucleotides. The phylogenetic analyses rejected the monophyly of the Armophorea most of the time and consistently indicated that the Metopidae and Nyctotheridae were related to the Litostomatea. There was no consensus on the placement of the Caenomorphidae, which could be a sister group of the Metopidae + Nyctorheridae, or could have diverged at the base of the Spirotrichea branch or the Intramacronucleata tree.
Resumo:
Pós-graduação em Ciências Biológicas (Zoologia) - IBRC
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Active machine learning algorithms are used when large numbers of unlabeled examples are available and getting labels for them is costly (e.g. requiring consulting a human expert). Many conventional active learning algorithms focus on refining the decision boundary, at the expense of exploring new regions that the current hypothesis misclassifies. We propose a new active learning algorithm that balances such exploration with refining of the decision boundary by dynamically adjusting the probability to explore at each step. Our experimental results demonstrate improved performance on data sets that require extensive exploration while remaining competitive on data sets that do not. Our algorithm also shows significant tolerance of noise.
Resumo:
In [1], the authors proposed a framework for automated clustering and visualization of biological data sets named AUTO-HDS. This letter is intended to complement that framework by showing that it is possible to get rid of a user-defined parameter in a way that the clustering stage can be implemented more accurately while having reduced computational complexity
Resumo:
Observations of cosmic rays arrival directions made with the Pierre Auger Observatory have previously provided evidence of anisotropy at the 99% CL using the correlation of ultra high energy cosmic rays (UHECRs) with objects drawn from the Veron-Cetty Veron catalog. In this paper we report on the use of three catalog independent methods to search for anisotropy. The 2pt-L, 2pt+ and 3pt methods, each giving a different measure of self-clustering in arrival directions, were tested on mock cosmic ray data sets to study the impacts of sample size and magnetic smearing on their results, accounting for both angular and energy resolutions. If the sources of UHECRs follow the same large scale structure as ordinary galaxies in the local Universe and if UHECRs are deflected no more than a few degrees, a study of mock maps suggests that these three method can efficiently respond to the resulting anisotropy with a P-value = 1.0% or smaller with data sets as few as 100 events. using data taken from January 1, 2004 to July 31, 2010 we examined the 20, 30, ... , 110 highest energy events with a corresponding minimum energy threshold of about 49.3 EeV. The minimum P-values found were 13.5% using the 2pt-L method, 1.0% using the 2pt+ method and 1.1% using the 3pt method for the highest 100 energy events. In view of the multiple (correlated) scans performed on the data set, these catalog-independent methods do not yield strong evidence of anisotropy in the highest energy cosmic rays.
Resumo:
Content-based image retrieval is still a challenging issue due to the inherent complexity of images and choice of the most discriminant descriptors. Recent developments in the field have introduced multidimensional projections to burst accuracy in the retrieval process, but many issues such as introduction of pattern recognition tasks and deeper user intervention to assist the process of choosing the most discriminant features still remain unaddressed. In this paper, we present a novel framework to CBIR that combines pattern recognition tasks, class-specific metrics, and multidimensional projection to devise an effective and interactive image retrieval system. User interaction plays an essential role in the computation of the final multidimensional projection from which image retrieval will be attained. Results have shown that the proposed approach outperforms existing methods, turning out to be a very attractive alternative for managing image data sets.
Resumo:
Background: This paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulations. In this application, it is important not only to correctly predict the free energy of binding, but also to provide a comprehensible model that could be validated by a domain specialist. Decision-tree induction algorithms have been successfully used in drug-design related applications, specially considering that decision trees are simple to understand, interpret, and validate. There are several decision-tree induction algorithms available for general-use, but each one has a bias that makes it more suitable for a particular data distribution. In this article, we propose and investigate the automatic design of decision-tree induction algorithms tailored to particular drug-enzyme binding data sets. We investigate the performance of our new method for evaluating binding conformations of different drug candidates to InhA, and we analyze our findings with respect to decision tree accuracy, comprehensibility, and biological relevance. Results: The empirical analysis indicates that our method is capable of automatically generating decision-tree induction algorithms that significantly outperform the traditional C4.5 algorithm with respect to both accuracy and comprehensibility. In addition, we provide the biological interpretation of the rules generated by our approach, reinforcing the importance of comprehensible predictive models in this particular bioinformatics application. Conclusions: We conclude that automatically designing a decision-tree algorithm tailored to molecular docking data is a promising alternative for the prediction of the free energy from the binding of a drug candidate with a flexible-receptor.