895 resultados para Genetic Algorithms, Adaptation, Internet Computing
Resumo:
In any data mining applications, automated text and text and image retrieval of information is needed. This becomes essential with the growth of the Internet and digital libraries. Our approach is based on the latent semantic indexing (LSI) and the corresponding term-by-document matrix suggested by Berry and his co-authors. Instead of using deterministic methods to find the required number of first "k" singular triplets, we propose a stochastic approach. First, we use Monte Carlo method to sample and to build much smaller size term-by-document matrix (e.g. we build k x k matrix) from where we then find the first "k" triplets using standard deterministic methods. Second, we investigate how we can reduce the problem to finding the "k"-largest eigenvalues using parallel Monte Carlo methods. We apply these methods to the initial matrix and also to the reduced one. The algorithms are running on a cluster of workstations under MPI and results of the experiments arising in textual retrieval of Web documents as well as comparison of the stochastic methods proposed are presented. (C) 2003 IMACS. Published by Elsevier Science B.V. All rights reserved.
Resumo:
In this paper we present error analysis for a Monte Carlo algorithm for evaluating bilinear forms of matrix powers. An almost Optimal Monte Carlo (MAO) algorithm for solving this problem is formulated. Results for the structure of the probability error are presented and the construction of robust and interpolation Monte Carlo algorithms are discussed. Results are presented comparing the performance of the Monte Carlo algorithm with that of a corresponding deterministic algorithm. The two algorithms are tested on a well balanced matrix and then the effects of perturbing this matrix, by small and large amounts, is studied.
Resumo:
A recently emerging bleeding canker disease, caused by Pseudomonas syringae pathovar aesculi (Pae), is threatening European horse chestnut in northwest Europe. Very little is known about the origin and biology of this new disease. We used the nucleotide sequences of seven commonly used marker genes to investigate the phylogeny of three strains isolated recently from bleeding stem cankers on European horse chestnut in Britain (E-Pae). On the basis of these sequences alone, the E-Pae strains were identical to the Pae type-strain (I-Pae), isolated from leaf spots on Indian horse chestnut in India in 1969. The phylogenetic analyses also showed that Pae belongs to a distinct clade of P. syringae pathovars adapted to woody hosts. We generated genome-wide Illumina sequence data from the three E-Pae strains and one strain of I-Pae. Comparative genomic analyses revealed pathovar-specific genomic regions in Pae potentially implicated in virulence on a tree host, including genes for the catabolism of plant-derived aromatic compounds and enterobactin synthesis. Several gene clusters displayed intra-pathovar variation, including those encoding type IV secretion, a novel fatty acid biosynthesis pathway and a sucrose uptake pathway. Rates of single nucleotide polymorphisms in the four Pae genomes indicate that the three E-Pae strains diverged from each other much more recently than they diverged from I-Pae. The very low genetic diversity among the three geographically distinct E-Pae strains suggests that they originate from a single, recent introduction into Britain, thus highlighting the serious environmental risks posed by the spread of an exotic plant pathogenic bacterium to a new geographic location. The genomic regions in Pae that are absent from other P. syringae pathovars that infect herbaceous hosts may represent candidate genetic adaptations to infection of the woody parts of the tree.
Resumo:
The increasing demand for cheaper-faster-better services anytime and anywhere has made radio network optimisation much more complex than ever before. In order to dynamically optimise the serving network, Dynamic Network Optimisation (DNO), is proposed as the ultimate solution and future trend. The realization of DNO, however, has been hindered by a significant bottleneck of the optimisation speed as the network complexity grows. This paper presents a multi-threaded parallel solution to accelerate complicated proprietary network optimisation algorithms, under a rigid condition of numerical consistency. ariesoACP product from Arieso Ltd serves as the platform for parallelisation. This parallel solution has been benchmarked and results exhibit a high scalability and a run-time reduction by 11% to 42% based on the technology, subscriber density and blocking rate of a given network in comparison with the original version. Further, it is highly essential that the parallel version produces equivalent optimisation quality in terms of identical optimisation outputs.
Resumo:
Searching for the optimum tap-length that best balances the complexity and steady-state performance of an adaptive filter has attracted attention recently. Among existing algorithms that can be found in the literature, two of which, namely the segmented filter (SF) and gradient descent (GD) algorithms, are of particular interest as they can search for the optimum tap-length quickly. In this paper, at first, we carefully compare the SF and GD algorithms and show that the two algorithms are equivalent in performance under some constraints, but each has advantages/disadvantages relative to the other. Then, we propose an improved variable tap-length algorithm using the concept of the pseudo fractional tap-length (FT). Updating the tap-length with instantaneous errors in a style similar to that used in the stochastic gradient [or least mean squares (LMS)] algorithm, the proposed FT algorithm not only retains the advantages from both the SF and the GD algorithms but also has significantly less complexity than existing algorithms. Both performance analysis and numerical simulations are given to verify the new proposed algorithm.
Resumo:
Information on the genetic variation of plant response to elevated CO2 (e[CO2]) is needed to understand plant adaptation and to pinpoint likely evolutionary response to future high atmospheric CO2 concentrations.• Here, quantitative trait loci (QTL) for above- and below-ground tree growth were determined in a pedigree – an F2 hybrid of poplar (Populus trichocarpa and Populus deltoides), following season-long exposure to either current day ambient CO2 (a[CO2]) or e[CO2] at 600 µl l−1, and genotype by environment interactions investigated.• In the F2 generation, both above- and below-ground growth showed a significant increase in e[CO2]. Three areas of the genome on linkage groups I, IX and XII were identified as important in determining above-ground growth response to e[CO2], while an additional three areas of the genome on linkage groups IV, XVI and XIX appeared important in determining root growth response to e[CO2].• These results quantify and identify genetic variation in response to e[CO2] and provide an insight into genomic response to the changing environment
Resumo:
We describe a model-data fusion (MDF) inter-comparison project (REFLEX), which compared various algorithms for estimating carbon (C) model parameters consistent with both measured carbon fluxes and states and a simple C model. Participants were provided with the model and with both synthetic net ecosystem exchange (NEE) of CO2 and leaf area index (LAI) data, generated from the model with added noise, and observed NEE and LAI data from two eddy covariance sites. Participants endeavoured to estimate model parameters and states consistent with the model for all cases over the two years for which data were provided, and generate predictions for one additional year without observations. Nine participants contributed results using Metropolis algorithms, Kalman filters and a genetic algorithm. For the synthetic data case, parameter estimates compared well with the true values. The results of the analyses indicated that parameters linked directly to gross primary production (GPP) and ecosystem respiration, such as those related to foliage allocation and turnover, or temperature sensitivity of heterotrophic respiration, were best constrained and characterised. Poorly estimated parameters were those related to the allocation to and turnover of fine root/wood pools. Estimates of confidence intervals varied among algorithms, but several algorithms successfully located the true values of annual fluxes from synthetic experiments within relatively narrow 90% confidence intervals, achieving >80% success rate and mean NEE confidence intervals <110 gC m−2 year−1 for the synthetic case. Annual C flux estimates generated by participants generally agreed with gap-filling approaches using half-hourly data. The estimation of ecosystem respiration and GPP through MDF agreed well with outputs from partitioning studies using half-hourly data. Confidence limits on annual NEE increased by an average of 88% in the prediction year compared to the previous year, when data were available. Confidence intervals on annual NEE increased by 30% when observed data were used instead of synthetic data, reflecting and quantifying the addition of model error. Finally, our analyses indicated that incorporating additional constraints, using data on C pools (wood, soil and fine roots) would help to reduce uncertainties for model parameters poorly served by eddy covariance data.
Resumo:
The impending threat of global climate change and its regional manifestations is among the most important and urgent problems facing humanity. Society needs accurate and reliable estimates of changes in the probability of regional weather variations to develop science-based adaptation and mitigation strategies. Recent advances in weather prediction and in our understanding and ability to model the climate system suggest that it is both necessary and possible to revolutionize climate prediction to meet these societal needs. However, the scientific workforce and the computational capability required to bring about such a revolution is not available in any single nation. Motivated by the success of internationally funded infrastructure in other areas of science, this paper argues that, because of the complexity of the climate system, and because the regional manifestations of climate change are mainly through changes in the statistics of regional weather variations, the scientific and computational requirements to predict its behavior reliably are so enormous that the nations of the world should create a small number of multinational high-performance computing facilities dedicated to the grand challenges of developing the capabilities to predict climate variability and change on both global and regional scales over the coming decades. Such facilities will play a key role in the development of next-generation climate models, build global capacity in climate research, nurture a highly trained workforce, and engage the global user community, policy-makers, and stakeholders. We recommend the creation of a small number of multinational facilities with computer capability at each facility of about 20 peta-flops in the near term, about 200 petaflops within five years, and 1 exaflop by the end of the next decade. Each facility should have sufficient scientific workforce to develop and maintain the software and data analysis infrastructure. Such facilities will enable questions of what resolution, both horizontal and vertical, in atmospheric and ocean models, is necessary for more confident predictions at the regional and local level. Current limitations in computing power have placed severe limitations on such an investigation, which is now badly needed. These facilities will also provide the world's scientists with the computational laboratories for fundamental research on weather–climate interactions using 1-km resolution models and on atmospheric, terrestrial, cryospheric, and oceanic processes at even finer scales. Each facility should have enabling infrastructure including hardware, software, and data analysis support, and scientific capacity to interact with the national centers and other visitors. This will accelerate our understanding of how the climate system works and how to model it. It will ultimately enable the climate community to provide society with climate predictions, which are based on our best knowledge of science and the most advanced technology.
Resumo:
The worldwide spread of barley cultivation required adaptation to agricultural environments far distant from those found in its centre of domestication. An important component of this adaptation is the timing of flowering, achieved predominantly in response to day length and temperature. Here, we use a collection of cultivars, landraces and wild barley accessions to investigate the origins and distribution of allelic diversity at four major flowering time loci, mutations at which have been under selection during the spread of barley cultivation into Europe. Our findings suggest that while mutant alleles at the PPD-H1 and PPD-H2 photoperiod loci occurred pre-domestication, the mutant vernalization non-responsive alleles utilized in landraces and cultivars at the VRN-H1 and VRN-H2 loci occurred post-domestication. The transition from wild to cultivated barley is associated with a doubling in the number of observed multi-locus flowering-time haplotypes, suggesting that the resulting phenotypic variation has aided adaptation to cultivation in the diverse ecogeographic locations encountered. Despite the importance of early-flowering alleles during the domestication of barley in Europe, we show that novel VRN alleles associated with early flowering in wild barley have been lost in domesticates, highlighting the potential of wild germplasm as a source of novel allelic variation for agronomic traits.
Resumo:
With the fast development of the Internet, wireless communications and semiconductor devices, home networking has received significant attention. Consumer products can collect and transmit various types of data in the home environment. Typical consumer sensors are often equipped with tiny, irreplaceable batteries and it therefore of the utmost importance to design energy efficient algorithms to prolong the home network lifetime and reduce devices going to landfill. Sink mobility is an important technique to improve home network performance including energy consumption, lifetime and end-to-end delay. Also, it can largely mitigate the hot spots near the sink node. The selection of optimal moving trajectory for sink node(s) is an NP-hard problem jointly optimizing routing algorithms with the mobile sink moving strategy is a significant and challenging research issue. The influence of multiple static sink nodes on energy consumption under different scale networks is first studied and an Energy-efficient Multi-sink Clustering Algorithm (EMCA) is proposed and tested. Then, the influence of mobile sink velocity, position and number on network performance is studied and a Mobile-sink based Energy-efficient Clustering Algorithm (MECA) is proposed. Simulation results validate the performance of the proposed two algorithms which can be deployed in a consumer home network environment.
Resumo:
We present an efficient graph-based algorithm for quantifying the similarity of household-level energy use profiles, using a notion of similarity that allows for small time–shifts when comparing profiles. Experimental results on a real smart meter data set demonstrate that in cases of practical interest our technique is far faster than the existing method for computing the same similarity measure. Having a fast algorithm for measuring profile similarity improves the efficiency of tasks such as clustering of customers and cross-validation of forecasting methods using historical data. Furthermore, we apply a generalisation of our algorithm to produce substantially better household-level energy use forecasts from historical smart meter data.
Resumo:
Our digital universe is rapidly expanding,more and more daily activities are digitally recorded, data arrives in streams, it needs to be analyzed in real time and may evolve over time. In the last decade many adaptive learning algorithms and prediction systems, which can automatically update themselves with the new incoming data, have been developed. The majority of those algorithms focus on improving the predictive performance and assume that model update is always desired as soon as possible and as frequently as possible. In this study we consider potential model update as an investment decision, which, as in the financial markets, should be taken only if a certain return on investment is expected. We introduce and motivate a new research problem for data streams ? cost-sensitive adaptation. We propose a reference framework for analyzing adaptation strategies in terms of costs and benefits. Our framework allows to characterize and decompose the costs of model updates, and to asses and interpret the gains in performance due to model adaptation for a given learning algorithm on a given prediction task. Our proof-of-concept experiment demonstrates how the framework can aid in analyzing and managing adaptation decisions in the chemical industry.
Resumo:
Theory predicts the emergence of generalists in variable environments and antagonistic pleiotropy to favour specialists in constant environments, but empirical data seldom support such generalist–specialist trade-offs. We selected for generalists and specialists in the dung fly Sepsis punctum (Diptera: Sepsidae) under conditions that we predicted would reveal antagonistic pleiotropy and multivariate trade-offs underlying thermal reaction norms for juvenile development. We performed replicated laboratory evolution using four treatments: adaptation at a hot (31 °C) or a cold (15 °C) temperature, or under regimes fluctuating between these temperatures, either within or between generations. After 20 generations, we assessed parental effects and genetic responses of thermal reaction norms for three correlated life-history traits: size at maturity, juvenile growth rate and juvenile survival. We find evidence for antagonistic pleiotropy for performance at hot and cold temperatures, and a temperature-mediated trade-off between juvenile survival and size at maturity, suggesting that trade-offs associated with environmental tolerance can arise via intensified evolutionary compromises between genetically correlated traits. However, despite this antagonistic pleiotropy, we found no support for the evolution of increased thermal tolerance breadth at the expense of reduced maximal performance, suggesting low genetic variance in the generalist–specialist dimension.
Resumo:
In e-health intervention studies, there are concerns about the reliability of internet-based, self-reported (SR) data and about the potential for identity fraud. This study introduced and tested a novel procedure for assessing the validity of internet-based, SR identity and validated anthropometric and demographic data via measurements performed face-to-face in a validation study (VS). Participants (n = 140) from seven European countries, participating in the Food4Me intervention study which aimed to test the efficacy of personalised nutrition approaches delivered via the internet, were invited to take part in the VS. Participants visited a research centre in each country within 2 weeks of providing SR data via the internet. Participants received detailed instructions on how to perform each measurement. Individual’s identity was checked visually and by repeated collection and analysis of buccal cell DNA for 33 genetic variants. Validation of identity using genomic information showed perfect concordance between SR and VS. Similar results were found for demographic data (age and sex verification). We observed strong intra-class correlation coefficients between SR and VS for anthropometric data (height 0.990, weight 0.994 and BMI 0.983). However, internet-based SR weight was under-reported (Δ −0.70 kg [−3.6 to 2.1], p < 0.0001) and, therefore, BMI was lower for SR data (Δ −0.29 kg m−2 [−1.5 to 1.0], p < 0.0001). BMI classification was correct in 93 % of cases. We demonstrate the utility of genotype information for detection of possible identity fraud in e-health studies and confirm the reliability of internet-based, SR anthropometric and demographic data collected in the Food4Me study.
Resumo:
An important application of Big Data Analytics is the real-time analysis of streaming data. Streaming data imposes unique challenges to data mining algorithms, such as concept drifts, the need to analyse the data on the fly due to unbounded data streams and scalable algorithms due to potentially high throughput of data. Real-time classification algorithms that are adaptive to concept drifts and fast exist, however, most approaches are not naturally parallel and are thus limited in their scalability. This paper presents work on the Micro-Cluster Nearest Neighbour (MC-NN) classifier. MC-NN is based on an adaptive statistical data summary based on Micro-Clusters. MC-NN is very fast and adaptive to concept drift whilst maintaining the parallel properties of the base KNN classifier. Also MC-NN is competitive compared with existing data stream classifiers in terms of accuracy and speed.