790 resultados para Datasets
Resumo:
In this study, we systematically compare a wide range of observational and numerical precipitation datasets for Central Asia. Data considered include two re-analyses, three datasets based on direct observations, and the output of a regional climate model simulation driven by a global re-analysis. These are validated and intercompared with respect to their ability to represent the Central Asian precipitation climate. In each of the datasets, we consider the mean spatial distribution and the seasonal cycle of precipitation, the amplitude of interannual variability, the representation of individual yearly anomalies, the precipitation sensitivity (i.e. the response to wet and dry conditions), and the temporal homogeneity of precipitation. Additionally, we carried out part of these analyses for datasets available in real time. The mutual agreement between the observations is used as an indication of how far these data can be used for validating precipitation data from other sources. In particular, we show that the observations usually agree qualitatively on anomalies in individual years while it is not always possible to use them for the quantitative validation of the amplitude of interannual variability. The regional climate model is capable of improving the spatial distribution of precipitation. At the same time, it strongly underestimates summer precipitation and its variability, while interannual variations are well represented during the other seasons, in particular in the Central Asian mountains during winter and spring
Resumo:
Understanding the response of the South Asian monsoon (SAM) system to global climate change is an interesting scientific problem that has enormous implications from the societal viewpoint. While the CMIP3 projections of future changes in monsoon precipitation used in the IPCC AR4 show major uncertainties, there is a growing recognition that the rapid increase of moisture in a warming climate can potentially enhance the stability of the large-scale tropical circulations. In this work, the authors have examined the stability of the SAM circulation based on diagnostic analysis of climate datasets over the past half century; and addressed the issue of likely future changes in the SAM in response to global warming using simulations from an ultrahigh resolution (20 km) global climate model. Additional sensitivity experiments using a simplified atmospheric model have been presented to supplement the overall findings. The results here suggest that the intensity of the boreal summer monsoon overturning circulation and the associated southwesterly monsoon flow have significantly weakened during the past 50-years. The weakening trend of the monsoon circulation is further corroborated by a significant decrease in the frequency of moderate-to-heavy monsoon rainfall days and upward vertical velocities particularly over the narrow mountain ranges of the Western Ghats. Based on simulations from the 20-km ultra high-resolution model, it is argued that a stabilization (weakening) of the summer monsoon Hadley-type circulation in response to global warming can potentially lead to a weakened large-scale monsoon flow thereby resulting in weaker vertical velocities and reduced orographic precipitation over the narrow Western Ghat mountains by the end of the twenty-first century. Supplementary experiments using a simplified atmospheric model indicate a high sensitivity of the large-scale monsoon circulation to atmospheric stability in comparison with the effects of condensational heating.
Resumo:
The Self-Organizing Map (SOM) is a popular unsupervised neural network able to provide effective clustering and data visualization for multidimensional input datasets. In this paper, we present an application of the simulated annealing procedure to the SOM learning algorithm with the aim to obtain a fast learning and better performances in terms of quantization error. The proposed learning algorithm is called Fast Learning Self-Organized Map, and it does not affect the easiness of the basic learning algorithm of the standard SOM. The proposed learning algorithm also improves the quality of resulting maps by providing better clustering quality and topology preservation of input multi-dimensional data. Several experiments are used to compare the proposed approach with the original algorithm and some of its modification and speed-up techniques.
Resumo:
At a time when cities are competing with one another to attract or retain jobs within a globalizing economy, city governments are providing an array of financial incentives to stimulate job growth and retain existing jobs, particularly in high cost locations. This paper provides the first systematic and comprehensive analysis of datasets on economic development incentives in New York City over the last fifteen years. The evidence on job retention and creation is mixed. Although many companies do not meet their agreed-upon job targets in absolute terms, the evidence suggests that companies receiving subsidies outperform their respective industries in terms of employment growth, that is, the grow more, or decline less. We emphasize that this finding is difficult to interpret, since firms receiving incentives may not be representative of the industry as a whole. In other words, their above-average performance may simply reflect the fact that the Economic Development Corporation (EDC) selects economically promising companies within manufacturing (or other industries) when granting incentives. At the same time, it is also possible that receiving incentives helps these companies to become stronger.
Resumo:
This paper publishes the results of the 1996 study, which repeats a cross-section analysis of around 125 City of London office buildings, and examines the longitudinal data contributed by a sample of 56 unrefurbished properties common to the 1986 and 1996 City of London datasets. An estimate of the average rate of rental and capital value depreciation is made; the effect of age is shown not to be straight-line; and the causes if depreciation are measured. The results are compared with the 1986 City of London findings.
Assessing and understanding the impact of stratospheric dynamics and variability on the earth system
Resumo:
Advances in weather and climate research have demonstrated the role of the stratosphere in the Earth system across a wide range of temporal and spatial scales. Stratospheric ozone loss has been identified as a key driver of Southern Hemisphere tropospheric circulation trends, affecting ocean currents and carbon uptake, sea ice, and possibly even the Antarctic ice sheets. Stratospheric variability has also been shown to affect short term and seasonal forecasts, connecting the tropics and midlatitudes and guiding storm track dynamics. The two-way interactions between the stratosphere and the Earth system have motivated the World Climate Research Programme's (WCRP) Stratospheric Processes and Their Role in Climate (SPARC) DynVar activity to investigate the impact of stratospheric dynamics and variability on climate. This assessment will be made possible by two new multi-model datasets. First, roughly 10 models with a well resolved stratosphere are participating in the Coupled Model Intercomparison Project 5 (CMIP5), providing the first multi-model ensemble of climate simulations coupled from the stratopause to the sea floor. Second, the Stratosphere Historical Forecasting Project (SHFP) of WCRP's Climate Variability and predictability (CLIVAR) program is forming a multi-model set of seasonal hindcasts with stratosphere resolving models, revealing the impact of both stratospheric initial conditions and dynamics on intraseasonal prediction. The CMIP5 and SHFP model-data sets will offer an unprecedented opportunity to understand the role of the stratosphere in the natural and forced variability of the Earth system and to determine whether incorporating knowledge of the middle atmosphere improves seasonal forecasts and climate projections. Capsule New modeling efforts will provide unprecedented opportunities to harness our knowledge of the stratosphere to improve weather and climate prediction.
Resumo:
The estimation of prediction quality is important because without quality measures, it is difficult to determine the usefulness of a prediction. Currently, methods for ligand binding site residue predictions are assessed in the function prediction category of the biennial Critical Assessment of Techniques for Protein Structure Prediction (CASP) experiment, utilizing the Matthews Correlation Coefficient (MCC) and Binding-site Distance Test (BDT) metrics. However, the assessment of ligand binding site predictions using such metrics requires the availability of solved structures with bound ligands. Thus, we have developed a ligand binding site quality assessment tool, FunFOLDQA, which utilizes protein feature analysis to predict ligand binding site quality prior to the experimental solution of the protein structures and their ligand interactions. The FunFOLDQA feature scores were combined using: simple linear combinations, multiple linear regression and a neural network. The neural network produced significantly better results for correlations to both the MCC and BDT scores, according to Kendall’s τ, Spearman’s ρ and Pearson’s r correlation coefficients, when tested on both the CASP8 and CASP9 datasets. The neural network also produced the largest Area Under the Curve score (AUC) when Receiver Operator Characteristic (ROC) analysis was undertaken for the CASP8 dataset. Furthermore, the FunFOLDQA algorithm incorporating the neural network, is shown to add value to FunFOLD, when both methods are employed in combination. This results in a statistically significant improvement over all of the best server methods, the FunFOLD method (6.43%), and one of the top manual groups (FN293) tested on the CASP8 dataset. The FunFOLDQA method was also found to be competitive with the top server methods when tested on the CASP9 dataset. To the best of our knowledge, FunFOLDQA is the first attempt to develop a method that can be used to assess ligand binding site prediction quality, in the absence of experimental data.
Resumo:
Current variability of precipitation (P) and its response to surface temperature (T) are analysed using coupled(CMIP5) and atmosphere-only (AMIP5) climate model simulations and compared with observational estimates. There is striking agreement between Global Precipitation Climatology Project (GPCP) observed and AMIP5 simulated P anomalies over land both globally and in the tropics suggesting that prescribed sea surface temperature and realistic radiative forcings are sufficient for simulating the interannual variability in continental P. Differences between the observed and simulated P variability over the ocean, originate primarily from the wet tropical regions, in particular the western Pacific, but are reduced slightly after 1995. All datasets show positive responses of P to T globally of around 2 %/K for simulations and 3-4 %/K in GPCP observations but model responses over the tropical oceans are around 3 times smaller than GPCP over the period 1988-2005. The observed anticorrelation between land and ocean P, linked with El Niño Southern Oscillation, is captured by the simulations. All data sets over the tropical ocean show a tendency for wet regions to become wetter and dry regions drier with warming. Over the wet region (75% precipitation percentile), the precipitation response is ~13-15%/K for GPCP and ~5%/K for models while trends in P are 2.4%/decade for GPCP, 0.6% /decade for CMIP5 and 0.9%/decade for AMIP5 suggesting that models are underestimating the precipitation responses or a deficiency exists in the satellite datasets.
Resumo:
Reliable techniques for screening large numbers of plants for root traits are still being developed, but include aeroponic, hydroponic and agar plate systems. Coupled with digital cameras and image analysis software, these systems permit the rapid measurement of root numbers, length and diameter in moderate ( typically <1000) numbers of plants. Usually such systems are employed with relatively small seedlings, and information is recorded in 2D. Recent developments in X-ray microtomography have facilitated 3D non-invasive measurement of small root systems grown in solid media, allowing angular distributions to be obtained in addition to numbers and length. However, because of the time taken to scan samples, only a small number can be screened (typically<10 per day, not including analysis time of the large spatial datasets generated) and, depending on sample size, limited resolution may mean that fine roots remain unresolved. Although agar plates allow differences between lines and genotypes to be discerned in young seedlings, the rank order may not be the same when the same materials are grown in solid media. For example, root length of dwarfing wheat ( Triticum aestivum L.) lines grown on agar plates was increased by similar to 40% relative to wild-type and semi-dwarfing lines, but in a sandy loam soil under well watered conditions it was decreased by 24-33%. Such differences in ranking suggest that significant soil environment-genotype interactions are occurring. Developments in instruments and software mean that a combination of high-throughput simple screens and more in-depth examination of root-soil interactions is becoming viable.
Resumo:
The first part of this review examines what is meant by ‘urban land and property’ (ULP) and looks at the background of ULP in the light of trends in UK urban areas over the past 50 years. Key conceptual approaches to the ULP ‘ownership issue’ are identified, together with the constraints to empirical analysis, which include a lack of data and patchy and inconsistent datasets. Three main components of ULP ownership in the UK are then examined using published data on commercial property, residential property and urban land, including ‘previously developed land’ (PDL) and ‘development land, covering both the private and public sectors. The review examines past trends in ULP ownership patterns in these sectors within the UK, and the key drivers which have created the present day patterns of ULP ownership. It concludes by identifying possible future trends in ULP ownership over the next 50 years to 2060 in the three main ULP sectors.
Resumo:
In a world where massive amounts of data are recorded on a large scale we need data mining technologies to gain knowledge from the data in a reasonable time. The Top Down Induction of Decision Trees (TDIDT) algorithm is a very widely used technology to predict the classification of newly recorded data. However alternative technologies have been derived that often produce better rules but do not scale well on large datasets. Such an alternative to TDIDT is the PrismTCS algorithm. PrismTCS performs particularly well on noisy data but does not scale well on large datasets. In this paper we introduce Prism and investigate its scaling behaviour. We describe how we improved the scalability of the serial version of Prism and investigate its limitations. We then describe our work to overcome these limitations by developing a framework to parallelise algorithms of the Prism family and similar algorithms. We also present the scale up results of a first prototype implementation.
Resumo:
In a world where data is captured on a large scale the major challenge for data mining algorithms is to be able to scale up to large datasets. There are two main approaches to inducing classification rules, one is the divide and conquer approach, also known as the top down induction of decision trees; the other approach is called the separate and conquer approach. A considerable amount of work has been done on scaling up the divide and conquer approach. However, very little work has been conducted on scaling up the separate and conquer approach.In this work we describe a parallel framework that allows the parallelisation of a certain family of separate and conquer algorithms, the Prism family. Parallelisation helps the Prism family of algorithms to harvest additional computer resources in a network of computers in order to make the induction of classification rules scale better on large datasets. Our framework also incorporates a pre-pruning facility for parallel Prism algorithms.
Resumo:
Top Down Induction of Decision Trees (TDIDT) is the most commonly used method of constructing a model from a dataset in the form of classification rules to classify previously unseen data. Alternative algorithms have been developed such as the Prism algorithm. Prism constructs modular rules which produce qualitatively better rules than rules induced by TDIDT. However, along with the increasing size of databases, many existing rule learning algorithms have proved to be computational expensive on large datasets. To tackle the problem of scalability, parallel classification rule induction algorithms have been introduced. As TDIDT is the most popular classifier, even though there are strongly competitive alternative algorithms, most parallel approaches to inducing classification rules are based on TDIDT. In this paper we describe work on a distributed classifier that induces classification rules in a parallel manner based on Prism.
Resumo:
Inducing rules from very large datasets is one of the most challenging areas in data mining. Several approaches exist to scaling up classification rule induction to large datasets, namely data reduction and the parallelisation of classification rule induction algorithms. In the area of parallelisation of classification rule induction algorithms most of the work has been concentrated on the Top Down Induction of Decision Trees (TDIDT), also known as the ‘divide and conquer’ approach. However powerful alternative algorithms exist that induce modular rules. Most of these alternative algorithms follow the ‘separate and conquer’ approach of inducing rules, but very little work has been done to make the ‘separate and conquer’ approach scale better on large training data. This paper examines the potential of the recently developed blackboard based J-PMCRI methodology for parallelising modular classification rule induction algorithms that follow the ‘separate and conquer’ approach. A concrete implementation of the methodology is evaluated empirically on very large datasets.
Resumo:
The Prism family of algorithms induces modular classification rules which, in contrast to decision tree induction algorithms, do not necessarily fit together into a decision tree structure. Classifiers induced by Prism algorithms achieve a comparable accuracy compared with decision trees and in some cases even outperform decision trees. Both kinds of algorithms tend to overfit on large and noisy datasets and this has led to the development of pruning methods. Pruning methods use various metrics to truncate decision trees or to eliminate whole rules or single rule terms from a Prism rule set. For decision trees many pre-pruning and postpruning methods exist, however for Prism algorithms only one pre-pruning method has been developed, J-pruning. Recent work with Prism algorithms examined J-pruning in the context of very large datasets and found that the current method does not use its full potential. This paper revisits the J-pruning method for the Prism family of algorithms and develops a new pruning method Jmax-pruning, discusses it in theoretical terms and evaluates it empirically.