78 resultados para data types and operators


Relevância:

100.00% 100.00%

Publicador:

Resumo:

We have discovered a novel approach of intrusion detection system using an intelligent data classifier based on a self organizing map (SOM). We have surveyed all other unsupervised intrusion detection methods, different alternative SOM based techniques and KDD winner IDS methods. This paper provides a robust designed and implemented intelligent data classifier technique based on a single large size (30x30) self organizing map (SOM) having the capability to detect all types of attacks given in the DARPA Archive 1999 the lowest false positive rate being 0.04 % and higher detection rate being 99.73% tested using full KDD data sets and 89.54% comparable detection rate and 0.18% lowest false positive rate tested using corrected data sets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The ability to display and inspect powder diffraction data quickly and efficiently is a central part of the data analysis process. Whilst many computer programs are capable of displaying powder data, their focus is typically on advanced operations such as structure solution or Rietveld refinement. This article describes a lightweight software package, Jpowder, whose focus is fast and convenient visualization and comparison of powder data sets in a variety of formats from computers with network access. Jpowder is written in Java and uses its associated Web Start technology to allow ‘single-click deployment’ from a web page, http://www.jpowder.org. Jpowder is open source, free and available for use by anyone.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Light Detection And Ranging (LIDAR) is an important modality in terrain and land surveying for many environmental, engineering and civil applications. This paper presents the framework for a recently developed unsupervised classification algorithm called Skewness Balancing for object and ground point separation in airborne LIDAR data. The main advantages of the algorithm are threshold-freedom and independence from LIDAR data format and resolution, while preserving object and terrain details. The framework for Skewness Balancing has been built in this contribution with a prediction model in which unknown LIDAR tiles can be categorised as “hilly” or “moderate” terrains. Accuracy assessment of the model is carried out using cross-validation with an overall accuracy of 95%. An extension to the algorithm is developed to address the overclassification issue for hilly terrain. For moderate terrain, the results show that from the classified tiles detached objects (buildings and vegetation) and attached objects (bridges and motorway junctions) are separated from bare earth (ground, roads and yards) which makes Skewness Balancing ideal to be integrated into geographic information system (GIS) software packages.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Bayesian Model Averaging (BMA) is used for testing for multiple break points in univariate series using conjugate normal-gamma priors. This approach can test for the number of structural breaks and produce posterior probabilities for a break at each point in time. Results are averaged over specifications including: stationary; stationary around trend and unit root models, each containing different types and number of breaks and different lag lengths. The procedures are used to test for structural breaks on 14 annual macroeconomic series and 11 natural resource price series. The results indicate that there are structural breaks in all of the natural resource series and most of the macroeconomic series. Many of the series had multiple breaks. Our findings regarding the existence of unit roots, having allowed for structural breaks in the data, are largely consistent with previous work.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There are a number of challenges associated with managing knowledge and information in construction organizations delivering major capital assets. These include the ever-increasing volumes of information, losing people because of retirement or competitors, the continuously changing nature of information, lack of methods on eliciting useful knowledge, development of new information technologies and changes in management and innovation practices. Existing tools and methodologies for valuing intangible assets in fields such as engineering, project management and financial, accounting, do not address fully the issues associated with the valuation of information and knowledge. Information is rarely recorded in a way that a document can be valued, when either produced or subsequently retrieved and re-used. In addition there is a wealth of tacit personal knowledge which, if codified into documentary information, may prove to be very valuable to operators of the finished asset or future designers. This paper addresses the problem of information overload and identifies the differences between data, information and knowledge. An exploratory study was conducted with a leading construction consultant examining three perspectives (business, project management and document management) by structured interviews and specifically how to value information in practical terms. Major challenges in information management are identified. An through-life Information Evaluation methodology (IEM) is presented to reduce information overload and to make the information more valuable in the future.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In a world of almost permanent and rapidly increasing electronic data availability, techniques of filtering, compressing, and interpreting this data to transform it into valuable and easily comprehensible information is of utmost importance. One key topic in this area is the capability to deduce future system behavior from a given data input. This book brings together for the first time the complete theory of data-based neurofuzzy modelling and the linguistic attributes of fuzzy logic in a single cohesive mathematical framework. After introducing the basic theory of data-based modelling, new concepts including extended additive and multiplicative submodels are developed and their extensions to state estimation and data fusion are derived. All these algorithms are illustrated with benchmark and real-life examples to demonstrate their efficiency. Chris Harris and his group have carried out pioneering work which has tied together the fields of neural networks and linguistic rule-based algortihms. This book is aimed at researchers and scientists in time series modeling, empirical data modeling, knowledge discovery, data mining, and data fusion.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

It is generally assumed that the variability of neuronal morphology has an important effect on both the connectivity and the activity of the nervous system, but this effect has not been thoroughly investigated. Neuroanatomical archives represent a crucial tool to explore structure–function relationships in the brain. We are developing computational tools to describe, generate, store and render large sets of three–dimensional neuronal structures in a format that is compact, quantitative, accurate and readily accessible to the neuroscientist. Single–cell neuroanatomy can be characterized quantitatively at several levels. In computer–aided neuronal tracing files, a dendritic tree is described as a series of cylinders, each represented by diameter, spatial coordinates and the connectivity to other cylinders in the tree. This ‘Cartesian’ description constitutes a completely accurate mapping of dendritic morphology but it bears little intuitive information for the neuroscientist. In contrast, a classical neuroanatomical analysis characterizes neuronal dendrites on the basis of the statistical distributions of morphological parameters, e.g. maximum branching order or bifurcation asymmetry. This description is intuitively more accessible, but it only yields information on the collective anatomy of a group of dendrites, i.e. it is not complete enough to provide a precise ‘blueprint’ of the original data. We are adopting a third, intermediate level of description, which consists of the algorithmic generation of neuronal structures within a certain morphological class based on a set of ‘fundamental’, measured parameters. This description is as intuitive as a classical neuroanatomical analysis (parameters have an intuitive interpretation), and as complete as a Cartesian file (the algorithms generate and display complete neurons). The advantages of the algorithmic description of neuronal structure are immense. If an algorithm can measure the values of a handful of parameters from an experimental database and generate virtual neurons whose anatomy is statistically indistinguishable from that of their real counterparts, a great deal of data compression and amplification can be achieved. Data compression results from the quantitative and complete description of thousands of neurons with a handful of statistical distributions of parameters. Data amplification is possible because, from a set of experimental neurons, many more virtual analogues can be generated. This approach could allow one, in principle, to create and store a neuroanatomical database containing data for an entire human brain in a personal computer. We are using two programs, L–NEURON and ARBORVITAE, to investigate systematically the potential of several different algorithms for the generation of virtual neurons. Using these programs, we have generated anatomically plausible virtual neurons for several morphological classes, including guinea pig cerebellar Purkinje cells and cat spinal cord motor neurons. These virtual neurons are stored in an online electronic archive of dendritic morphology. This process highlights the potential and the limitations of the ‘computational neuroanatomy’ strategy for neuroscience databases.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Svalgaard and Cliver (2010) recently reported a consensus between the various reconstructions of the heliospheric field over recent centuries. This is a significant development because, individually, each has uncertainties introduced by instrument calibration drifts, limited numbers of observatories, and the strength of the correlations employed. However, taken collectively, a consistent picture is emerging. We here show that this consensus extends to more data sets and methods than reported by Svalgaard and Cliver, including that used by Lockwood et al. (1999), when their algorithm is used to predict the heliospheric field rather than the open solar flux. One area where there is still some debate relates to the existence and meaning of a floor value to the heliospheric field. From cosmogenic isotope abundances, Steinhilber et al. (2010) have recently deduced that the near-Earth IMF at the end of the Maunder minimum was 1.80 ± 0.59 nT which is considerably lower than the revised floor of 4nT proposed by Svalgaard and Cliver. We here combine cosmogenic and geomagnetic reconstructions and modern observations (with allowance for the effect of solar wind speed and structure on the near-Earth data) to derive an estimate for the open solar flux of (0.48 ± 0.29) × 1014 Wb at the end of the Maunder minimum. By way of comparison, the largest and smallest annual means recorded by instruments in space between 1965 and 2010 are 5.75 × 1014 Wb and 1.37 × 1014 Wb, respectively, set in 1982 and 2009, and the maximum of the 11 year running means was 4.38 × 1014 Wb in 1986. Hence the average open solar flux during the Maunder minimum is found to have been 11% of its peak value during the recent grand solar maximum.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Virulence for bean and soybean is determined by effector genes in a plasmid-borne pathogenicity island (PAI) in race 7 strain 1449B of Pseudomonas syringae pv. phaseolicola. One of the effector genes, avrPphF, confers either pathogenicity, virulence, or avirulence depending on the plant host and is absent from races 2, 3, 4, 6, and 8 of this pathogen. Analysis of cosmid clones and comparison of DNA sequences showed that the absence of avrPphF from strain 1448A is due to deletion of a continuous 9.5-kb fragment. The remainder of the PAI is well conserved in strains 1448A and 1449B. The left junction of the deleted region consists of a chimeric transposable element generated from the fusion of homologs of IS1492 from Pseudomonas putida and IS1090 from Ralstonia eutropha. The borders of the deletion were conserved in 66 P. syringae pv. phaseolicola strains isolated in different countries and representing the five races lacking avrPphF. However, six strains isolated in Spain had a 10.5-kb deletion that extended 1 kb further from the right junction. The perfect conservation of the 28-nucleotide right repeat of the IS1090 homolog in the two deletion types and in the other 47 insertions of the IS1090 homolog in the 1448A genome strongly suggests that the avrPphF deletions were mediated by the activity of the chimeric mobile element. Our data strongly support a clonal origin for the races of P. syringae pv. phaseolicola lacking avrPphF.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper re-examines the relative importance of sector and regional effects in determining property returns. Using the largest property database currently available in the world, we decompose the returns on individual properties into a national effect, common to all properties, and a number of sector and regional factors. However, unlike previous studies, we categorise the individual property data into an ever-increasing number of property-types and regions, from a simple 3-by-3 classification, up to a 10 by 63 sector/region classification. In this way we can test the impact that a finer classification has on the sector and regional effects. We confirm the earlier findings of previous studies that sector-specific effects have a greater influence on property returns than regional effects. We also find that the impact of the sector effect is robust across different classifications of sectors and regions. Nonetheless, the more refined sector and regional partitions uncover some interesting sector and regional differences, which were obscured in previous studies. All of which has important implications for property portfolio construction and analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As part of a large European coastal operational oceanography project (ECOOP), we have developed a web portal for the display and comparison of model and in situ marine data. The distributed model and in situ datasets are accessed via an Open Geospatial Consortium Web Map Service (WMS) and Web Feature Service (WFS) respectively. These services were developed independently and readily integrated for the purposes of the ECOOP project, illustrating the ease of interoperability resulting from adherence to international standards. The key feature of the portal is the ability to display co-plotted timeseries of the in situ and model data and the quantification of misfits between the two. By using standards-based web technology we allow the user to quickly and easily explore over twenty model data feeds and compare these with dozens of in situ data feeds without being concerned with the low level details of differing file formats or the physical location of the data. Scientific and operational benefits to this work include model validation, quality control of observations, data assimilation and decision support in near real time. In these areas it is essential to be able to bring different data streams together from often disparate locations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A dynamic, mechanistic model of enteric fermentation was used to investigate the effect of type and quality of grass forage, dry matter intake (DMI) and proportion of concentrates in dietary dry matter (DM) on variation in methane (CH(4)) emission from enteric fermentation in dairy cows. The model represents substrate degradation and microbial fermentation processes in rumen and hindgut and, in particular, the effects of type of substrate fermented and of pH oil the production of individual volatile fatty acids and CH, as end-products of fermentation. Effects of type and quality of fresh and ensiled grass were evaluated by distinguishing two N fertilization rates of grassland and two stages of grass maturity. Simulation results indicated a strong impact of the amount and type of grass consumed oil CH(4) emission, with a maximum difference (across all forage types and all levels of DM 1) of 49 and 77% in g CH(4)/kg fat and protein corrected milk (FCM) for diets with a proportion of concentrates in dietary DM of 0.1 and 0.4, respectively (values ranging from 10.2 to 19.5 g CH(4)/kg FCM). The lowest emission was established for early Cut, high fertilized grass silage (GS) and high fertilized grass herbage (GH). The highest emission was found for late cut, low-fertilized GS. The N fertilization rate had the largest impact, followed by stage of grass maturity at harvesting and by the distinction between GH and GS. Emission expressed in g CH(4)/kg FCM declined oil average 14% with an increase of DMI from 14 to 18 kg/day for grass forage diets with a proportion of concentrates of 0.1, and on average 29% with an increase of DMI from 14 to 23 kg/day for diets with a proportion of concentrates of 0.4. Simulation results indicated that a high proportion of concentrates in dietary DM may lead to a further reduction of CH, emission per kg FCM mainly as a result of a higher DM I and milk yield, in comparison to low concentrate diets. Simulation results were evaluated against independent data obtained at three different laboratories in indirect calorimetry trials with COWS consuming GH mainly. The model predicted the average of observed values reasonably, but systematic deviations remained between individual laboratories and root mean squared prediction error was a proportion of 0.12 of the observed mean. Both observed and predicted emission expressed in g CH(4)/kg DM intake decreased upon an increase in dietary N:organic matter (OM) ratio. The model reproduced reasonably well the variation in measured CH, emission in cattle sheds oil Dutch dairy farms and indicated that oil average a fraction of 0.28 of the total emissions must have originated from manure under these circumstances.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We describe a model-data fusion (MDF) inter-comparison project (REFLEX), which compared various algorithms for estimating carbon (C) model parameters consistent with both measured carbon fluxes and states and a simple C model. Participants were provided with the model and with both synthetic net ecosystem exchange (NEE) of CO2 and leaf area index (LAI) data, generated from the model with added noise, and observed NEE and LAI data from two eddy covariance sites. Participants endeavoured to estimate model parameters and states consistent with the model for all cases over the two years for which data were provided, and generate predictions for one additional year without observations. Nine participants contributed results using Metropolis algorithms, Kalman filters and a genetic algorithm. For the synthetic data case, parameter estimates compared well with the true values. The results of the analyses indicated that parameters linked directly to gross primary production (GPP) and ecosystem respiration, such as those related to foliage allocation and turnover, or temperature sensitivity of heterotrophic respiration, were best constrained and characterised. Poorly estimated parameters were those related to the allocation to and turnover of fine root/wood pools. Estimates of confidence intervals varied among algorithms, but several algorithms successfully located the true values of annual fluxes from synthetic experiments within relatively narrow 90% confidence intervals, achieving >80% success rate and mean NEE confidence intervals <110 gC m−2 year−1 for the synthetic case. Annual C flux estimates generated by participants generally agreed with gap-filling approaches using half-hourly data. The estimation of ecosystem respiration and GPP through MDF agreed well with outputs from partitioning studies using half-hourly data. Confidence limits on annual NEE increased by an average of 88% in the prediction year compared to the previous year, when data were available. Confidence intervals on annual NEE increased by 30% when observed data were used instead of synthetic data, reflecting and quantifying the addition of model error. Finally, our analyses indicated that incorporating additional constraints, using data on C pools (wood, soil and fine roots) would help to reduce uncertainties for model parameters poorly served by eddy covariance data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data assimilation refers to the problem of finding trajectories of a prescribed dynamical model in such a way that the output of the model (usually some function of the model states) follows a given time series of observations. Typically though, these two requirements cannot both be met at the same time–tracking the observations is not possible without the trajectory deviating from the proposed model equations, while adherence to the model requires deviations from the observations. Thus, data assimilation faces a trade-off. In this contribution, the sensitivity of the data assimilation with respect to perturbations in the observations is identified as the parameter which controls the trade-off. A relation between the sensitivity and the out-of-sample error is established, which allows the latter to be calculated under operational conditions. A minimum out-of-sample error is proposed as a criterion to set an appropriate sensitivity and to settle the discussed trade-off. Two approaches to data assimilation are considered, namely variational data assimilation and Newtonian nudging, also known as synchronization. Numerical examples demonstrate the feasibility of the approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The P-found protein folding and unfolding simulation repository is designed to allow scientists to perform data mining and other analyses across large, distributed simulation data sets. There are two storage components in P-found: a primary repository of simulation data that is used to populate the second component, and a data warehouse that contains important molecular properties. These properties may be used for data mining studies. Here we demonstrate how grid technologies can support multiple, distributed P-found installations. In particular, we look at two aspects: firstly, how grid data management technologies can be used to access the distributed data warehouses; and secondly, how the grid can be used to transfer analysis programs to the primary repositories — this is an important and challenging aspect of P-found, due to the large data volumes involved and the desire of scientists to maintain control of their own data. The grid technologies we are developing with the P-found system will allow new large data sets of protein folding simulations to be accessed and analysed in novel ways, with significant potential for enabling scientific discovery.