16 resultados para Software packages selection
em CentAUR: Central Archive University of Reading - UK
Resumo:
Recently major processor manufacturers have announced a dramatic shift in their paradigm to increase computing power over the coming years. Instead of focusing on faster clock speeds and more powerful single core CPUs, the trend clearly goes towards multi core systems. This will also result in a paradigm shift for the development of algorithms for computationally expensive tasks, such as data mining applications. Obviously, work on parallel algorithms is not new per se but concentrated efforts in the many application domains are still missing. Multi-core systems, but also clusters of workstations and even large-scale distributed computing infrastructures provide new opportunities and pose new challenges for the design of parallel and distributed algorithms. Since data mining and machine learning systems rely on high performance computing systems, research on the corresponding algorithms must be on the forefront of parallel algorithm research in order to keep pushing data mining and machine learning applications to be more powerful and, especially for the former, interactive. To bring together researchers and practitioners working in this exciting field, a workshop on parallel data mining was organized as part of PKDD/ECML 2006 (Berlin, Germany). The six contributions selected for the program describe various aspects of data mining and machine learning approaches featuring low to high degrees of parallelism: The first contribution focuses the classic problem of distributed association rule mining and focuses on communication efficiency to improve the state of the art. After this a parallelization technique for speeding up decision tree construction by means of thread-level parallelism for shared memory systems is presented. The next paper discusses the design of a parallel approach for dis- tributed memory systems of the frequent subgraphs mining problem. This approach is based on a hierarchical communication topology to solve issues related to multi-domain computational envi- ronments. The forth paper describes the combined use and the customization of software packages to facilitate a top down parallelism in the tuning of Support Vector Machines (SVM) and the next contribution presents an interesting idea concerning parallel training of Conditional Random Fields (CRFs) and motivates their use in labeling sequential data. The last contribution finally focuses on very efficient feature selection. It describes a parallel algorithm for feature selection from random subsets. Selecting the papers included in this volume would not have been possible without the help of an international Program Committee that has provided detailed reviews for each paper. We would like to also thank Matthew Otey who helped with publicity for the workshop.
Resumo:
This document provides guidelines for fish stock assessment and fishery management using the software tools and other outputs developed by the United Kingdom's Department for International Development's Fisheries Management Science Programme (FMSP) from 1992 to 2004. It explains some key elements of the precautionary approach to fisheries management and outlines a range of alternative stock assessment approaches that can provide the information needed for such precautionary management. Four FMSP software tools, LFDA (Length Frequency Data Analysis), CEDA (Catch Effort Data Analysis), YIELD and ParFish (Participatory Fisheries Stock Assessment), are described with which intermediary parameters, performance indicators and reference points may be estimated. The document also contains examples of the assessment and management of multispecies fisheries, the use of Bayesian methodologies, the use of empirical modelling approaches for estimating yields and in analysing fishery systems, and the assessment and management of inland fisheries. It also provides a comparison of length- and age-based stock assessment methods. A CD-ROM with the FMSP software packages CEDA, LFDA, YIELD and ParFish is included.
Resumo:
This article illustrates that not all statistical software packages are correctly calculating a p-value for the classical F test comparison of two independent Normal variances. This is illustrated with a simple example, and the reasons why are discussed. Eight different software packages are considered.
Resumo:
This paper considers methods for testing for superiority or non-inferiority in active-control trials with binary data, when the relative treatment effect is expressed as an odds ratio. Three asymptotic tests for the log-odds ratio based on the unconditional binary likelihood are presented, namely the likelihood ratio, Wald and score tests. All three tests can be implemented straightforwardly in standard statistical software packages, as can the corresponding confidence intervals. Simulations indicate that the three alternatives are similar in terms of the Type I error, with values close to the nominal level. However, when the non-inferiority margin becomes large, the score test slightly exceeds the nominal level. In general, the highest power is obtained from the score test, although all three tests are similar and the observed differences in power are not of practical importance. Copyright (C) 2007 John Wiley & Sons, Ltd.
Resumo:
Once unit-cell dimensions have been determined from a powder diffraction data set and therefore the crystal system is known (e.g. orthorhombic), the method presented by Markvardsen, David, Johnson & Shankland [Acta Cryst. (2001), A57, 47-54] can be used to generate a table ranking the extinction symbols of the given crystal system according to probability. Markvardsen et al. tested a computer program (ExtSym) implementing the method against Pawley refinement outputs generated using the TF12LS program [David, Ibberson & Matthewman (1992). Report RAL-92-032. Rutherford Appleton Laboratory, Chilton, Didcot, Oxon, UK]. Here, it is shown that ExtSym can be used successfully with many well known powder diffraction analysis packages, namely DASH [David, Shankland, van de Streek, Pidcock, Motherwell & Cole (2006). J. Appl. Cryst. 39, 910-915], FullProf [Rodriguez-Carvajal (1993). Physica B, 192, 55-69], GSAS [Larson & Von Dreele (1994). Report LAUR 86-748. Los Alamos National Laboratory, New Mexico, USA], PRODD [Wright (2004). Z. Kristallogr. 219, 1-11] and TOPAS [Coelho (2003). Bruker AXS GmbH, Karlsruhe, Germany]. In addition, a precise description of the optimal input for ExtSym is given to enable other software packages to interface with ExtSym and to allow the improvement/modification of existing interfacing scripts. ExtSym takes as input the powder data in the form of integrated intensities and error estimates for these intensities. The output returned by ExtSym is demonstrated to be strongly dependent on the accuracy of these error estimates and the reason for this is explained. ExtSym is tested against a wide range of data sets, confirming the algorithm to be very successful at ranking the published extinction symbol as the most likely. (C) 2008 International Union of Crystallography Printed in Singapore - all rights reserved.
Resumo:
The main activity carried out by the geophysicist when interpreting seismic data, in terms of both importance and time spent is tracking (or picking) seismic events. in practice, this activity turns out to be rather challenging, particularly when the targeted event is interrupted by discontinuities such as geological faults or exhibits lateral changes in seismic character. In recent years, several automated schemes, known as auto-trackers, have been developed to assist the interpreter in this tedious and time-consuming task. The automatic tracking tool available in modem interpretation software packages often employs artificial neural networks (ANN's) to identify seismic picks belonging to target events through a pattern recognition process. The ability of ANNs to track horizons across discontinuities largely depends on how reliably data patterns characterise these horizons. While seismic attributes are commonly used to characterise amplitude peaks forming a seismic horizon, some researchers in the field claim that inherent seismic information is lost in the attribute extraction process and advocate instead the use of raw data (amplitude samples). This paper investigates the performance of ANNs using either characterisation methods, and demonstrates how the complementarity of both seismic attributes and raw data can be exploited in conjunction with other geological information in a fuzzy inference system (FIS) to achieve an enhanced auto-tracking performance.
Resumo:
Light Detection And Ranging (LIDAR) is an important modality in terrain and land surveying for many environmental, engineering and civil applications. This paper presents the framework for a recently developed unsupervised classification algorithm called Skewness Balancing for object and ground point separation in airborne LIDAR data. The main advantages of the algorithm are threshold-freedom and independence from LIDAR data format and resolution, while preserving object and terrain details. The framework for Skewness Balancing has been built in this contribution with a prediction model in which unknown LIDAR tiles can be categorised as “hilly” or “moderate” terrains. Accuracy assessment of the model is carried out using cross-validation with an overall accuracy of 95%. An extension to the algorithm is developed to address the overclassification issue for hilly terrain. For moderate terrain, the results show that from the classified tiles detached objects (buildings and vegetation) and attached objects (bridges and motorway junctions) are separated from bare earth (ground, roads and yards) which makes Skewness Balancing ideal to be integrated into geographic information system (GIS) software packages.
Resumo:
What happens when digital coordination practices are introduced into the institutionalized setting of an engineering project? This question is addressed through an interpretive study that examines how a shared digital model becomes used in the late design stages of a major station refurbishment project. The paper contributes by mobilizing the idea of ‘hybrid practices’ to understand the diverse patterns of activity that emerge to manage digital coordination of design. It articulates how engineering and architecture professions develop different relationships with the shared model; the design team negotiates paper-based practices across organizational boundaries; and diverse practitioners probe the potential and limitations of the digital infrastructure. While different software packages and tools have become linked together into an integrated digital infrastructure, these emerging hybrid practices contrast with the interactions anticipated in practice and policy guidance and presenting new opportunities and challenges for managing project delivery. The study has implications for researchers working in the growing field of empirical work on engineering project organizations as it shows the importance of considering, and suggests new ways to theorise, the introduction of digital coordination practices into these institutionalized settings.
Resumo:
Global temperatures are expected to rise by between 1.1 and 6.4oC this century, depending, to a large extent, on the amount of carbon we emit to the atmosphere from now onwards. This warming is expected to have very negative effects on many peoples and ecosystems and, therefore, minimising our carbon emissions is a priority. Buildings are estimated to be responsible for around 50% of carbon emissions in the UK. Potential reductions involve both operational emissions, produced during use, and embodied emissions, produced during manufacture of materials and components, and during construction, refurbishments and demolition. To date the major effort has focused on reducing the, apparently, larger operational element, which is more readily quantifiable and reduction measures are relatively straightforward to identify and implement. Various studies have compared the magnitude of embodied and operational emissions, but have shown considerable variation in the relative values. This illustrates the difficulties in quantifying embodied, as it requires a detailed knowledge of the processes involved in the different life cycle phases, and requires the use of consistent system boundaries. However, other studies have established the interaction between operational and embodied, which demonstrates the importance of considering both elements together in order to maximise potential reductions. This is borne out in statements from both the Intergovernmental Panel on Climate Change and The Low Carbon Construction Innovation and Growth Team of the UK Government. In terms of meeting the 2020 and 2050 timeframes for carbon reductions it appears to be equally, if not more, important to consider early embodied carbon reductions, rather than just future operational reductions. Future decarbonisation of energy supply and more efficient lighting and M&E equipment installed in future refits is likely to significantly reduce operational emissions, lending further weight to this argument. A method of discounting to evaluate the present value of future carbon emissions would allow more realistic comparisons to be made on the relative importance of the embodied and operational elements. This paper describes the results of case studies on carbon emissions over the whole lifecycle of three buildings in the UK, compares four available software packages for determining embodied carbon and suggests a method of carbon discounting to obtain present values for future emissions. These form the initial stages of a research project aimed at producing information on embodied carbon for different types of building, components and forms of construction, in a simplified form, which can be readily used by building designers in optimising building design in terms of minimising overall carbon emissions. Keywords: Embodied carbon; carbon emission; building; operational carbon.
Resumo:
This paper reviews nine software packages with particular reference to their GARCH model estimation accuracy when judged against a respected benchmark. We consider the numerical consistency of GARCH and EGARCH estimation and forecasting. Our results have a number of implications for published research and future software development. Finally, we argue that the establishment of benchmarks for other standard non-linear models is long overdue.
Resumo:
The Environmental Data Abstraction Library provides a modular data management library for bringing new and diverse datatypes together for visualisation within numerous software packages, including the ncWMS viewing service, which already has very wide international uptake. The structure of EDAL is presented along with examples of its use to compare satellite, model and in situ data types within the same visualisation framework. We emphasize the value of this capability for cross calibration of datasets and evaluation of model products against observations, including preparation for data assimilation.
Resumo:
Growing pot poinsettia and similar crops involves careful crop monitoring and management to ensure that height specifications are met. Graphical tracking represents a target driven approach to decision support with simple interpretation. HDC (Horticultural Development Council) Poinsettia Tracker implements a graphical track based on the Generalised Logistic Curve, similar to that of other tracking packages. Any set of curve parameters can be used to track crop progress. However, graphical tracks must be expected to be site and cultivar specific. By providing a simple Curve fitting function, growers can easily develop their own site and variety specific ideal tracks based on past records with increasing quality as more seasons' data are added. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
High-density oligonucleotide (oligo) arrays are a powerful tool for transcript profiling. Arrays based on GeneChip® technology are amongst the most widely used, although GeneChip® arrays are currently available for only a small number of plant and animal species. Thus, we have developed a method to improve the sensitivity of high-density oligonucleotide arrays when applied to heterologous species and tested the method by analysing the transcriptome of Brassica oleracea L., a species for which no GeneChip® array is available, using a GeneChip® array designed for Arabidopsis thaliana (L.) Heynh. Genomic DNA from B. oleracea was labelled and hybridised to the ATH1-121501 GeneChip® array. Arabidopsis thaliana probe-pairs that hybridised to the B. oleracea genomic DNA on the basis of the perfect-match (PM) probe signal were then selected for subsequent B. oleracea transcriptome analysis using a .cel file parser script to generate probe mask files. The transcriptional response of B. oleracea to a mineral nutrient (phosphorus; P) stress was quantified using probe mask files generated for a wide range of gDNA hybridisation intensity thresholds. An example probe mask file generated with a gDNA hybridisation intensity threshold of 400 removed > 68 % of the available PM probes from the analysis but retained >96 % of available A. thaliana probe-sets. Ninety-nine of these genes were then identified as significantly regulated under P stress in B. oleracea, including the homologues of P stress responsive genes in A. thaliana. Increasing the gDNA hybridisation intensity thresholds up to 500 for probe-selection increased the sensitivity of the GeneChip® array to detect regulation of gene expression in B. oleracea under P stress by up to 13-fold. Our open-source software to create probe mask files is freely available http://affymetrix.arabidopsis.info/xspecies/ webcite and may be used to facilitate transcriptomic analyses of a wide range of plant and animal species in the absence of custom arrays.
Resumo:
A model based on graph isomorphisms is used to formalize software evolution. Step by step we narrow the search space by an informed selection of the attributes based on the current state-of-the-art in software engineering and generate a seed solution. We then traverse the resulting space using graph isomorphisms and other set operations over the vertex sets. The new solutions will preserve the desired attributes. The goal of defining an isomorphism based search mechanism is to construct predictors of evolution that can facilitate the automation of ’software factory’ paradigm. The model allows for automation via software tools implementing the concepts.