88 resultados para Software Package Data Exchange (SPDX)
An empirical study of process-related attributes in segmented software cost-estimation relationships
Resumo:
Parametric software effort estimation models consisting on a single mathematical relationship suffer from poor adjustment and predictive characteristics in cases in which the historical database considered contains data coming from projects of a heterogeneous nature. The segmentation of the input domain according to clusters obtained from the database of historical projects serves as a tool for more realistic models that use several local estimation relationships. Nonetheless, it may be hypothesized that using clustering algorithms without previous consideration of the influence of well-known project attributes misses the opportunity to obtain more realistic segments. In this paper, we describe the results of an empirical study using the ISBSG-8 database and the EM clustering algorithm that studies the influence of the consideration of two process-related attributes as drivers of the clustering process: the use of engineering methodologies and the use of CASE tools. The results provide evidence that such consideration conditions significantly the final model obtained, even though the resulting predictive quality is of a similar magnitude.
Resumo:
Population size estimation with discrete or nonparametric mixture models is considered, and reliable ways of construction of the nonparametric mixture model estimator are reviewed and set into perspective. Construction of the maximum likelihood estimator of the mixing distribution is done for any number of components up to the global nonparametric maximum likelihood bound using the EM algorithm. In addition, the estimators of Chao and Zelterman are considered with some generalisations of Zelterman’s estimator. All computations are done with CAMCR, a special software developed for population size estimation with mixture models. Several examples and data sets are discussed and the estimators illustrated. Problems using the mixture model-based estimators are highlighted.
Resumo:
Light Detection And Ranging (LIDAR) is an important modality in terrain and land surveying for many environmental, engineering and civil applications. This paper presents the framework for a recently developed unsupervised classification algorithm called Skewness Balancing for object and ground point separation in airborne LIDAR data. The main advantages of the algorithm are threshold-freedom and independence from LIDAR data format and resolution, while preserving object and terrain details. The framework for Skewness Balancing has been built in this contribution with a prediction model in which unknown LIDAR tiles can be categorised as “hilly” or “moderate” terrains. Accuracy assessment of the model is carried out using cross-validation with an overall accuracy of 95%. An extension to the algorithm is developed to address the overclassification issue for hilly terrain. For moderate terrain, the results show that from the classified tiles detached objects (buildings and vegetation) and attached objects (bridges and motorway junctions) are separated from bare earth (ground, roads and yards) which makes Skewness Balancing ideal to be integrated into geographic information system (GIS) software packages.
Resumo:
This paper compares exchange rate pass-through to aggregate prices in the US, Germany and Japan across a number of dimensions. Building on the empirical approaches in the recent literature, our contribution is to perform a thorough sensitivity analysis of pass-through estimates. We find that the econometric method, data frequency and variable proxy employed matter for the precision of details, yet they often agree on some general trends. Thus, pass-through to import prices has declined in the 1990s relative to the 1980s, pass-through to export prices remains country-specific and pass-through to consumer prices is nowadays negligible in all three economies we considered.
Resumo:
There is remarkable agreement in expectations today for vastly improved ocean data management a decade from now -- capabilities that will help to bring significant benefits to ocean research and to society. Advancing data management to such a degree, however, will require cultural and policy changes that are slow to effect. The technological foundations upon which data management systems are built are certain to continue advancing rapidly in parallel. These considerations argue for adopting attitudes of pragmatism and realism when planning data management strategies. In this paper we adopt those attitudes as we outline opportunities for progress in ocean data management. We begin with a synopsis of expectations for integrated ocean data management a decade from now. We discuss factors that should be considered by those evaluating candidate “standards”. We highlight challenges and opportunities in a number of technical areas, including “Web 2.0” applications, data modeling, data discovery and metadata, real-time operational data, archival of data, biological data management and satellite data management. We discuss the importance of investments in the development of software toolkits to accelerate progress. We conclude the paper by recommending a few specific, short term targets for implementation, that we believe to be both significant and achievable, and calling for action by community leadership to effect these advancements.
Resumo:
As integrated software solutions reshape project delivery, they alter the bases for collaboration and competition across firms in complex industries. This paper synthesises and extends literatures on strategy in project-based industries and digitally-integrated work to understand how project-based firms interact with digital infrastructures for project delivery. Four identified strategies are to: 1) develop and use capabilities to shape the integrated software solutions that are used in projects; 2) co-specialize, developing complementary assets to work repeatedly with a particular integrator firm; 3) retain flexibility by developing and maintaining capabilities in multiple digital technologies and processes; and 4) manage interfaces, translating work into project formats for coordination while hiding proprietary data and capabilities in internal systems. The paper articulates the strategic importance of digital infrastructures for delivery as well as product architectures. It concludes by discussing managerial implications of the identified strategies and areas for further research.
Resumo:
Many recent papers have documented periodicities in returns, return volatility, bid–ask spreads and trading volume, in both equity and foreign exchange markets. We propose and employ a new test for detecting subtle periodicities in time series data based on a signal coherence function. The technique is applied to a set of seven half-hourly exchange rate series. Overall, we find the signal coherence to be maximal at the 8-h and 12-h frequencies. Retaining only the most coherent frequencies for each series, we implement a trading rule that is based on these observed periodicities. Our results demonstrate in all cases except one that, in gross terms, the rules can generate returns that are considerably greater than those of a buy-and-hold strategy, although they cannot retain their profitability net of transactions costs. We conjecture that this methodology could constitute an important tool for financial market researchers which will enable them to detect, quantify and rank the various periodic components in financial data better.
Resumo:
In recent years, a sharp divergence of London Stock Exchange equity prices from dividends has been noted. In this paper, we examine whether this divergence can be explained by reference to the existence of a speculative bubble. Three different empirical methodologies are used: variance bounds tests, bubble specification tests, and cointegration tests based on both ex post and ex ante data. We find that, stock prices diverged significantly from their fundamental values during the late 1990's, and that this divergence has all the characteristics of a bubble.
Resumo:
This paper deals with the integration of radial basis function (RBF) networks into the industrial software control package Connoisseur. The paper shows the improved modelling capabilities offered by RBF networks within the Connoisseur environment compared to linear modelling techniques such as recursive least squares. The paper also goes on to mention the way this improved modelling capability, obtained through the RBF networks will be utilised within Connoisseur.
Resumo:
Metabolic stable isotope labeling is increasingly employed for accurate protein (and metabolite) quantitation using mass spectrometry (MS). It provides sample-specific isotopologues that can be used to facilitate comparative analysis of two or more samples. Stable Isotope Labeling by Amino acids in Cell culture (SILAC) has been used for almost a decade in proteomic research and analytical software solutions have been established that provide an easy and integrated workflow for elucidating sample abundance ratios for most MS data formats. While SILAC is a discrete labeling method using specific amino acids, global metabolic stable isotope labeling using isotopes such as (15)N labels the entire element content of the sample, i.e. for (15)N the entire peptide backbone in addition to all nitrogen-containing side chains. Although global metabolic labeling can deliver advantages with regard to isotope incorporation and costs, the requirements for data analysis are more demanding because, for instance for polypeptides, the mass difference introduced by the label depends on the amino acid composition. Consequently, there has been less progress on the automation of the data processing and mining steps for this type of protein quantitation. Here, we present a new integrated software solution for the quantitative analysis of protein expression in differential samples and show the benefits of high-resolution MS data in quantitative proteomic analyses.
Resumo:
This paper builds upon previous research on currency bands, and provides a model for the Colombian peso. Stochastic differential equations are combined with information related to the Colombian currency band to estimate competing models of the behaviour of the Colombian peso within the limits of the currency band. The resulting moments of the density function for the simulated returns describe adequately most of the characteristics of the sample returns data. The factor included to account for the intra-marginal intervention performed to drive the rate towards the Central Parity accounts only for 6.5% of the daily change, which supports the argument that intervention, if performed by the Central Bank, it is not directed to push the currency towards the limits. Moreover, the credibility of the Colombian Central Bank, Banco de la República’s ability to defend the band seems low.
Resumo:
Much consideration is rightly given to the design of metadata models to describe data. At the other end of the data-delivery spectrum much thought has also been given to the design of geospatial delivery interfaces such as the Open Geospatial Consortium standards, Web Coverage Service (WCS), Web Map Server and Web Feature Service (WFS). Our recent experience with the Climate Science Modelling Language shows that an implementation gap exists where many challenges remain unsolved. To bridge this gap requires transposing information and data from one world view of geospatial climate data to another. Some of the issues include: the loss of information in mapping to a common information model, the need to create ‘views’ onto file-based storage, and the need to map onto an appropriate delivery interface (as with the choice between WFS and WCS for feature types with coverage-valued properties). Here we summarise the approaches we have taken in facing up to these problems.
Resumo:
A new electronic software distribution (ESD) life cycle analysis (LCA)methodology and model structure were constructed to calculate energy consumption and greenhouse gas (GHG) emissions. In order to counteract the use of high level, top-down modeling efforts, and to increase result accuracy, a focus upon device details and data routes was taken. In order to compare ESD to a relevant physical distribution alternative,physical model boundaries and variables were described. The methodology was compiled from the analysis and operational data of a major online store which provides ESD and physical distribution options. The ESD method included the calculation of power consumption of data center server and networking devices. An in-depth method to calculate server efficiency and utilization was also included to account for virtualization and server efficiency features. Internet transfer power consumption was analyzed taking into account the number of data hops and networking devices used. The power consumed by online browsing and downloading was also factored into the model. The embedded CO2e of server and networking devices was proportioned to each ESD process. Three U.K.-based ESD scenarios were analyzed using the model which revealed potential CO2e savings of 83% when ESD was used over physical distribution. Results also highlighted the importance of server efficiency and utilization methods.
Resumo:
Despite the increasing use of groupware technologies in education, there is little evidence of their impact, especially within an enquiry-based learning (EBL) context. In this paper, we examine the use of a commercial standard Group Intelligence software called GroupSystems®ThinkTank. To date, ThinkTank has been adopted mainly in the USA and supports teams in generating ideas, categorising, prioritising, voting and multi-criteria decision-making and automatically generates a report at the end of each session. The software was used by students carrying out an EBL project, set by employers, for a full academic year. The criteria for assessing the impact of ThinkTank on student learning were those of creativity, participation, productivity, engagement and understanding. Data was collected throughout the year using a combination of interviews and questionnaires, and written feedback from employers. The overall findings show an increase in levels of productivity and creativity, evidence of a deeper understanding of their work but some variation in attitudes towards participation in the early stages of the project.
Resumo:
Recently major processor manufacturers have announced a dramatic shift in their paradigm to increase computing power over the coming years. Instead of focusing on faster clock speeds and more powerful single core CPUs, the trend clearly goes towards multi core systems. This will also result in a paradigm shift for the development of algorithms for computationally expensive tasks, such as data mining applications. Obviously, work on parallel algorithms is not new per se but concentrated efforts in the many application domains are still missing. Multi-core systems, but also clusters of workstations and even large-scale distributed computing infrastructures provide new opportunities and pose new challenges for the design of parallel and distributed algorithms. Since data mining and machine learning systems rely on high performance computing systems, research on the corresponding algorithms must be on the forefront of parallel algorithm research in order to keep pushing data mining and machine learning applications to be more powerful and, especially for the former, interactive. To bring together researchers and practitioners working in this exciting field, a workshop on parallel data mining was organized as part of PKDD/ECML 2006 (Berlin, Germany). The six contributions selected for the program describe various aspects of data mining and machine learning approaches featuring low to high degrees of parallelism: The first contribution focuses the classic problem of distributed association rule mining and focuses on communication efficiency to improve the state of the art. After this a parallelization technique for speeding up decision tree construction by means of thread-level parallelism for shared memory systems is presented. The next paper discusses the design of a parallel approach for dis- tributed memory systems of the frequent subgraphs mining problem. This approach is based on a hierarchical communication topology to solve issues related to multi-domain computational envi- ronments. The forth paper describes the combined use and the customization of software packages to facilitate a top down parallelism in the tuning of Support Vector Machines (SVM) and the next contribution presents an interesting idea concerning parallel training of Conditional Random Fields (CRFs) and motivates their use in labeling sequential data. The last contribution finally focuses on very efficient feature selection. It describes a parallel algorithm for feature selection from random subsets. Selecting the papers included in this volume would not have been possible without the help of an international Program Committee that has provided detailed reviews for each paper. We would like to also thank Matthew Otey who helped with publicity for the workshop.