167 resultados para Software distributions
Resumo:
A long-held assumption in entrepreneurship research is that normal (i.e., Gaussian) distributions characterize variables of interest for both theory and practice. We challenge this assumption by examining more than 12,000 nascent, young, and hyper-growth firms. Results reveal that variables which play central roles in resource-, cognition-, action-, and environment-based entrepreneurship theories exhibit highly skewed power law distributions, where a few outliers account for a disproportionate amount of the distribution's total output. Our results call for the development of new theory to explain and predict the mechanisms that generate these distributions and the outliers therein. We offer a research agenda, including a description of non-traditional methodological approaches, to answer this call.
Resumo:
The characterisation of facial expression through landmark-based analysis methods such as FACEM (Pilowsky & Katsikitis, 1994) has a variety of uses in psychiatric and psychological research. In these systems, important structural relationships are extracted from images of facial expressions by the analysis of a pre-defined set of feature points. These relationship measures may then be used, for instance, to assess the degree of variability and similarity between different facial expressions of emotion. FaceXpress is a multimedia software suite that provides a generalised workbench for landmark-based facial emotion analysis and stimulus manipulation. It is a flexible tool that is designed to be specialised at runtime by the user. While FaceXpress has been used to implement the FACEM process, it can also be configured to support any other similar, arbitrary system for quantifying human facial emotion. FaceXpress also implements an integrated set of image processing tools and specialised tools for facial expression stimulus production including facial morphing routines and the generation of expression-representative line drawings from photographs.
Resumo:
Neu-Model, an ongoing project aimed at developing a neural simulation environment that is extremely computationally powerful and flexible, is described. It is shown that the use of good Software Engineering techniques in Neu-Model’s design and implementation is resulting in a high performance system that is powerful and flexible enough to allow rigorous exploration of brain function at a variety of conceptual levels.
Resumo:
This paper describes a software architecture for real-world robotic applications. We discuss issues of software reliability, testing and realistic off-line simulation that allows the majority of the automation system to be tested off-line in the laboratory before deployment in the field. A recent project, the automation of a very large mining machine is used to illustrate the discussion.
Resumo:
In 2005, Ginger Myles and Hongxia Jin proposed a software watermarking scheme based on converting jump instructions or unconditional branch statements (UBSs) by calls to a fingerprint branch function (FBF) that computes the correct target address of the UBS as a function of the generated fingerprint and integrity check. If the program is tampered with, the fingerprint and integrity checks change and the target address will not be computed correctly. In this paper, we present an attack based on tracking stack pointer modifications to break the scheme and provide implementation details. The key element of the attack is to remove the fingerprint and integrity check generating code from the program after disassociating the target address from the fingerprint and integrity value. Using the debugging tools that give vast control to the attacker to track stack pointer operations, we perform both subtractive and watermark replacement attacks. The major steps in the attack are automated resulting in a fast and low-cost attack.
Resumo:
Species identification based on short sequences of DNA markers, that is, DNA barcoding, has emerged as an integral part of modern taxonomy. However, software for the analysis of large and multilocus barcoding data sets is scarce. The Basic Local Alignment Search Tool (BLAST) is currently the fastest tool capable of handling large databases (e.g. >5000 sequences), but its accuracy is a concern and has been criticized for its local optimization. However, current more accurate software requires sequence alignment or complex calculations, which are time-consuming when dealing with large data sets during data preprocessing or during the search stage. Therefore, it is imperative to develop a practical program for both accurate and scalable species identification for DNA barcoding. In this context, we present VIP Barcoding: a user-friendly software in graphical user interface for rapid DNA barcoding. It adopts a hybrid, two-stage algorithm. First, an alignment-free composition vector (CV) method is utilized to reduce searching space by screening a reference database. The alignment-based K2P distance nearest-neighbour method is then employed to analyse the smaller data set generated in the first stage. In comparison with other software, we demonstrate that VIP Barcoding has (i) higher accuracy than Blastn and several alignment-free methods and (ii) higher scalability than alignment-based distance methods and character-based methods. These results suggest that this platform is able to deal with both large-scale and multilocus barcoding data with accuracy and can contribute to DNA barcoding for modern taxonomy. VIP Barcoding is free and available at http://msl.sls.cuhk.edu.hk/vipbarcoding/.
Resumo:
As more raw sugar factories become involved in the manufacture of by-products and cogeneration, bagasse is becoming an increasingly valuable commodity. However, in most factories, most of the bagasse produced is used to generate steam in relatively old and inefficient boilers. Efficient bagasse fired boilers are a high capital cost item and the cost of supplying the steam required to run a sugar factory by other means is prohibitive. For many factories a more realistic way to reduce bagasse consumption is to increase the efficiency of existing boilers. The Farleigh No. 3 boiler is a relatively old low efficiency boiler. Like many in the industry, the performance of this boiler has been adversely affected by uneven gas and air flow distributions and air heater leaks. The combustion performance and efficiency of this boiler have been significantly improved by making the gas and air flow distributions through the boiler more uniform and repairing the air heater. The estimated bagasse savings easily justify the cost of the boiler improvements.
Resumo:
This research explored how small and medium enterprises can achieve success with software as a service (SaaS) applications from cloud. Based upon an empirical investigation of six growth oriented and early technology adopting small and medium enterprises, this study proposes a SaaS for small and medium enterprise success model with two approaches: one for basic and one for advanced benefits. The basic model explains the effective use of SaaS for achieving informational and transactional benefits. The advanced model explains the enhanced use of software as a service for achieving strategic and transformational benefits. Both models explicate the information systems capabilities and organizational complementarities needed for achieving success with SaaS.
Resumo:
Bug fixing is a highly cooperative work activity where developers, testers, product managers and other stake-holders collaborate using a bug tracking system. In the context of Global Software Development (GSD), where software development is distributed across different geographical locations, we focus on understanding the role of bug trackers in supporting software bug fixing activities. We carried out a small-scale ethnographic fieldwork in a software product team distributed between Finland and India at a multinational engineering company. Using semi-structured interviews and in-situ observations of 16 bug cases, we show that the bug tracker 1) supported information needs of different stake holder, 2) established common-ground, and 3) reinforced issues related to ownership, performance and power. Consequently, we provide implications for design around these findings.
Resumo:
Environmental data usually include measurements, such as water quality data, which fall below detection limits, because of limitations of the instruments or of certain analytical methods used. The fact that some responses are not detected needs to be properly taken into account in statistical analysis of such data. However, it is well-known that it is challenging to analyze a data set with detection limits, and we often have to rely on the traditional parametric methods or simple imputation methods. Distributional assumptions can lead to biased inference and justification of distributions is often not possible when the data are correlated and there is a large proportion of data below detection limits. The extent of bias is usually unknown. To draw valid conclusions and hence provide useful advice for environmental management authorities, it is essential to develop and apply an appropriate statistical methodology. This paper proposes rank-based procedures for analyzing non-normally distributed data collected at different sites over a period of time in the presence of multiple detection limits. To take account of temporal correlations within each site, we propose an optimal linear combination of estimating functions and apply the induced smoothing method to reduce the computational burden. Finally, we apply the proposed method to the water quality data collected at Susquehanna River Basin in United States of America, which dearly demonstrates the advantages of the rank regression models.
Resumo:
We investigate methods for data-based selection of working covariance models in the analysis of correlated data with generalized estimating equations. We study two selection criteria: Gaussian pseudolikelihood and a geodesic distance based on discrepancy between model-sensitive and model-robust regression parameter covariance estimators. The Gaussian pseudolikelihood is found in simulation to be reasonably sensitive for several response distributions and noncanonical mean-variance relations for longitudinal data. Application is also made to a clinical dataset. Assessment of adequacy of both correlation and variance models for longitudinal data should be routine in applications, and we describe open-source software supporting this practice.
Resumo:
Species distribution modelling (SDM) typically analyses species’ presence together with some form of absence information. Ideally absences comprise observations or are inferred from comprehensive sampling. When such information is not available, then pseudo-absences are often generated from the background locations within the study region of interest containing the presences, or else absence is implied through the comparison of presences to the whole study region, e.g. as is the case in Maximum Entropy (MaxEnt) or Poisson point process modelling. However, the choice of which absence information to include can be both challenging and highly influential on SDM predictions (e.g. Oksanen and Minchin, 2002). In practice, the use of pseudo- or implied absences often leads to an imbalance where absences far outnumber presences. This leaves analysis highly susceptible to ‘naughty-noughts’: absences that occur beyond the envelope of the species, which can exert strong influence on the model and its predictions (Austin and Meyers, 1996). Also known as ‘excess zeros’, naughty noughts can be estimated via an overall proportion in simple hurdle or mixture models (Martin et al., 2005). However, absences, especially those that occur beyond the species envelope, can often be more diverse than presences. Here we consider an extension to excess zero models. The two-staged approach first exploits the compartmentalisation provided by classification trees (CTs) (as in O’Leary, 2008) to identify multiple sources of naughty noughts and simultaneously delineate several species envelopes. Then SDMs can be fit separately within each envelope, and for this stage, we examine both CTs (as in Falk et al., 2014) and the popular MaxEnt (Elith et al., 2006). We introduce a wider range of model performance measures to improve treatment of naughty noughts in SDM. We retain an overall measure of model performance, the area under the curve (AUC) of the Receiver-Operating Curve (ROC), but focus on its constituent measures of false negative rate (FNR) and false positive rate (FPR), and how these relate to the threshold in the predicted probability of presence that delimits predicted presence from absence. We also propose error rates more relevant to users of predictions: false omission rate (FOR), the chance that a predicted absence corresponds to (and hence wastes) an observed presence, and the false discovery rate (FDR), reflecting those predicted (or potential) presences that correspond to absence. A high FDR may be desirable since it could help target future search efforts, whereas zero or low FOR is desirable since it indicates none of the (often valuable) presences have been ignored in the SDM. For illustration, we chose Bradypus variegatus, a species that has previously been published as an exemplar species for MaxEnt, proposed by Phillips et al. (2006). We used CTs to increasingly refine the species envelope, starting with the whole study region (E0), eliminating more and more potential naughty noughts (E1–E3). When combined with an SDM fit within the species envelope, the best CT SDM had similar AUC and FPR to the best MaxEnt SDM, but otherwise performed better. The FNR and FOR were greatly reduced, suggesting that CTs handle absences better. Interestingly, MaxEnt predictions showed low discriminatory performance, with the most common predicted probability of presence being in the same range (0.00-0.20) for both true absences and presences. In summary, this example shows that SDMs can be improved by introducing an initial hurdle to identify naughty noughts and partition the envelope before applying SDMs. This improvement was barely detectable via AUC and FPR yet visible in FOR, FNR, and the comparison of predicted probability of presence distribution for pres/absence.
Resumo:
Information sharing in distance collaboration: A software engineering perspective, QueenslandFactors in software engineering workgroups such as geographical dispersion and background discipline can be conceptually characterized as "distances", and they are obstructive to team collaboration and information sharing. This thesis focuses on information sharing across multidimensional distances and develops an information sharing distance model, with six core dimensions: geography, time zone, organization, multi-discipline, heterogeneous roles, and varying project tenure. The research suggests that the effectiveness of workgroups may be improved through mindful conducts of information sharing, especially proactive consideration of, and explicit adjustment for, the distances of the recipient when sharing information.
Resumo:
We propose a family of multivariate heavy-tailed distributions that allow variable marginal amounts of tailweight. The originality comes from introducing multidimensional instead of univariate scale variables for the mixture of scaled Gaussian family of distributions. In contrast to most existing approaches, the derived distributions can account for a variety of shapes and have a simple tractable form with a closed-form probability density function whatever the dimension. We examine a number of properties of these distributions and illustrate them in the particular case of Pearson type VII and t tails. For these latter cases, we provide maximum likelihood estimation of the parameters and illustrate their modelling flexibility on simulated and real data clustering examples.
Resumo:
In this paper, we examine approaches to estimate a Bayesian mixture model at both single and multiple time points for a sample of actual and simulated aerosol particle size distribution (PSD) data. For estimation of a mixture model at a single time point, we use Reversible Jump Markov Chain Monte Carlo (RJMCMC) to estimate mixture model parameters including the number of components which is assumed to be unknown. We compare the results of this approach to a commonly used estimation method in the aerosol physics literature. As PSD data is often measured over time, often at small time intervals, we also examine the use of an informative prior for estimation of the mixture parameters which takes into account the correlated nature of the parameters. The Bayesian mixture model offers a promising approach, providing advantages both in estimation and inference.