Biblioteca Digital

39 resultados para variable data printing

Advances in data envelopment analysis

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Since its introduction in 1978, data envelopment analysis (DEA) has become one of the preeminent nonparametric methods for measuring efficiency and productivity of decision making units (DMUs). Charnes et al. (1978) provided the original DEA constant returns to scale (CRS) model, later extended to variable returns to scale (VRS) by Banker et al. (1984). These ‘standard’ models are known by the acronyms CCR and BCC, respectively, and are now employed routinely in areas that range from assessment of public sectors, such as hospitals and health care systems, schools, and universities, to private sectors, such as banks and financial institutions (Emrouznejad et al. 2008; Emrouznejad and De Witte 2010). The main objective of this volume is to publish original studies that are beyond the two standard CCR and BCC models with both theoretical and practical applications using advanced models in DEA.

Veja mais

The influence of socio-cultural environments on the performance of nascent entrepreneurs:an instrumental variable approach

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The importance of informal institutions and in particular culture for entrepreneurship is a subject of ongoing interest. Past research has mostly concentrated on cross-national comparisons, cultural values, and the direct effects of culture on entrepreneurial behavior, but in the main found inconsistent results. The present research adds a fresh perspective to this research stream by turning attention to community-level culture and cultural norms. We hypothesize indirect effects of cultural norms on venture emergence. Specifically that community-level cultural norms (performance-based culture and socially-supportive institutional norms) impact important supply-side variables (entrepreneurial self-efficacy and entrepreneurial motivation) which in turn influence nascent entrepreneurs’ success in creating operational ventures (venture emergence). We test our predictions on a unique longitudinal data set (PSED II) tracking nascent entrepreneurs venture creation efforts over a 5 year time span and find evidence supporting them. Our research contributes to a more fine-grained understanding of how culture, in particular perceptions of community cultural norms, influences venture emergence. This research highlights the embeddedness of entrepreneurial behavior and its immediate antecedent beliefs in the local, community context.

Veja mais

Handling varying amounts of missing data when classifying mental-health risk levels

Relevância:

30.00% 30.00%

Publicador:

Resumo:

One of the main challenges of classifying clinical data is determining how to handle missing features. Most research favours imputing of missing values or neglecting records that include missing data, both of which can degrade accuracy when missing values exceed a certain level. In this research we propose a methodology to handle data sets with a large percentage of missing values and with high variability in which particular data are missing. Feature selection is effected by picking variables sequentially in order of maximum correlation with the dependent variable and minimum correlation with variables already selected. Classification models are generated individually for each test case based on its particular feature set and the matching data values available in the training population. The method was applied to real patients' anonymous mental-health data where the task was to predict the suicide risk judgement clinicians would give for each patient's data, with eleven possible outcome classes: zero to ten, representing no risk to maximum risk. The results compare favourably with alternative methods and have the advantage of ensuring explanations of risk are based only on the data given, not imputed data. This is important for clinical decision support systems using human expertise for modelling and explaining predictions.

Veja mais

Site-restricted web searches for data collection in regional dialectology

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This article presents a new method for data collection in regional dialectology based on site-restricted web searches. The method measures the usage and determines the distribution of lexical variants across a region of interest using common web search engines, such as Google or Bing. The method involves estimating the proportions of the variants of a lexical alternation variable over a series of cities by counting the number of webpages that contain the variants on newspaper websites originating from these cities through site-restricted web searches. The method is evaluated by mapping the 26 variants of 10 lexical variables with known distributions in American English. In almost all cases, the maps based on site-restricted web searches align closely with traditional dialect maps based on data gathered through questionnaires, demonstrating the accuracy of this method for the observation of regional linguistic variation. However, unlike collecting dialect data using traditional methods, which is a relatively slow process, the use of site-restricted web searches allows for dialect data to be collected from across a region as large as the United States in a matter of days.

Veja mais

Visualisation of heterogeneous data with simultaneous feature saliency using Generalised Generative Topographic Mapping

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Most machine-learning algorithms are designed for datasets with features of a single type whereas very little attention has been given to datasets with mixed-type features. We recently proposed a model to handle mixed types with a probabilistic latent variable formalism. This proposed model describes the data by type-specific distributions that are conditionally independent given the latent space and is called generalised generative topographic mapping (GGTM). It has often been observed that visualisations of high-dimensional datasets can be poor in the presence of noisy features. In this paper we therefore propose to extend the GGTM to estimate feature saliency values (GGTMFS) as an integrated part of the parameter learning process with an expectation-maximisation (EM) algorithm. The efficacy of the proposed GGTMFS model is demonstrated both for synthetic and real datasets.

Veja mais

Self-adapting parallel metric-space search engine for variable query loads

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This research focuses on automatically adapting a search engine size in response to fluctuations in query workload. Deploying a search engine in an Infrastructure as a Service (IaaS) cloud facilitates allocating or deallocating computer resources to or from the engine. Our solution is to contribute an adaptive search engine that will repeatedly re-evaluate its load and, when appropriate, switch over to a dierent number of active processors. We focus on three aspects and break them out into three sub-problems as follows: Continually determining the Number of Processors (CNP), New Grouping Problem (NGP) and Regrouping Order Problem (ROP). CNP means that (in the light of the changes in the query workload in the search engine) there is a problem of determining the ideal number of processors p active at any given time to use in the search engine and we call this problem CNP. NGP happens when changes in the number of processors are determined and it must also be determined which groups of search data will be distributed across the processors. ROP is how to redistribute this data onto processors while keeping the engine responsive and while also minimising the switchover time and the incurred network load. We propose solutions for these sub-problems. For NGP we propose an algorithm for incrementally adjusting the index to t the varying number of virtual machines. For ROP we present an ecient method for redistributing data among processors while keeping the search engine responsive. Regarding the solution for CNP, we propose an algorithm determining the new size of the search engine by re-evaluating its load. We tested the solution performance using a custom-build prototype search engine deployed in the Amazon EC2 cloud. Our experiments show that when we compare our NGP solution with computing the index from scratch, the incremental algorithm speeds up the index computation 2{10 times while maintaining a similar search performance. The chosen redistribution method is 25% to 50% faster than other methods and reduces the network load around by 30%. For CNP we present a deterministic algorithm that shows a good ability to determine a new size of search engine. When combined, these algorithms give an adapting algorithm that is able to adjust the search engine size with a variable workload.

Veja mais

Semi-supervised Gaussian process latent variable model with pairwise constraints

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In machine learning, Gaussian process latent variable model (GP-LVM) has been extensively applied in the field of unsupervised dimensionality reduction. When some supervised information, e.g., pairwise constraints or labels of the data, is available, the traditional GP-LVM cannot directly utilize such supervised information to improve the performance of dimensionality reduction. In this case, it is necessary to modify the traditional GP-LVM to make it capable of handing the supervised or semi-supervised learning tasks. For this purpose, we propose a new semi-supervised GP-LVM framework under the pairwise constraints. Through transferring the pairwise constraints in the observed space to the latent space, the constrained priori information on the latent variables can be obtained. Under this constrained priori, the latent variables are optimized by the maximum a posteriori (MAP) algorithm. The effectiveness of the proposed algorithm is demonstrated with experiments on a variety of data sets. © 2010 Elsevier B.V.

Veja mais

Improved data visualisation through multiple dissimilarity modelling

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Popular dimension reduction and visualisation algorithms rely on the assumption that input dissimilarities are typically Euclidean, for instance Metric Multidimensional Scaling, t-distributed Stochastic Neighbour Embedding and the Gaussian Process Latent Variable Model. It is well known that this assumption does not hold for most datasets and often high-dimensional data sits upon a manifold of unknown global geometry. We present a method for improving the manifold charting process, coupled with Elastic MDS, such that we no longer assume that the manifold is Euclidean, or of any particular structure. We draw on the benefits of different dissimilarity measures allowing for the relative responsibilities, under a linear combination, to drive the visualisation process.

Veja mais

The sensitivity of mapping methods to reference data quality:training supervised image classifications with imperfect reference data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The accuracy of a map is dependent on the reference dataset used in its construction. Classification analyses used in thematic mapping can, for example, be sensitive to a range of sampling and data quality concerns. With particular focus on the latter, the effects of reference data quality on land cover classifications from airborne thematic mapper data are explored. Variations in sampling intensity and effort are highlighted in a dataset that is widely used in mapping and modelling studies; these may need accounting for in analyses. The quality of the labelling in the reference dataset was also a key variable influencing mapping accuracy. Accuracy varied with the amount and nature of mislabelled training cases with the nature of the effects varying between classifiers. The largest impacts on accuracy occurred when mislabelling involved confusion between similar classes. Accuracy was also typically negatively related to the magnitude of mislabelled cases and the support vector machine (SVM), which has been claimed to be relatively insensitive to training data error, was the most sensitive of the set of classifiers investigated, with overall classification accuracy declining by 8% (significant at 95% level of confidence) with the use of a training set containing 20% mislabelled cases.

Veja mais

39 resultados para variable data printing

Filtro por publicador