962 resultados para variable data printing
Resumo:
The importance of informal institutions and in particular culture for entrepreneurship is a subject of ongoing interest. Past research has mostly concentrated on cross-national comparisons, cultural values, and the direct effects of culture on entrepreneurial behavior, but in the main found inconsistent results. The present research adds a fresh perspective to this research stream by turning attention to community-level culture and cultural norms. We hypothesize indirect effects of cultural norms on venture emergence. Specifically that community-level cultural norms (performance-based culture and socially-supportive institutional norms) impact important supply-side variables (entrepreneurial self-efficacy and entrepreneurial motivation) which in turn influence nascent entrepreneurs’ success in creating operational ventures (venture emergence). We test our predictions on a unique longitudinal data set (PSED II) tracking nascent entrepreneurs venture creation efforts over a 5 year time span and find evidence supporting them. Our research contributes to a more fine-grained understanding of how culture, in particular perceptions of community cultural norms, influences venture emergence. This research highlights the embeddedness of entrepreneurial behavior and its immediate antecedent beliefs in the local, community context.
Resumo:
One of the main challenges of classifying clinical data is determining how to handle missing features. Most research favours imputing of missing values or neglecting records that include missing data, both of which can degrade accuracy when missing values exceed a certain level. In this research we propose a methodology to handle data sets with a large percentage of missing values and with high variability in which particular data are missing. Feature selection is effected by picking variables sequentially in order of maximum correlation with the dependent variable and minimum correlation with variables already selected. Classification models are generated individually for each test case based on its particular feature set and the matching data values available in the training population. The method was applied to real patients' anonymous mental-health data where the task was to predict the suicide risk judgement clinicians would give for each patient's data, with eleven possible outcome classes: zero to ten, representing no risk to maximum risk. The results compare favourably with alternative methods and have the advantage of ensuring explanations of risk are based only on the data given, not imputed data. This is important for clinical decision support systems using human expertise for modelling and explaining predictions.
Resumo:
This article presents a new method for data collection in regional dialectology based on site-restricted web searches. The method measures the usage and determines the distribution of lexical variants across a region of interest using common web search engines, such as Google or Bing. The method involves estimating the proportions of the variants of a lexical alternation variable over a series of cities by counting the number of webpages that contain the variants on newspaper websites originating from these cities through site-restricted web searches. The method is evaluated by mapping the 26 variants of 10 lexical variables with known distributions in American English. In almost all cases, the maps based on site-restricted web searches align closely with traditional dialect maps based on data gathered through questionnaires, demonstrating the accuracy of this method for the observation of regional linguistic variation. However, unlike collecting dialect data using traditional methods, which is a relatively slow process, the use of site-restricted web searches allows for dialect data to be collected from across a region as large as the United States in a matter of days.
Resumo:
∗The author was partially supported by M.U.R.S.T. Progr. Nazionale “Problemi Non Lineari...”
Resumo:
In the nonparametric framework of Data Envelopment Analysis the statistical properties of its estimators have been investigated and only asymptotic results are available. For DEA estimators results of practical use have been proved only for the case of one input and one output. However, in the real world problems the production process is usually well described by many variables. In this paper a machine learning approach to variable aggregation based on Canonical Correlation Analysis is presented. This approach is applied for efficiency estimation of all the farms in Terceira Island of the Azorean archipelago.
Resumo:
2000 Mathematics Subject Classification: 62J12, 62P10.
Resumo:
Carte du Ciel (from French, map of the sky) is a part of a 19th century extensive international astronomical project whose goal was to map the entire visible sky. The results of this vast effort were collected in the form of astrographic plates and their paper representatives that are called astrographic maps and are widely distributed among many observatories and astronomical institutes over the world. Our goal is to design methods and algorithms to automatically extract data from digitized Carte du Ciel astrographic maps. This paper examines the image processing and pattern recognition techniques that can be adopted for automatic extraction of astronomical data from stars’ triple expositions that can aid variable stars detection in Carte du Ciel maps.
Resumo:
Most machine-learning algorithms are designed for datasets with features of a single type whereas very little attention has been given to datasets with mixed-type features. We recently proposed a model to handle mixed types with a probabilistic latent variable formalism. This proposed model describes the data by type-specific distributions that are conditionally independent given the latent space and is called generalised generative topographic mapping (GGTM). It has often been observed that visualisations of high-dimensional datasets can be poor in the presence of noisy features. In this paper we therefore propose to extend the GGTM to estimate feature saliency values (GGTMFS) as an integrated part of the parameter learning process with an expectation-maximisation (EM) algorithm. The efficacy of the proposed GGTMFS model is demonstrated both for synthetic and real datasets.
Resumo:
This research focuses on automatically adapting a search engine size in response to fluctuations in query workload. Deploying a search engine in an Infrastructure as a Service (IaaS) cloud facilitates allocating or deallocating computer resources to or from the engine. Our solution is to contribute an adaptive search engine that will repeatedly re-evaluate its load and, when appropriate, switch over to a dierent number of active processors. We focus on three aspects and break them out into three sub-problems as follows: Continually determining the Number of Processors (CNP), New Grouping Problem (NGP) and Regrouping Order Problem (ROP). CNP means that (in the light of the changes in the query workload in the search engine) there is a problem of determining the ideal number of processors p active at any given time to use in the search engine and we call this problem CNP. NGP happens when changes in the number of processors are determined and it must also be determined which groups of search data will be distributed across the processors. ROP is how to redistribute this data onto processors while keeping the engine responsive and while also minimising the switchover time and the incurred network load. We propose solutions for these sub-problems. For NGP we propose an algorithm for incrementally adjusting the index to t the varying number of virtual machines. For ROP we present an ecient method for redistributing data among processors while keeping the search engine responsive. Regarding the solution for CNP, we propose an algorithm determining the new size of the search engine by re-evaluating its load. We tested the solution performance using a custom-build prototype search engine deployed in the Amazon EC2 cloud. Our experiments show that when we compare our NGP solution with computing the index from scratch, the incremental algorithm speeds up the index computation 2{10 times while maintaining a similar search performance. The chosen redistribution method is 25% to 50% faster than other methods and reduces the network load around by 30%. For CNP we present a deterministic algorithm that shows a good ability to determine a new size of search engine. When combined, these algorithms give an adapting algorithm that is able to adjust the search engine size with a variable workload.
Resumo:
In machine learning, Gaussian process latent variable model (GP-LVM) has been extensively applied in the field of unsupervised dimensionality reduction. When some supervised information, e.g., pairwise constraints or labels of the data, is available, the traditional GP-LVM cannot directly utilize such supervised information to improve the performance of dimensionality reduction. In this case, it is necessary to modify the traditional GP-LVM to make it capable of handing the supervised or semi-supervised learning tasks. For this purpose, we propose a new semi-supervised GP-LVM framework under the pairwise constraints. Through transferring the pairwise constraints in the observed space to the latent space, the constrained priori information on the latent variables can be obtained. Under this constrained priori, the latent variables are optimized by the maximum a posteriori (MAP) algorithm. The effectiveness of the proposed algorithm is demonstrated with experiments on a variety of data sets. © 2010 Elsevier B.V.
Resumo:
Popular dimension reduction and visualisation algorithms rely on the assumption that input dissimilarities are typically Euclidean, for instance Metric Multidimensional Scaling, t-distributed Stochastic Neighbour Embedding and the Gaussian Process Latent Variable Model. It is well known that this assumption does not hold for most datasets and often high-dimensional data sits upon a manifold of unknown global geometry. We present a method for improving the manifold charting process, coupled with Elastic MDS, such that we no longer assume that the manifold is Euclidean, or of any particular structure. We draw on the benefits of different dissimilarity measures allowing for the relative responsibilities, under a linear combination, to drive the visualisation process.
Resumo:
This ex post facto study (N = 209) examined the relationships between employer job strategies and job retention among organizations participating in Florida welfare-to-work network programs and associated the strategies with job retention data to determine best practices. ^ An internet-based self-report survey battery was administered to a heterogeneous sampling of organizations participating in the Florida welfare-to-work network program. Hypotheses were tested through correlational and hierarchical regression analytic procedures. The partial correlation results linked each of the job retention strategies to job retention. Wages, benefits, training and supervision, communication, job growth, work/life balance, fairness and respect were all significantly related to job retention. Hierarchical regression results indicated that the training and supervision variable was the best predictor of job retention in the regression equation. ^ The size of the organization was also a significant predictor of job retention. Large organizations reported higher job retention rates than small organizations. There was no statistical difference between the types of organizations (profit-making and non-profit) and job retention. The standardized betas ranged from to .26 to .41 in the regression equation. Twenty percent of the variance in job retention was explained by the combination of demographic and job retention strategy predictors, supporting the theoretical, empirical, and practical relevance of understanding the association between employer job strategies and job retention outcomes. Implications for adult education and human resource development theory, research, and practice are highlighted as possible strategic leverage points for creating conditions that facilitate the development of job strategies as a means for improving former welfare workers’ job retention.^
Resumo:
long-term research on freshwater ecosystems provides insights that can be difficult to obtain from other approaches. Widespread monitoring of ecologically relevant water-quality parameters spanning decades can facilitate important tests of ecological principles. Unique long-term data sets and analytical tools are increasingly available, allowing for powerful and synthetic analyses across sites. long-term measurements or experiments in aquatic systems can catch rare events, changes in highly variable systems, time-lagged responses, cumulative effects of stressors, and biotic responses that encompass multiple generations. Data are available from formal networks, local to international agencies, private organizations, various institutions, and paleontological and historic records; brief literature surveys suggest much existing data are not synthesized. Ecological sciences will benefit from careful maintenance and analyses of existing long-term programs, and subsequent insights can aid in the design of effective future long-term experimental and observational efforts. long-term research on freshwaters is particularly important because of their value to humanity.
Resumo:
Modern data centers host hundreds of thousands of servers to achieve economies of scale. Such a huge number of servers create challenges for the data center network (DCN) to provide proportionally large bandwidth. In addition, the deployment of virtual machines (VMs) in data centers raises the requirements for efficient resource allocation and find-grained resource sharing. Further, the large number of servers and switches in the data center consume significant amounts of energy. Even though servers become more energy efficient with various energy saving techniques, DCN still accounts for 20% to 50% of the energy consumed by the entire data center. The objective of this dissertation is to enhance DCN performance as well as its energy efficiency by conducting optimizations on both host and network sides. First, as the DCN demands huge bisection bandwidth to interconnect all the servers, we propose a parallel packet switch (PPS) architecture that directly processes variable length packets without segmentation-and-reassembly (SAR). The proposed PPS achieves large bandwidth by combining switching capacities of multiple fabrics, and it further improves the switch throughput by avoiding padding bits in SAR. Second, since certain resource demands of the VM are bursty and demonstrate stochastic nature, to satisfy both deterministic and stochastic demands in VM placement, we propose the Max-Min Multidimensional Stochastic Bin Packing (M3SBP) algorithm. M3SBP calculates an equivalent deterministic value for the stochastic demands, and maximizes the minimum resource utilization ratio of each server. Third, to provide necessary traffic isolation for VMs that share the same physical network adapter, we propose the Flow-level Bandwidth Provisioning (FBP) algorithm. By reducing the flow scheduling problem to multiple stages of packet queuing problems, FBP guarantees the provisioned bandwidth and delay performance for each flow. Finally, while DCNs are typically provisioned with full bisection bandwidth, DCN traffic demonstrates fluctuating patterns, we propose a joint host-network optimization scheme to enhance the energy efficiency of DCNs during off-peak traffic hours. The proposed scheme utilizes a unified representation method that converts the VM placement problem to a routing problem and employs depth-first and best-fit search to find efficient paths for flows.
Resumo:
The standard highway assignment model in the Florida Standard Urban Transportation Modeling Structure (FSUTMS) is based on the equilibrium traffic assignment method. This method involves running several iterations of all-or-nothing capacity-restraint assignment with an adjustment of travel time to reflect delays encountered in the associated iteration. The iterative link time adjustment process is accomplished through the Bureau of Public Roads (BPR) volume-delay equation. Since FSUTMS' traffic assignment procedure outputs daily volumes, and the input capacities are given in hourly volumes, it is necessary to convert the hourly capacities to their daily equivalents when computing the volume-to-capacity ratios used in the BPR function. The conversion is accomplished by dividing the hourly capacity by a factor called the peak-to-daily ratio, or referred to as CONFAC in FSUTMS. The ratio is computed as the highest hourly volume of a day divided by the corresponding total daily volume. ^ While several studies have indicated that CONFAC is a decreasing function of the level of congestion, a constant value is used for each facility type in the current version of FSUTMS. This ignores the different congestion level associated with each roadway and is believed to be one of the culprits of traffic assignment errors. Traffic counts data from across the state of Florida were used to calibrate CONFACs as a function of a congestion measure using the weighted least squares method. The calibrated functions were then implemented in FSUTMS through a procedure that takes advantage of the iterative nature of FSUTMS' equilibrium assignment method. ^ The assignment results based on constant and variable CONFACs were then compared against the ground counts for three selected networks. It was found that the accuracy from the two assignments was not significantly different, that the hypothesized improvement in assignment results from the variable CONFAC model was not empirically evident. It was recognized that many other factors beyond the scope and control of this study could contribute to this finding. It was recommended that further studies focus on the use of the variable CONFAC model with recalibrated parameters for the BPR function and/or with other forms of volume-delay functions. ^