893 resultados para data driven approach


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Hierarchical visualization systems are desirable because a single two-dimensional visualization plot may not be sufficient to capture all of the interesting aspects of complex high-dimensional data sets. We extend an existing locally linear hierarchical visualization system PhiVis [1] in several directions: bf(1) we allow for em non-linear projection manifolds (the basic building block is the Generative Topographic Mapping -- GTM), bf(2) we introduce a general formulation of hierarchical probabilistic models consisting of local probabilistic models organized in a hierarchical tree, bf(3) we describe folding patterns of low-dimensional projection manifold in high-dimensional data space by computing and visualizing the manifold's local directional curvatures. Quantities such as magnification factors [3] and directional curvatures are helpful for understanding the layout of the nonlinear projection manifold in the data space and for further refinement of the hierarchical visualization plot. Like PhiVis, our system is statistically principled and is built interactively in a top-down fashion using the EM algorithm. We demonstrate the visualization system principle of the approach on a complex 12-dimensional data set and mention possible applications in the pharmaceutical industry.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

An interactive hierarchical Generative Topographic Mapping (HGTM) ¸iteHGTM has been developed to visualise complex data sets. In this paper, we build a more general visualisation system by extending the HGTM visualisation system in 3 directions: bf (1) We generalize HGTM to noise models from the exponential family of distributions. The basic building block is the Latent Trait Model (LTM) developed in ¸iteKabanpami. bf (2) We give the user a choice of initializing the child plots of the current plot in either em interactive, or em automatic mode. In the interactive mode the user interactively selects ``regions of interest'' as in ¸iteHGTM, whereas in the automatic mode an unsupervised minimum message length (MML)-driven construction of a mixture of LTMs is employed. bf (3) We derive general formulas for magnification factors in latent trait models. Magnification factors are a useful tool to improve our understanding of the visualisation plots, since they can highlight the boundaries between data clusters. The unsupervised construction is particularly useful when high-level plots are covered with dense clusters of highly overlapping data projections, making it difficult to use the interactive mode. Such a situation often arises when visualizing large data sets. We illustrate our approach on a toy example and apply our system to three more complex real data sets.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper is drawn from the use of data envelopment analysis (DEA) in helping a Portuguese bank to manage the performance of its branches. The bank wanted to set targets for the branches on such variables as growth in number of clients, growth in funds deposited and so on. Such variables can take positive and negative values but apart from some exceptions, traditional DEA models have hitherto been restricted to non-negative data. We report on the development of a model to handle unrestricted data in a DEA framework and illustrate the use of this model on data from the bank concerned.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Overlaying maps using a desktop GIS is often the first step of a multivariate spatial analysis. The potential of this operation has increased considerably as data sources an dWeb services to manipulate them are becoming widely available via the Internet. Standards from the OGC enable such geospatial ‘mashups’ to be seamless and user driven, involving discovery of thematic data. The user is naturally inclined to look for spatial clusters and ‘correlation’ of outcomes. Using classical cluster detection scan methods to identify multivariate associations can be problematic in this context, because of a lack of control on or knowledge about background populations. For public health and epidemiological mapping, this limiting factor can be critical but often the focus is on spatial identification of risk factors associated with health or clinical status. In this article we point out that this association itself can ensure some control on underlying populations, and develop an exploratory scan statistic framework for multivariate associations. Inference using statistical map methodologies can be used to test the clustered associations. The approach is illustrated with a hypothetical data example and an epidemiological study on community MRSA. Scenarios of potential use for online mashups are introduced but full implementation is left for further research.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

It is generally believed that the structural reforms that were introduced in India following the macro-economic crisis of 1991 ushered in competition and forced companies to become more efficient. However, whether the post-1991 growth is an outcome of more efficient use of resources or greater use of factor inputs remains an open empirical question. In this paper, we use plant-level data from 1989–1990 and 2000–2001 to address this question. Our results indicate that while there was an increase in the productivity of factor inputs during the 1990s, most of the growth in value added is explained by growth in the use of factor inputs. We also find that median technical efficiency declined in all but one of the industries between 1989–1990 and 2000–2001, and that change in technical efficiency explains a very small proportion of the change in gross value added.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Data envelopment analysis (DEA) as introduced by Charnes, Cooper, and Rhodes (1978) is a linear programming technique that has widely been used to evaluate the relative efficiency of a set of homogenous decision making units (DMUs). In many real applications, the input-output variables cannot be precisely measured. This is particularly important in assessing efficiency of DMUs using DEA, since the efficiency score of inefficient DMUs are very sensitive to possible data errors. Hence, several approaches have been proposed to deal with imprecise data. Perhaps the most popular fuzzy DEA model is based on a-cut. One drawback of the a-cut approach is that it cannot include all information about uncertainty. This paper aims to introduce an alternative linear programming model that can include some uncertainty information from the intervals within the a-cut approach. We introduce the concept of "local a-level" to develop a multi-objective linear programming to measure the efficiency of DMUs under uncertainty. An example is given to illustrate the use of this method.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We argue that, for certain constrained domains, elaborate model transformation technologies-implemented from scratch in general-purpose programming languages-are unnecessary for model-driven engineering; instead, lightweight configuration of commercial off-the-shelf productivity tools suffices. In particular, in the CancerGrid project, we have been developing model-driven techniques for the generation of software tools to support clinical trials. A domain metamodel captures the community's best practice in trial design. A scientist authors a trial protocol, modelling their trial by instantiating the metamodel; customized software artifacts to support trial execution are generated automatically from the scientist's model. The metamodel is expressed as an XML Schema, in such a way that it can be instantiated by completing a form to generate a conformant XML document. The same process works at a second level for trial execution: among the artifacts generated from the protocol are models of the data to be collected, and the clinician conducting the trial instantiates such models in reporting observations-again by completing a form to create a conformant XML document, representing the data gathered during that observation. Simple standard form management tools are all that is needed. Our approach is applicable to a wide variety of information-modelling domains: not just clinical trials, but also electronic public sector computing, customer relationship management, document workflow, and so on. © 2012 Springer-Verlag.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Grape is one of the world's largest fruit crops with approximately 67.5 million tonnes produced each year and energy is an important element in modern grape productions as it heavily depends on fossil and other energy resources. Efficient use of these energies is a necessary step toward reducing environmental hazards, preventing destruction of natural resources and ensuring agricultural sustainability. Hence, identifying excessive use of energy as well as reducing energy resources is the main focus of this paper to optimize energy consumption in grape production.In this study we use a two-stage methodology to find the association of energy efficiency and performance explained by farmers' specific characteristics. In the first stage a non-parametric Data Envelopment Analysis is used to model efficiencies as an explicit function of human labor, machinery, chemicals, FYM (farmyard manure), diesel fuel, electricity and water for irrigation energies. In the second step, farm specific variables such as farmers' age, gender, level of education and agricultural experience are used in a Tobit regression framework to explain how these factors influence efficiency of grape farming.The result of the first stage shows substantial inefficiency between the grape producers in the studied area while the second stage shows that the main difference between efficient and inefficient farmers was in the use of chemicals, diesel fuel and water for irrigation. The use of chemicals such as insecticides, herbicides and fungicides were considerably less than inefficient ones. The results revealed that the more educated farmers are more energy efficient in comparison with their less educated counterparts. © 2013.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The Securities and Exchange Commission (SEC) in the United States and in particular its immediately past chairman, Christopher Cox, has been actively promoting an upgrade of the EDGAR system of disseminating filings. The new generation of information provision has been dubbed by Chairman Cox, "Interactive Data" (SEC, 2006). In October this year the Office of Interactive Disclosure was created(http://www.sec.gov/news/press/2007/2007-213.htm). The focus of this paper is to examine the way in which the non-professional investor has been constructed by various actors. We examine the manner in which Interactive Data has been sold as the panacea for financial market 'irregularities' by the SEC and others. The academic literature shows almost no evidence of researching non-professional investors in any real sense (Young, 2006). Both this literature and the behaviour of representatives of institutions such as the SEC and FSA appears to find it convenient to construct this class of investor in a particular form and to speak for them. We theorise the activities of the SEC and its chairman in particular over a period of about three years, both following and prior to the 'credit crunch'. Our approach is to examine a selection of the policy documents released by the SEC and other interested parties and the statements made by some of the policy makers and regulators central to the programme to advance the socio-technical project that is constituted by Interactive Data. We adopt insights from ANT and more particularly the sociology of translation (Callon, 1986; Latour, 1987, 2005; Law, 1996, 2002; Law & Singleton, 2005) to show how individuals and regulators have acted as spokespersons for this malleable class of investor. We theorise the processes of accountability to investors and others and in so doing reveal the regulatory bodies taking the regulated for granted. The possible implications of technological developments in digital reporting have been identified also by the CEO's of the six biggest audit firms in a discussion document on the role of accounting information and audit in the future of global capital markets (DiPiazza et al., 2006). The potential for digital reporting enabled through XBRL to "revolutionize the entire company reporting model" (p.16) is discussed and they conclude that the new model "should be driven by the wants of investors and other users of company information,..." (p.17; emphasis in the original). Here rather than examine the somewhat illusive and vexing question of whether adding interactive functionality to 'traditional' reports can achieve the benefits claimed for nonprofessional investors we wish to consider the rhetorical and discursive moves in which the SEC and others have engaged to present such developments as providing clearer reporting and accountability standards and serving the interests of this constructed and largely unknown group - the non-professional investor.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Data envelopment analysis (DEA) is the most widely used methods for measuring the efficiency and productivity of decision-making units (DMUs). The need for huge computer resources in terms of memory and CPU time in DEA is inevitable for a large-scale data set, especially with negative measures. In recent years, wide ranges of studies have been conducted in the area of artificial neural network and DEA combined methods. In this study, a supervised feed-forward neural network is proposed to evaluate the efficiency and productivity of large-scale data sets with negative values in contrast to the corresponding DEA method. Results indicate that the proposed network has some computational advantages over the corresponding DEA models; therefore, it can be considered as a useful tool for measuring the efficiency of DMUs with (large-scale) negative data.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Purpose - The purpose of this paper is to assess high-dimensional visualisation, combined with pattern matching, as an approach to observing dynamic changes in the ways people tweet about science topics. Design/methodology/approach - The high-dimensional visualisation approach was applied to three scientific topics to test its effectiveness for longitudinal analysis of message framing on Twitter over two disjoint periods in time. The paper uses coding frames to drive categorisation and visual analytics of tweets discussing the science topics. Findings - The findings point to the potential of this mixed methods approach, as it allows sufficiently high sensitivity to recognise and support the analysis of non-trending as well as trending topics on Twitter. Research limitations/implications - Three topics are studied and these illustrate a range of frames, but results may not be representative of all scientific topics. Social implications - Funding bodies increasingly encourage scientists to participate in public engagement. As social media provides an avenue actively utilised for public communication, understanding the nature of the dialog on this medium is important for the scientific community and the public at large. Originality/value - This study differs from standard approaches to the analysis of microblog data, which tend to focus on machine driven analysis large-scale datasets. It provides evidence that this approach enables practical and effective analysis of the content of midsize to large collections of microposts.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This research is focused on the optimisation of resource utilisation in wireless mobile networks with the consideration of the users’ experienced quality of video streaming services. The study specifically considers the new generation of mobile communication networks, i.e. 4G-LTE, as the main research context. The background study provides an overview of the main properties of the relevant technologies investigated. These include video streaming protocols and networks, video service quality assessment methods, the infrastructure and related functionalities of LTE, and resource allocation algorithms in mobile communication systems. A mathematical model based on an objective and no-reference quality assessment metric for video streaming, namely Pause Intensity, is developed in this work for the evaluation of the continuity of streaming services. The analytical model is verified by extensive simulation and subjective testing on the joint impairment effects of the pause duration and pause frequency. Various types of the video contents and different levels of the impairments have been used in the process of validation tests. It has been shown that Pause Intensity is closely correlated with the subjective quality measurement in terms of the Mean Opinion Score and this correlation property is content independent. Based on the Pause Intensity metric, an optimised resource allocation approach is proposed for the given user requirements, communication system specifications and network performances. This approach concerns both system efficiency and fairness when establishing appropriate resource allocation algorithms, together with the consideration of the correlation between the required and allocated data rates per user. Pause Intensity plays a key role here, representing the required level of Quality of Experience (QoE) to ensure the best balance between system efficiency and fairness. The 3GPP Long Term Evolution (LTE) system is used as the main application environment where the proposed research framework is examined and the results are compared with existing scheduling methods on the achievable fairness, efficiency and correlation. Adaptive video streaming technologies are also investigated and combined with our initiatives on determining the distribution of QoE performance across the network. The resulting scheduling process is controlled through the prioritization of users by considering their perceived quality for the services received. Meanwhile, a trade-off between fairness and efficiency is maintained through an online adjustment of the scheduler’s parameters. Furthermore, Pause Intensity is applied to act as a regulator to realise the rate adaptation function during the end user’s playback of the adaptive streaming service. The adaptive rates under various channel conditions and the shape of the QoE distribution amongst the users for different scheduling policies have been demonstrated in the context of LTE. Finally, the work for interworking between mobile communication system at the macro-cell level and the different deployments of WiFi technologies throughout the macro-cell is presented. A QoEdriven approach is proposed to analyse the offloading mechanism of the user’s data (e.g. video traffic) while the new rate distribution algorithm reshapes the network capacity across the macrocell. The scheduling policy derived is used to regulate the performance of the resource allocation across the fair-efficient spectrum. The associated offloading mechanism can properly control the number of the users within the coverages of the macro-cell base station and each of the WiFi access points involved. The performance of the non-seamless and user-controlled mobile traffic offloading (through the mobile WiFi devices) has been evaluated and compared with that of the standard operator-controlled WiFi hotspots.