913 resultados para Data-driven analysis
Resumo:
We employ a non-parametrical approach to growth accounting (Data Envelopment Analysis,DEA) to disentangle the proximate sources of labour productivity growth in 41 nationsbetween 1929 and 1950 by decomposing productivity growth into four components:technological change; efficiency catch-up (movements towards the production frontier),capital accumulation and human capital accumulation. We show that efficiency catch-upgenerally explains productivity growth, whereas technological change and factoraccumulation were limited and distorted by the effects of war. War clearly hamperedefficiency. Moreover, an unbalanced ratio of human capital to physical capital (a gap to thetechnological leader) was crucial for efficiency catching-up.
Resumo:
The goal of this paper is to present an optimal resource allocation model for the regional allocation of public service inputs. Theproposed solution leads to maximise the relative public service availability in regions located below the best availability frontier, subject to exogenous budget restrictions and equality ofaccess for equal need criteria (equity-based notion of regional needs). The construction of non-parametric deficit indicators is proposed for public service availability by a novel application of Data Envelopment Analysis (DEA) models, whose results offer advantages for the evaluation and improvement of decentralised public resource allocation systems. The method introduced in this paper has relevance as a resource allocation guide for the majority of services centrally funded by the public sector in a given country, such as health care, basic and higher education, citizen safety, justice, transportation, environmental protection, leisure, culture, housing and city planning, etc.
Resumo:
In this article we examine the potential effect of market structureon hospital technical efficiency as a measure of performance controlled byownership and regulation. This study is relevant to provide an evaluationof the potential effects of recommended and initiated deregulation policiesin order to promote market reforms in the context of a European NationalHealth Service. Our goal was reached through three main empirical stages.Firstly, using patient origin data from hospitals in the region of Cataloniain 1990, we estimated geographic hospital markets through the Elzinga--Hogartyapproach, based on patient flows. Then we measured the market level ofconcentration using the Herfindahl--Hirschman index. Secondly, technicaland scale efficiency scores for each hospital was obtained specifying aData Envelopment Analysis. According to the data nearly two--thirds of thehospitals operate under the production frontier with an average efficiencyscore of 0.841. Finally, the determinants of the efficiency scores wereinvestigated using a censored regression model. Special attention waspaid to test the hypothesis that there is an efficiency improvement in morecompetitive markets. The results suggest that the number of competitors inthe market contributes positively to technical efficiency and there is someevidence that the differences in efficiency scores are attributed toseveral environmental factors such as ownership, market structure andregulation effects.
Resumo:
Detecting local differences between groups of connectomes is a great challenge in neuroimaging, because the large number of tests that have to be performed and the impact on multiplicity correction. Any available information should be exploited to increase the power of detecting true between-group effects. We present an adaptive strategy that exploits the data structure and the prior information concerning positive dependence between nodes and connections, without relying on strong assumptions. As a first step, we decompose the brain network, i.e., the connectome, into subnetworks and we apply a screening at the subnetwork level. The subnetworks are defined either according to prior knowledge or by applying a data driven algorithm. Given the results of the screening step, a filtering is performed to seek real differences at the node/connection level. The proposed strategy could be used to strongly control either the family-wise error rate or the false discovery rate. We show by means of different simulations the benefit of the proposed strategy, and we present a real application of comparing connectomes of preschool children and adolescents.
Resumo:
BACKGROUND: By analyzing human immunodeficiency virus type 1 (HIV-1) pol sequences from the Swiss HIV Cohort Study (SHCS), we explored whether the prevalence of non-B subtypes reflects domestic transmission or migration patterns. METHODS: Swiss non-B sequences and sequences collected abroad were pooled to construct maximum likelihood trees, which were analyzed for Swiss-specific subepidemics, (subtrees including ≥80% Swiss sequences, bootstrap >70%; macroscale analysis) or evidence for domestic transmission (sequence pairs with genetic distance <1.5%, bootstrap ≥98%; microscale analysis). RESULTS: Of 8287 SHCS participants, 1732 (21%) were infected with non-B subtypes, of which A (n = 328), C (n = 272), CRF01_AE (n = 258), and CRF02_AG (n = 285) were studied further. The macroscale analysis revealed that 21% (A), 16% (C), 24% (CRF01_AE), and 28% (CRF02_AG) belonged to Swiss-specific subepidemics. The microscale analysis identified 26 possible transmission pairs: 3 (12%) including only homosexual Swiss men of white ethnicity; 3 (12%) including homosexual white men from Switzerland and partners from foreign countries; and 10 (38%) involving heterosexual white Swiss men and females of different nationality and predominantly nonwhite ethnicity. CONCLUSIONS: Of all non-B infections diagnosed in Switzerland, <25% could be prevented by domestic interventions. Awareness should be raised among immigrants and Swiss individuals with partners from high prevalence countries to contain the spread of non-B subtypes.
Resumo:
TEXTABLE est un nouvel outil open source de programmation visuelle pour l'analyse de données textuelles. Les implications de la conception de ce logiciel du point de vue de l'interopérabilité et de la flexibilité sont abordées, ainsi que la question que son adéquation pour un usage pédagogique. Une brève introduction aux principes de la programmation visuelle pour l'analyse de données textuelles est également proposée.
Resumo:
This contribution introduces Data Envelopment Analysis (DEA), a performance measurement technique. DEA helps decision makers for the following reasons: (1) By calculating an efficiency score, it indicates if a firm is efficient or has capacity for improvement; (2) By setting target values for input and output, it calculates how much input must be decreased or output increased in order to become efficient; (3) By identifying the nature of returns to scale, it indicates if a firm has to decrease or increase its scale (or size) in order to minimise the average total cost; (4) By identifying a set of benchmarks, it specifies which other firms' processes need to be analysed in order to improve its own practices. This contribution presents the essentials about DEA, alongside a case study to intuitively understand its application. It also introduces Win4DEAP, a software package that conducts efficiency analysis based on DEA methodology. The methodical background of DEA is presented for more demanding readers. Finally, four advanced topics of DEA are treated: adjustment to the environment, preferences, sensitivity analysis and time series data.
Resumo:
Uncertainty quantification of petroleum reservoir models is one of the present challenges, which is usually approached with a wide range of geostatistical tools linked with statistical optimisation or/and inference algorithms. The paper considers a data driven approach in modelling uncertainty in spatial predictions. Proposed semi-supervised Support Vector Regression (SVR) model has demonstrated its capability to represent realistic features and describe stochastic variability and non-uniqueness of spatial properties. It is able to capture and preserve key spatial dependencies such as connectivity, which is often difficult to achieve with two-point geostatistical models. Semi-supervised SVR is designed to integrate various kinds of conditioning data and learn dependences from them. A stochastic semi-supervised SVR model is integrated into a Bayesian framework to quantify uncertainty with multiple models fitted to dynamic observations. The developed approach is illustrated with a reservoir case study. The resulting probabilistic production forecasts are described by uncertainty envelopes.
Resumo:
[spa] La implementación de un programa de subvenciones públicas a proyectos empresariales de I+D comporta establecer un sistema de selección de proyectos. Esta selección se enfrenta a problemas relevantes, como son la medición del posible rendimiento de los proyectos de I+D y la optimización del proceso de selección entre proyectos con múltiples y a veces incomparables medidas de resultados. Las agencias públicas utilizan mayoritariamente el método peer review que, aunque presenta ventajas, no está exento de críticas. En cambio, las empresas privadas con el objetivo de optimizar su inversión en I+D utilizan métodos más cuantitativos, como el Data Envelopment Análisis (DEA). En este trabajo se compara la actuación de los evaluadores de una agencia pública (peer review) con una metodología alternativa de selección de proyectos como es el DEA.
Resumo:
[spa] La implementación de un programa de subvenciones públicas a proyectos empresariales de I+D comporta establecer un sistema de selección de proyectos. Esta selección se enfrenta a problemas relevantes, como son la medición del posible rendimiento de los proyectos de I+D y la optimización del proceso de selección entre proyectos con múltiples y a veces incomparables medidas de resultados. Las agencias públicas utilizan mayoritariamente el método peer review que, aunque presenta ventajas, no está exento de críticas. En cambio, las empresas privadas con el objetivo de optimizar su inversión en I+D utilizan métodos más cuantitativos, como el Data Envelopment Análisis (DEA). En este trabajo se compara la actuación de los evaluadores de una agencia pública (peer review) con una metodología alternativa de selección de proyectos como es el DEA.
Resumo:
Among the types of remote sensing acquisitions, optical images are certainly one of the most widely relied upon data sources for Earth observation. They provide detailed measurements of the electromagnetic radiation reflected or emitted by each pixel in the scene. Through a process termed supervised land-cover classification, this allows to automatically yet accurately distinguish objects at the surface of our planet. In this respect, when producing a land-cover map of the surveyed area, the availability of training examples representative of each thematic class is crucial for the success of the classification procedure. However, in real applications, due to several constraints on the sample collection process, labeled pixels are usually scarce. When analyzing an image for which those key samples are unavailable, a viable solution consists in resorting to the ground truth data of other previously acquired images. This option is attractive but several factors such as atmospheric, ground and acquisition conditions can cause radiometric differences between the images, hindering therefore the transfer of knowledge from one image to another. The goal of this Thesis is to supply remote sensing image analysts with suitable processing techniques to ensure a robust portability of the classification models across different images. The ultimate purpose is to map the land-cover classes over large spatial and temporal extents with minimal ground information. To overcome, or simply quantify, the observed shifts in the statistical distribution of the spectra of the materials, we study four approaches issued from the field of machine learning. First, we propose a strategy to intelligently sample the image of interest to collect the labels only in correspondence of the most useful pixels. This iterative routine is based on a constant evaluation of the pertinence to the new image of the initial training data actually belonging to a different image. Second, an approach to reduce the radiometric differences among the images by projecting the respective pixels in a common new data space is presented. We analyze a kernel-based feature extraction framework suited for such problems, showing that, after this relative normalization, the cross-image generalization abilities of a classifier are highly increased. Third, we test a new data-driven measure of distance between probability distributions to assess the distortions caused by differences in the acquisition geometry affecting series of multi-angle images. Also, we gauge the portability of classification models through the sequences. In both exercises, the efficacy of classic physically- and statistically-based normalization methods is discussed. Finally, we explore a new family of approaches based on sparse representations of the samples to reciprocally convert the data space of two images. The projection function bridging the images allows a synthesis of new pixels with more similar characteristics ultimately facilitating the land-cover mapping across images.
Resumo:
The distribution of mitochondrial control region-sequence polymorphism was investigated in 15 populations of Crocidura russula along an altitudinal gradient in western Switzerland. High-altitude populations are smaller, sparser and appear to undergo frequent bottlenecks. Accordingly, they showed a loss of rare haplotypes, but unexpectedly, were less differentiated than lowland populations. Furthermore, the major haplotypes segregated significantly with altitude. The results were inconsistent with a simple model of drift and dispersal. They suggested instead a role for historical patterns of colonization, or, alternatively, present-day selective forces acting on one of the mitochondrial genes involved in metabolic pathways.
Resumo:
[spa] La implementación de un programa de subvenciones públicas a proyectos empresariales de I+D comporta establecer un sistema de selección de proyectos. Esta selección se enfrenta a problemas relevantes, como son la medición del posible rendimiento de los proyectos de I+D y la optimización del proceso de selección entre proyectos con múltiples y a veces incomparables medidas de resultados. Las agencias públicas utilizan mayoritariamente el método peer review que, aunque presenta ventajas, no está exento de críticas. En cambio, las empresas privadas con el objetivo de optimizar su inversión en I+D utilizan métodos más cuantitativos, como el Data Envelopment Análisis (DEA). En este trabajo se compara la actuación de los evaluadores de una agencia pública (peer review) con una metodología alternativa de selección de proyectos como es el DEA.
Resumo:
Measuring school efficiency is a challenging task. First, a performance measurement technique has to be selected. Within Data Envelopment Analysis (DEA), one such technique, alternative models have been developed in order to deal with environmental variables. The majority of these models lead to diverging results. Second, the choice of input and output variables to be included in the efficiency analysis is often dictated by data availability. The choice of the variables remains an issue even when data is available. As a result, the choice of technique, model and variables is probably, and ultimately, a political judgement. Multi-criteria decision analysis methods can help the decision makers to select the most suitable model. The number of selection criteria should remain parsimonious and not be oriented towards the results of the models in order to avoid opportunistic behaviour. The selection criteria should also be backed by the literature or by an expert group. Once the most suitable model is identified, the principle of permanence of methods should be applied in order to avoid a change of practices over time. Within DEA, the two-stage model developed by Ray (1991) is the most convincing model which allows for an environmental adjustment. In this model, an efficiency analysis is conducted with DEA followed by an econometric analysis to explain the efficiency scores. An environmental variable of particular interest, tested in this thesis, consists of the fact that operations are held, for certain schools, on multiple sites. Results show that the fact of being located on more than one site has a negative influence on efficiency. A likely way to solve this negative influence would consist of improving the use of ICT in school management and teaching. Planning new schools should also consider the advantages of being located on a unique site, which allows reaching a critical size in terms of pupils and teachers. The fact that underprivileged pupils perform worse than privileged pupils has been public knowledge since Coleman et al. (1966). As a result, underprivileged pupils have a negative influence on school efficiency. This is confirmed by this thesis for the first time in Switzerland. Several countries have developed priority education policies in order to compensate for the negative impact of disadvantaged socioeconomic status on school performance. These policies have failed. As a result, other actions need to be taken. In order to define these actions, one has to identify the social-class differences which explain why disadvantaged children underperform. Childrearing and literary practices, health characteristics, housing stability and economic security influence pupil achievement. Rather than allocating more resources to schools, policymakers should therefore focus on related social policies. For instance, they could define pre-school, family, health, housing and benefits policies in order to improve the conditions for disadvantaged children.
Resumo:
In this paper, we develop a data-driven methodology to characterize the likelihood of orographic precipitation enhancement using sequences of weather radar images and a digital elevation model (DEM). Geographical locations with topographic characteristics favorable to enforce repeatable and persistent orographic precipitation such as stationary cells, upslope rainfall enhancement, and repeated convective initiation are detected by analyzing the spatial distribution of a set of precipitation cells extracted from radar imagery. Topographic features such as terrain convexity and gradients computed from the DEM at multiple spatial scales as well as velocity fields estimated from sequences of weather radar images are used as explanatory factors to describe the occurrence of localized precipitation enhancement. The latter is represented as a binary process by defining a threshold on the number of cell occurrences at particular locations. Both two-class and one-class support vector machine classifiers are tested to separate the presumed orographic cells from the nonorographic ones in the space of contributing topographic and flow features. Site-based validation is carried out to estimate realistic generalization skills of the obtained spatial prediction models. Due to the high class separability, the decision function of the classifiers can be interpreted as a likelihood or susceptibility of orographic precipitation enhancement. The developed approach can serve as a basis for refining radar-based quantitative precipitation estimates and short-term forecasts or for generating stochastic precipitation ensembles conditioned on the local topography.