854 resultados para data warehouse tuning aggregato business intelligence performance
Resumo:
Our empirical literature review shows that little is known about how firm performance changes with age, presumably because of the paucity of data on firm age. For Spanish manufacturing firms, we analyse the firm performance related to firm age between 1998 and 2006. We find evidence that firms improve with age, because ageing firms are observed to have steadily increasing levels of productivity, higher profits, larger size, lower debt ratios, and higher equity ratios. Furthermore, older firms are better able to convert sales growth into subsequent growth of profits and productivity. On the other hand, we also found evidence that firm performance deteriorates with age. Older firms have lower expected growth rates of sales, profits and productivity, they have lower profitability levels (when other variables such as size are controlled for), and also that they appear to be less capable to convert employment growth into growth of sales, profits and productivity.
Resumo:
A monthly survey of Aedes aegypti and Aedes albopictus immatures in discarded tires at a site in metropolitan Rio de Janeiro showed that Ae. albopictus was much more abundant in the rainy season, but Ae. aegypti abundance showed a less clear seasonal pattern. Pupal masses for Ae. albopictus showed a seasonal trend. In contrast, Ae. aegypti pupae did not show any clear trend in weight. Large Ae. albopictus pupae were found in the warmer months, when water volume was higher, pH lower, and larval abundance lower. Further studies should be carried out to assess how seasonal variations in body size may impact vector competence of these species in Brazil.
Resumo:
Resum en anglès del projecte de recerca L'empresa xarxa a Catalunya. TIC, productivitat, competitivitat, salaris i beneficis a l'empresa catalana té com a objectiu principal constatar que la consolidació d'un nou model estratègic, organitzatiu i d'activitat empresarial, vinculat amb la inversió i l'ús de les TIC (o empresa xarxa), modifica substancialment els patrons de comportament dels resultats empresarials, en especial la productivitat, la competitivitat, les retribucions dels treballadors i el benefici. La contrastació empírica de les hipòtesis de treball l'hem feta per mitjà de les dades d'una enquesta a una mostra representativa de 2.038 empreses catalanes. Amb la perspectiva de l'impacte de la inversió i l'ús de les TIC no s'aprecia una relació directa entre els processos d'innovació digital i els resultats de l'activitat de l'empresa catalana. En aquest sentit, hem hagut de segmentar el teixit productiu català per a buscar les organitzacions en què el procés de coinnovació tecnològica digital i organitzativa és més present i en què la intensitat de l'ús del coneixement és un recurs molt freqüent per a poder copsar impactes rellevants en els principals resultats empresarials. Això és així perquè l'economia catalana, avui, presenta una estructura productiva dual.
Resumo:
The low levels of unemployment recorded in the UK in recent years are widely cited asevidence of the country’s improved economic performance, and the apparent convergence of unemployment rates across the country’s regions used to suggest that the longstanding divide in living standards between the relatively prosperous ‘south’ and the more depressed ‘north’ has been substantially narrowed. Dissenters from theseconclusions have drawn attention to the greatly increased extent of non-employment(around a quarter of the UK’s working age population are not in employment) and themarked regional dimension in its distribution across the country. Amongst these dissenters it is generally agreed that non-employment is concentrated amongst oldermales previously employed in the now very much smaller ‘heavy’ industries (e.g. coal,steel, shipbuilding).This paper uses the tools of compositiona l data analysis to provide a much richer picture of non-employment and one which challenges the conventional analysis wisdom about UK labour market performance as well as the dissenters view of the nature of theproblem. It is shown that, associated with the striking ‘north/south’ divide in nonemployment rates, there is a statistically significant relationship between the size of the non-employment rate and the composition of non-employment. Specifically, it is shown that the share of unemployment in non-employment is negatively correlated with the overall non-employment rate: in regions where the non-employment rate is high the share of unemployment is relatively low. So the unemployment rate is not a very reliable indicator of regional disparities in labour market performance. Even more importantly from a policy viewpoint, a significant positive relationship is found between the size ofthe non-employment rate and the share of those not employed through reason of sicknessor disability and it seems (contrary to the dissenters) that this connection is just as strong for women as it is for men
Resumo:
In the B-ISDN there is a provision for four classes of services, all of them supported by a single transport network (the ATM network). Three of these services, the connected oriented (CO) ones, permit connection access control (CAC) but the fourth, the connectionless oriented (CLO) one, does not. Therefore, when CLO service and CO services have to share the same ATM link, a conflict may arise. This is because a bandwidth allocation to obtain maximum statistical gain can damage the contracted ATM quality of service (QOS); and vice versa, in order to guarantee the contracted QOS, the statistical gain have to be sacrificed. The paper presents a performance evaluation study of the influence of the CLO service on a CO service (a circuit emulation service or a variable bit-rate service) when sharing the same link
Resumo:
The aim of this study is to develop a model measuring the performance of cities' marketing efforts. The model and the benchmarking methodology presented can be used by local authorities to position their marketing efforts and achievements against other (competing) cities and to identify best practices that can assist place marketers in learning how to be more efficient obtaining desired place marketing results, e.g., improved city brand image, with the available resources/budgets. The major implication for practitioners is that place marketing should be managed as a process, taking into account both the resource flows and the outputs, as well as the efficiency of this process.
Resumo:
We use CEX repeated cross-section data on consumption and income, to evaluate the nature of increased income inequality in the 1980s and 90s. We decompose unexpected changes in family income into transitory and permanent, and idiosyncratic and aggregate components, and estimate the contribution of each component to total inequality. The model we use is a linearized incomplete markets model, enriched to incorporate risk-sharing while maintaining tractability. Our estimates suggest that taking risk sharing into account is important for the model fit; that the increase in inequality in the 1980s was mainly permanent; and that inequality is driven almost entirely by idiosyncratic income risk. In addition we find no evidence for cyclical behavior of consumption risk, casting doubt on Constantinides and Duffie s (1995) explanation for the equity premium puzzle.
Resumo:
Revenue management (RM) is a complicated business process that can best be described ascontrol of sales (using prices, restrictions, or capacity), usually using software as a tool to aiddecisions. RM software can play a mere informative role, supplying analysts with formatted andsummarized data who use it to make control decisions (setting a price or allocating capacity fora price point), or, play a deeper role, automating the decisions process completely, at the otherextreme. The RM models and algorithms in the academic literature by and large concentrateon the latter, completely automated, level of functionality.A firm considering using a new RM model or RM system needs to evaluate its performance.Academic papers justify the performance of their models using simulations, where customerbooking requests are simulated according to some process and model, and the revenue perfor-mance of the algorithm compared to an alternate set of algorithms. Such simulations, whilean accepted part of the academic literature, and indeed providing research insight, often lackcredibility with management. Even methodologically, they are usually awed, as the simula-tions only test \within-model" performance, and say nothing as to the appropriateness of themodel in the first place. Even simulations that test against alternate models or competition arelimited by their inherent necessity on fixing some model as the universe for their testing. Theseproblems are exacerbated with RM models that attempt to model customer purchase behav-ior or competition, as the right models for competitive actions or customer purchases remainsomewhat of a mystery, or at least with no consensus on their validity.How then to validate a model? Putting it another way, we want to show that a particularmodel or algorithm is the cause of a certain improvement to the RM process compared to theexisting process. We take care to emphasize that we want to prove the said model as the causeof performance, and to compare against a (incumbent) process rather than against an alternatemodel.In this paper we describe a \live" testing experiment that we conducted at Iberia Airlineson a set of flights. A set of competing algorithms control a set of flights during adjacentweeks, and their behavior and results are observed over a relatively long period of time (9months). In parallel, a group of control flights were managed using the traditional mix of manualand algorithmic control (incumbent system). Such \sandbox" testing, while common at manylarge internet search and e-commerce companies is relatively rare in the revenue managementarea. Sandbox testing has an undisputable model of customer behavior but the experimentaldesign and analysis of results is less clear. In this paper we describe the philosophy behind theexperiment, the organizational challenges, the design and setup of the experiment, and outlinethe analysis of the results. This paper is a complement to a (more technical) related paper thatdescribes the econometrics and statistical analysis of the results.
Batch effect confounding leads to strong bias in performance estimates obtained by cross-validation.
Resumo:
BACKGROUND: With the large amount of biological data that is currently publicly available, many investigators combine multiple data sets to increase the sample size and potentially also the power of their analyses. However, technical differences ("batch effects") as well as differences in sample composition between the data sets may significantly affect the ability to draw generalizable conclusions from such studies. FOCUS: The current study focuses on the construction of classifiers, and the use of cross-validation to estimate their performance. In particular, we investigate the impact of batch effects and differences in sample composition between batches on the accuracy of the classification performance estimate obtained via cross-validation. The focus on estimation bias is a main difference compared to previous studies, which have mostly focused on the predictive performance and how it relates to the presence of batch effects. DATA: We work on simulated data sets. To have realistic intensity distributions, we use real gene expression data as the basis for our simulation. Random samples from this expression matrix are selected and assigned to group 1 (e.g., 'control') or group 2 (e.g., 'treated'). We introduce batch effects and select some features to be differentially expressed between the two groups. We consider several scenarios for our study, most importantly different levels of confounding between groups and batch effects. METHODS: We focus on well-known classifiers: logistic regression, Support Vector Machines (SVM), k-nearest neighbors (kNN) and Random Forests (RF). Feature selection is performed with the Wilcoxon test or the lasso. Parameter tuning and feature selection, as well as the estimation of the prediction performance of each classifier, is performed within a nested cross-validation scheme. The estimated classification performance is then compared to what is obtained when applying the classifier to independent data.
Resumo:
Gaia is the most ambitious space astrometry mission currently envisaged and is a technological challenge in all its aspects. We describe a proposal for the payload data handling system of Gaia, as an example of a high-performance, real-time, concurrent, and pipelined data system. This proposal includes the front-end systems for the instrumentation, the data acquisition and management modules, the star data processing modules, and the payload data handling unit. We also review other payload and service module elements and we illustrate a data flux proposal.
Resumo:
Des nombreuses études ont montré une augmentation des scores aux tests d'aptitudes à travers les générations (« effet Flynn »). Différentes hypothèses d'ordre biologique, social et/ou éducationnels ont été élaborées afin d'expliquer ce phénomène. L'objectif de cette recherche est d'examiner l'évolution des performances aux tests d'aptitudes sur la base d'étalonnages datant de 1991 et de 2002. Les résultats suggèrent une inversion non homogène de l'effet Flynn. La diminution concerne plus particulièrement les tests d'aptitudes scolaires, comme ceux évaluant le facteur verbal et numérique. Cette étude pourrait refléter un changement de l'importance accordée aux différentes aptitudes peu évaluées en orientation scolaire et professionnelle.
Resumo:
Resum en anglès del projecte de recerca L'empresa xarxa a Catalunya. TIC, productivitat, competitivitat, salaris i beneficis a l'empresa catalana té com a objectiu principal constatar que la consolidació d'un nou model estratègic, organitzatiu i d'activitat empresarial, vinculat amb la inversió i l'ús de les TIC (o empresa xarxa), modifica substancialment els patrons de comportament dels resultats empresarials, en especial la productivitat, la competitivitat, les retribucions dels treballadors i el benefici. La contrastació empírica de les hipòtesis de treball l'hem feta per mitjà de les dades d'una enquesta a una mostra representativa de 2.038 empreses catalanes. Amb la perspectiva de l'impacte de la inversió i l'ús de les TIC no s'aprecia una relació directa entre els processos d'innovació digital i els resultats de l'activitat de l'empresa catalana. En aquest sentit, hem hagut de segmentar el teixit productiu català per a buscar les organitzacions en què el procés de coinnovació tecnològica digital i organitzativa és més present i en què la intensitat de l'ús del coneixement és un recurs molt freqüent per a poder copsar impactes rellevants en els principals resultats empresarials. Això és així perquè l'economia catalana, avui, presenta una estructura productiva dual.
Resumo:
Résumé Cette thèse est consacrée à l'analyse, la modélisation et la visualisation de données environnementales à référence spatiale à l'aide d'algorithmes d'apprentissage automatique (Machine Learning). L'apprentissage automatique peut être considéré au sens large comme une sous-catégorie de l'intelligence artificielle qui concerne particulièrement le développement de techniques et d'algorithmes permettant à une machine d'apprendre à partir de données. Dans cette thèse, les algorithmes d'apprentissage automatique sont adaptés pour être appliqués à des données environnementales et à la prédiction spatiale. Pourquoi l'apprentissage automatique ? Parce que la majorité des algorithmes d'apprentissage automatiques sont universels, adaptatifs, non-linéaires, robustes et efficaces pour la modélisation. Ils peuvent résoudre des problèmes de classification, de régression et de modélisation de densité de probabilités dans des espaces à haute dimension, composés de variables informatives spatialisées (« géo-features ») en plus des coordonnées géographiques. De plus, ils sont idéaux pour être implémentés en tant qu'outils d'aide à la décision pour des questions environnementales allant de la reconnaissance de pattern à la modélisation et la prédiction en passant par la cartographie automatique. Leur efficacité est comparable au modèles géostatistiques dans l'espace des coordonnées géographiques, mais ils sont indispensables pour des données à hautes dimensions incluant des géo-features. Les algorithmes d'apprentissage automatique les plus importants et les plus populaires sont présentés théoriquement et implémentés sous forme de logiciels pour les sciences environnementales. Les principaux algorithmes décrits sont le Perceptron multicouches (MultiLayer Perceptron, MLP) - l'algorithme le plus connu dans l'intelligence artificielle, le réseau de neurones de régression généralisée (General Regression Neural Networks, GRNN), le réseau de neurones probabiliste (Probabilistic Neural Networks, PNN), les cartes auto-organisées (SelfOrganized Maps, SOM), les modèles à mixture Gaussiennes (Gaussian Mixture Models, GMM), les réseaux à fonctions de base radiales (Radial Basis Functions Networks, RBF) et les réseaux à mixture de densité (Mixture Density Networks, MDN). Cette gamme d'algorithmes permet de couvrir des tâches variées telle que la classification, la régression ou l'estimation de densité de probabilité. L'analyse exploratoire des données (Exploratory Data Analysis, EDA) est le premier pas de toute analyse de données. Dans cette thèse les concepts d'analyse exploratoire de données spatiales (Exploratory Spatial Data Analysis, ESDA) sont traités selon l'approche traditionnelle de la géostatistique avec la variographie expérimentale et selon les principes de l'apprentissage automatique. La variographie expérimentale, qui étudie les relations entre pairs de points, est un outil de base pour l'analyse géostatistique de corrélations spatiales anisotropiques qui permet de détecter la présence de patterns spatiaux descriptible par une statistique. L'approche de l'apprentissage automatique pour l'ESDA est présentée à travers l'application de la méthode des k plus proches voisins qui est très simple et possède d'excellentes qualités d'interprétation et de visualisation. Une part importante de la thèse traite de sujets d'actualité comme la cartographie automatique de données spatiales. Le réseau de neurones de régression généralisée est proposé pour résoudre cette tâche efficacement. Les performances du GRNN sont démontrées par des données de Comparaison d'Interpolation Spatiale (SIC) de 2004 pour lesquelles le GRNN bat significativement toutes les autres méthodes, particulièrement lors de situations d'urgence. La thèse est composée de quatre chapitres : théorie, applications, outils logiciels et des exemples guidés. Une partie importante du travail consiste en une collection de logiciels : Machine Learning Office. Cette collection de logiciels a été développée durant les 15 dernières années et a été utilisée pour l'enseignement de nombreux cours, dont des workshops internationaux en Chine, France, Italie, Irlande et Suisse ainsi que dans des projets de recherche fondamentaux et appliqués. Les cas d'études considérés couvrent un vaste spectre de problèmes géoenvironnementaux réels à basse et haute dimensionnalité, tels que la pollution de l'air, du sol et de l'eau par des produits radioactifs et des métaux lourds, la classification de types de sols et d'unités hydrogéologiques, la cartographie des incertitudes pour l'aide à la décision et l'estimation de risques naturels (glissements de terrain, avalanches). Des outils complémentaires pour l'analyse exploratoire des données et la visualisation ont également été développés en prenant soin de créer une interface conviviale et facile à l'utilisation. Machine Learning for geospatial data: algorithms, software tools and case studies Abstract The thesis is devoted to the analysis, modeling and visualisation of spatial environmental data using machine learning algorithms. In a broad sense machine learning can be considered as a subfield of artificial intelligence. It mainly concerns with the development of techniques and algorithms that allow computers to learn from data. In this thesis machine learning algorithms are adapted to learn from spatial environmental data and to make spatial predictions. Why machine learning? In few words most of machine learning algorithms are universal, adaptive, nonlinear, robust and efficient modeling tools. They can find solutions for the classification, regression, and probability density modeling problems in high-dimensional geo-feature spaces, composed of geographical space and additional relevant spatially referenced features. They are well-suited to be implemented as predictive engines in decision support systems, for the purposes of environmental data mining including pattern recognition, modeling and predictions as well as automatic data mapping. They have competitive efficiency to the geostatistical models in low dimensional geographical spaces but are indispensable in high-dimensional geo-feature spaces. The most important and popular machine learning algorithms and models interesting for geo- and environmental sciences are presented in details: from theoretical description of the concepts to the software implementation. The main algorithms and models considered are the following: multi-layer perceptron (a workhorse of machine learning), general regression neural networks, probabilistic neural networks, self-organising (Kohonen) maps, Gaussian mixture models, radial basis functions networks, mixture density networks. This set of models covers machine learning tasks such as classification, regression, and density estimation. Exploratory data analysis (EDA) is initial and very important part of data analysis. In this thesis the concepts of exploratory spatial data analysis (ESDA) is considered using both traditional geostatistical approach such as_experimental variography and machine learning. Experimental variography is a basic tool for geostatistical analysis of anisotropic spatial correlations which helps to understand the presence of spatial patterns, at least described by two-point statistics. A machine learning approach for ESDA is presented by applying the k-nearest neighbors (k-NN) method which is simple and has very good interpretation and visualization properties. Important part of the thesis deals with a hot topic of nowadays, namely, an automatic mapping of geospatial data. General regression neural networks (GRNN) is proposed as efficient model to solve this task. Performance of the GRNN model is demonstrated on Spatial Interpolation Comparison (SIC) 2004 data where GRNN model significantly outperformed all other approaches, especially in case of emergency conditions. The thesis consists of four chapters and has the following structure: theory, applications, software tools, and how-to-do-it examples. An important part of the work is a collection of software tools - Machine Learning Office. Machine Learning Office tools were developed during last 15 years and was used both for many teaching courses, including international workshops in China, France, Italy, Ireland, Switzerland and for realizing fundamental and applied research projects. Case studies considered cover wide spectrum of the real-life low and high-dimensional geo- and environmental problems, such as air, soil and water pollution by radionuclides and heavy metals, soil types and hydro-geological units classification, decision-oriented mapping with uncertainties, natural hazards (landslides, avalanches) assessments and susceptibility mapping. Complementary tools useful for the exploratory data analysis and visualisation were developed as well. The software is user friendly and easy to use.