943 resultados para Data Envelopment Analysis
Resumo:
In this work we study the classification of forest types using mathematics based image analysis on satellite data. We are interested in improving classification of forest segments when a combination of information from two or more different satellites is used. The experimental part is based on real satellite data originating from Canada. This thesis gives summary of the mathematics basics of the image analysis and supervised learning , methods that are used in the classification algorithm. Three data sets and four feature sets were investigated in this thesis. The considered feature sets were 1) histograms (quantiles) 2) variance 3) skewness and 4) kurtosis. Good overall performances were achieved when a combination of ASTERBAND and RADARSAT2 data sets was used.
Resumo:
The objective of this paper is to examine whether informal labor markets affect the flows of Foreign Direct Investment (FDI), and also whether this effect is similar in developed and developing countries. With this aim, different public data sources, such as the World Bank (WB), and the United Nations Conference on Trade and Development (UNCTAD) are used, and panel econometric models are estimated for a sample of 65 countries over a 14 year period (1996-2009). In addition, this paper uses a dynamic model as an extension of the analysis to establish whether such an effect exists and what its indicators and significance may be.
Resumo:
Fourier transform infrared attenuated total reflectance (FT-IR ATR) spectroscopy was used to determine 14 different measurands in northeast Brazilian honey samples. Nine different honey samples (six monofloral and three polyfloral) from 2009 obtained from the company CEARAPI underwent FT-IR ATR, palynological, color, and sensorial analysis to obtain preliminary results for these types of honey. The results showed that there are five monofloral, three bifloral, and one extrafloral honey, and also that mid-infrared spectrometry can be used as a screening method for the routine analysis of Brazilian honey, with the advantages of being rapid, nondestructive, and accurate.
Resumo:
In general, laboratory activities are costly in terms of time, space, and money. As such, the ability to provide realistically simulated laboratory data that enables students to practice data analysis techniques as a complementary activity would be expected to reduce these costs while opening up very interesting possibilities. In the present work, a novel methodology is presented for design of analytical chemistry instrumental analysis exercises that can be automatically personalized for each student and the results evaluated immediately. The proposed system provides each student with a different set of experimental data generated randomly while satisfying a set of constraints, rather than using data obtained from actual laboratory work. This allows the instructor to provide students with a set of practical problems to complement their regular laboratory work along with the corresponding feedback provided by the system's automatic evaluation process. To this end, the Goodle Grading Management System (GMS), an innovative web-based educational tool for automating the collection and assessment of practical exercises for engineering and scientific courses, was developed. The proposed methodology takes full advantage of the Goodle GMS fusion code architecture. The design of a particular exercise is provided ad hoc by the instructor and requires basic Matlab knowledge. The system has been employed with satisfactory results in several university courses. To demonstrate the automatic evaluation process, three exercises are presented in detail. The first exercise involves a linear regression analysis of data and the calculation of the quality parameters of an instrumental analysis method. The second and third exercises address two different comparison tests, a comparison test of the mean and a t-paired test.
Resumo:
Longitudinal surveys are increasingly used to collect event history data on person-specific processes such as transitions between labour market states. Surveybased event history data pose a number of challenges for statistical analysis. These challenges include survey errors due to sampling, non-response, attrition and measurement. This study deals with non-response, attrition and measurement errors in event history data and the bias caused by them in event history analysis. The study also discusses some choices faced by a researcher using longitudinal survey data for event history analysis and demonstrates their effects. These choices include, whether a design-based or a model-based approach is taken, which subset of data to use and, if a design-based approach is taken, which weights to use. The study takes advantage of the possibility to use combined longitudinal survey register data. The Finnish subset of European Community Household Panel (FI ECHP) survey for waves 15 were linked at person-level with longitudinal register data. Unemployment spells were used as study variables of interest. Lastly, a simulation study was conducted in order to assess the statistical properties of the Inverse Probability of Censoring Weighting (IPCW) method in a survey data context. The study shows how combined longitudinal survey register data can be used to analyse and compare the non-response and attrition processes, test the missingness mechanism type and estimate the size of bias due to non-response and attrition. In our empirical analysis, initial non-response turned out to be a more important source of bias than attrition. Reported unemployment spells were subject to seam effects, omissions, and, to a lesser extent, overreporting. The use of proxy interviews tended to cause spell omissions. An often-ignored phenomenon classification error in reported spell outcomes, was also found in the data. Neither the Missing At Random (MAR) assumption about non-response and attrition mechanisms, nor the classical assumptions about measurement errors, turned out to be valid. Both measurement errors in spell durations and spell outcomes were found to cause bias in estimates from event history models. Low measurement accuracy affected the estimates of baseline hazard most. The design-based estimates based on data from respondents to all waves of interest and weighted by the last wave weights displayed the largest bias. Using all the available data, including the spells by attriters until the time of attrition, helped to reduce attrition bias. Lastly, the simulation study showed that the IPCW correction to design weights reduces bias due to dependent censoring in design-based Kaplan-Meier and Cox proportional hazard model estimators. The study discusses implications of the results for survey organisations collecting event history data, researchers using surveys for event history analysis, and researchers who develop methods to correct for non-sampling biases in event history data.
Resumo:
Med prediktion avses att man skattar det framtida vrdet p en observerbar storhet. Knnetecknande fr det bayesianska paradigmet r att oskerhet gllande oknda storheter uttrycks i form av sannolikheter. En bayesiansk prediktiv modell r sledes en sannolikhetsfrdelning ver de mjliga vrden som en observerbar, men nnu inte observerad storhet kan anta. I de artiklar som ingr i avhandlingen utvecklas metoder, vilka bl.a. tillmpas i analys av kromatografiska data i brottsutredningar. Med undantag fr den frsta artikeln, bygger samtliga metoder p bayesiansk prediktiv modellering. I artiklarna betraktas i huvudsak tre olika typer av problem relaterade till kromatografiska data: kvantifiering, parvis matchning och klustring. I den frsta artikeln utvecklas en icke-parametrisk modell fr mtfel av kromatografiska analyser av alkoholhalt i blodet. I den andra artikeln utvecklas en prediktiv inferensmetod fr jmfrelse av tv stickprov. Metoden tillmpas i den tredje artik eln fr jmfrelse av oljeprover i syfte att kunna identifiera den frorenande kllan i samband med oljeutslpp. I den fjrde artikeln hrleds en prediktiv modell fr klustring av data av blandad diskret och kontinuerlig typ, vilken bl.a. tillmpas i klassificering av amfetaminprover med avseende p produktionsomgngar.
Resumo:
Communications play a key role in modern smart grids. New functionalities that make the grids smart require the communication network to function properly. Data transmission between intelligent electric devices (IEDs) in the rectifier and the customer-end inverters (CEIs) used for power conversion is also required in the smart grid concept of the low-voltage direct current (LVDC) distribution network. Smart grid applications, such as smart metering, demand side management (DSM), and grid protection applied with communications are all installed in the LVDC system. Thus, besides remote connection to the databases of the grid operators, a local communication network in the LVDC network is needed. One solution applied to implement the communication medium in power distribution grids is power line communication (PLC). There are power cables in the distribution grids, and hence, they may be applied as a communication channel for the distribution-level data. This doctoral thesis proposes an IP-based high-frequency (HF) band PLC data transmission concept for the LVDC network. A general method to implement the Ethernet-based PLC concept between the public distribution rectifier and the customerend inverters in the LVDC grid is introduced. Low-voltage cables are studied as the communication channel in the frequency band of 100 kHz30 MHz. The communication channel characteristics and the noise in the channel are described. All individual components in the channel are presented in detail, and a channel model, comprising models for each channel component is developed and verified by measurements. The channel noise is also studied by measurements. Theoretical signalto- noise ratio (SNR) and channel capacity analyses and practical data transmission tests are carried out to evaluate the applicability of the PLC concept against the requirements set by the smart grid applications in the LVDC system. The main results concerning the applicability of the PLC concept and its limitations are presented, and suggestion for future research proposed.
Resumo:
Identification of low-dimensional structures and main sources of variation from multivariate data are fundamental tasks in data analysis. Many methods aimed at these tasks involve solution of an optimization problem. Thus, the objective of this thesis is to develop computationally efficient and theoretically justified methods for solving such problems. Most of the thesis is based on a statistical model, where ridges of the density estimated from the data are considered as relevant features. Finding ridges, that are generalized maxima, necessitates development of advanced optimization methods. An efficient and convergent trust region Newton method for projecting a point onto a ridge of the underlying density is developed for this purpose. The method is utilized in a differential equation-based approach for tracing ridges and computing projection coordinates along them. The density estimation is done nonparametrically by using Gaussian kernels. This allows application of ridge-based methods with only mild assumptions on the underlying structure of the data. The statistical model and the ridge finding methods are adapted to two different applications. The first one is extraction of curvilinear structures from noisy data mixed with background clutter. The second one is a novel nonlinear generalization of principal component analysis (PCA) and its extension to time series data. The methods have a wide range of potential applications, where most of the earlier approaches are inadequate. Examples include identification of faults from seismic data and identification of filaments from cosmological data. Applicability of the nonlinear PCA to climate analysis and reconstruction of periodic patterns from noisy time series data are also demonstrated. Other contributions of the thesis include development of an efficient semidefinite optimization method for embedding graphs into the Euclidean space. The method produces structure-preserving embeddings that maximize interpoint distances. It is primarily developed for dimensionality reduction, but has also potential applications in graph theory and various areas of physics, chemistry and engineering. Asymptotic behaviour of ridges and maxima of Gaussian kernel densities is also investigated when the kernel bandwidth approaches infinity. The results are applied to the nonlinear PCA and to finding significant maxima of such densities, which is a typical problem in visual object tracking.
Resumo:
GLUT4 protein expression in white adipose tissue (WAT) and skeletal muscle (SM) was investigated in 2-month-old, 12-month-old spontaneously obese or 12-month-old calorie-restricted lean Wistar rats, by considering different parameters of analysis, such as tissue and body weight, and total protein yield of the tissue. In WAT, a ~70% decrease was observed in plasma membrane and microsomal GLUT4 protein, expressed as g protein or g tissue, in both 12-month-old obese and 12-month-old lean rats compared to 2-month-old rats. However, when plasma membrane and microsomal GLUT4 tissue contents were expressed as g body weight, they were the same. In SM, GLUT4 protein content, expressed as g protein, was similar in 2-month-old and 12-month-old obese rats, whereas it was reduced in 12-month-old obese rats, when expressed as g tissue or g body weight, which may play an important role in insulin resistance. Weight loss did not change the SM GLUT4 content. These results show that altered insulin sensitivity is accompanied by modulation of GLUT4 protein expression. However, the true role of WAT and SM GLUT4 contents in whole-body or tissue insulin sensitivity should be determined considering not only GLUT4 protein expression, but also the strong morphostructural changes in these tissues, which require different types of data analysis.
Resumo:
This study sought to evaluate the acceptance of "dulce de leche" with coffee and whey. The results were analyzed through response surface, ANOVA, test of averages, histograms, and preference map correlating the global impression data with results of physical, physiochemical and sensory analysis. The response surface methodology, by itself, was not enough to find the best formulation. For ANOVA, test of averages, and preference map it was observed that the consumers' favorite "dulce de leche" were those of formulation 1 (10% whey and 1% coffee) and 2 (30% whey and 1% coffee), followed by formulation 9 (20% whey and 1.25% coffee). The acceptance of samples 1 and 2 was influenced by the higher acceptability in relation to the flavor and for presenting higher pH, L*, and b* values. It was observed that samples 1 and 2 presented higher purchase approval score and higher percentages of responses for the 'ideal' category in terms of sweetness and coffee flavor. It was found that consumers preferred the samples with low concentrations of coffee independent of the concentration of whey thus enabling the use of whey and coffee in the manufacture of dulce de leche, obtaining a new product.
Heat Demand Forecasting Models Development: Use of Data Mining Tools in SQL Server Analysis Services
Resumo:
This thesis introduces heat demand forecasting models which are generated by using data mining algorithms. The forecast spans one full day and this forecast can be used in regulating heat consumption of buildings. For training the data mining models, two years of heat consumption data from a case building and weather measurement data from Finnish Meteorological Institute are used. The thesis utilizes Microsoft SQL Server Analysis Services data mining tools in generating the data mining models and CRISP-DM process framework to implement the research. Results show that the built models can predict heat demand at best with mean average percentage errors of 3.8% for 24-h profile and 5.9% for full day. A deployment model for integrating the generated data mining models into an existing building energy management system is also discussed.
Resumo:
The recent rapid development of biotechnological approaches has enabled the production of large whole genome level biological data sets. In order to handle thesedata sets, reliable and ecient automated tools and methods for data processingand result interpretation are required. Bioinformatics, as the eld of studying andprocessing biological data, tries to answer this need by combining methods and approaches across computer science, statistics, mathematics and engineering to studyand process biological data. The need is also increasing for tools that can be used by the biological researchers themselves who may not have a strong statistical or computational background, which requires creating tools and pipelines with intuitive user interfaces, robust analysis workows and strong emphasis on result reportingand visualization. Within this thesis, several data analysis tools and methods have been developed for analyzing high-throughput biological data sets. These approaches, coveringseveral aspects of high-throughput data analysis, are specically aimed for gene expression and genotyping data although in principle they are suitable for analyzing other data types as well. Coherent handling of the data across the various data analysis steps is highly important in order to ensure robust and reliable results. Thus,robust data analysis workows are also described, putting the developed tools andmethods into a wider context. The choice of the correct analysis method may also depend on the properties of the specic data setandthereforeguidelinesforchoosing an optimal method are given. The data analysis tools, methods and workows developed within this thesis have been applied to several research studies, of which two representative examplesare included in the thesis. The rst study focuses on spermatogenesis in murinetestis and the second one examines cell lineage specication in mouse embryonicstem cells.
Resumo:
This research concerns the Urban Living Idea Contest conducted by Creator Space of BASF SE during its 150th anniversary in 2015. The main objectives of the thesis are to provide a comprehensive analysis of the Urban Living Idea Contest (ULIC) and propose a number of improvement suggestions for future years. More than 4,000 data points were collected and analyzed to investigate the functionality of different elements of the contest. Furthermore, a set of improvement suggestions were proposed to BASF SE. Novelty of this thesis lies in the data collection and the original analysis of the contest, which identified its critical elements, as well as the areas that could be improved. The author of this research was a member of the organizing team and involved in the decision making process from the beginning until the end of the ULIC.
Resumo:
La douleur est une exprience perceptive comportant de nombreuses dimensions. Ces dimensions de douleur sont inter-relies et recrutent des rseaux neuronaux qui traitent les informations correspondantes. Llucidation de l'architecture fonctionnelle qui supporte les diffrents aspects perceptifs de l'exprience est donc une tape fondamentale pour notre comprhension du rle fonctionnel des diffrentes rgions de la matrice crbrale de la douleur dans les circuits corticaux qui sous tendent l'exprience subjective de la douleur. Parmi les diverses rgions du cerveau impliques dans le traitement de l'information nociceptive, le cortex somatosensoriel primaire et secondaire (S1 et S2) sont les principales rgions gnralement associes au traitement de l'aspect sensori-discriminatif de la douleur. Toutefois, l'organisation fonctionnelle dans ces rgions somato-sensorielles nest pas compltement claire et relativement peu d'tudes ont examin directement l'intgration de l'information entre les rgions somatiques sensorielles. Ainsi, plusieurs questions demeurent concernant la relation hirarchique entre S1 et S2, ainsi que le rle fonctionnel des connexions inter-hmisphriques des rgions somatiques sensorielles homologues. De mme, le traitement en srie ou en parallle au sein du systme somatosensoriel constitue un autre lment de questionnement qui ncessite un examen plus approfondi. Le but de la prsente tude tait de tester un certain nombre d'hypothses sur la causalit dans les interactions fonctionnelle entre S1 et S2, alors que les sujets recevaient des chocs lectriques douloureux. Nous avons mis en place une mthode de modlisation de la connectivit, qui utilise une description de causalit de la dynamique du systme, afin d'tudier les interactions entre les sites d'activation dfinie par un ensemble de donnes provenant d'une tude d'imagerie fonctionnelle. Notre paradigme est constitu de 3 session exprimentales en utilisant des chocs lectriques trois diffrents niveaux dintensit, soit modrment douloureux (niveau 3), soit lgrement douloureux (niveau 2), soit compltement non douloureux (niveau 1). Par consquent, notre paradigme nous a permis d'tudier comment l'intensit du stimulus est cod dans notre rseau d'intrt, et comment la connectivit des diffrentes rgions est module dans les conditions de stimulation diffrentes. Nos rsultats sont en faveur du mode sriel de traitement de linformation somatosensorielle nociceptive avec un apport prdominant de la voie thalamocorticale vers S1 controlatrale au site de stimulation. Nos rsultats impliquent que l'information se propage de S1 controlatral travers notre rseau d'intrt compos des cortex S1 bilatraux et S2. Notre analyse indique que la connexion S1S2 est renforce par la douleur, ce qui suggre que S2 est plus lev dans la hirarchie du traitement de la douleur que S1, conformment aux conclusions prcdentes neurophysiologiques et de magntoencphalographie. Enfin, notre analyse fournit des preuves de l'entre de l'information somatosensorielle dans l'hmisphre controlatral au ct de stimulation, avec des connexions inter-hmisphriques responsable du transfert de l'information l'hmisphre ipsilatral.
Resumo:
Rapport de recherche prsent la Facult des arts et des sciences en vue de l'obtention du grade de Matrise en sciences conomiques.