924 resultados para Automatic Analysis of Multivariate Categorical Data Sets
Resumo:
Diabetes is a rapidly increasing worldwide problem which is characterised by defective metabolism of glucose that causes long-term dysfunction and failure of various organs. The most common complication of diabetes is diabetic retinopathy (DR), which is one of the primary causes of blindness and visual impairment in adults. The rapid increase of diabetes pushes the limits of the current DR screening capabilities for which the digital imaging of the eye fundus (retinal imaging), and automatic or semi-automatic image analysis algorithms provide a potential solution. In this work, the use of colour in the detection of diabetic retinopathy is statistically studied using a supervised algorithm based on one-class classification and Gaussian mixture model estimation. The presented algorithm distinguishes a certain diabetic lesion type from all other possible objects in eye fundus images by only estimating the probability density function of that certain lesion type. For the training and ground truth estimation, the algorithm combines manual annotations of several experts for which the best practices were experimentally selected. By assessing the algorithm’s performance while conducting experiments with the colour space selection, both illuminance and colour correction, and background class information, the use of colour in the detection of diabetic retinopathy was quantitatively evaluated. Another contribution of this work is the benchmarking framework for eye fundus image analysis algorithms needed for the development of the automatic DR detection algorithms. The benchmarking framework provides guidelines on how to construct a benchmarking database that comprises true patient images, ground truth, and an evaluation protocol. The evaluation is based on the standard receiver operating characteristics analysis and it follows the medical practice in the decision making providing protocols for image- and pixel-based evaluations. During the work, two public medical image databases with ground truth were published: DIARETDB0 and DIARETDB1. The framework, DR databases and the final algorithm, are made public in the web to set the baseline results for automatic detection of diabetic retinopathy. Although deviating from the general context of the thesis, a simple and effective optic disc localisation method is presented. The optic disc localisation is discussed, since normal eye fundus structures are fundamental in the characterisation of DR.
Resumo:
The concept of open innovation has recently gained widespread attention, and is particularly relevant now as many firms endeavouring to implement open innovation, face different sets of challenges associated with managing it. Prior research on open innovation has focused on the internal processes dealing with open innovation implementation and the organizational changes, already taking place or yet required in companies order to succeed in the global open innovation market. Despite the intensive research on open innovation, the question of what influences its adoption by companies in different contexts has not received much attention in studies. To fill this gap, this thesis contribute to the discussion on open innovation influencing factors by bringing in the perspective of environmental impacts, i.e. gathering data on possible sources of external influences, classifying them and testing their systemic impact through conceptual system dynamics simulation model. The insights from data collection and conceptualization in modelling are used to answer the question of how the external environment affects the adoption of open innovation. The thesis research is presented through five research papers reflecting the method triangulation based study (conducted at initial stage as case study, later as quantitative analysis and finally as system dynamics simulation). This multitude of methods was used to collect the possible external influence factors and to assess their impact (on positive/negative scale rather than numerical). The results obtained throughout the thesis research bring valuable insights into understanding of open innovation influencing factors inside a firm’s operating environment, point out the balance required in the system for successful open innovation performance and discover the existence of tipping point of open innovation success when driven by market dynamics and structures. The practical implications on how firms and policy-makers can leverage environment for their potential benefits are offered in the conclusions.
Resumo:
Longitudinal surveys are increasingly used to collect event history data on person-specific processes such as transitions between labour market states. Surveybased event history data pose a number of challenges for statistical analysis. These challenges include survey errors due to sampling, non-response, attrition and measurement. This study deals with non-response, attrition and measurement errors in event history data and the bias caused by them in event history analysis. The study also discusses some choices faced by a researcher using longitudinal survey data for event history analysis and demonstrates their effects. These choices include, whether a design-based or a model-based approach is taken, which subset of data to use and, if a design-based approach is taken, which weights to use. The study takes advantage of the possibility to use combined longitudinal survey register data. The Finnish subset of European Community Household Panel (FI ECHP) survey for waves 1–5 were linked at person-level with longitudinal register data. Unemployment spells were used as study variables of interest. Lastly, a simulation study was conducted in order to assess the statistical properties of the Inverse Probability of Censoring Weighting (IPCW) method in a survey data context. The study shows how combined longitudinal survey register data can be used to analyse and compare the non-response and attrition processes, test the missingness mechanism type and estimate the size of bias due to non-response and attrition. In our empirical analysis, initial non-response turned out to be a more important source of bias than attrition. Reported unemployment spells were subject to seam effects, omissions, and, to a lesser extent, overreporting. The use of proxy interviews tended to cause spell omissions. An often-ignored phenomenon classification error in reported spell outcomes, was also found in the data. Neither the Missing At Random (MAR) assumption about non-response and attrition mechanisms, nor the classical assumptions about measurement errors, turned out to be valid. Both measurement errors in spell durations and spell outcomes were found to cause bias in estimates from event history models. Low measurement accuracy affected the estimates of baseline hazard most. The design-based estimates based on data from respondents to all waves of interest and weighted by the last wave weights displayed the largest bias. Using all the available data, including the spells by attriters until the time of attrition, helped to reduce attrition bias. Lastly, the simulation study showed that the IPCW correction to design weights reduces bias due to dependent censoring in design-based Kaplan-Meier and Cox proportional hazard model estimators. The study discusses implications of the results for survey organisations collecting event history data, researchers using surveys for event history analysis, and researchers who develop methods to correct for non-sampling biases in event history data.
Resumo:
Med prediktion avses att man skattar det framtida värdet på en observerbar storhet. Kännetecknande för det bayesianska paradigmet är att osäkerhet gällande okända storheter uttrycks i form av sannolikheter. En bayesiansk prediktiv modell är således en sannolikhetsfördelning över de möjliga värden som en observerbar, men ännu inte observerad storhet kan anta. I de artiklar som ingår i avhandlingen utvecklas metoder, vilka bl.a. tillämpas i analys av kromatografiska data i brottsutredningar. Med undantag för den första artikeln, bygger samtliga metoder på bayesiansk prediktiv modellering. I artiklarna betraktas i huvudsak tre olika typer av problem relaterade till kromatografiska data: kvantifiering, parvis matchning och klustring. I den första artikeln utvecklas en icke-parametrisk modell för mätfel av kromatografiska analyser av alkoholhalt i blodet. I den andra artikeln utvecklas en prediktiv inferensmetod för jämförelse av två stickprov. Metoden tillämpas i den tredje artik eln för jämförelse av oljeprover i syfte att kunna identifiera den förorenande källan i samband med oljeutsläpp. I den fjärde artikeln härleds en prediktiv modell för klustring av data av blandad diskret och kontinuerlig typ, vilken bl.a. tillämpas i klassificering av amfetaminprover med avseende på produktionsomgångar.
Resumo:
Wind power is a low-carbon energy production form that reduces the dependence of society on fossil fuels. Finland has adopted wind energy production into its climate change mitigation policy, and that has lead to changes in legislation, guidelines, regional wind power areas allocation and establishing a feed-in tariff. Wind power production has indeed boosted in Finland after two decades of relatively slow growth, for instance from 2010 to 2011 wind energy production increased with 64 %, but there is still a long way to the national goal of 6 TWh by 2020. This thesis introduces a GIS-based decision-support methodology for the preliminary identification of suitable areas for wind energy production including estimation of their level of risk. The goal of this study was to define the least risky places for wind energy development within Kemiönsaari municipality in Southwest Finland. Spatial multicriteria decision analysis (SMCDA) has been used for searching suitable wind power areas along with many other location-allocation problems. SMCDA scrutinizes complex ill-structured decision problems in GIS environment using constraints and evaluation criteria, which are aggregated using weighted linear combination (WLC). Weights for the evaluation criteria were acquired using analytic hierarchy process (AHP) with nine expert interviews. Subsequently, feasible alternatives were ranked in order to provide a recommendation and finally, a sensitivity analysis was conducted for the determination of recommendation robustness. The first study aim was to scrutinize the suitability and necessity of existing data for this SMCDA study. Most of the available data sets were of sufficient resolution and quality. Input data necessity was evaluated qualitatively for each data set based on e.g. constraint coverage and attribute weights. Attribute quality was estimated mainly qualitatively by attribute comprehensiveness, operationality, measurability, completeness, decomposability, minimality and redundancy. The most significant quality issue was redundancy as interdependencies are not tolerated by WLC and AHP does not include measures to detect them. The third aim was to define the least risky areas for wind power development within the study area. The two highest ranking areas were Nordanå-Lövböle and Påvalsby followed by Helgeboda, Degerdal, Pungböle, Björkboda, and Östanå-Labböle. The fourth aim was to assess the recommendation reliability, and the top-ranking two areas proved robust whereas the other ones were more sensitive.
Resumo:
One of the largest genera of Orchidaceae in the Neotropics with about 450 species, Maxillaria presents several taxonomic uncertainties about its generic circumscription and the delimitation of species groups, mainly due to the large variability of some species. The present study aims at verifying the morphological variation and species delimitation in the Brasiliorchis picta complex, a recent new genus derived from Maxillaria, using morphometric multivariate analysis. A total of 340 specimens belonging to six species (B. chrysantha (Barb. Rodr.) R.B. Singer, S. Koehler & Carnevali, B. gracilis (Lodd.) R.B. Singer, S. Koehler & Carnevali, B. marginata (Lindl.) R.B. Singer, S. Koehler & Carnevali, B. picta (Hook.) R. Singer, S. Koehler & Carnevali, B. porphyrostele (Rchb. f.) R.B. Singer, S. Koehler & Carnevali and B. ubatubana (Hoehne) R.B. Singer, S. Koehler & Carnevali) were analyzed using multivariate methods (PCA, CVA, DA, and Cluster Analysis with UPGMA). B. gracilis shows the largest morphological discontinuity, mainly due to its smaller size. The other species tend to form distinct groups, but intermediate characteristics between pairs of species induce overlaps among the individuals of different species and thus confuse the distinction of each one. Hybridization and geographic distribution can be involved in the differentiation of the species and lineages in this complex. Because the species classified a priori in this work cannot be recognized by the quantitative characters measured here, such other tools as geometric morphometry and molecular data should be employed in future works to clarify species relationships in this complex.
Resumo:
The aim of this work was the identification of geographic zones suitable for the production of honeys in which pollen grains of Escallonia pulverulenta (Ruiz & Pav.) Pers. (Saxifragaceae) can be detected. The analysis of botanical origin of 240 honey samples produced between La Serena and Puerto Mont (the IV and X Administrative Regions of Chile), allowed the detection of pollen grains of E. pulverulenta in 46 Chilean honeys. The geographic distribution of the honeys studied is presented together with their affinities, through factor analysis and frequency tables. The study was based on the presence of E. pulverulenta pollen. Escallonia pulverulenta pollen percentages oscillated between 0.24% and 78.5%. Seventeen of the studied samples were designated as unifloral - i.e. samples showing more than 45% pollen of a determined plant species. Two of these corresponded to E. pulverulenta (corontillo, madroño or barraco) honeys. The remaining unifloral honeys correspond to 8 samples of Lotus uliginosus Schkuhr (birdsfoot trefoil), 2 samples of Aristotelia chilensis (Molina) Stuntz (maqui) and 1 sample of Escallonia rubra (Ruiz & Pav.) Pers. (siete camisas), Eucryphia cordifolia Cav. (ulmo or muemo), Weinmannia trichosperma Cav. (tineo), Rubus ulmifolius Schott (blackberry) and Brassica rapa L. (turnip). Honeys with different percentages of E. pulverulenta pollen - statistically analyzed through correspondence analysis - could be associated and assigned to one of three geographic types, defined on the basis of this analysis. The geographical type areas defined were the Northern Mediterranean Zone (samples from the IV Region), Central Mediterranean Zone (samples from the V to the VIII regions including two samples of unifloral Escallonia pulverulenta honey), and Southern Mediterranean Zone (samples from the IX Region).
Resumo:
The present study compares the performance of stochastic and fuzzy models for the analysis of the relationship between clinical signs and diagnosis. Data obtained for 153 children concerning diagnosis (pneumonia, other non-pneumonia diseases, absence of disease) and seven clinical signs were divided into two samples, one for analysis and other for validation. The former was used to derive relations by multi-discriminant analysis (MDA) and by fuzzy max-min compositions (fuzzy), and the latter was used to assess the predictions drawn from each type of relation. MDA and fuzzy were closely similar in terms of prediction, with correct allocation of 75.7 to 78.3% of patients in the validation sample, and displaying only a single instance of disagreement: a patient with low level of toxemia was mistaken as not diseased by MDA and correctly taken as somehow ill by fuzzy. Concerning relations, each method provided different information, each revealing different aspects of the relations between clinical signs and diagnoses. Both methods agreed on pointing X-ray, dyspnea, and auscultation as better related with pneumonia, but only fuzzy was able to detect relations of heart rate, body temperature, toxemia and respiratory rate with pneumonia. Moreover, only fuzzy was able to detect a relationship between heart rate and absence of disease, which allowed the detection of six malnourished children whose diagnoses as healthy are, indeed, disputable. The conclusion is that even though fuzzy sets theory might not improve prediction, it certainly does enhance clinical knowledge since it detects relationships not visible to stochastic models.
Integration of marketing research data in new product development. Case study: Food industry company
Resumo:
The aim of this master’s thesis is to provide a real life example of how marketing research data is used by different functions in the NPD process. In order to achieve this goal, a case study in a company was implemented where gathering, analysis, distribution and synthesis of marketing research data in NPD were studied. The main research question was formulated as follows: How is marketing research data integrated and used by different company functions in the NPD process? The theory part of the master’s thesis was focused on the discussion of the marketing function role in NPD, use of marketing research particularly in the food industry, as well as issues related to the marketing/R&D interface during the NPD process. The empirical part of the master’s thesis was based on qualitative explanatory case study research. Individual in-depth interviews with company representatives, company documents and online research were used for data collection and analyzed through triangulation method. The empirical findings advocate that the most important marketing data sources at the concept generation stage of NPD are: global trends monitoring, retailing audit and consumers insights. These data sets are crucial for establishing the potential of the product on the market and defining the desired features for the new product to be developed. The findings also suggest the example of successful crossfunctional communication during the NPD process with formal and informal communication patterns. General managerial recommendations are given on the integration in NPD of a strategy, process, continuous improvement, and motivated cross-functional product development teams.
Resumo:
This study developed a gluten-free granola and evaluated it during storage with the application of multivariate and regression analysis of the sensory and instrumental parameters. The physicochemical, sensory, and nutritional characteristics of a product containing quinoa, amaranth and linseed were evaluated. The crude protein and lipid contents ranged from 97.49 and 122.72 g kg-1 of food, respectively. The polyunsaturated/saturated, and n-6:n-3 fatty acid ratios ranged from 2.82 and 2.59:1, respectively. Granola had the best alpha-linolenic acid content, nutritional indices in the lipid fraction, and mineral content. There were good hygienic and sanitary conditions during storage; probably due to the low water activity of the formulation, which contributed to inhibit microbial growth. The sensory attributes ranged from 'like very much' to 'like slightly', and the regression models were highly fitted and correlated during the storage period. A reduction in the sensory attribute levels and in the product physical stabilisation was verified by principal component analysis. The use of the affective test acceptance and instrumental analysis combined with statistical methods allowed us to obtain promising results about the characteristics of gluten-free granola.
Resumo:
Avidins (Avds) are homotetrameric or homodimeric glycoproteins with typically less than 130 amino acid residues per monomer. They form a highly stable, non-covalent complex with biotin (vitamin H) with Kd = 10-15 M (for chicken Avd). The best-studied Avds are the chicken Avd from Gallus gallus and streptavidin from Streptomyces avidinii, although other Avd studies have also included Avds from various origins, e.g., from frogs, fishes, mushrooms and from many different bacteria. Several engineered Avds have been reported as well, e.g., dual-chain Avds (dcAvds) and single-chain Avds (scAvds), circular permutants with up to four simultaneously modifiable ligand-binding sites. These engineered Avds along with the many native Avds have potential to be used in various nanobiotechnological applications. In this study, we made a structure-based alignment representing all currently available sequences of Avds and studied the evolutionary relationship of Avds using phylogenetic analysis. First, we created an initial multiple sequence alignment of Avds using 42 closely related sequences, guided by the known Avd crystal structures. Next, we searched for non-redundant Avd sequences from various online databases, including National Centre for Biotechnology Information and the Universal Protein Resource; the identified sequences were added to the initial alignment to expand it to a final alignment of 242 Avd sequences. The MEGA software package was used to create distance matrices and a phylogenetic tree. Bootstrap reproducibility of the tree was poor at multiple nodes and may reflect on several possible issues with the data: the sequence length compared is relatively short and, whereas some positions are highly conserved and functional, others can vary without impinging on the structure or the function, so there are few informative sites; it may be that periods of rapid duplication have led to paralogs and that the differences among them are within the error limit of the data; and there may be other yet unknown reasons. Principle component analysis applied to alternative distance data did segregate the major groups, and success is likely due to the multivariate consideration of all the information. Furthermore, based on our extensive alignment and phylogenetic analysis, we expressed two novel Avds, lacavidin from Lactrodectus Hesperus, a western black widow spider, and hoefavidin from Hoeflea phototrophica, an aerobic marine bacterium, the ultimate aim being to determine their X-ray structures. These Avds were selected because of their unique sequences: lacavidin has an N-terminal Avd-like domain but a long C-terminal overhang, whereas hoefavidin was thought to be a dimeric Avd. Both these Avds could be used as novel scaffolds in biotechnological applications.
Resumo:
Emerging markets have experienced rapid economic growth, and manufacturing firms have had to face the effects of globalisation. Some of the major emerging economies have been able to create a supportive business environment that fosters innovation, and China is a good example of a country that has been able to increase value-added investments. Conversely, when we look at Russia, another big emerging market, we witness a situation in which domestic firms struggle more with global competitiveness. Innovation has proven to be one of the most essential ingredients for firms aiming to grow and become more competitive. In emerging markets, the business environment sets many constraints for innovation. However, open strategic choices in new product development enable companies in emerging markets to expand their resource base and capability building. Networking and close inter-firm cooperation are essential in this regard. In this dissertation, I argue that technology transfer is one of the key tools for these companies to become internationally networked and to improve their competitiveness. It forces companies to reach outside the company and national borders, which in many cases, is a major challenge for firms in emerging markets. This dissertation focuses on how companies can catch up with competitiveness in emerging markets. The empirical studies included in the dissertation are based on analyses of survey data mainly of firms and their strategies in the Russian manufacturing industry. The dissertation contributes to the current strategic management literature by further investigating technology management strategies in manufacturing firms in emerging markets and the benefits of more open approaches to new product development and innovation.
Resumo:
La douleur est une expérience perceptive comportant de nombreuses dimensions. Ces dimensions de douleur sont inter-reliées et recrutent des réseaux neuronaux qui traitent les informations correspondantes. L’élucidation de l'architecture fonctionnelle qui supporte les différents aspects perceptifs de l'expérience est donc une étape fondamentale pour notre compréhension du rôle fonctionnel des différentes régions de la matrice cérébrale de la douleur dans les circuits corticaux qui sous tendent l'expérience subjective de la douleur. Parmi les diverses régions du cerveau impliquées dans le traitement de l'information nociceptive, le cortex somatosensoriel primaire et secondaire (S1 et S2) sont les principales régions généralement associées au traitement de l'aspect sensori-discriminatif de la douleur. Toutefois, l'organisation fonctionnelle dans ces régions somato-sensorielles n’est pas complètement claire et relativement peu d'études ont examiné directement l'intégration de l'information entre les régions somatiques sensorielles. Ainsi, plusieurs questions demeurent concernant la relation hiérarchique entre S1 et S2, ainsi que le rôle fonctionnel des connexions inter-hémisphériques des régions somatiques sensorielles homologues. De même, le traitement en série ou en parallèle au sein du système somatosensoriel constitue un autre élément de questionnement qui nécessite un examen plus approfondi. Le but de la présente étude était de tester un certain nombre d'hypothèses sur la causalité dans les interactions fonctionnelle entre S1 et S2, alors que les sujets recevaient des chocs électriques douloureux. Nous avons mis en place une méthode de modélisation de la connectivité, qui utilise une description de causalité de la dynamique du système, afin d'étudier les interactions entre les sites d'activation définie par un ensemble de données provenant d'une étude d'imagerie fonctionnelle. Notre paradigme est constitué de 3 session expérimentales en utilisant des chocs électriques à trois différents niveaux d’intensité, soit modérément douloureux (niveau 3), soit légèrement douloureux (niveau 2), soit complètement non douloureux (niveau 1). Par conséquent, notre paradigme nous a permis d'étudier comment l'intensité du stimulus est codé dans notre réseau d'intérêt, et comment la connectivité des différentes régions est modulée dans les conditions de stimulation différentes. Nos résultats sont en faveur du mode sériel de traitement de l’information somatosensorielle nociceptive avec un apport prédominant de la voie thalamocorticale vers S1 controlatérale au site de stimulation. Nos résultats impliquent que l'information se propage de S1 controlatéral à travers notre réseau d'intérêt composé des cortex S1 bilatéraux et S2. Notre analyse indique que la connexion S1→S2 est renforcée par la douleur, ce qui suggère que S2 est plus élevé dans la hiérarchie du traitement de la douleur que S1, conformément aux conclusions précédentes neurophysiologiques et de magnétoencéphalographie. Enfin, notre analyse fournit des preuves de l'entrée de l'information somatosensorielle dans l'hémisphère controlatéral au côté de stimulation, avec des connexions inter-hémisphériques responsable du transfert de l'information à l'hémisphère ipsilatéral.
Resumo:
Rapport de recherche présenté à la Faculté des arts et des sciences en vue de l'obtention du grade de Maîtrise en sciences économiques.
Resumo:
Triple quadrupole mass spectrometers coupled with high performance liquid chromatography are workhorses in quantitative bioanalyses. It provides substantial benefits including reproducibility, sensitivity and selectivity for trace analysis. Selected Reaction Monitoring allows targeted assay development but data sets generated contain very limited information. Data mining and analysis of non-targeted high-resolution mass spectrometry profiles of biological samples offer the opportunity to perform more exhaustive assessments, including quantitative and qualitative analysis. The objectives of this study was to test method precision and accuracy, statistically compare bupivacaine drug concentration in real study samples and verify if high resolution and accurate mass data collected in scan mode can actually permit retrospective data analysis, more specifically, extract metabolite related information. The precision and accuracy data presented using both instruments provided equivalent results. Overall, the accuracy was ranging from 106.2 to 113.2% and the precision observed was from 1.0 to 3.7%. Statistical comparisons using a linear regression between both methods reveal a coefficient of determination (R2) of 0.9996 and a slope of 1.02 demonstrating a very strong correlation between both methods. Individual sample comparison showed differences from -4.5% to 1.6% well within the accepted analytical error. Moreover, post acquisition extracted ion chromatograms at m/z 233.1648 ± 5 ppm (M-56) and m/z 305.2224 ± 5 ppm (M+16) revealed the presence of desbutyl-bupivacaine and three distinct hydroxylated bupivacaine metabolites. Post acquisition analysis allowed us to produce semiquantitative evaluations of the concentration-time profiles for bupicavaine metabolites.