906 resultados para Content analysis (Communication) -- Data processing
Resumo:
In an investigation intended to determine training needs of night crews, Bowers et al. (1998, this issue) report two studies showing that the patterning of communication is a better discriminator of good and poor crews than is the content of communication. Bowers et al. characterize their studies as intended to generate hypotheses for training needs and draw connections with Exploratory Sequential Data Analysis (ESDA). Although applauding the intentions of Bowers ct al., we point out some concerns with their characterization and implementation of ESDA. Our principal concern is that the Bowers et al. exploration of the data does not convincingly lead them back to a better fundamental understanding of the original phenomena they are investigating.
Resumo:
We describe a novel approach to explore DNA nucleotide sequence data, aiming to produce high-level categorical and structural information about the underlying chromosomes, genomes and species. The article starts by analyzing chromosomal data through histograms using fixed length DNA sequences. After creating the DNA-related histograms, a correlation between pairs of histograms is computed, producing a global correlation matrix. These data are then used as input to several data processing methods for information extraction and tabular/graphical output generation. A set of 18 species is processed and the extensive results reveal that the proposed method is able to generate significant and diversified outputs, in good accordance with current scientific knowledge in domains such as genomics and phylogenetics.
Resumo:
The development of high spatial resolution airborne and spaceborne sensors has improved the capability of ground-based data collection in the fields of agriculture, geography, geology, mineral identification, detection [2, 3], and classification [4–8]. The signal read by the sensor from a given spatial element of resolution and at a given spectral band is a mixing of components originated by the constituent substances, termed endmembers, located at that element of resolution. This chapter addresses hyperspectral unmixing, which is the decomposition of the pixel spectra into a collection of constituent spectra, or spectral signatures, and their corresponding fractional abundances indicating the proportion of each endmember present in the pixel [9, 10]. Depending on the mixing scales at each pixel, the observed mixture is either linear or nonlinear [11, 12]. The linear mixing model holds when the mixing scale is macroscopic [13]. The nonlinear model holds when the mixing scale is microscopic (i.e., intimate mixtures) [14, 15]. The linear model assumes negligible interaction among distinct endmembers [16, 17]. The nonlinear model assumes that incident solar radiation is scattered by the scene through multiple bounces involving several endmembers [18]. Under the linear mixing model and assuming that the number of endmembers and their spectral signatures are known, hyperspectral unmixing is a linear problem, which can be addressed, for example, under the maximum likelihood setup [19], the constrained least-squares approach [20], the spectral signature matching [21], the spectral angle mapper [22], and the subspace projection methods [20, 23, 24]. Orthogonal subspace projection [23] reduces the data dimensionality, suppresses undesired spectral signatures, and detects the presence of a spectral signature of interest. The basic concept is to project each pixel onto a subspace that is orthogonal to the undesired signatures. As shown in Settle [19], the orthogonal subspace projection technique is equivalent to the maximum likelihood estimator. This projection technique was extended by three unconstrained least-squares approaches [24] (signature space orthogonal projection, oblique subspace projection, target signature space orthogonal projection). Other works using maximum a posteriori probability (MAP) framework [25] and projection pursuit [26, 27] have also been applied to hyperspectral data. In most cases the number of endmembers and their signatures are not known. Independent component analysis (ICA) is an unsupervised source separation process that has been applied with success to blind source separation, to feature extraction, and to unsupervised recognition [28, 29]. ICA consists in finding a linear decomposition of observed data yielding statistically independent components. Given that hyperspectral data are, in given circumstances, linear mixtures, ICA comes to mind as a possible tool to unmix this class of data. In fact, the application of ICA to hyperspectral data has been proposed in reference 30, where endmember signatures are treated as sources and the mixing matrix is composed by the abundance fractions, and in references 9, 25, and 31–38, where sources are the abundance fractions of each endmember. In the first approach, we face two problems: (1) The number of samples are limited to the number of channels and (2) the process of pixel selection, playing the role of mixed sources, is not straightforward. In the second approach, ICA is based on the assumption of mutually independent sources, which is not the case of hyperspectral data, since the sum of the abundance fractions is constant, implying dependence among abundances. This dependence compromises ICA applicability to hyperspectral images. In addition, hyperspectral data are immersed in noise, which degrades the ICA performance. IFA [39] was introduced as a method for recovering independent hidden sources from their observed noisy mixtures. IFA implements two steps. First, source densities and noise covariance are estimated from the observed data by maximum likelihood. Second, sources are reconstructed by an optimal nonlinear estimator. Although IFA is a well-suited technique to unmix independent sources under noisy observations, the dependence among abundance fractions in hyperspectral imagery compromises, as in the ICA case, the IFA performance. Considering the linear mixing model, hyperspectral observations are in a simplex whose vertices correspond to the endmembers. Several approaches [40–43] have exploited this geometric feature of hyperspectral mixtures [42]. Minimum volume transform (MVT) algorithm [43] determines the simplex of minimum volume containing the data. The MVT-type approaches are complex from the computational point of view. Usually, these algorithms first find the convex hull defined by the observed data and then fit a minimum volume simplex to it. Aiming at a lower computational complexity, some algorithms such as the vertex component analysis (VCA) [44], the pixel purity index (PPI) [42], and the N-FINDR [45] still find the minimum volume simplex containing the data cloud, but they assume the presence in the data of at least one pure pixel of each endmember. This is a strong requisite that may not hold in some data sets. In any case, these algorithms find the set of most pure pixels in the data. Hyperspectral sensors collects spatial images over many narrow contiguous bands, yielding large amounts of data. For this reason, very often, the processing of hyperspectral data, included unmixing, is preceded by a dimensionality reduction step to reduce computational complexity and to improve the signal-to-noise ratio (SNR). Principal component analysis (PCA) [46], maximum noise fraction (MNF) [47], and singular value decomposition (SVD) [48] are three well-known projection techniques widely used in remote sensing in general and in unmixing in particular. The newly introduced method [49] exploits the structure of hyperspectral mixtures, namely the fact that spectral vectors are nonnegative. The computational complexity associated with these techniques is an obstacle to real-time implementations. To overcome this problem, band selection [50] and non-statistical [51] algorithms have been introduced. This chapter addresses hyperspectral data source dependence and its impact on ICA and IFA performances. The study consider simulated and real data and is based on mutual information minimization. Hyperspectral observations are described by a generative model. This model takes into account the degradation mechanisms normally found in hyperspectral applications—namely, signature variability [52–54], abundance constraints, topography modulation, and system noise. The computation of mutual information is based on fitting mixtures of Gaussians (MOG) to data. The MOG parameters (number of components, means, covariances, and weights) are inferred using the minimum description length (MDL) based algorithm [55]. We study the behavior of the mutual information as a function of the unmixing matrix. The conclusion is that the unmixing matrix minimizing the mutual information might be very far from the true one. Nevertheless, some abundance fractions might be well separated, mainly in the presence of strong signature variability, a large number of endmembers, and high SNR. We end this chapter by sketching a new methodology to blindly unmix hyperspectral data, where abundance fractions are modeled as a mixture of Dirichlet sources. This model enforces positivity and constant sum sources (full additivity) constraints. The mixing matrix is inferred by an expectation-maximization (EM)-type algorithm. This approach is in the vein of references 39 and 56, replacing independent sources represented by MOG with mixture of Dirichlet sources. Compared with the geometric-based approaches, the advantage of this model is that there is no need to have pure pixels in the observations. The chapter is organized as follows. Section 6.2 presents a spectral radiance model and formulates the spectral unmixing as a linear problem accounting for abundance constraints, signature variability, topography modulation, and system noise. Section 6.3 presents a brief resume of ICA and IFA algorithms. Section 6.4 illustrates the performance of IFA and of some well-known ICA algorithms with experimental data. Section 6.5 studies the ICA and IFA limitations in unmixing hyperspectral data. Section 6.6 presents results of ICA based on real data. Section 6.7 describes the new blind unmixing scheme and some illustrative examples. Section 6.8 concludes with some remarks.
Resumo:
Dissertação para obtenção do Grau de Doutor em Ciências da Educação Especialidade em Tecnologias, Redes e Multimédia na Educação e Formação
Resumo:
In view of the major social and environmental problems, with which we are faced nowadays, we noticed a certain absence of values in society, where man draws many more resources than nature can replace in the short or medium term. Within the framework of fashion emerges the ethical fashion as a movement in this direction, intending to change this current paradigm. Ethical fashion encompasses different concepts such as fair trade, sustainability, working conditions, raw materials, social responsibility and the protection of animals. This study aims to determine which type of communication are fashion brands using in this context, and if this communication aims at educating the consumer for a more ethical consumer behavior. For this study were selected 44 fashion brands associated with the Ethical Trade Initiative. The method used for the research development was content analysis for which first was made a data collection of the information provided on the websites and social networks of the selected fashion brands. The data was analyzed taking into account the quality and type of information published related to ethical fashion, for which an ordinal scale was created as a way of measuring and comparing results.
Resumo:
Recently, there has been a growing interest in the field of metabolomics, materialized by a remarkable growth in experimental techniques, available data and related biological applications. Indeed, techniques as Nuclear Magnetic Resonance, Gas or Liquid Chromatography, Mass Spectrometry, Infrared and UV-visible spectroscopies have provided extensive datasets that can help in tasks as biological and biomedical discovery, biotechnology and drug development. However, as it happens with other omics data, the analysis of metabolomics datasets provides multiple challenges, both in terms of methodologies and in the development of appropriate computational tools. Indeed, from the available software tools, none addresses the multiplicity of existing techniques and data analysis tasks. In this work, we make available a novel R package, named specmine, which provides a set of methods for metabolomics data analysis, including data loading in different formats, pre-processing, metabolite identification, univariate and multivariate data analysis, machine learning, and feature selection. Importantly, the implemented methods provide adequate support for the analysis of data from diverse experimental techniques, integrating a large set of functions from several R packages in a powerful, yet simple to use environment. The package, already available in CRAN, is accompanied by a web site where users can deposit datasets, scripts and analysis reports to be shared with the community, promoting the efficient sharing of metabolomics data analysis pipelines.
Resumo:
BACKGROUND: Solexa/Illumina short-read ultra-high throughput DNA sequencing technology produces millions of short tags (up to 36 bases) by parallel sequencing-by-synthesis of DNA colonies. The processing and statistical analysis of such high-throughput data poses new challenges; currently a fair proportion of the tags are routinely discarded due to an inability to match them to a reference sequence, thereby reducing the effective throughput of the technology. RESULTS: We propose a novel base calling algorithm using model-based clustering and probability theory to identify ambiguous bases and code them with IUPAC symbols. We also select optimal sub-tags using a score based on information content to remove uncertain bases towards the ends of the reads. CONCLUSION: We show that the method improves genome coverage and number of usable tags as compared with Solexa's data processing pipeline by an average of 15%. An R package is provided which allows fast and accurate base calling of Solexa's fluorescence intensity files and the production of informative diagnostic plots.
Resumo:
DnaSP is a software package for a comprehensive analysis of DNA polymorphism data. Version 5 implements a number of new features and analytical methods allowing extensive DNA polymorphism analyses on large datasets. Among other features, the newly implemented methods allow for: (i) analyses on multiple data files; (ii) haplotype phasing; (iii) analyses on insertion/deletion polymorphism data; (iv) visualizing sliding window results integrated with available genome annotations in the UCSC browser.
Resumo:
Työn tarkoituksena oli tutkia sisältö- ja diskurssianalyysin avulla kuinka yritykset viestivät asiakasreferenssejä verkkosivuillaan. Työssä keskityttiin tutkimaan yritysten referenssikuvausten teemoja ja diskursseja, sekä sitä kuinka referenssisuhde rakentuu diskursiivisesti referenssikuvauksissa. Tutkimukseen valittiin kolme suomalaista ICT-alan yritystä: Nokia, TietoEnator ja F-Secure. Aineisto koostuu 140:stä yritysten WWW-sivuilta kerätystä referenssikuvauksesta. Sisältöanalyysin tuloksena havaittiin, että referenssikuvaukset keskittyvät kuvaamaan yksittäisiä tuote- tai projektitoimituksia referenssiasiakkaille kyseisten asiakassuhteiden valossa. Analyysin tuloksena tunnistettiin kolme diskurssia: hyötydiskurssi, sitoutumisen diskurssi sekä teknologisen eksperttiyden diskurssi. Diskurssit paljastavat referenssikuvausten retoriset keinot ja konstruoivat referenssisuhteen ja toimittajan subjektiposition eri näkökulmista. Pääpaino referenssikuvauksissa on toimittajan ratkaisun tuomissa hyödyissä. Diskurssit tuottavat referenssisuhteesta kuvan hyötyjä tuovana ja läheisenä asiakassuhteena, joka tarjoaa väylän ulkopuolisiin kyvykkyyksiin ja teknologioihin. Toimittaja esitetään referenssikuvauksissa diskurssista riippuen hyötyjen tuojana, luotettavana partnerina sekä kokeneena eksperttinä. Referenssiasiakas sen sijaan esitetään vain yhdestä näkökulmasta stereotyyppisesti tärkeänä ja tyytyväisenä asiakkaana.
Resumo:
The objective of the thesis was to explore the nature and characteristics of customer-related internal communication in a global industrial matrix organization during a specific customer relationship, and how it could be improved. The theoretical part of the study views the field of the concepts of intra-organizational information and knowledge sharing. The theoretical part also views the internal communications influences to customer relationships, its problematic, and the suggestions to improve internal communication in literature. The empirical part of the study was conducted with the Content Analysis and the Social Network Analysis as research methods. The data was collected by interviews and a questionnaire. Internal communication was observed first generally within the organization from the point of view of a certain business, and secondly, during a specific customer relationship at personal level and at departmental level. The results of the study describe the nature and characteristics of internal communication in the organization. The results give 13 suggestions for improving internal communication in the organization. Although the study has been done in one specific organization, it also offers insights for other organizations as well as managers to improve their internal communication.
TransPromo communication as a part of company’s marketing communications when the e-bill is a medium
Resumo:
The goal of this thesis was to research what TransPromo is, why companies want to implement TransPromo communication, and what the elements of effective Transpromo communication are. Furthermore, the goal was to develop a TransPromo communication strategy and a normative model for TeliaSonera Finland, which depicts the elements of effective TransPromo communication when the electronic bill is a medium. Abductive reasoning was utilized in this thesis, which means that empirical and theoretical worlds are alternating in researcher’s reasoning process. This thesis didn’t rely on any specific theory nor did it utilize any previous theoretical model. However, certain theoretical connections existed so this thesis cannot be considered purely inductive. The empirical part of this thesis was conducted by examining secondary industry data and by conducting specialist interviews at TeliaSonera Finland and Strålfors. Grounded Theory approach was utilized in the analysis of the interview data and content analysis was used in the analysis of secondary industry data. This thesis increases knowledge in the area of TransPromo communication, and provides one definition of TransPromo communication. As a result of this thesis, a TransPromo communication strategy and a normative model for TeliaSonera Finland was built. The model depicts the elements of the effective TransPromo communication when the e-bill is a medium. The TranPromo communication objective is to utilize transaction documents, such as bills, in order to deliver targeted and personalized marketing messages to current customers. The aim is to strengthen the customer relationship, and to enforce up-sell and cross-sell opportunities and cost savings.
Resumo:
This paper explores behavioral patterns of web users on an online magazine web-site. The goal of the study is to first find and visualize user paths within the data generated during collection, and to identify some generic behavioral typologies of user behavior. To form a theoretical foundation for processing data and identifying behavioral ar-chetypes, the study relies on established consumer behavior literature to propose typologies of behavior. For data processing, the study utilizes methodologies of ap-plied cluster analysis and sequential path analysis. Utilizing a dataset of click stream data generated from the real-life clicks of 250 ran-domly selected website visitors over a period of six weeks. Based on the data collect-ed, an exploratory method is followed in order to find and visualize generally occur-ring paths of users on the website. Six distinct behavioral typologies were recog-nized, with the dominant user consuming mainly blog content, as opposed to editori-al content. Most importantly, it was observed that approximately 80% of clicks were of the blog content category, meaning that the majority of web traffic occurring in the site takes place in content other than the desired editorial content pages. The out-come of the study is a set of managerial recommendations for each identified behavioral archetype.
Resumo:
Analysis by reduction is a linguistically motivated method for checking correctness of a sentence. It can be modelled by restarting automata. In this paper we propose a method for learning restarting automata which are strictly locally testable (SLT-R-automata). The method is based on the concept of identification in the limit from positive examples only. Also we characterize the class of languages accepted by SLT-R-automata with respect to the Chomsky hierarchy.
Resumo:
Conceptual Graphs and Formal Concept Analysis have in common basic concerns: the focus on conceptual structures, the use of diagrams for supporting communication, the orientation by Peirce's Pragmatism, and the aim of representing and processing knowledge. These concerns open rich possibilities of interplay and integration. We discuss the philosophical foundations of both disciplines, and analyze their specific qualities. Based on this analysis, we discuss some possible approaches of interplay and integration.
Resumo:
In this paper, we discuss Conceptual Knowledge Discovery in Databases (CKDD) in its connection with Data Analysis. Our approach is based on Formal Concept Analysis, a mathematical theory which has been developed and proven useful during the last 20 years. Formal Concept Analysis has led to a theory of conceptual information systems which has been applied by using the management system TOSCANA in a wide range of domains. In this paper, we use such an application in database marketing to demonstrate how methods and procedures of CKDD can be applied in Data Analysis. In particular, we show the interplay and integration of data mining and data analysis techniques based on Formal Concept Analysis. The main concern of this paper is to explain how the transition from data to knowledge can be supported by a TOSCANA system. To clarify the transition steps we discuss their correspondence to the five levels of knowledge representation established by R. Brachman and to the steps of empirically grounded theory building proposed by A. Strauss and J. Corbin.