9 resultados para applicazione, business analysis, data mining, Facebook, PRIN, relazioni sociali, social network
em Helda - Digital Repository of University of Helsinki
Resumo:
Telecommunications network management is based on huge amounts of data that are continuously collected from elements and devices from all around the network. The data is monitored and analysed to provide information for decision making in all operation functions. Knowledge discovery and data mining methods can support fast-pace decision making in network operations. In this thesis, I analyse decision making on different levels of network operations. I identify the requirements decision-making sets for knowledge discovery and data mining tools and methods, and I study resources that are available to them. I then propose two methods for augmenting and applying frequent sets to support everyday decision making. The proposed methods are Comprehensive Log Compression for log data summarisation and Queryable Log Compression for semantic compression of log data. Finally I suggest a model for a continuous knowledge discovery process and outline how it can be implemented and integrated to the existing network operations infrastructure.
Resumo:
Matrix decompositions, where a given matrix is represented as a product of two other matrices, are regularly used in data mining. Most matrix decompositions have their roots in linear algebra, but the needs of data mining are not always those of linear algebra. In data mining one needs to have results that are interpretable -- and what is considered interpretable in data mining can be very different to what is considered interpretable in linear algebra. --- The purpose of this thesis is to study matrix decompositions that directly address the issue of interpretability. An example is a decomposition of binary matrices where the factor matrices are assumed to be binary and the matrix multiplication is Boolean. The restriction to binary factor matrices increases interpretability -- factor matrices are of the same type as the original matrix -- and allows the use of Boolean matrix multiplication, which is often more intuitive than normal matrix multiplication with binary matrices. Also several other decomposition methods are described, and the computational complexity of computing them is studied together with the hardness of approximating the related optimization problems. Based on these studies, algorithms for constructing the decompositions are proposed. Constructing the decompositions turns out to be computationally hard, and the proposed algorithms are mostly based on various heuristics. Nevertheless, the algorithms are shown to be capable of finding good results in empirical experiments conducted with both synthetic and real-world data.
Resumo:
Segmentation is a data mining technique yielding simplified representations of sequences of ordered points. A sequence is divided into some number of homogeneous blocks, and all points within a segment are described by a single value. The focus in this thesis is on piecewise-constant segments, where the most likely description for each segment and the most likely segmentation into some number of blocks can be computed efficiently. Representing sequences as segmentations is useful in, e.g., storage and indexing tasks in sequence databases, and segmentation can be used as a tool in learning about the structure of a given sequence. The discussion in this thesis begins with basic questions related to segmentation analysis, such as choosing the number of segments, and evaluating the obtained segmentations. Standard model selection techniques are shown to perform well for the sequence segmentation task. Segmentation evaluation is proposed with respect to a known segmentation structure. Applying segmentation on certain features of a sequence is shown to yield segmentations that are significantly close to the known underlying structure. Two extensions to the basic segmentation framework are introduced: unimodal segmentation and basis segmentation. The former is concerned with segmentations where the segment descriptions first increase and then decrease, and the latter with the interplay between different dimensions and segments in the sequence. These problems are formally defined and algorithms for solving them are provided and analyzed. Practical applications for segmentation techniques include time series and data stream analysis, text analysis, and biological sequence analysis. In this thesis segmentation applications are demonstrated in analyzing genomic sequences.
Resumo:
Cell transition data is obtained from a cellular phone that switches its current serving cell tower. The data consists of a sequence of transition events, which are pairs of cell identifiers and transition times. The focus of this thesis is applying data mining methods to such data, developing new algorithms, and extracting knowledge that will be a solid foundation on which to build location-aware applications. In addition to a thorough exploration of the features of the data, the tools and methods developed in this thesis provide solutions to three distinct research problems. First, we develop clustering algorithms that produce a reliable mapping between cell transitions and physical locations observed by users of mobile devices. The main clustering algorithm operates in online fashion, and we consider also a number of offline clustering methods for comparison. Second, we define the concept of significant locations, known as bases, and give an online algorithm for determining them. Finally, we consider the task of predicting the movement of the user, based on historical data. We develop a prediction algorithm that considers paths of movement in their entirety, instead of just the most recent movement history. All of the presented methods are evaluated with a significant body of real cell transition data, collected from about one hundred different individuals. The algorithms developed in this thesis are designed to be implemented on a mobile device, and require no extra hardware sensors or network infrastructure. By not relying on external services and keeping the user information as much as possible on the user s own personal device, we avoid privacy issues and let the users control the disclosure of their location information.
Resumo:
Helicobacter pylori infection is a risk factor for gastric cancer, which is a major health issue worldwide. Gastric cancer has a poor prognosis due to the unnoticeable progression of the disease and surgery is the only available treatment in gastric cancer. Therefore, gastric cancer patients would greatly benefit from identifying biomarker genes that would improve diagnostic and prognostic prediction and provide targets for molecular therapies. DNA copy number amplifications are the hallmarks of cancers in various anatomical locations. Mechanisms of amplification predict that DNA double-strand breaks occur at the margins of the amplified region. The first objective of this thesis was to identify the genes that were differentially expressed in H. pylori infection as well as the transcription factors and signal transduction pathways that were associated with the gene expression changes. The second objective was to identify putative biomarker genes in gastric cancer with correlated expression and copy number, and the last objective was to characterize cancers based on DNA copy number amplifications. DNA microarrays, an in vitro model and real-time polymerase chain reaction were used to measure gene expression changes in H. pylori infected AGS cells. In order to identify the transcription factors and signal transduction pathways that were activated after H. pylori infection, gene expression profiling data from the H. pylori experiments and a bioinformatics approach accompanied by experimental validation were used. Genome-wide expression and copy number microarray analysis of clinical gastric cancer samples and immunohistochemistry on tissue microarray were used to identify putative gastric cancer genes. Data mining and machine learning techniques were applied to study amplifications in a cross-section of cancers. FOS and various stress response genes were regulated by H. pylori infection. H. pylori regulated genes were enriched in the chromosomal regions that are frequently changed in gastric cancer, suggesting that molecular pathways of gastric cancer and premalignant H. pylori infection that induces gastritis are interconnected. 16 transcription factors were identified as being associated with H. pylori infection induced changes in gene expression. NF-κB transcription factor and p50 and p65 subunits were verified using elecrophoretic mobility shift assays. ERBB2 and other genes located in 17q12- q21 were found to be up-regulated in association with copy number amplification in gastric cancer. Cancers with similar cell type and origin clustered together based on the genomic localization of the amplifications. Cancer genes and large genes were co-localized with amplified regions and fragile sites, telomeres, centromeres and light chromosome bands were enriched at the amplification boundaries. H. pylori activated transcription factors and signal transduction pathways function in cellular mechanisms that might be capable of promoting carcinogenesis of the stomach. Intestinal and diffuse type gastric cancers showed distinct molecular genetic profiles. Integration of gene expression and copy number microarray data allowed the identification of genes that might be involved in gastric carcinogenesis and have clinical relevance. Gene amplifications were demonstrated to be non-random genomic instabilities. Cell lineage, properties of precursor stem cells, tissue microenvironment and genomic map localization of specific oncogenes define the site specificity of DNA amplifications, whereas labile genomic features define the structures of amplicons. These conclusions suggest that the definition of genomic changes in cancer is based on the interplay between the cancer cell and the tumor microenvironment.
Resumo:
In the markets-as-networks approach business networks are conceived as dynamic actor structures, giving focus to exchange relationships and actors’ capabilities to control and co-ordinate activities and resources. Researchers have shared an understanding that actors’ actions are crucial for the development of business networks and for network dynamics. However, researchers have mainly studied firms as business actors and excluded individuals, although both firms and individuals can be seen as business actors. This focus on firms as business actors has resulted in a paucity of research on human action and the exchange of intangible resources in business networks, e.g. social exchange between individuals in social networks. Consequently, the current conception of business networks fails to appreciate the richness of business actors, the human character of business action and the import of social action in business networks. The central assumption in this study is that business actors are multidimensional and that their specific constitution in any given situation is determined by human interaction in social networks. Multidimensionality is presented as a concept for exploring how business actors act in different situations and how actors simultaneously manage multiple identities: individual, organisational, professional, business and network identities. The study presents a model that describes the multidimensionality of actors in business networks and conceptualises the connection between social exchange and human action in business networks. Empirically the study explores the change that has taken place in pharmaceutical retailing in Finland during recent years. The phenomenon of emerging pharmacy networks is highly contemporary in the Nordic countries, where the traditional license-based pharmacy business is changing. The study analyses the development of two Finnish pharmacy chains, one integrated and one voluntary chain, and the network structures and dynamics in them. Social Network Analysis is applied to explore the social structures within the pharmacy networks. The study shows that emerging pharmacy networks are multifaceted phenomena where political, economic, social, cultural, and historical elements together contribute to the observed changes. Individuals have always been strongly present in the pharmacy business and the development of pharmacy networks provides an interesting example of human actors’ influence in the development of business networks. The dynamics or forces driving the network development can be linked to actors’ own economic and social motives for developing the business. The study highlights the central role of individuals and social networks in the development of the two studied pharmacy networks. The relation between individuals and social networks is reciprocal. The social context of every individual enables multidimensional business actors. The mix of various identities, both individual and collective identities, is an important part of network dynamics. Social networks in pharmacy networks create a platform for exchange and social action, and social networks enable and support business network development.
Resumo:
The new paradigm of connectedness and empowerment brought by the interactivity feature of the Web 2.0 has been challenging the traditional centralized performance of mainstream media. The corporation has been able to survive the strong winds by transforming itself into a global multimedia business network embedded in the network society. By establishing networks, e.g. networks of production and distribution, the global multimedia business network has been able to sight potential solutions by opening the doors to innovation in a decentralized and flexible manner. Under this emerging context of re-organization, traditional practices like sourcing need to be re- explained and that is precisely what this thesis attempts to tackle. Based on ICT and on the network society, the study seeks to explain within the Finnish context the particular case of Helsingin Sanomat (HS) and its relations with the youth news agency, Youth Voice Editorial Board (NÄT). In that sense, the study can be regarded as an explanatory embedded single case study, where HS is the principal unit of analysis and NÄT its embedded unit of analysis. The thesis was able to reach explanations through interrelated steps. First, it determined the role of ICT in HS’s sourcing practices. Then it mapped an overview of the HS’s sourcing relations and provided a context in which NÄT was located. And finally, it established conceptualized institutional relational data between HS and NÄT for their posterior measurement through social network analysis. The data set was collected via qualitative interviews addressed to online and offline editors of HS as well as interviews addressed to NÄT’s personnel. The study concluded that ICT’s interactivity and User Generated Content (UGC) are not sourcing tools as such but mechanism used by HS for getting ideas that could turn into potential news stories. However, when it comes to visual communication, some exemptions were found. The lack of official sources amidst the immediacy leads HS to rely on ICT’s interaction and UGC. More than meets the eye, ICT’s input into the sourcing practice may be more noticeable if the interaction and UGC is well organized and coordinated into proper and innovative networks of alternative content collaboration. Currently, HS performs this sourcing practice via two projects that differ, precisely, by the mode they are coordinated. The first project found, Omakaupunki, is coordinated internally by Sanoma Group’s owned media houses HS, Vartti and Metro. The second project found is coordinated externally. The external alternative sourcing network, as it was labeled, consists of three actors, namely HS, NÄT (professionals in charge) and the youth. This network is a balanced and complete triad in which the actors connect themselves in relations of feedback, recognition, creativity and filtering. However, as innovation is approached very reluctantly, this content collaboration is a laboratory of experiments; a ‘COLLABORATORY’.