10 resultados para Knowledge Discovery Tools
em Helda - Digital Repository of University of Helsinki
Resumo:
Telecommunications network management is based on huge amounts of data that are continuously collected from elements and devices from all around the network. The data is monitored and analysed to provide information for decision making in all operation functions. Knowledge discovery and data mining methods can support fast-pace decision making in network operations. In this thesis, I analyse decision making on different levels of network operations. I identify the requirements decision-making sets for knowledge discovery and data mining tools and methods, and I study resources that are available to them. I then propose two methods for augmenting and applying frequent sets to support everyday decision making. The proposed methods are Comprehensive Log Compression for log data summarisation and Queryable Log Compression for semantic compression of log data. Finally I suggest a model for a continuous knowledge discovery process and outline how it can be implemented and integrated to the existing network operations infrastructure.
Resumo:
We propose to compress weighted graphs (networks), motivated by the observation that large networks of social, biological, or other relations can be complex to handle and visualize. In the process also known as graph simplication, nodes and (unweighted) edges are grouped to supernodes and superedges, respectively, to obtain a smaller graph. We propose models and algorithms for weighted graphs. The interpretation (i.e. decompression) of a compressed, weighted graph is that a pair of original nodes is connected by an edge if their supernodes are connected by one, and that the weight of an edge is approximated to be the weight of the superedge. The compression problem now consists of choosing supernodes, superedges, and superedge weights so that the approximation error is minimized while the amount of compression is maximized. In this paper, we formulate this task as the 'simple weighted graph compression problem'. We then propose a much wider class of tasks under the name of 'generalized weighted graph compression problem'. The generalized task extends the optimization to preserve longer-range connectivities between nodes, not just individual edge weights. We study the properties of these problems and propose a range of algorithms to solve them, with dierent balances between complexity and quality of the result. We evaluate the problems and algorithms experimentally on real networks. The results indicate that weighted graphs can be compressed efficiently with relatively little compression error.
Resumo:
During the past ten years, large-scale transcript analysis using microarrays has become a powerful tool to identify and predict functions for new genes. It allows simultaneous monitoring of the expression of thousands of genes and has become a routinely used tool in laboratories worldwide. Microarray analysis will, together with other functional genomics tools, take us closer to understanding the functions of all genes in genomes of living organisms. Flower development is a genetically regulated process which has mostly been studied in the traditional model species Arabidopsis thaliana, Antirrhinum majus and Petunia hybrida. The molecular mechanisms behind flower development in them are partly applicable in other plant systems. However, not all biological phenomena can be approached with just a few model systems. In order to understand and apply the knowledge to ecologically and economically important plants, other species also need to be studied. Sequencing of 17 000 ESTs from nine different cDNA libraries of the ornamental plant Gerbera hybrida made it possible to construct a cDNA microarray with 9000 probes. The probes of the microarray represent all different ESTs in the database. From the gerbera ESTs 20% were unique to gerbera while 373 were specific to the Asteraceae family of flowering plants. Gerbera has composite inflorescences with three different types of flowers that vary from each other morphologically. The marginal ray flowers are large, often pigmented and female, while the central disc flowers are smaller and more radially symmetrical perfect flowers. Intermediate trans flowers are similar to ray flowers but smaller in size. This feature together with the molecular tools applied to gerbera, make gerbera a unique system in comparison to the common model plants with only a single kind of flowers in their inflorescence. In the first part of this thesis, conditions for gerbera microarray analysis were optimised including experimental design, sample preparation and hybridization, as well as data analysis and verification. Moreover, in the first study, the flower and flower organ-specific genes were identified. After the reliability and reproducibility of the method were confirmed, the microarrays were utilized to investigate transcriptional differences between ray and disc flowers. This study revealed novel information about the morphological development as well as the transcriptional regulation of early stages of development in various flower types of gerbera. The most interesting finding was differential expression of MADS-box genes, suggesting the existence of flower type-specific regulatory complexes in the specification of different types of flowers. The gerbera microarray was further used to profile changes in expression during petal development. Gerbera ray flower petals are large, which makes them an ideal model to study organogenesis. Six different stages were compared and specifically analysed. Expression profiles of genes related to cell structure and growth implied that during stage two, cells divide, a process which is marked by expression of histones, cyclins and tubulins. Stage 4 was found to be a transition stage between cell division and expansion and by stage 6 cells had stopped division and instead underwent expansion. Interestingly, at the last analysed stage, stage 9, when cells did not grow any more, the highest number of upregulated genes was detected. The gerbera microarray is a fully-functioning tool for large-scale studies of flower development and correlation with real-time RT-PCR results show that it is also highly sensitive and reliable. Gene expression data presented here will be a source for gene expression mining or marker gene discovery in the future studies that will be performed in the Gerbera Laboratory. The publicly available data will also serve the plant research community world-wide.
Resumo:
Mutation and recombination are the fundamental processes leading to genetic variation in natural populations. This variation forms the raw material for evolution through natural selection and drift. Therefore, studying mutation rates may reveal information about evolutionary histories as well as phylogenetic interrelationships of organisms. In this thesis two molecular tools, DNA barcoding and the molecular clock were examined. In the first part, the efficiency of mutations to delineate closely related species was tested and the implications for conservation practices were assessed. The second part investigated the proposition that a constant mutation rate exists within invertebrates, in form of a metabolic-rate dependent molecular clock, which can be applied to accurately date speciation events. DNA barcoding aspires to be an efficient technique to not only distinguish between species but also reveal population-level variation solely relying on mutations found on a short stretch of a single gene. In this thesis barcoding was applied to discriminate between Hylochares populations from Russian Karelia and new Hylochares findings from the greater Helsinki region in Finland. Although barcoding failed to delineate the two reproductively isolated groups, their distinct morphological features and differing life-history traits led to their classification as two closely related, although separate species. The lack of genetic differentiation appears to be due to a recent divergence event not yet reflected in the beetles molecular make-up. Thus, the Russian Hylochares was described as a new species. The Finnish species, previously considered as locally extinct, was recognized as endangered. Even if, due to their identical genetic make-up, the populations had been regarded as conspecific, conservation strategies based on prior knowledge from Russia would not have guaranteed the survival of the Finnish beetle. Therefore, new conservation actions based on detailed studies of the biology and life-history of the Finnish Hylochares were conducted to protect this endemic rarity in Finland. The idea behind the strict molecular clock is that mutation rates are constant over evolutionary time and may thus be used to infer species divergence dates. However, one of the most recent theories argues that a strict clock does not tick per unit of time but that it has a constant substitution rate per unit of mass-specific metabolic energy. Therefore, according to this hypothesis, molecular clocks have to be recalibrated taking body size and temperature into account. This thesis tested the temperature effect on mutation rates in equally sized invertebrates. For the first dataset (family Eucnemidae, Coleoptera) the phylogenetic interrelationships and evolutionary history of the genus Arrhipis had to be inferred before the influence of temperature on substitution rates could be studied. Further, a second, larger invertebrate dataset (family Syrphidae, Diptera) was employed. Several methodological approaches, a number of genes and multiple molecular clock models revealed that there was no consistent relationship between temperature and mutation rate for the taxa under study. Thus, the body size effect, observed in vertebrates but controversial for invertebrates, rather than temperature may be the underlying driving force behind the metabolic-rate dependent molecular clock. Therefore, the metabolic-rate dependent molecular clock does not hold for the here studied invertebrate groups. This thesis emphasizes that molecular techniques relying on mutation rates have to be applied with caution. Whereas they may work satisfactorily under certain conditions for specific taxa, they may fail for others. The molecular clock as well as DNA barcoding should incorporate all the information and data available to obtain comprehensive estimations of the existing biodiversity and its evolutionary history.
Resumo:
The systemic autoinflammatory disorders are a group of rare diseases characterized by periodically recurring episodes of acute inflammation and a rise in serum acute phase proteins, but with no signs of autoimmunity. At present eight hereditary syndromes are categorized as autoinflammatory, although the definition has also occasionally been extended to other inflammatory disorders, such as Crohn s disease. One of the autoinflammatory disorders is the autosomally dominantly inherited tumour necrosis factor receptor-associated periodic syndrome (TRAPS), which is caused by mutations in the gene encoding the tumour necrosis factor type 1 receptor (TNFRSF1A). In patients of Nordic descent, cases of TRAPS and of three other hereditary fevers, hyperimmunoglobulinemia D with periodic fever syndrome (HIDS), chronic infantile neurologic, cutaneous and articular syndrome (CINCA) and familial cold autoinflammatory syndrome (FCAS), have been reported, TRAPS being the most common of the four. Clinical characteristics of TRAPS are recurrent attacks of high spiking fever, associated with inflammation of serosal membranes and joints, myalgia, migratory rash and conjunctivitis or periorbital cellulitis. Systemic AA amyloidosis may occur as a sequel of the systemic inflammation. The aim of this study was to investigate the genetic background of hereditary periodically occurring fever syndromes in Finnish patients, to explore the reliability of determining serum concentrations of soluble TNFRSF1A and metalloproteinase-induced TNFRSF1A shedding as helpful tools in differential diagnostics, as well as to study intracellular NF-κB signalling in an attempt to widen the knowledge of the pathomechanisms underlying TRAPS. Genomic sequencing revealed two novel TNFRSF1A mutations, F112I and C73R, in two Finnish families. F112I was the first TNFRSF1A mutation to be reported in the third extracellular cysteine-rich domain of the gene and C73R was the third novel mutation to be reported in a Finnish family, with only one other TNFRSF1A mutation having been reported in the Nordic countries. We also presented a differential diagnostic problem in a TRAPS patient, emphasizing for the clinician the importance of differential diagnostic vigiliance in dealing with rare hereditary disorders. The underlying genetic disease of the patient both served as a misleading factor, which possibly postponed arrival at the correct diagnosis, but may also have predisposed to the pathologic condition, which led to a critical state of the patient. Using a method of flow cytometric analysis modified for the use on fresh whole blood, we studied intracellular signalling pathways in three Finnish TRAPS families with the F112I, C73R and the previously reported C88Y mutations. Evaluation of TNF-induced phosphorylation of NF-κB and p38, revealed low phosphorylation profiles in nine out of ten TRAPS patients in comparison to healthy control subjects. This study shows that TRAPS is a diagnostic possibility in patients of Nordic descent, with symptoms of periodically recurring fever and inflammation of the serosa and joints. In particular in the case of a family history of febrile episodes, the possibility of TRAPS should be considered, if an etiology of autoimmune or infectious nature is excluded. The discovery of three different mutations in a population as small as the Finnish, reinforces the notion that the extracellular domain of TNFRSF1A is prone to be mutated at the entire stretch of its cysteine-rich domains and not only at a limited number of sites, suggesting the absence of a founder effect in TRAPS. This study also demonstrates the challenges of clinical work in differentiating the symptoms of rare genetic disorders from those of other pathologic conditions and presents the possibility of an autoinflammatory disorder as being the underlying cause of severe clinical complications. Furthermore, functional studies of fresh blood leukocytes show that TRAPS is often associated with a low NF-κB and p38 phosphorylation profile, although low phosphorylation levels are not a requirement for the development of TRAPS. The aberrant signalling would suggest that the hyperinflammatory phenotype of TRAPS is the result of compensatory NF-κB-mediated regulatory mechanisms triggered by a deficiency of the innate immune response.
Resumo:
This study investigates the role of social media as a form of organizational knowledge sharing. Social media is investigated in terms of the Web 2.0 technologies that organizations provide their employees as tools of internal communication. This study is anchored in the theoretical understanding of social media as technologies which enable both knowledge collection and knowledge donation. This study investigates the factors influencing employees’ use of social media in their working environment. The study presents the multidisciplinary research tradition concerning knowledge sharing. Social media is analyzed especially in relation to internal communication and knowledge sharing. Based on previous studies, it is assumed that personal, organizational, and technological factors influence employees’ use of social media in their working environment. The research represents a case study focusing on the employees of the Finnish company Wärtsilä. Wärtsilä represents an eligible case organization for this study given that it puts in use several Web 2.0 tools in its intranet. The research is based on quantitative methods. In total 343 answers were obtained with the aid of an online survey which was available in Wärtsilä’s intranet. The associations between the variables are analyzed with the aid of correlations. Finally, with the aid of multiple linear regression analysis the causality between the assumed factors and the use of social media is tested. The analysis demonstrates that personal, organizational and technological factors influence the respondents’ use of social media. As strong predictive variables emerge the benefits that respondents expect to receive from using social media and respondents’ experience in using Web 2.0 in their private lives. Also organizational factors such as managers’ and colleagues’ activeness and organizational guidelines for using social media form a causal relationship with the use of social media. In addition, respondents’ understanding of their responsibilities affects their use of social media. The more social media is considered as a part of individual responsibilities, the more frequently social media is used. Finally, technological factors must be recognized. The more user-friendly social media tools are considered and the better technical skills respondents have, the more frequently social media is used in the working environment. The central references in relation to knowledge sharing include Chun Wei Choo’s (2006) work Knowing Organization, Ikujiro Nonaka and Hirotaka Takeuchi’s (1995) work The Knowledge Creating Company and Linda Argote’s (1999) work Organizational Learning.
Resumo:
Physics teachers are in a key position to form the attitudes and conceptions of future generations toward science and technology, as well as to educate future generations of scientists. Therefore, good teacher education is one of the key areas of physics departments education program. This dissertation is a contribution to the research-based development of high quality physics teacher education, designed to meet three central challenges of good teaching. The first challenge relates to the organization of physics content knowledge. The second challenge, connected to the first one, is to understand the role of experiments and models in (re)constructing the content knowledge of physics for purposes of teaching. The third challenge is to provide for pre-service physics teachers opportunities and resources for reflecting on or assessing their knowledge and experience about physics and physics education. This dissertation demonstrates how these challenges can be met when the content knowledge of physics, the relevant epistemological aspects of physics and the pedagogical knowledge of teaching and learning physics are combined. The theoretical part of this dissertation is concerned with designing two didactical reconstructions for purposes of physics teacher education: the didactical reconstruction of processes (DRoP) and the didactical reconstruction of structures (DRoS). This part starts with taking into account the required professional competencies of physics teachers, the pedagogical aspects of teaching and learning, and the benefits of the graphical ways of representing knowledge. Then it continues with the conceptual and philosophical analysis of physics, especially with the analysis of experiments and models role in constructing knowledge. This analysis is condensed in the form of the epistemological reconstruction of knowledge justification. Finally, these two parts are combined in the designing and production of the DRoP and DRoS. The DRoP captures the knowledge formation of physical concepts and laws in concise and simplified form while still retaining authenticity from the processes of how concepts have been formed. The DRoS is used for representing the structural knowledge of physics, the connections between physical concepts, quantities and laws, to varying extents. Both DRoP and DRoS are represented in graphical form by means of flow charts consisting of nodes and directed links connecting the nodes. The empirical part discusses two case studies that show how the three challenges are met through the use of DRoP and DRoS and how the outcomes of teaching solutions based on them are evaluated. The research approach is qualitative; it aims at the in-depth evaluation and understanding about the usefulness of the didactical reconstructions. The data, which were collected from the advanced course for prospective physics teachers during 20012006, consisted of DRoP and DRoS flow charts made by students and student interviews. The first case study discusses how student teachers used DRoP flow charts to understand the process of forming knowledge about the law of electromagnetic induction. The second case study discusses how student teachers learned to understand the development of physical quantities as related to the temperature concept by using DRoS flow charts. In both studies, the attention is focused on the use of DRoP and DRoS to organize knowledge and on the role of experiments and models in this organization process. The results show that students understanding about physics knowledge production improved and their knowledge became more organized and coherent. It is shown that the flow charts and the didactical reconstructions behind them had an important role in gaining these positive learning results. On the basis of the results reported here, the designed learning tools have been adopted as a standard part of the teaching solutions used in the physics teacher education courses in the Department of Physics, University of Helsinki.
Resumo:
Gene mapping is a systematic search for genes that affect observable characteristics of an organism. In this thesis we offer computational tools to improve the efficiency of (disease) gene-mapping efforts. In the first part of the thesis we propose an efficient simulation procedure for generating realistic genetical data from isolated populations. Simulated data is useful for evaluating hypothesised gene-mapping study designs and computational analysis tools. As an example of such evaluation, we demonstrate how a population-based study design can be a powerful alternative to traditional family-based designs in association-based gene-mapping projects. In the second part of the thesis we consider a prioritisation of a (typically large) set of putative disease-associated genes acquired from an initial gene-mapping analysis. Prioritisation is necessary to be able to focus on the most promising candidates. We show how to harness the current biomedical knowledge for the prioritisation task by integrating various publicly available biological databases into a weighted biological graph. We then demonstrate how to find and evaluate connections between entities, such as genes and diseases, from this unified schema by graph mining techniques. Finally, in the last part of the thesis, we define the concept of reliable subgraph and the corresponding subgraph extraction problem. Reliable subgraphs concisely describe strong and independent connections between two given vertices in a random graph, and hence they are especially useful for visualising such connections. We propose novel algorithms for extracting reliable subgraphs from large random graphs. The efficiency and scalability of the proposed graph mining methods are backed by extensive experiments on real data. While our application focus is in genetics, the concepts and algorithms can be applied to other domains as well. We demonstrate this generality by considering coauthor graphs in addition to biological graphs in the experiments.