9 resultados para Annotation

em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Baltic Sea is a unique environment that contains unique genetic populations. In order to study these populations on a genetic level basic molecular research is needed. The aim of this thesis was to provide a basic genetic resource for population genomic studies by de novo assembling a transcriptome for the Baltic Sea isopod Idotea balthica. RNA was extracted from a whole single adult male isopod and sequenced using Illumina (125bp PE) RNA-Seq. The reads were preprocessed using FASTQC for quality control, TRIMMOMATIC for trimming, and RCORRECTOR for error correction. The preprocessed reads were then assembled with TRINITY, a de Bruijn graph-based assembler, using different k-mer sizes. The different assemblies were combined and clustered using CD-HIT. The assemblies were evaluated using TRANSRATE for quality and filtering, BUSCO for completeness, and TRANSDECODER for annotation potential. The 25-mer assembly was annotated using PANNZER (protein annotation with z-score) and BLASTX. The 25-mer assembly represents the best first draft assembly since it contains the most information. However, this assembly shows high levels of polymorphism, which currently cannot be differentiated as paralogs or allelic variants. Furthermore, this assembly is incomplete, which could be improved by sampling additional developmental stages.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Relationship between organisms within an ecosystem is one of the main focuses in the study of ecology and evolution. For instance, host-parasite interactions have long been under close interest of ecology, evolutionary biology and conservation science, due to great variety of strategies and interaction outcomes. The monogenean ecto-parasites consist of a significant portion of flatworms. Gyrodactylus salaris is a monogenean freshwater ecto-parasite of Atlantic salmon (Salmo salar) whose damage can make fish to be prone to further bacterial and fungal infections. G. salaris is the only one parasite whose genome has been studied so far. The RNA-seq data analyzed in this thesis has already been annotated by using LAST. The RNA-seq data was obtained from Illumina sequencing i.e. yielded reads were assembled into 15777 transcripts. Last resulted in annotation of 46% transcripts and remaining were left unknown. This thesis work was started with whole data and annotation process was continued by the use of PANNZER, CDD and InterProScan. This annotation resulted in 56% successfully annotated sequences having parasite specific proteins identified. This thesis represents the first of Monogenean transcriptomic information which gives an important source for further research on this specie. Additionally, comparison of annotation methods interestingly revealed that description and domain based methods perform better than simple similarity search methods. Therefore it is more likely to suggest the use of these tools and databases for functional annotation. These results also emphasize the need for use of multiple methods and databases. It also highlights the need of more genomic information related to G. salaris.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Biomedical research is currently facing a new type of challenge: an excess of information, both in terms of raw data from experiments and in the number of scientific publications describing their results. Mirroring the focus on data mining techniques to address the issues of structured data, there has recently been great interest in the development and application of text mining techniques to make more effective use of the knowledge contained in biomedical scientific publications, accessible only in the form of natural human language. This thesis describes research done in the broader scope of projects aiming to develop methods, tools and techniques for text mining tasks in general and for the biomedical domain in particular. The work described here involves more specifically the goal of extracting information from statements concerning relations of biomedical entities, such as protein-protein interactions. The approach taken is one using full parsing—syntactic analysis of the entire structure of sentences—and machine learning, aiming to develop reliable methods that can further be generalized to apply also to other domains. The five papers at the core of this thesis describe research on a number of distinct but related topics in text mining. In the first of these studies, we assessed the applicability of two popular general English parsers to biomedical text mining and, finding their performance limited, identified several specific challenges to accurate parsing of domain text. In a follow-up study focusing on parsing issues related to specialized domain terminology, we evaluated three lexical adaptation methods. We found that the accurate resolution of unknown words can considerably improve parsing performance and introduced a domain-adapted parser that reduced the error rate of theoriginal by 10% while also roughly halving parsing time. To establish the relative merits of parsers that differ in the applied formalisms and the representation given to their syntactic analyses, we have also developed evaluation methodology, considering different approaches to establishing comparable dependency-based evaluation results. We introduced a methodology for creating highly accurate conversions between different parse representations, demonstrating the feasibility of unification of idiverse syntactic schemes under a shared, application-oriented representation. In addition to allowing formalism-neutral evaluation, we argue that such unification can also increase the value of parsers for domain text mining. As a further step in this direction, we analysed the characteristics of publicly available biomedical corpora annotated for protein-protein interactions and created tools for converting them into a shared form, thus contributing also to the unification of text mining resources. The introduced unified corpora allowed us to perform a task-oriented comparative evaluation of biomedical text mining corpora. This evaluation established clear limits on the comparability of results for text mining methods evaluated on different resources, prompting further efforts toward standardization. To support this and other research, we have also designed and annotated BioInfer, the first domain corpus of its size combining annotation of syntax and biomedical entities with a detailed annotation of their relationships. The corpus represents a major design and development effort of the research group, with manual annotation that identifies over 6000 entities, 2500 relationships and 28,000 syntactic dependencies in 1100 sentences. In addition to combining these key annotations for a single set of sentences, BioInfer was also the first domain resource to introduce a representation of entity relations that is supported by ontologies and able to capture complex, structured relationships. Part I of this thesis presents a summary of this research in the broader context of a text mining system, and Part II contains reprints of the five included publications.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Tässä tutkimuksessa toteutettiin uusi versio aikaisemmin tuotetusta työkalusta merkintöjen tekemiseksi pääasiassa silmänpohjakuviin. Tarkoituksena oli toteuttaa kuvankäsittelyyn perustuvia aputoimintoja kuvien valaistuksenkorjaamiseksi, sekä korostaa lääkärille mahdollisia diabeettiseen retinopatiaan kuuluvia löydöksiä. Kuvien annotoinnin helpottamiseksi toteutettiin kaksi menetelmää valaistuksenkorjaamiseksi: yksiulotteinen käyrämenetelmä sekä värikanavien ominaisuuksia hyödyntävä menetelmä. Kuvien annotoinin helpottamiseksi toteutettiin kuvan vihreän kanavan jakaumaan perustuva aputoiminto, joka pyrkii korostamaan mahdollisia diabeettiseen retinopatiaan kuuluvia löydöksiä.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Poster at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Diplomityön tavoitteena oli kehittää Dust Control Systems Oy:n nykyistä suunnittelujärjestelmää yrityksen toiminnan laadun ja suunnittelun tehokkuuden parantamiseksi. Suunnittelun suurin pullonkaula on valmistusdokumenttien tuottaminen, joten keskeinen tehtävä oli dokumenttien laadun parantaminen ja tuottamisen nopeuttaminen. Lisäksi tärkeää oli selkeyttää nykyistä nimikkeiden hallintaa ja ylläpitoa. Kehitystyö keskittyi suurelta osin suunnittelujärjestelmän ohjelmistoympäristön päivittämiseen ja sen muokkaamiseen. Teoriaosuudessa käsiteltiin nykyaikaista suunnittelua ja sen kehittämismahdollisuuksia. Lisäksi tarkasteltiin suunnittelun automatisoinnin tuomia hyötyjä sekä erilaisia tuotannon tehostamismenetelmiä. Kehitysosiossa tutkittiin yrityksen nykyistä suunnittelujärjestelmää ja ohjelmistoympäristöä, tuotteiden asettamia vaatimuksia sekä järjestelmän puutteita. Näihin liittyi suuresti yhteistyö henkilöstön sekä ohjelmistotoimittajan kanssa. Tutkimuksen perusteella ryhdyttiin kehittämään uutta järjestelmäkokonaisuutta, johon liitettiin tarvittavia ohjelmistoja ja päivityksiä. Käyttöönottoa varten ohjelmistot konfiguroitiin yrityksen tarpeita vastaaviksi. Työn tuloksena saatiin kehittyneempi ja nykyaikaisempi suunnittelujärjestelmä ja ohjelmistoympäristö. Suunnittelijoiden työtaakkaa saadaan kevennettyä toistuvien työvaiheiden pidemmälle viedyllä automatisoinnilla. Uudistettu ohjelmistoympäristö luo osaltaan ehtoja ja sääntöjä, jotta virheitä syntyy yhä vähemmän. Lisäksi suunnittelun läpimenoaikaa saadaan lyhennettyä parannetulla dokumenttien tuottamisella.