9 resultados para Biology - Computer programs

em Helda - Digital Repository of University of Helsinki


Relevância:

80.00% 80.00%

Publicador:

Resumo:

The methodology of extracting information from texts has widely been described in the current literature. However, the methodology has been developed mainly for the purposes of other fields than terminology science. In addition, the research has been English language oriented. Therefore, there are no satisfactory language-independent methods for extracting terminological information from texts. The aim of the present study is to form the basis for a further improvement of methods for extraction of terminological information. A further aim is to determine differences in term extraction between subject groups with or without knowledge of the special field in question. The study is based on the theory of terminology, and has mainly a qualitative approach. The research material consists of electronically readable specialized texts in the subject domain of maritime safety. Textbooks, conference papers, research reports and articles from professional journals in Finnish and in Russian are included. The thesis first deals with certain term extraction methods. These are manual term identification and semi-automatic term extraction, the latter of which was carried out by using three commercial computer programs. The results of term extraction were compared and the recall and precision of the methods were evaluated. The latter part of the study is dedicated to the identification of concept relations. Certain linguistic expressions, which some researchers call knowledge probes, were applied to identify concept relations. The results of the present thesis suggest that special field knowledge is an advantage in manual term identification. However, in the candidate term lists the variation between subject groups was not as remarkable as it was between individual subjects. The term extraction software tested here produces candidate term lists which can be useful, but only after some manual work. Therefore, the work emphasizes the need to further develop term extraction software. Furthermore, the analyses indicate that there are a certain number of terms which were extracted by all the subjects and the software. These terms we call core terms. As the result of the experiment on linguistic expressions which signal concept relations, a proposal of Finnish and Russian knowledge probes in the field of maritime safety was made. The main finding was that it would be useful to combine the use of knowledge probes with semi-automatic term extraction since knowledge probes usually occur in the vicinity of terms.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A repetitive sequence collection is one where portions of a base sequence of length n are repeated many times with small variations, forming a collection of total length N. Examples of such collections are version control data and genome sequences of individuals, where the differences can be expressed by lists of basic edit operations. Flexible and efficient data analysis on a such typically huge collection is plausible using suffix trees. However, suffix tree occupies O(N log N) bits, which very soon inhibits in-memory analyses. Recent advances in full-text self-indexing reduce the space of suffix tree to O(N log σ) bits, where σ is the alphabet size. In practice, the space reduction is more than 10-fold, for example on suffix tree of Human Genome. However, this reduction factor remains constant when more sequences are added to the collection. We develop a new family of self-indexes suited for the repetitive sequence collection setting. Their expected space requirement depends only on the length n of the base sequence and the number s of variations in its repeated copies. That is, the space reduction factor is no longer constant, but depends on N / n. We believe the structures developed in this work will provide a fundamental basis for storage and retrieval of individual genomes as they become available due to rapid progress in the sequencing technologies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Mass spectrometry (MS) became a standard tool for identifying metabolites in biological tissues, and metabolomics is slowly acknowledged as a legitimate research discipline for characterizing biological conditions. The computational analyses of metabolomics, however, lag behind compared with the rapid advances in analytical aspects for two reasons. First is the lack of standardized data repository for mass spectra: each research institution is flooded with gigabytes of mass-spectral data from its own analytical groups and cannot host a world-class repository for mass spectra. The second reason is the lack of informatics experts that are fully experienced with spectral analyses. The two barriers must be overcome to establish a publicly free data server for MS analysis in metabolomics as does GenBank in genomics and UniProt in proteomics. The workshop brought together bioinformaticians working on mass spectral analyses in Finland and Japan with the goal to establish a consortium to freely exchange and publicize mass spectra of metabolites measured on various platforms computational tools to analyze spectra spectral knowledge that are computationally predicted from standardized data. This book contains the abstracts of the presentations given in the workshop. The programme of the workshop consisted of oral presentations from Japan and Finland, invited lectures from Steffen Neumann (Leibniz Institute of Plant Biochemistry), Matej Oresic (VTT), Merja Penttila (VTT) and Nicola Zamboni (ETH Zurich) as well as free form discussion among the participants. The event was funded by Academy of Finland (grants 139203 and 118653), Japan Society for the Promotion of Science (JSPS Japan-Finland Bilateral Semi- nar Program 2010) and Department of Computer Science University of Helsinki. We would like to thank all the people contributing to the technical pro- gramme and the sponsors for making the workshop possible. Helsinki, October 2010 Masanori Arita, Markus Heinonen and Juho Rousu

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In a max-min LP, the objective is to maximise ω subject to Ax ≤ 1, Cx ≥ ω1, and x ≥ 0 for nonnegative matrices A and C. We present a local algorithm (constant-time distributed algorithm) for approximating max-min LPs. The approximation ratio of our algorithm is the best possible for any local algorithm; there is a matching unconditional lower bound.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In a max-min LP, the objective is to maximise ω subject to Ax ≤ 1, Cx ≥ ω1, and x ≥ 0. In a min-max LP, the objective is to minimise ρ subject to Ax ≤ ρ1, Cx ≥ 1, and x ≥ 0. The matrices A and C are nonnegative and sparse: each row ai of A has at most ΔI positive elements, and each row ck of C has at most ΔK positive elements. We study the approximability of max-min LPs and min-max LPs in a distributed setting; in particular, we focus on local algorithms (constant-time distributed algorithms). We show that for any ΔI ≥ 2, ΔK ≥ 2, and ε > 0 there exists a local algorithm that achieves the approximation ratio ΔI (1 − 1/ΔK) + ε. We also show that this result is the best possible: no local algorithm can achieve the approximation ratio ΔI (1 − 1/ΔK) for any ΔI ≥ 2 and ΔK ≥ 2.