34 resultados para Automated extraction
em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland
Resumo:
Decrease in microbial contacts in affluent societies is considered to lie behind the rise in allergic and other chronic inflammatory diseases during the last decades. Indeed, deviations in the intestinal microbiota composition and diversity have been associated with several diseases, such as atopic eczema. However, there is no consensus yet on what would constitute a beneficial or harmful microbiota. The aim of this thesis was to study the microbiota development in healthy infants and to characterize intestinal microbiota signatures associated with disease status and severity in infants with atopic eczema. The methodological aim was to compare and optimize methods for DNA extraction from fecal samples to be used in high-throughput microbiota analyses. It was confirmed that the most critical step in successful microbial DNA extraction from fecal samples is the mechanical cell lysis procedure. Based on this finding, an efficient semi-automated extraction process was developed that can be scaled for use in high-throughput platforms such as phylogenetic microarray used in this series of studies. By analyzing a longitudinal motherchild cohort for 3 years it was observed that the microbiota development is a gradual process, where some bacterial groups reach the degree of adult-type pattern earlier than others. During the breast-feeding period, the microbiota appeared to be relatively simple, while major diversification was found to start during the weaning process. By the age of 3 years, the child’s microbiota composition started to resemble that of an adult, but the bacterial diversity has still not reached the full diversity, indicating that the microbiota maturation extends beyond this age. In addition, at three years of age, the child’s microbiota was more similar to mother’s microbiota than to microbiota of nonrelated women.In infants with atopic eczema, a high total microbiota diversity and abundance of butyrate-producing bacteria was found to correlate with mild symptoms at 6 months. At 18 months, infants with mild eczema had significantly higher microbiota diversity and aberrant microbiota composition when compared to healthy controls at the same age. In conclusion, the comprehensive phylogenetic microarray analysis of early life microbiota shows the synergetic effect of vertical transmission and shared environment on the intestinal microbiota development. By the age of three years, the compositional development of intestinal microbiota is close to adult level, but the microbiota diversification continues beyond this age. In addition, specific microbiota signatures are associated with the existence and severity of atopic eczema and intestinal microbiota seems to have a role in alleviating the symptoms of this disease.
Resumo:
Biomedical natural language processing (BioNLP) is a subfield of natural language processing, an area of computational linguistics concerned with developing programs that work with natural language: written texts and speech. Biomedical relation extraction concerns the detection of semantic relations such as protein-protein interactions (PPI) from scientific texts. The aim is to enhance information retrieval by detecting relations between concepts, not just individual concepts as with a keyword search. In recent years, events have been proposed as a more detailed alternative for simple pairwise PPI relations. Events provide a systematic, structural representation for annotating the content of natural language texts. Events are characterized by annotated trigger words, directed and typed arguments and the ability to nest other events. For example, the sentence “Protein A causes protein B to bind protein C” can be annotated with the nested event structure CAUSE(A, BIND(B, C)). Converted to such formal representations, the information of natural language texts can be used by computational applications. Biomedical event annotations were introduced by the BioInfer and GENIA corpora, and event extraction was popularized by the BioNLP'09 Shared Task on Event Extraction. In this thesis we present a method for automated event extraction, implemented as the Turku Event Extraction System (TEES). A unified graph format is defined for representing event annotations and the problem of extracting complex event structures is decomposed into a number of independent classification tasks. These classification tasks are solved using SVM and RLS classifiers, utilizing rich feature representations built from full dependency parsing. Building on earlier work on pairwise relation extraction and using a generalized graph representation, the resulting TEES system is capable of detecting binary relations as well as complex event structures. We show that this event extraction system has good performance, reaching the first place in the BioNLP'09 Shared Task on Event Extraction. Subsequently, TEES has achieved several first ranks in the BioNLP'11 and BioNLP'13 Shared Tasks, as well as shown competitive performance in the binary relation Drug-Drug Interaction Extraction 2011 and 2013 shared tasks. The Turku Event Extraction System is published as a freely available open-source project, documenting the research in detail as well as making the method available for practical applications. In particular, in this thesis we describe the application of the event extraction method to PubMed-scale text mining, showing how the developed approach not only shows good performance, but is generalizable and applicable to large-scale real-world text mining projects. Finally, we discuss related literature, summarize the contributions of the work and present some thoughts on future directions for biomedical event extraction. This thesis includes and builds on six original research publications. The first of these introduces the analysis of dependency parses that leads to development of TEES. The entries in the three BioNLP Shared Tasks, as well as in the DDIExtraction 2011 task are covered in four publications, and the sixth one demonstrates the application of the system to PubMed-scale text mining.
Resumo:
Selostus: Alumiini- ja rautaoksidien fosforikyllästysasteen arvioiminen suomalaisista peltomaista
Resumo:
Tässä diplomityössä tutkitaan automatisoitua testausta ja käyttöliittymätestauksen tekemistä helpommaksi Symbian-käyttöjärjestelmässä. Työssä esitellään Symbian ja Symbian-sovelluskehityksessä kohdattavia haasteita. Lisäksi kerrotaan testausstrategioista ja -tavoista sekä automatisoidusta testaamisesta. Lopuksi esitetään työkalu, jolla testitapausten luominen toiminnalisuus- ja järjestelmätestaukseen tehdään helpommaksi. Graafiset käyttöliittymättuovat ainutlaatuisia haasteita ohjelmiston testaamiseen. Ne tehdään usein monimutkaisista komponenteista ja niitä suunnitellaan jatkuvasti uusiksi ohjelmistokehityksen aikana. Graafisten käyttöliittymien testaukseen käytetään usein kaappaus- ja toistotyökaluja. Käyttöliittymätestauksen testitapausten suunnittelu ja toteutus vaatii paljon panostusta. Koska graafiset käyttöliittymät muodostavat suuren osan koodista, voitaisiin säästää paljon resursseja tekemällä testitapausten luomisesta helpompaa. Käytännön osuudessa toteutettu projekti pyrkii tähän tekemällä testiskriptien luomisesta visuaalista. Näin ollen itse testien skriptikieltä ei tarvitse ymmärtää ja testien hahmottaminen on myös helpompaa.
Resumo:
Tässä diplomityössä esitellään ohjelmistotestauksen ja verifioinnin yleisiä periaatteita sekä käsitellään tarkemmin älypuhelinohjelmistojen verifiointia. Työssä esitellään myös älypuhelimissa käytettävä Symbian-käyttöjärjestelmä. Työn käytännön osuudessa suunniteltiin ja toteutettiin Symbian-käyttöjärjestelmässä toimiva palvelin, joka tarkkailee ja tallentaa järjestelmäresurssien käyttöä. Verifiointi on tärkeä ja kuluja aiheuttava tehtävä älypuhelinohjelmistojen kehityssyklissä. Kuluja voidaan vähentää automatisoimalla osa verifiointiprosessista. Toteutettu palvelin automatisoijärjestelmäresurssien tarkkailun tallentamalla tietoja niistä tiedostoon testien ajon aikana. Kun testit ajetaan uudestaan, uusia tuloksia vertaillaan lähdetallenteeseen. Jos tulokset eivät ole käyttäjän asettamien virherajojen sisällä, siitä ilmoitetaan käyttäjälle. Virherajojen ja lähdetallenteen määrittäminen saattaa osoittautua vaikeaksi. Kuitenkin, jos ne määritetään sopivasti, palvelin tuottaa hyödyllistä tietoa poikkeamista järjestelmäresurssien kulutuksessa testaajille.
Resumo:
Tässä diplomityössä tutkitaan tekniikoita, joillavesileima lisätään spektrikuvaan, ja menetelmiä, joilla vesileimat tunnistetaanja havaitaan spektrikuvista. PCA (Principal Component Analysis) -algoritmia käyttäen alkuperäisten kuvien spektriulottuvuutta vähennettiin. Vesileiman lisääminen spektrikuvaan suoritettiin muunnosavaruudessa. Ehdotetun mallin mukaisesti muunnosavaruuden komponentti korvattiin vesileiman ja toisen muunnosavaruuden komponentin lineaarikombinaatiolla. Lisäyksessä käytettävää parametrijoukkoa tutkittiin. Vesileimattujen kuvien laatu mitattiin ja analysoitiin. Suositukset vesileiman lisäykseen esitettiin. Useita menetelmiä käytettiin vesileimojen tunnistamiseen ja tunnistamisen tulokset analysoitiin. Vesileimojen kyky sietää erilaisia hyökkäyksiä tarkistettiin. Diplomityössä suoritettiin joukko havaitsemis-kokeita ottamalla huomioon vesileiman lisäyksessä käytetyt parametrit. ICA (Independent Component Analysis) -menetelmää pidetään yhtenä mahdollisena vaihtoehtona vesileiman havaitsemisessa.
Resumo:
Over 70% of the total costs of an end product are consequences of decisions that are made during the design process. A search for optimal cross-sections will often have only a marginal effect on the amount of material used if the geometry of a structure is fixed and if the cross-sectional characteristics of its elements are property designed by conventional methods. In recent years, optimalgeometry has become a central area of research in the automated design of structures. It is generally accepted that no single optimisation algorithm is suitable for all engineering design problems. An appropriate algorithm, therefore, mustbe selected individually for each optimisation situation. Modelling is the mosttime consuming phase in the optimisation of steel and metal structures. In thisresearch, the goal was to develop a method and computer program, which reduces the modelling and optimisation time for structural design. The program needed anoptimisation algorithm that is suitable for various engineering design problems. Because Finite Element modelling is commonly used in the design of steel and metal structures, the interaction between a finite element tool and optimisation tool needed a practical solution. The developed method and computer programs were tested with standard optimisation tests and practical design optimisation cases. Three generations of computer programs are developed. The programs combine anoptimisation problem modelling tool and FE-modelling program using three alternate methdos. The modelling and optimisation was demonstrated in the design of a new boom construction and steel structures of flat and ridge roofs. This thesis demonstrates that the most time consuming modelling time is significantly reduced. Modelling errors are reduced and the results are more reliable. A new selection rule for the evolution algorithm, which eliminates the need for constraint weight factors is tested with optimisation cases of the steel structures that include hundreds of constraints. It is seen that the tested algorithm can be used nearly as a black box without parameter settings and penalty factors of the constraints.
Resumo:
Perceiving the world visually is a basic act for humans, but for computers it is still an unsolved problem. The variability present innatural environments is an obstacle for effective computer vision. The goal of invariant object recognition is to recognise objects in a digital image despite variations in, for example, pose, lighting or occlusion. In this study, invariant object recognition is considered from the viewpoint of feature extraction. Thedifferences between local and global features are studied with emphasis on Hough transform and Gabor filtering based feature extraction. The methods are examined with respect to four capabilities: generality, invariance, stability, and efficiency. Invariant features are presented using both Hough transform and Gabor filtering. A modified Hough transform technique is also presented where the distortion tolerance is increased by incorporating local information. In addition, methods for decreasing the computational costs of the Hough transform employing parallel processing and local information are introduced.
Resumo:
In this thesis author approaches the problem of automated text classification, which is one of basic tasks for building Intelligent Internet Search Agent. The work discusses various approaches to solving sub-problems of automated text classification, such as feature extraction and machine learning on text sources. Author also describes her own multiword approach to feature extraction and pres-ents the results of testing this approach using linear discriminant analysis based classifier, and classifier combining unsupervised learning for etalon extraction with supervised learning using common backpropagation algorithm for multilevel perceptron.
Resumo:
In this study we used market settlement prices of European call options on stock index futures to extract implied probability distribution function (PDF). The method used produces a PDF of returns of an underlying asset at expiration date from implied volatility smile. With this method, the assumption of lognormal distribution (Black-Scholes model) is tested. The market view of the asset price dynamics can then be used for various purposes (hedging, speculation). We used the so called smoothing approach for implied PDF extraction presented by Shimko (1993). In our analysis we obtained implied volatility smiles from index futures markets (S&P 500 and DAX indices) and standardized them. The method introduced by Breeden and Litzenberger (1978) was then used on PDF extraction. The results show significant deviations from the assumption of lognormal returns for S&P500 options while DAX options mostly fit the lognormal distribution. A deviant subjective view of PDF can be used to form a strategy as discussed in the last section.
Resumo:
Current-day web search engines (e.g., Google) do not crawl and index a significant portion of theWeb and, hence, web users relying on search engines only are unable to discover and access a large amount of information from the non-indexable part of the Web. Specifically, dynamic pages generated based on parameters provided by a user via web search forms (or search interfaces) are not indexed by search engines and cannot be found in searchers’ results. Such search interfaces provide web users with an online access to myriads of databases on the Web. In order to obtain some information from a web database of interest, a user issues his/her query by specifying query terms in a search form and receives the query results, a set of dynamic pages that embed required information from a database. At the same time, issuing a query via an arbitrary search interface is an extremely complex task for any kind of automatic agents including web crawlers, which, at least up to the present day, do not even attempt to pass through web forms on a large scale. In this thesis, our primary and key object of study is a huge portion of the Web (hereafter referred as the deep Web) hidden behind web search interfaces. We concentrate on three classes of problems around the deep Web: characterization of deep Web, finding and classifying deep web resources, and querying web databases. Characterizing deep Web: Though the term deep Web was coined in 2000, which is sufficiently long ago for any web-related concept/technology, we still do not know many important characteristics of the deep Web. Another matter of concern is that surveys of the deep Web existing so far are predominantly based on study of deep web sites in English. One can then expect that findings from these surveys may be biased, especially owing to a steady increase in non-English web content. In this way, surveying of national segments of the deep Web is of interest not only to national communities but to the whole web community as well. In this thesis, we propose two new methods for estimating the main parameters of deep Web. We use the suggested methods to estimate the scale of one specific national segment of the Web and report our findings. We also build and make publicly available a dataset describing more than 200 web databases from the national segment of the Web. Finding deep web resources: The deep Web has been growing at a very fast pace. It has been estimated that there are hundred thousands of deep web sites. Due to the huge volume of information in the deep Web, there has been a significant interest to approaches that allow users and computer applications to leverage this information. Most approaches assumed that search interfaces to web databases of interest are already discovered and known to query systems. However, such assumptions do not hold true mostly because of the large scale of the deep Web – indeed, for any given domain of interest there are too many web databases with relevant content. Thus, the ability to locate search interfaces to web databases becomes a key requirement for any application accessing the deep Web. In this thesis, we describe the architecture of the I-Crawler, a system for finding and classifying search interfaces. Specifically, the I-Crawler is intentionally designed to be used in deepWeb characterization studies and for constructing directories of deep web resources. Unlike almost all other approaches to the deep Web existing so far, the I-Crawler is able to recognize and analyze JavaScript-rich and non-HTML searchable forms. Querying web databases: Retrieving information by filling out web search forms is a typical task for a web user. This is all the more so as interfaces of conventional search engines are also web forms. At present, a user needs to manually provide input values to search interfaces and then extract required data from the pages with results. The manual filling out forms is not feasible and cumbersome in cases of complex queries but such kind of queries are essential for many web searches especially in the area of e-commerce. In this way, the automation of querying and retrieving data behind search interfaces is desirable and essential for such tasks as building domain-independent deep web crawlers and automated web agents, searching for domain-specific information (vertical search engines), and for extraction and integration of information from various deep web resources. We present a data model for representing search interfaces and discuss techniques for extracting field labels, client-side scripts and structured data from HTML pages. We also describe a representation of result pages and discuss how to extract and store results of form queries. Besides, we present a user-friendly and expressive form query language that allows one to retrieve information behind search interfaces and extract useful data from the result pages based on specified conditions. We implement a prototype system for querying web databases and describe its architecture and components design.
Resumo:
The amphiphilic nature of metal extractants causes the formation of micelles and other microscopic aggregates when in contact with water and an organic diluent. These phenomena and their effects on metal extraction were studied using carboxylic acid (Versatic 10) and organophosphorus acid (Cyanex 272) based extractants. Special emphasis was laid on the study of phase behaviour in a pre neutralisation stage when the extractant is transformed to a sodium or ammonium salt form. The pre neutralised extractants were used to extract nickel and to separate cobalt and nickel. Phase diagrams corresponding to the pre neutralisation stage in a metal extraction process were determined. The maximal solubilisation of the components in the system water(NH3)/extractant/isooctane takes place when the molar ratio between the ammonia salt form and the free form of the extractant is 0.5 for the carboxylic acid and 1 for the organophosphorus acid extractant. These values correspond to the complex stoichiometry of NH4A•HA and NIi4A, respectively. When such a solution is contacted with water a microemulsion is formed. If the aqueous phase contains also metal ions (e.g. Ni²+), complexation will take place on the microscopic interface of the micellar aggregates. Experimental evidence showing that the initial stage of nickel extraction with pre neutralised Versatic 10 is a fast pseudohomogeneous reaction was obtained. About 90% of the metal were extracted in the first 15 s after the initial contact. For nickel extraction with pre neutralised Versatic 10 it was found that the highest metal loading and the lowest residual ammonia and water contents in the organic phase are achieved when the feeds are balanced so that the stoichiometry is 2NH4+(org) = Nit2+(aq). In the case of Co/Ni separation using pre neutralised Cyanex 272 the highest separation is achieved when the Co/extractant molar ratio in the feeds is 1 : 4 and at the same time the optimal degree of neutralisation of the Cyanex 272 is about 50%. The adsorption of the extractants on solid surfaces may cause accumulation of solid fine particles at the interface between the aqueous and organic phases in metal extraction processes. Copper extraction processes are known to suffer of this problem. Experiments were carried out using model silica and mica particles. It was found that high copper loading, aromacity of the diluent, modification agents and the presence of aqueous phase decrease the adsorption of the hydroxyoxime on silica surfaces.
Resumo:
Liquid-liquid extraction is a mass transfer process for recovering the desired components from the liquid streams by contacting it to non-soluble liquid solvent. Literature part of this thesis deals with theory of the liquid-liquid extraction and the main steps of the extraction process design. The experimental part of this thesis investigates the extraction of organic acids from aqueous solution. The aim was to find the optimal solvent for recovering the organic acids from aqueous solutions. The other objective was to test the selected solvent in pilot scale with packed column and compare the effectiveness of the structured and the random packing, the effect of dispersed phase selection and the effect of packing material wettability properties. Experiments showed that selected solvent works well with dilute organic acid solutions. The random packing proved to be more efficient than the structured packing due to higher hold-up of the dispersed phase. Dispersing the phase that is present in larger volume proved to more efficient. With the random packing the material that was wetted by the dispersed phase was more efficient due to higher hold-up of the dispersed phase. According the literature, the behavior is usually opposite.
Resumo:
This study presents an automatic, computer-aided analytical method called Comparison Structure Analysis (CSA), which can be applied to different dimensions of music. The aim of CSA is first and foremost practical: to produce dynamic and understandable representations of musical properties by evaluating the prevalence of a chosen musical data structure through a musical piece. Such a comparison structure may refer to a mathematical vector, a set, a matrix or another type of data structure and even a combination of data structures. CSA depends on an abstract systematic segmentation that allows for a statistical or mathematical survey of the data. To choose a comparison structure is to tune the apparatus to be sensitive to an exclusive set of musical properties. CSA settles somewhere between traditional music analysis and computer aided music information retrieval (MIR). Theoretically defined musical entities, such as pitch-class sets, set-classes and particular rhythm patterns are detected in compositions using pattern extraction and pattern comparison algorithms that are typical within the field of MIR. In principle, the idea of comparison structure analysis can be applied to any time-series type data and, in the music analytical context, to polyphonic as well as homophonic music. Tonal trends, set-class similarities, invertible counterpoints, voice-leading similarities, short-term modulations, rhythmic similarities and multiparametric changes in musical texture were studied. Since CSA allows for a highly accurate classification of compositions, its methods may be applicable to symbolic music information retrieval as well. The strength of CSA relies especially on the possibility to make comparisons between the observations concerning different musical parameters and to combine it with statistical and perhaps other music analytical methods. The results of CSA are dependent on the competence of the similarity measure. New similarity measures for tonal stability, rhythmic and set-class similarity measurements were proposed. The most advanced results were attained by employing the automated function generation – comparable with the so-called genetic programming – to search for an optimal model for set-class similarity measurements. However, the results of CSA seem to agree strongly, independent of the type of similarity function employed in the analysis.