36 resultados para binary descriptor


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Diplomityön tarkoituksena on tutkia eri laskentamenetelmien soveltuvuutta kevyiden rikkiyhdisteiden laskentaan ja kuinka mitatusta kaasu-neste tasapainotiedoista sovitetut binääriset vuorovaikutusparametrit parantavat kaasu-neste tasapainojen laskentaa simuloinneissa. Kirjallisuusosassa paneudutaan kevyisiin rikkiyhdisteisiin ja niiden aineominaisuuksiin. Lisäksi käsitellään öljynjalostuksessa nykyisin käytettäviä ja uusia kehitteillä olevia rikinpoistomenetelmiä.Kokeellisessa osassa tarkastellaan eri laskentamenetelmien soveltuvuutta rikkiyhdisteiden ja kevyiden hiilivetyjen kaasu-neste tasapainon laskentaan. Mitatusta rikkiyhdisteiden ja hiilivetyjen kaasu-neste tasapainoista sovitetaan binäärisiä vuorovaikutusparametrejä tarkentamaan käytettäviä laskentamenetelmiä. Osassa verrataan binääristen seosten mittaustuloksia eri laskentamenetelmillä saatuihin simulointituloksiin. Tarkasteluiden perusteella tehdään johtopäätöksiä laskentamenetelmien soveltuvuudesta kevyiden hiilivetyjen ja rikkiyhdisteiden laskentaan. Tarkastellaan kahden prosessin (rikkivetystripperi ja butaaninpoistokolonni) rikkiyhdisteiden laskentaa. Prosesseille tehdään taseajot, joista saatuja analyysituloksia verrataan simulointien antamiin tuloksiin. Työssä tarkastellaan myös veden liukoisuuden laskentaa ja mahdollisten laskentamenetelmien käytön vaikutusta rikkiyhdisteiden laskentaan.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Diplomityön tarkoituksena on tarkastella isobuteenin dimeroitumisprosessin neste-nestetasapainoja. Tavoitteena on määrittää prosessissa esiintyvien komponenttien väliset neste-nestetasapainot. Työn kirjallisuusosassa on tarkasteltu neste-nestetasapainojen teoriaa. Erityisesti on tarkasteltu mittausmenetelmiä sekä kirjallisuudesta löytyneitä laitteistoja binääristen ja ternääristen systeemien neste-nestetasapainojen määritystä varten. Menetelmät ja laitteistot on esitetty erikseen matalassa ja korkeassa paineessa suoritettaville mittauksille. Lisäksi on tarkasteltu näytteenottoa sekä näytteiden analysointimenetelmiä. Kirjallisuusosassa on myös sivuttu kaasu-neste-nestetasapainojen määritystä, mutta työn varsinainen kohde on neste-nestetasapainojen määritys. Työn kokeellisessa osassa määritettiin iso-oktaaniprosessissa esiintyvien komponenttien välisiä binäärisiä ja ternäärisiä neste-nestetasapainoja. Mitattavien komponenttiparien määrää karsittiin ja jäljellejääneiden parien välillä suoritettavat mittaukset jaoteltiin matalassa ja korkeassa paineessa suoritettaviin määrityksiin. Ternääriset mittaukset tulivat kyseeseen sellaisten komponenttiparien kohdalla, joissa toisiinsa täysin liukenevien nesteiden systeemiin kolmatta komponenttia lisättäessä saatiin aikaiseksi kaksi nestefaasia. Tällaisesta mittaustiedosta voidaan määrittää neste-nestetasapainomallien parametrejä. Mittausten lisäksi kokeellisessa osassa tarkasteltiin näytteenottoa sekä näytteiden analysointia.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This thesis deals with distance transforms which are a fundamental issue in image processing and computer vision. In this thesis, two new distance transforms for gray level images are presented. As a new application for distance transforms, they are applied to gray level image compression. The new distance transforms are both new extensions of the well known distance transform algorithm developed by Rosenfeld, Pfaltz and Lay. With some modification their algorithm which calculates a distance transform on binary images with a chosen kernel has been made to calculate a chessboard like distance transform with integer numbers (DTOCS) and a real value distance transform (EDTOCS) on gray level images. Both distance transforms, the DTOCS and EDTOCS, require only two passes over the graylevel image and are extremely simple to implement. Only two image buffers are needed: The original gray level image and the binary image which defines the region(s) of calculation. No other image buffers are needed even if more than one iteration round is performed. For large neighborhoods and complicated images the two pass distance algorithm has to be applied to the image more than once, typically 3 10 times. Different types of kernels can be adopted. It is important to notice that no other existing transform calculates the same kind of distance map as the DTOCS. All the other gray weighted distance function, GRAYMAT etc. algorithms find the minimum path joining two points by the smallest sum of gray levels or weighting the distance values directly by the gray levels in some manner. The DTOCS does not weight them that way. The DTOCS gives a weighted version of the chessboard distance map. The weights are not constant, but gray value differences of the original image. The difference between the DTOCS map and other distance transforms for gray level images is shown. The difference between the DTOCS and EDTOCS is that the EDTOCS calculates these gray level differences in a different way. It propagates local Euclidean distances inside a kernel. Analytical derivations of some results concerning the DTOCS and the EDTOCS are presented. Commonly distance transforms are used for feature extraction in pattern recognition and learning. Their use in image compression is very rare. This thesis introduces a new application area for distance transforms. Three new image compression algorithms based on the DTOCS and one based on the EDTOCS are presented. Control points, i.e. points that are considered fundamental for the reconstruction of the image, are selected from the gray level image using the DTOCS and the EDTOCS. The first group of methods select the maximas of the distance image to new control points and the second group of methods compare the DTOCS distance to binary image chessboard distance. The effect of applying threshold masks of different sizes along the threshold boundaries is studied. The time complexity of the compression algorithms is analyzed both analytically and experimentally. It is shown that the time complexity of the algorithms is independent of the number of control points, i.e. the compression ratio. Also a new morphological image decompression scheme is presented, the 8 kernels' method. Several decompressed images are presented. The best results are obtained using the Delaunay triangulation. The obtained image quality equals that of the DCT images with a 4 x 4

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Streaming potential measurements for the surface charge characterisation of different filter media types and materials were used. The equipment was developed further so that measurements could be taken along the surfaces, and so that tubular membranes could also be measured. The streaming potential proved to be a very useful tool in the charge analysis of both clean and fouled filter media. Adsorption and fouling could be studied, as could flux, as functions of time. A module to determine the membrane potential was also constructed. The results collected from the experiments conducted with these devices were used in the study of the theory of streaming potential as an electrokinetic phenomenon. Several correction factors, which are derived to take into account the surface conductance and the electrokinetic flow in very narrow capillaries, were tested in practice. The surface materials were studied using FTIR and the results compared with those from the streaming potentials. FTIR analysis was also found to be a useful tool in the characterisation of filters, as well as in the fouling studies. Upon examination of the recorded spectra from different depths in a sample it was possible to determine the adsorption sites. The influence of an external electric field on the cross flow microflltration of a binary protein system was investigated using a membrane electroflltration apparatus. The results showed that a significant improvement could be achieved in membrane filtration by using the measured electrochemical properties to help adjust the process conditions.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Recent years have produced great advances in the instrumentation technology. The amount of available data has been increasing due to the simplicity, speed and accuracy of current spectroscopic instruments. Most of these data are, however, meaningless without a proper analysis. This has been one of the reasons for the overgrowing success of multivariate handling of such data. Industrial data is commonly not designed data; in other words, there is no exact experimental design, but rather the data have been collected as a routine procedure during an industrial process. This makes certain demands on the multivariate modeling, as the selection of samples and variables can have an enormous effect. Common approaches in the modeling of industrial data are PCA (principal component analysis) and PLS (projection to latent structures or partial least squares) but there are also other methods that should be considered. The more advanced methods include multi block modeling and nonlinear modeling. In this thesis it is shown that the results of data analysis vary according to the modeling approach used, thus making the selection of the modeling approach dependent on the purpose of the model. If the model is intended to provide accurate predictions, the approach should be different than in the case where the purpose of modeling is mostly to obtain information about the variables and the process. For industrial applicability it is essential that the methods are robust and sufficiently simple to apply. In this way the methods and the results can be compared and an approach selected that is suitable for the intended purpose. Differences in data analysis methods are compared with data from different fields of industry in this thesis. In the first two papers, the multi block method is considered for data originating from the oil and fertilizer industries. The results are compared to those from PLS and priority PLS. The third paper considers applicability of multivariate models to process control for a reactive crystallization process. In the fourth paper, nonlinear modeling is examined with a data set from the oil industry. The response has a nonlinear relation to the descriptor matrix, and the results are compared between linear modeling, polynomial PLS and nonlinear modeling using nonlinear score vectors.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Fluent health information flow is critical for clinical decision-making. However, a considerable part of this information is free-form text and inabilities to utilize it create risks to patient safety and cost-­effective hospital administration. Methods for automated processing of clinical text are emerging. The aim in this doctoral dissertation is to study machine learning and clinical text in order to support health information flow.First, by analyzing the content of authentic patient records, the aim is to specify clinical needs in order to guide the development of machine learning applications.The contributions are a model of the ideal information flow,a model of the problems and challenges in reality, and a road map for the technology development. Second, by developing applications for practical cases,the aim is to concretize ways to support health information flow. Altogether five machine learning applications for three practical cases are described: The first two applications are binary classification and regression related to the practical case of topic labeling and relevance ranking.The third and fourth application are supervised and unsupervised multi-class classification for the practical case of topic segmentation and labeling.These four applications are tested with Finnish intensive care patient records.The fifth application is multi-label classification for the practical task of diagnosis coding. It is tested with English radiology reports.The performance of all these applications is promising. Third, the aim is to study how the quality of machine learning applications can be reliably evaluated.The associations between performance evaluation measures and methods are addressed,and a new hold-out method is introduced.This method contributes not only to processing time but also to the evaluation diversity and quality. The main conclusion is that developing machine learning applications for text requires interdisciplinary, international collaboration. Practical cases are very different, and hence the development must begin from genuine user needs and domain expertise. The technological expertise must cover linguistics,machine learning, and information systems. Finally, the methods must be evaluated both statistically and through authentic user-feedback.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Local features are used in many computer vision tasks including visual object categorization, content-based image retrieval and object recognition to mention a few. Local features are points, blobs or regions in images that are extracted using a local feature detector. To make use of extracted local features the localized interest points are described using a local feature descriptor. A descriptor histogram vector is a compact representation of an image and can be used for searching and matching images in databases. In this thesis the performance of local feature detectors and descriptors is evaluated for object class detection task. Features are extracted from image samples belonging to several object classes. Matching features are then searched using random image pairs of a same class. The goal of this thesis is to find out what are the best detector and descriptor methods for such task in terms of detector repeatability and descriptor matching rate.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Machine learning provides tools for automated construction of predictive models in data intensive areas of engineering and science. The family of regularized kernel methods have in the recent years become one of the mainstream approaches to machine learning, due to a number of advantages the methods share. The approach provides theoretically well-founded solutions to the problems of under- and overfitting, allows learning from structured data, and has been empirically demonstrated to yield high predictive performance on a wide range of application domains. Historically, the problems of classification and regression have gained the majority of attention in the field. In this thesis we focus on another type of learning problem, that of learning to rank. In learning to rank, the aim is from a set of past observations to learn a ranking function that can order new objects according to how well they match some underlying criterion of goodness. As an important special case of the setting, we can recover the bipartite ranking problem, corresponding to maximizing the area under the ROC curve (AUC) in binary classification. Ranking applications appear in a large variety of settings, examples encountered in this thesis include document retrieval in web search, recommender systems, information extraction and automated parsing of natural language. We consider the pairwise approach to learning to rank, where ranking models are learned by minimizing the expected probability of ranking any two randomly drawn test examples incorrectly. The development of computationally efficient kernel methods, based on this approach, has in the past proven to be challenging. Moreover, it is not clear what techniques for estimating the predictive performance of learned models are the most reliable in the ranking setting, and how the techniques can be implemented efficiently. The contributions of this thesis are as follows. First, we develop RankRLS, a computationally efficient kernel method for learning to rank, that is based on minimizing a regularized pairwise least-squares loss. In addition to training methods, we introduce a variety of algorithms for tasks such as model selection, multi-output learning, and cross-validation, based on computational shortcuts from matrix algebra. Second, we improve the fastest known training method for the linear version of the RankSVM algorithm, which is one of the most well established methods for learning to rank. Third, we study the combination of the empirical kernel map and reduced set approximation, which allows the large-scale training of kernel machines using linear solvers, and propose computationally efficient solutions to cross-validation when using the approach. Next, we explore the problem of reliable cross-validation when using AUC as a performance criterion, through an extensive simulation study. We demonstrate that the proposed leave-pair-out cross-validation approach leads to more reliable performance estimation than commonly used alternative approaches. Finally, we present a case study on applying machine learning to information extraction from biomedical literature, which combines several of the approaches considered in the thesis. The thesis is divided into two parts. Part I provides the background for the research work and summarizes the most central results, Part II consists of the five original research articles that are the main contribution of this thesis.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Kirjallisuusarvostelu

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The three main topics of this work are independent systems and chains of word equations, parametric solutions of word equations on three unknowns, and unique decipherability in the monoid of regular languages. The most important result about independent systems is a new method giving an upper bound for their sizes in the case of three unknowns. The bound depends on the length of the shortest equation. This result has generalizations for decreasing chains and for more than three unknowns. The method also leads to shorter proofs and generalizations of some old results. Hmelevksii’s theorem states that every word equation on three unknowns has a parametric solution. We give a significantly simplified proof for this theorem. As a new result we estimate the lengths of parametric solutions and get a bound for the length of the minimal nontrivial solution and for the complexity of deciding whether such a solution exists. The unique decipherability problem asks whether given elements of some monoid form a code, that is, whether they satisfy a nontrivial equation. We give characterizations for when a collection of unary regular languages is a code. We also prove that it is undecidable whether a collection of binary regular languages is a code.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Binary probes are oligonucleotide probe pairs that hybridize adjacently to a complementary target nucleic acid. In order to detect this hybridization, the two probes can be modified with, for example, fluorescent molecules, chemically reactive groups or nucleic acid enzymes. The benefit of this kind of binary probe based approach is that the hybridization elicits a detectable signal which is distinguishable from background noise even though unbound probes are not removed by washing before measurement. In addition, the requirement of two simultaneous binding events increases specificity. Similarly to binary oligonucleotide probes, also certain enzymes and fluorescent proteins can be divided into two parts and used in separation-free assays. Split enzyme and fluorescent protein reporters have practical applications among others as tools to investigate protein-protein interactions within living cells. In this study, a novel label technology, switchable lanthanide luminescence, was introduced and used successfully in model assays for nucleic acid and protein detection. This label technology is based on a luminescent lanthanide chelate divided into two inherently non-luminescent moieties, an ion carrier chelate and a light harvesting antenna ligand. These form a highly luminescent complex when brought into close proximity; i.e., the label moieties switch from a dark state to a luminescent state. This kind of mixed lanthanide complex has the same beneficial photophysical properties as the more typical lanthanide chelates and cryptates - sharp emission peaks, long emission lifetime enabling time-resolved measurement, and large Stokes’ shift, which minimize the background signal. Furthermore, the switchable lanthanide luminescence technique enables a homogeneous assay set-up. Here, switchable lanthanide luminescence label technology was first applied to sensitive, homogeneous, single-target nucleic acid and protein assays with picomolar detection limits and high signal to background ratios. Thereafter, a homogeneous four-plex nucleic acid array-based assay was developed. Finally, the label technology was shown to be effective in discrimination of single nucleotide mismatched targets from fully matched targets and the luminescent complex formation was analyzed more thoroughly. In conclusion, this study demonstrates that the switchable lanthanide luminescencebased label technology can be used in various homogeneous bioanalytical assays.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this thesis we examine four well-known and traditional concepts of combinatorics on words. However the contexts in which these topics are treated are not the traditional ones. More precisely, the question of avoidability is asked, for example, in terms of k-abelian squares. Two words are said to be k-abelian equivalent if they have the same number of occurrences of each factor up to length k. Consequently, k-abelian equivalence can be seen as a sharpening of abelian equivalence. This fairly new concept is discussed broader than the other topics of this thesis. The second main subject concerns the defect property. The defect theorem is a well-known result for words. We will analyze the property, for example, among the sets of 2-dimensional words, i.e., polyominoes composed of labelled unit squares. From the defect effect we move to equations. We will use a special way to define a product operation for words and then solve a few basic equations over constructed partial semigroup. We will also consider the satisfiability question and the compactness property with respect to this kind of equations. The final topic of the thesis deals with palindromes. Some finite words, including all binary words, are uniquely determined up to word isomorphism by the position and length of some of its palindromic factors. The famous Thue-Morse word has the property that for each positive integer n, there exists a factor which cannot be generated by fewer than n palindromes. We prove that in general, every non ultimately periodic word contains a factor which cannot be generated by fewer than 3 palindromes, and we obtain a classification of those binary words each of whose factors are generated by at most 3 palindromes. Surprisingly these words are related to another much studied set of words, Sturmian words.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The usage of digital content, such as video clips and images, has increased dramatically during the last decade. Local image features have been applied increasingly in various image and video retrieval applications. This thesis evaluates local features and applies them to image and video processing tasks. The results of the study show that 1) the performance of different local feature detector and descriptor methods vary significantly in object class matching, 2) local features can be applied in image alignment with superior results against the state-of-the-art, 3) the local feature based shot boundary detection method produces promising results, and 4) the local feature based hierarchical video summarization method shows promising new new research direction. In conclusion, this thesis presents the local features as a powerful tool in many applications and the imminent future work should concentrate on improving the quality of the local features.

Relevância:

10.00% 10.00%

Publicador:

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Biomedical natural language processing (BioNLP) is a subfield of natural language processing, an area of computational linguistics concerned with developing programs that work with natural language: written texts and speech. Biomedical relation extraction concerns the detection of semantic relations such as protein-protein interactions (PPI) from scientific texts. The aim is to enhance information retrieval by detecting relations between concepts, not just individual concepts as with a keyword search. In recent years, events have been proposed as a more detailed alternative for simple pairwise PPI relations. Events provide a systematic, structural representation for annotating the content of natural language texts. Events are characterized by annotated trigger words, directed and typed arguments and the ability to nest other events. For example, the sentence “Protein A causes protein B to bind protein C” can be annotated with the nested event structure CAUSE(A, BIND(B, C)). Converted to such formal representations, the information of natural language texts can be used by computational applications. Biomedical event annotations were introduced by the BioInfer and GENIA corpora, and event extraction was popularized by the BioNLP'09 Shared Task on Event Extraction. In this thesis we present a method for automated event extraction, implemented as the Turku Event Extraction System (TEES). A unified graph format is defined for representing event annotations and the problem of extracting complex event structures is decomposed into a number of independent classification tasks. These classification tasks are solved using SVM and RLS classifiers, utilizing rich feature representations built from full dependency parsing. Building on earlier work on pairwise relation extraction and using a generalized graph representation, the resulting TEES system is capable of detecting binary relations as well as complex event structures. We show that this event extraction system has good performance, reaching the first place in the BioNLP'09 Shared Task on Event Extraction. Subsequently, TEES has achieved several first ranks in the BioNLP'11 and BioNLP'13 Shared Tasks, as well as shown competitive performance in the binary relation Drug-Drug Interaction Extraction 2011 and 2013 shared tasks. The Turku Event Extraction System is published as a freely available open-source project, documenting the research in detail as well as making the method available for practical applications. In particular, in this thesis we describe the application of the event extraction method to PubMed-scale text mining, showing how the developed approach not only shows good performance, but is generalizable and applicable to large-scale real-world text mining projects. Finally, we discuss related literature, summarize the contributions of the work and present some thoughts on future directions for biomedical event extraction. This thesis includes and builds on six original research publications. The first of these introduces the analysis of dependency parses that leads to development of TEES. The entries in the three BioNLP Shared Tasks, as well as in the DDIExtraction 2011 task are covered in four publications, and the sixth one demonstrates the application of the system to PubMed-scale text mining.