32 resultados para Binary
Resumo:
Diplomityön tarkoituksena on tarkastella isobuteenin dimeroitumisprosessin neste-nestetasapainoja. Tavoitteena on määrittää prosessissa esiintyvien komponenttien väliset neste-nestetasapainot. Työn kirjallisuusosassa on tarkasteltu neste-nestetasapainojen teoriaa. Erityisesti on tarkasteltu mittausmenetelmiä sekä kirjallisuudesta löytyneitä laitteistoja binääristen ja ternääristen systeemien neste-nestetasapainojen määritystä varten. Menetelmät ja laitteistot on esitetty erikseen matalassa ja korkeassa paineessa suoritettaville mittauksille. Lisäksi on tarkasteltu näytteenottoa sekä näytteiden analysointimenetelmiä. Kirjallisuusosassa on myös sivuttu kaasu-neste-nestetasapainojen määritystä, mutta työn varsinainen kohde on neste-nestetasapainojen määritys. Työn kokeellisessa osassa määritettiin iso-oktaaniprosessissa esiintyvien komponenttien välisiä binäärisiä ja ternäärisiä neste-nestetasapainoja. Mitattavien komponenttiparien määrää karsittiin ja jäljellejääneiden parien välillä suoritettavat mittaukset jaoteltiin matalassa ja korkeassa paineessa suoritettaviin määrityksiin. Ternääriset mittaukset tulivat kyseeseen sellaisten komponenttiparien kohdalla, joissa toisiinsa täysin liukenevien nesteiden systeemiin kolmatta komponenttia lisättäessä saatiin aikaiseksi kaksi nestefaasia. Tällaisesta mittaustiedosta voidaan määrittää neste-nestetasapainomallien parametrejä. Mittausten lisäksi kokeellisessa osassa tarkasteltiin näytteenottoa sekä näytteiden analysointia.
Resumo:
This thesis deals with distance transforms which are a fundamental issue in image processing and computer vision. In this thesis, two new distance transforms for gray level images are presented. As a new application for distance transforms, they are applied to gray level image compression. The new distance transforms are both new extensions of the well known distance transform algorithm developed by Rosenfeld, Pfaltz and Lay. With some modification their algorithm which calculates a distance transform on binary images with a chosen kernel has been made to calculate a chessboard like distance transform with integer numbers (DTOCS) and a real value distance transform (EDTOCS) on gray level images. Both distance transforms, the DTOCS and EDTOCS, require only two passes over the graylevel image and are extremely simple to implement. Only two image buffers are needed: The original gray level image and the binary image which defines the region(s) of calculation. No other image buffers are needed even if more than one iteration round is performed. For large neighborhoods and complicated images the two pass distance algorithm has to be applied to the image more than once, typically 3 10 times. Different types of kernels can be adopted. It is important to notice that no other existing transform calculates the same kind of distance map as the DTOCS. All the other gray weighted distance function, GRAYMAT etc. algorithms find the minimum path joining two points by the smallest sum of gray levels or weighting the distance values directly by the gray levels in some manner. The DTOCS does not weight them that way. The DTOCS gives a weighted version of the chessboard distance map. The weights are not constant, but gray value differences of the original image. The difference between the DTOCS map and other distance transforms for gray level images is shown. The difference between the DTOCS and EDTOCS is that the EDTOCS calculates these gray level differences in a different way. It propagates local Euclidean distances inside a kernel. Analytical derivations of some results concerning the DTOCS and the EDTOCS are presented. Commonly distance transforms are used for feature extraction in pattern recognition and learning. Their use in image compression is very rare. This thesis introduces a new application area for distance transforms. Three new image compression algorithms based on the DTOCS and one based on the EDTOCS are presented. Control points, i.e. points that are considered fundamental for the reconstruction of the image, are selected from the gray level image using the DTOCS and the EDTOCS. The first group of methods select the maximas of the distance image to new control points and the second group of methods compare the DTOCS distance to binary image chessboard distance. The effect of applying threshold masks of different sizes along the threshold boundaries is studied. The time complexity of the compression algorithms is analyzed both analytically and experimentally. It is shown that the time complexity of the algorithms is independent of the number of control points, i.e. the compression ratio. Also a new morphological image decompression scheme is presented, the 8 kernels' method. Several decompressed images are presented. The best results are obtained using the Delaunay triangulation. The obtained image quality equals that of the DCT images with a 4 x 4
Resumo:
Streaming potential measurements for the surface charge characterisation of different filter media types and materials were used. The equipment was developed further so that measurements could be taken along the surfaces, and so that tubular membranes could also be measured. The streaming potential proved to be a very useful tool in the charge analysis of both clean and fouled filter media. Adsorption and fouling could be studied, as could flux, as functions of time. A module to determine the membrane potential was also constructed. The results collected from the experiments conducted with these devices were used in the study of the theory of streaming potential as an electrokinetic phenomenon. Several correction factors, which are derived to take into account the surface conductance and the electrokinetic flow in very narrow capillaries, were tested in practice. The surface materials were studied using FTIR and the results compared with those from the streaming potentials. FTIR analysis was also found to be a useful tool in the characterisation of filters, as well as in the fouling studies. Upon examination of the recorded spectra from different depths in a sample it was possible to determine the adsorption sites. The influence of an external electric field on the cross flow microflltration of a binary protein system was investigated using a membrane electroflltration apparatus. The results showed that a significant improvement could be achieved in membrane filtration by using the measured electrochemical properties to help adjust the process conditions.
Resumo:
Fluent health information flow is critical for clinical decision-making. However, a considerable part of this information is free-form text and inabilities to utilize it create risks to patient safety and cost-effective hospital administration. Methods for automated processing of clinical text are emerging. The aim in this doctoral dissertation is to study machine learning and clinical text in order to support health information flow.First, by analyzing the content of authentic patient records, the aim is to specify clinical needs in order to guide the development of machine learning applications.The contributions are a model of the ideal information flow,a model of the problems and challenges in reality, and a road map for the technology development. Second, by developing applications for practical cases,the aim is to concretize ways to support health information flow. Altogether five machine learning applications for three practical cases are described: The first two applications are binary classification and regression related to the practical case of topic labeling and relevance ranking.The third and fourth application are supervised and unsupervised multi-class classification for the practical case of topic segmentation and labeling.These four applications are tested with Finnish intensive care patient records.The fifth application is multi-label classification for the practical task of diagnosis coding. It is tested with English radiology reports.The performance of all these applications is promising. Third, the aim is to study how the quality of machine learning applications can be reliably evaluated.The associations between performance evaluation measures and methods are addressed,and a new hold-out method is introduced.This method contributes not only to processing time but also to the evaluation diversity and quality. The main conclusion is that developing machine learning applications for text requires interdisciplinary, international collaboration. Practical cases are very different, and hence the development must begin from genuine user needs and domain expertise. The technological expertise must cover linguistics,machine learning, and information systems. Finally, the methods must be evaluated both statistically and through authentic user-feedback.
Resumo:
Machine learning provides tools for automated construction of predictive models in data intensive areas of engineering and science. The family of regularized kernel methods have in the recent years become one of the mainstream approaches to machine learning, due to a number of advantages the methods share. The approach provides theoretically well-founded solutions to the problems of under- and overfitting, allows learning from structured data, and has been empirically demonstrated to yield high predictive performance on a wide range of application domains. Historically, the problems of classification and regression have gained the majority of attention in the field. In this thesis we focus on another type of learning problem, that of learning to rank. In learning to rank, the aim is from a set of past observations to learn a ranking function that can order new objects according to how well they match some underlying criterion of goodness. As an important special case of the setting, we can recover the bipartite ranking problem, corresponding to maximizing the area under the ROC curve (AUC) in binary classification. Ranking applications appear in a large variety of settings, examples encountered in this thesis include document retrieval in web search, recommender systems, information extraction and automated parsing of natural language. We consider the pairwise approach to learning to rank, where ranking models are learned by minimizing the expected probability of ranking any two randomly drawn test examples incorrectly. The development of computationally efficient kernel methods, based on this approach, has in the past proven to be challenging. Moreover, it is not clear what techniques for estimating the predictive performance of learned models are the most reliable in the ranking setting, and how the techniques can be implemented efficiently. The contributions of this thesis are as follows. First, we develop RankRLS, a computationally efficient kernel method for learning to rank, that is based on minimizing a regularized pairwise least-squares loss. In addition to training methods, we introduce a variety of algorithms for tasks such as model selection, multi-output learning, and cross-validation, based on computational shortcuts from matrix algebra. Second, we improve the fastest known training method for the linear version of the RankSVM algorithm, which is one of the most well established methods for learning to rank. Third, we study the combination of the empirical kernel map and reduced set approximation, which allows the large-scale training of kernel machines using linear solvers, and propose computationally efficient solutions to cross-validation when using the approach. Next, we explore the problem of reliable cross-validation when using AUC as a performance criterion, through an extensive simulation study. We demonstrate that the proposed leave-pair-out cross-validation approach leads to more reliable performance estimation than commonly used alternative approaches. Finally, we present a case study on applying machine learning to information extraction from biomedical literature, which combines several of the approaches considered in the thesis. The thesis is divided into two parts. Part I provides the background for the research work and summarizes the most central results, Part II consists of the five original research articles that are the main contribution of this thesis.
Resumo:
Kirjallisuusarvostelu
Resumo:
The three main topics of this work are independent systems and chains of word equations, parametric solutions of word equations on three unknowns, and unique decipherability in the monoid of regular languages. The most important result about independent systems is a new method giving an upper bound for their sizes in the case of three unknowns. The bound depends on the length of the shortest equation. This result has generalizations for decreasing chains and for more than three unknowns. The method also leads to shorter proofs and generalizations of some old results. Hmelevksii’s theorem states that every word equation on three unknowns has a parametric solution. We give a significantly simplified proof for this theorem. As a new result we estimate the lengths of parametric solutions and get a bound for the length of the minimal nontrivial solution and for the complexity of deciding whether such a solution exists. The unique decipherability problem asks whether given elements of some monoid form a code, that is, whether they satisfy a nontrivial equation. We give characterizations for when a collection of unary regular languages is a code. We also prove that it is undecidable whether a collection of binary regular languages is a code.
Resumo:
Binary probes are oligonucleotide probe pairs that hybridize adjacently to a complementary target nucleic acid. In order to detect this hybridization, the two probes can be modified with, for example, fluorescent molecules, chemically reactive groups or nucleic acid enzymes. The benefit of this kind of binary probe based approach is that the hybridization elicits a detectable signal which is distinguishable from background noise even though unbound probes are not removed by washing before measurement. In addition, the requirement of two simultaneous binding events increases specificity. Similarly to binary oligonucleotide probes, also certain enzymes and fluorescent proteins can be divided into two parts and used in separation-free assays. Split enzyme and fluorescent protein reporters have practical applications among others as tools to investigate protein-protein interactions within living cells. In this study, a novel label technology, switchable lanthanide luminescence, was introduced and used successfully in model assays for nucleic acid and protein detection. This label technology is based on a luminescent lanthanide chelate divided into two inherently non-luminescent moieties, an ion carrier chelate and a light harvesting antenna ligand. These form a highly luminescent complex when brought into close proximity; i.e., the label moieties switch from a dark state to a luminescent state. This kind of mixed lanthanide complex has the same beneficial photophysical properties as the more typical lanthanide chelates and cryptates - sharp emission peaks, long emission lifetime enabling time-resolved measurement, and large Stokes’ shift, which minimize the background signal. Furthermore, the switchable lanthanide luminescence technique enables a homogeneous assay set-up. Here, switchable lanthanide luminescence label technology was first applied to sensitive, homogeneous, single-target nucleic acid and protein assays with picomolar detection limits and high signal to background ratios. Thereafter, a homogeneous four-plex nucleic acid array-based assay was developed. Finally, the label technology was shown to be effective in discrimination of single nucleotide mismatched targets from fully matched targets and the luminescent complex formation was analyzed more thoroughly. In conclusion, this study demonstrates that the switchable lanthanide luminescencebased label technology can be used in various homogeneous bioanalytical assays.
Resumo:
In this thesis we examine four well-known and traditional concepts of combinatorics on words. However the contexts in which these topics are treated are not the traditional ones. More precisely, the question of avoidability is asked, for example, in terms of k-abelian squares. Two words are said to be k-abelian equivalent if they have the same number of occurrences of each factor up to length k. Consequently, k-abelian equivalence can be seen as a sharpening of abelian equivalence. This fairly new concept is discussed broader than the other topics of this thesis. The second main subject concerns the defect property. The defect theorem is a well-known result for words. We will analyze the property, for example, among the sets of 2-dimensional words, i.e., polyominoes composed of labelled unit squares. From the defect effect we move to equations. We will use a special way to define a product operation for words and then solve a few basic equations over constructed partial semigroup. We will also consider the satisfiability question and the compactness property with respect to this kind of equations. The final topic of the thesis deals with palindromes. Some finite words, including all binary words, are uniquely determined up to word isomorphism by the position and length of some of its palindromic factors. The famous Thue-Morse word has the property that for each positive integer n, there exists a factor which cannot be generated by fewer than n palindromes. We prove that in general, every non ultimately periodic word contains a factor which cannot be generated by fewer than 3 palindromes, and we obtain a classification of those binary words each of whose factors are generated by at most 3 palindromes. Surprisingly these words are related to another much studied set of words, Sturmian words.
Resumo:
Biomedical natural language processing (BioNLP) is a subfield of natural language processing, an area of computational linguistics concerned with developing programs that work with natural language: written texts and speech. Biomedical relation extraction concerns the detection of semantic relations such as protein-protein interactions (PPI) from scientific texts. The aim is to enhance information retrieval by detecting relations between concepts, not just individual concepts as with a keyword search. In recent years, events have been proposed as a more detailed alternative for simple pairwise PPI relations. Events provide a systematic, structural representation for annotating the content of natural language texts. Events are characterized by annotated trigger words, directed and typed arguments and the ability to nest other events. For example, the sentence “Protein A causes protein B to bind protein C” can be annotated with the nested event structure CAUSE(A, BIND(B, C)). Converted to such formal representations, the information of natural language texts can be used by computational applications. Biomedical event annotations were introduced by the BioInfer and GENIA corpora, and event extraction was popularized by the BioNLP'09 Shared Task on Event Extraction. In this thesis we present a method for automated event extraction, implemented as the Turku Event Extraction System (TEES). A unified graph format is defined for representing event annotations and the problem of extracting complex event structures is decomposed into a number of independent classification tasks. These classification tasks are solved using SVM and RLS classifiers, utilizing rich feature representations built from full dependency parsing. Building on earlier work on pairwise relation extraction and using a generalized graph representation, the resulting TEES system is capable of detecting binary relations as well as complex event structures. We show that this event extraction system has good performance, reaching the first place in the BioNLP'09 Shared Task on Event Extraction. Subsequently, TEES has achieved several first ranks in the BioNLP'11 and BioNLP'13 Shared Tasks, as well as shown competitive performance in the binary relation Drug-Drug Interaction Extraction 2011 and 2013 shared tasks. The Turku Event Extraction System is published as a freely available open-source project, documenting the research in detail as well as making the method available for practical applications. In particular, in this thesis we describe the application of the event extraction method to PubMed-scale text mining, showing how the developed approach not only shows good performance, but is generalizable and applicable to large-scale real-world text mining projects. Finally, we discuss related literature, summarize the contributions of the work and present some thoughts on future directions for biomedical event extraction. This thesis includes and builds on six original research publications. The first of these introduces the analysis of dependency parses that leads to development of TEES. The entries in the three BioNLP Shared Tasks, as well as in the DDIExtraction 2011 task are covered in four publications, and the sixth one demonstrates the application of the system to PubMed-scale text mining.
Resumo:
Preparative liquid chromatography is one of the most selective separation techniques in the fine chemical, pharmaceutical, and food industries. Several process concepts have been developed and applied for improving the performance of classical batch chromatography. The most powerful approaches include various single-column recycling schemes, counter-current and cross-current multi-column setups, and hybrid processes where chromatography is coupled with other unit operations such as crystallization, chemical reactor, and/or solvent removal unit. To fully utilize the potential of stand-alone and integrated chromatographic processes, efficient methods for selecting the best process alternative as well as optimal operating conditions are needed. In this thesis, a unified method is developed for analysis and design of the following singlecolumn fixed bed processes and corresponding cross-current schemes: (1) batch chromatography, (2) batch chromatography with an integrated solvent removal unit, (3) mixed-recycle steady state recycling chromatography (SSR), and (4) mixed-recycle steady state recycling chromatography with solvent removal from fresh feed, recycle fraction, or column feed (SSR–SR). The method is based on the equilibrium theory of chromatography with an assumption of negligible mass transfer resistance and axial dispersion. The design criteria are given in general, dimensionless form that is formally analogous to that applied widely in the so called triangle theory of counter-current multi-column chromatography. Analytical design equations are derived for binary systems that follow competitive Langmuir adsorption isotherm model. For this purpose, the existing analytic solution of the ideal model of chromatography for binary Langmuir mixtures is completed by deriving missing explicit equations for the height and location of the pure first component shock in the case of a small feed pulse. It is thus shown that the entire chromatographic cycle at the column outlet can be expressed in closed-form. The developed design method allows predicting the feasible range of operating parameters that lead to desired product purities. It can be applied for the calculation of first estimates of optimal operating conditions, the analysis of process robustness, and the early-stage evaluation of different process alternatives. The design method is utilized to analyse the possibility to enhance the performance of conventional SSR chromatography by integrating it with a solvent removal unit. It is shown that the amount of fresh feed processed during a chromatographic cycle and thus the productivity of SSR process can be improved by removing solvent. The maximum solvent removal capacity depends on the location of the solvent removal unit and the physical solvent removal constraints, such as solubility, viscosity, and/or osmotic pressure limits. Usually, the most flexible option is to remove solvent from the column feed. Applicability of the equilibrium design for real, non-ideal separation problems is evaluated by means of numerical simulations. Due to assumption of infinite column efficiency, the developed design method is most applicable for high performance systems where thermodynamic effects are predominant, while significant deviations are observed under highly non-ideal conditions. The findings based on the equilibrium theory are applied to develop a shortcut approach for the design of chromatographic separation processes under strongly non-ideal conditions with significant dispersive effects. The method is based on a simple procedure applied to a single conventional chromatogram. Applicability of the approach for the design of batch and counter-current simulated moving bed processes is evaluated with case studies. It is shown that the shortcut approach works the better the higher the column efficiency and the lower the purity constraints are.
Resumo:
Water geochemistry is a very important tool for studying the water quality in a given area. Geology and climate are the major natural factors controlling the chemistry of most natural waters. Anthropogenic impacts are the secondary sources of contamination in natural waters. This study presents the first integrative approach to the geochemistry and water quality of surface waters and Lake Qarun in the Fayoum catchment, Egypt. Moreover, geochemical modeling of Lake Qarun was firstly presented. The Nile River is the main source of water to the Fayoum watershed. To investigate the quality and geochemistry of this water, water samples from irrigation canals, drains and Lake Qarun were collected during the period 2010‒2013 from the whole Fayoum drainage basin to address the major processes and factors governing the evolution of water chemistry in the investigation area. About 34 physicochemical quality parameters, including major ions, oxygen isotopes, trace elements, nutrients and microbiological parameters were investigated in the water samples. Multivariable statistical analysis was used to interpret the interrelationship between the different studied parameters. Geochemical modeling of Lake Qarun was carried out using Hardie and Eugster’s evolutionary model and a model simulated by PHREEQC software. The crystallization sequence during evaporation of Lake Qarun brine was also studied using a Jänecke phase diagram involving the system Na‒K‒Mg‒ Cl‒SO4‒H2O. The results show that the chemistry of surface water in the Fayoum catchment evolves from Ca- Mg-HCO3 at the head waters to Ca‒Mg‒Cl‒SO4 and eventually to Na‒Cl downstream and at Lake Qarun. The main processes behind the high levels of Na, SO4 and Cl in downstream waters and in Lake Qarun are dissolution of evaporites from Fayoum soils followed by evapoconcentration. This was confirmed by binary plots between the different ions, Piper plot, Gibb’s plot and δ18O results. The modeled data proved that Lake Qarun brine evolves from drainage waters via an evaporation‒crystallization process. Through the precipitation of calcite and gypsum, the solution should reach the final composition "Na–Mg–SO4–Cl". As simulated by PHREEQC, further evaporation of lake brine can drive halite to precipitate in the final stages of evaporation. Significantly, the crystallization sequence during evaporation of the lake brine at the concentration ponds of the Egyptian Salts and Minerals Company (EMISAL) reflected the findings from both Hardie and Eugster’s evolutionary model and the PHREEQC simulated model. After crystallization of halite at the EMISAL ponds, the crystallization sequence during evaporation of the residual brine (bittern) was investigated using a Jänecke phase diagram at 35 °C. This diagram was more useful than PHREEQC for predicting the evaporation path especially in the case of this highly concentrated brine (bittern). The predicted crystallization path using a Jänecke phase diagram at 35 °C showed that halite, hexahydrite, kainite and kieserite should appear during bittern evaporation. Yet the actual crystallized mineral salts were only halite and hexahydrite. The absence of kainite was due to its metastability while the absence of kieserite was due to opposed relative humidity. The presence of a specific MgSO4.nH2O phase in ancient evaporite deposits can be used as a paleoclimatic indicator. Evaluation of surface water quality for agricultural purposes shows that some irrigation waters and all drainage waters have high salinities and therefore cannot be used for irrigation. Waters from irrigation canals used as a drinking water supply show higher concentrations of Al and suffer from high levels of total coliform (TC), fecal coliform (FC) and fecal streptococcus (FS). These waters cannot be used for drinking or agricultural purposes without treatment, because of their high health risk. Therefore it is crucial that environmental protection agencies and the media increase public awareness of this issue, especially in rural areas.
Resumo:
The whole research of the current Master Thesis project is related to Big Data transfer over Parallel Data Link and my main objective is to assist the Saint-Petersburg National Research University ITMO research team to accomplish this project and apply Green IT methods for the data transfer system. The goal of the team is to transfer Big Data by using parallel data links with SDN Openflow approach. My task as a team member was to compare existing data transfer applications in case to verify which results the highest data transfer speed in which occasions and explain the reasons. In the context of this thesis work a comparison between 5 different utilities was done, which including Fast Data Transfer (FDT), BBCP, BBFTP, GridFTP, and FTS3. A number of scripts where developed which consist of creating random binary data to be incompressible to have fair comparison between utilities, execute the Utilities with specified parameters, create log files, results, system parameters, and plot graphs to compare the results. Transferring such an enormous variety of data can take a long time, and hence, the necessity appears to reduce the energy consumption to make them greener. In the context of Green IT approach, our team used Cloud Computing infrastructure called OpenStack. It’s more efficient to allocated specific amount of hardware resources to test different scenarios rather than using the whole resources from our testbed. Testing our implementation with OpenStack infrastructure results that the virtual channel does not consist of any traffic and we can achieve the highest possible throughput. After receiving the final results we are in place to identify which utilities produce faster data transfer in different scenarios with specific TCP parameters and we can use them in real network data links.
Resumo:
A method to synthesize ethyl β-ᴅ-glucopyranoside (BEG) was searched. Feasibility of different ion exchange resins was examined to purify the product from the synthetic binary solution of BEG and glucose. The target was to produce at least 50 grams of 99 % pure BEG with a scaled up process. Another target was to transfer the batch process into steady-state recycle chromatography process (SSR). BEG was synthesized enzymatically with reverse hydrolysis utilizing β-glucosidase as a catalyst. 65 % of glucose reacted with ethanol into BEG during the synthesis. Different ion exchanger based resins were examined to separate BEG from glucose. Based on batch chromatography experiments the best adsorbent was chosen between styrene based strong acid cation exchange resins (SAC) and acryl based weak acid cation exchange resins (WAC). CA10GC WAC resin in Na+ form was chosen for the further separation studies. To produce greater amounts of the product the batch process was scaled up. The adsorption isotherms for the components were linear. The target purity was possible to reach already in batch without recycle with flowrate and injection size small enough. 99 % pure product was produced with scaled-up batch process. Batch process was transferred to SSR process utilizing the data from design pulse chromatograms and Matlab simulations. The optimal operating conditions for the system were determined. Batch and SSR separation results were compared and by using SSR 98 % pure products were gained with 40 % higher productivity and 40 % lower eluent consumption compared to batch process producing as pure products.