851 resultados para Topic discovery
Resumo:
This paper addresses the problem of discovering business process models from event logs. Existing approaches to this problem strike various tradeoffs between accuracy and understandability of the discovered models. With respect to the second criterion, empirical studies have shown that block-structured process models are generally more understandable and less error-prone than unstructured ones. Accordingly, several automated process discovery methods generate block-structured models by construction. These approaches however intertwine the concern of producing accurate models with that of ensuring their structuredness, sometimes sacrificing the former to ensure the latter. In this paper we propose an alternative approach that separates these two concerns. Instead of directly discovering a structured process model, we first apply a well-known heuristic technique that discovers more accurate but sometimes unstructured (and even unsound) process models, and then transform the resulting model into a structured one. An experimental evaluation shows that our “discover and structure” approach outperforms traditional “discover structured” approaches with respect to a range of accuracy and complexity measures.
Resumo:
This thesis describes current and past n-in-one methods and presents three early experimental studies using mass spectrometry and the triple quadrupole instrument on the application of n-in-one in drug discovery. N-in-one strategy pools and mix samples in drug discovery prior to measurement or analysis. This allows the most promising compounds to be rapidly identified and then analysed. Nowadays properties of drugs are characterised earlier and in parallel with pharmacological efficacy. Studies presented here use in vitro methods as caco-2 cells and immobilized artificial membrane chromatography for drug absorption and lipophilicity measurements. The high sensitivity and selectivity of liquid chromatography mass spectrometry are especially important for new analytical methods using n-in-one. In the first study, the fragmentation patterns of ten nitrophenoxy benzoate compounds, serial homology, were characterised and the presence of the compounds was determined in a combinatorial library. The influence of one or two nitro substituents and the alkyl chain length of methyl to pentyl on collision-induced fragmentation was studied, and interesting structurefragmentation relationships were detected. Two nitro group compounds increased fragmentation compared to one nitro group, whereas less fragmentation was noted in molecules with a longer alkyl chain. The most abundant product ions were nitrophenoxy ions, which were also tested in the precursor ion screening of the combinatorial library. In the second study, the immobilized artificial membrane chromatographic method was transferred from ultraviolet detection to mass spectrometric analysis and a new method was developed. Mass spectra were scanned and the chromatographic retention of compounds was analysed using extract ion chromatograms. When changing detectors and buffers and including n-in-one in the method, the results showed good correlation. Finally, the results demonstrated that mass spectrometric detection with gradient elution can provide a rapid and convenient n-in-one method for ranking the lipophilic properties of several structurally diverse compounds simultaneously. In the final study, a new method was developed for caco-2 samples. Compounds were separated by liquid chromatography and quantified by selected reaction monitoring using mass spectrometry. This method was used for caco-2 samples, where absorption of ten chemically and physiologically different compounds was screened using both single and nin- one approaches. These three studies used mass spectrometry for compound identification, method transfer and quantitation in the area of mixture analysis. Different mass spectrometric scanning modes for the triple quadrupole instrument were used in each method. Early drug discovery with n-in-one is area where mass spectrometric analysis, its possibilities and proper use, is especially important.
Resumo:
Market microstructure is “the study of the trading mechanisms used for financial securities” (Hasbrouck (2007)). It seeks to understand the sources of value and reasons for trade, in a setting with different types of traders, and different private and public information sets. The actual mechanisms of trade are a continually changing object of study. These include continuous markets, auctions, limit order books, dealer markets, or combinations of these operating as a hybrid market. Microstructure also has to allow for the possibility of multiple prices. At any given time an investor may be faced with a multitude of different prices, depending on whether he or she is buying or selling, the quantity he or she wishes to trade, and the required speed for the trade. The price may also depend on the relationship that the trader has with potential counterparties. In this research, I touch upon all of the above issues. I do this by studying three specific areas, all of which have both practical and policy implications. First, I study the role of information in trading and pricing securities in markets with a heterogeneous population of traders, some of whom are informed and some not, and who trade for different private or public reasons. Second, I study the price discovery of stocks in a setting where they are simultaneously traded in more than one market. Third, I make a contribution to the ongoing discussion about market design, i.e. the question of which trading systems and ways of organizing trading are most efficient. A common characteristic throughout my thesis is the use of high frequency datasets, i.e. tick data. These datasets include all trades and quotes in a given security, rather than just the daily closing prices, as in traditional asset pricing literature. This thesis consists of four separate essays. In the first essay I study price discovery for European companies cross-listed in the United States. I also study explanatory variables for differences in price discovery. In my second essay I contribute to earlier research on two issues of broad interest in market microstructure: market transparency and informed trading. I examine the effects of a change to an anonymous market at the OMX Helsinki Stock Exchange. I broaden my focus slightly in the third essay, to include releases of macroeconomic data in the United States. I analyze the effect of these releases on European cross-listed stocks. The fourth and last essay examines the uses of standard methodologies of price discovery analysis in a novel way. Specifically, I study price discovery within one market, between local and foreign traders.
Resumo:
The purpose of this thesis is to examine the role of trade durations in price discovery. The motivation to use trade durations in the study of price discovery is that durations are robust to many microstructure effects that introduce a bias in the measurement of returns volatility. Another motivation to use trade durations in the study of price discovery is that it is difficult to think of economic variables, which really are useful in the determination of the source of volatility at arbitrarily high frequencies. The dissertation contains three essays. In the first essay, the role of trade durations in price discovery is examined with respect to the volatility pattern of stock returns. The theory on volatility is associated with the theory on the information content of trade, dear to the market microstructure theory. The first essay documents that the volatility per transaction is related to the intensity of trade, and a strong relationship between the stochastic process of trade durations and trading variables. In the second essay, the role of trade durations in price discovery is examined with respect to the quantification of risk due to a trading volume of a certain size. The theory on volume is intrinsically associated with the stock volatility pattern. The essay documents that volatility increases, in general, when traders choose to trade with large transactions. In the third essay, the role of trade durations in price discovery is examined with respect to the information content of a trade. The theory on the information content of a trade is associated with the theory on the rate of price revisions in the market. The essay documents that short durations are associated with information. Thus, traders are compensated for responding quickly to information
Resumo:
Enzymes offer many advantages in industrial processes, such as high specificity, mild treatment conditions and low energy requirements. Therefore, the industry has exploited them in many sectors including food processing. Enzymes can modify food properties by acting on small molecules or on polymers such as carbohydrates or proteins. Crosslinking enzymes such as tyrosinases and sulfhydryl oxidases catalyse the formation of novel covalent bonds between specific residues in proteins and/or peptides, thus forming or modifying the protein network of food. In this study, novel secreted fungal proteins with sequence features typical of tyrosinases and sulfhydryl oxidases were iden-tified through a genome mining study. Representatives of both of these enzyme families were selected for heterologous produc-tion in the filamentous fungus Trichoderma reesei and biochemical characterisation. Firstly, a novel family of putative tyrosinases carrying a shorter sequence than the previously characterised tyrosinases was discovered. These proteins lacked the whole linker and C-terminal domain that possibly play a role in cofactor incorporation, folding or protein activity. One of these proteins, AoCO4 from Aspergillus oryzae, was produced in T. reesei with a production level of about 1.5 g/l. The enzyme AoCO4 was correctly folded and bound the copper cofactors with a type-3 copper centre. However, the enzyme had only a low level of activity with the phenolic substrates tested. Highest activity was obtained with 4-tert-butylcatechol. Since tyrosine was not a substrate for AoCO4, the enzyme was classified as catechol oxidase. Secondly, the genome analysis for secreted proteins with sequence features typical of flavin-dependent sulfhydryl oxidases pinpointed two previously uncharacterised proteins AoSOX1 and AoSOX2 from A. oryzae. These two novel sulfhydryl oxidases were produced in T. reesei with production levels of 70 and 180 mg/l, respectively, in shake flask cultivations. AoSOX1 and AoSOX2 were FAD-dependent enzymes with a dimeric tertiary structure and they both showed activity on small sulfhydryl compounds such as glutathione and dithiothreitol, and were drastically inhibited by zinc sulphate. AoSOX2 showed good stabil-ity to thermal and chemical denaturation, being superior to AoSOX1 in this respect. Thirdly, the suitability of AoSOX1 as a possible baking improver was elucidated. The effect of AoSOX1, alone and in combi-nation with the widely used improver ascorbic acid was tested on yeasted wheat dough, both fresh and frozen, and on fresh water-flour dough. In all cases, AoSOX1 had no effect on the fermentation properties of fresh yeasted dough. AoSOX1 nega-tively affected the fermentation properties of frozen doughs and accelerated the damaging effects of the frozen storage, i.e. giving a softer dough with poorer gas retention abilities than the control. In combination with ascorbic acid, AoSOX1 gave harder doughs. In accordance, rheological studies in yeast-free dough showed that the presence of only AoSOX1 resulted in weaker and more extensible dough whereas a dough with opposite properties was obtained if ascorbic acid was also used. Doughs containing ascorbic acid and increasing amounts of AoSOX1 were harder in a dose-dependent manner. Sulfhydryl oxidase AoSOX1 had an enhancing effect on the dough hardening mechanism of ascorbic acid. This was ascribed mainly to the produc-tion of hydrogen peroxide in the SOX reaction which is able to convert the ascorbic acid to the actual improver dehydroascorbic acid. In addition, AoSOX1 could possibly oxidise the free glutathione in the dough and thus prevent the loss of dough strength caused by the spontaneous reduction of the disulfide bonds constituting the dough protein network. Sulfhydryl oxidase AoSOX1 is therefore able to enhance the action of ascorbic acid in wheat dough and could potentially be applied in wheat dough baking.
Resumo:
The management and coordination of business-process collaboration experiences changes because of globalization, specialization, and innovation. Service-oriented computing (SOC) is a means towards businessprocess automation and recently, many industry standards emerged to become part of the service-oriented architecture (SOA) stack. In a globalized world, organizations face new challenges for setting up and carrying out collaborations in semi-automating ecosystems for business services. For being efficient and effective, many companies express their services electronically in what we term business-process as a service (BPaaS). Companies then source BPaaS on the fly from third parties if they are not able to create all service-value inhouse because of reasons such as lack of reasoures, lack of know-how, cost- and time-reduction needs. Thus, a need emerges for BPaaS-HUBs that not only store service offers and requests together with information about their issuing organizations and assigned owners, but that also allow an evaluation of trust and reputation in an anonymized electronic service marketplace. In this paper, we analyze the requirements, design architecture and system behavior of such a BPaaS-HUB to enable a fast setup and enactment of business-process collaboration. Moving into a cloud-computing setting, the results of this paper allow system designers to quickly evaluate which services they need for instantiationg the BPaaS-HUB architecture. Furthermore, the results also show what the protocol of a backbone service bus is that allows a communication between services that implement the BPaaS-HUB. Finally, the paper analyzes where an instantiation must assign additional computing resources vor the avoidance of performance bottlenecks.
Resumo:
Bayesian networks are compact, flexible, and interpretable representations of a joint distribution. When the network structure is unknown but there are observational data at hand, one can try to learn the network structure. This is called structure discovery. This thesis contributes to two areas of structure discovery in Bayesian networks: space--time tradeoffs and learning ancestor relations. The fastest exact algorithms for structure discovery in Bayesian networks are based on dynamic programming and use excessive amounts of space. Motivated by the space usage, several schemes for trading space against time are presented. These schemes are presented in a general setting for a class of computational problems called permutation problems; structure discovery in Bayesian networks is seen as a challenging variant of the permutation problems. The main contribution in the area of the space--time tradeoffs is the partial order approach, in which the standard dynamic programming algorithm is extended to run over partial orders. In particular, a certain family of partial orders called parallel bucket orders is considered. A partial order scheme that provably yields an optimal space--time tradeoff within parallel bucket orders is presented. Also practical issues concerning parallel bucket orders are discussed. Learning ancestor relations, that is, directed paths between nodes, is motivated by the need for robust summaries of the network structures when there are unobserved nodes at work. Ancestor relations are nonmodular features and hence learning them is more difficult than modular features. A dynamic programming algorithm is presented for computing posterior probabilities of ancestor relations exactly. Empirical tests suggest that ancestor relations can be learned from observational data almost as accurately as arcs even in the presence of unobserved nodes.
Resumo:
Service discovery is vital in ubiquitous applications, where a large number of devices and software components collaborate unobtrusively and provide numerous services without user intervention. Existing service discovery schemes use a service matching process in order to offer services of interest to the users. Potentially, the context information of the users and surrounding environment can be used to improve the quality of service matching. To make use of context information in service matching, a service discovery technique needs to address certain challenges. Firstly, it is required that the context information shall have unambiguous representation. Secondly, the devices in the environment shall be able to disseminate high level and low level context information seamlessly in the different networks. And thirdly, dynamic nature of the context information be taken into account. We propose a C-IOB(Context-Information, Observation and Belief) based service discovery model which deals with the above challenges by processing the context information and by formulating the beliefs based on the observations. With these formulated beliefs the required services will be provided to the users. The method has been tested with a typical ubiquitous museum guide application over different cases. The simulation results are time efficient and quite encouraging.
Resumo:
An understanding of application I/O access patterns is useful in several situations. First, gaining insight into what applications are doing with their data at a semantic level helps in designing efficient storage systems. Second, it helps create benchmarks that mimic realistic application behavior closely. Third, it enables autonomic systems as the information obtained can be used to adapt the system in a closed loop.All these use cases require the ability to extract the application-level semantics of I/O operations. Methods such as modifying application code to associate I/O operations with semantic tags are intrusive. It is well known that network file system traces are an important source of information that can be obtained non-intrusively and analyzed either online or offline. These traces are a sequence of primitive file system operations and their parameters. Simple counting, statistical analysis or deterministic search techniques are inadequate for discovering application-level semantics in the general case, because of the inherent variation and noise in realistic traces.In this paper, we describe a trace analysis methodology based on Profile Hidden Markov Models. We show that the methodology has powerful discriminatory capabilities that enable it to recognize applications based on the patterns in the traces, and to mark out regions in a long trace that encapsulate sets of primitive operations that represent higher-level application actions. It is robust enough that it can work around discrepancies between training and target traces such as in length and interleaving with other operations. We demonstrate the feasibility of recognizing patterns based on a small sampling of the trace, enabling faster trace analysis. Preliminary experiments show that the method is capable of learning accurate profile models on live traces in an online setting. We present a detailed evaluation of this methodology in a UNIX environment using NFS traces of selected commonly used applications such as compilations as well as on industrial strength benchmarks such as TPC-C and Postmark, and discuss its capabilities and limitations in the context of the use cases mentioned above.
Resumo:
Segmental dynamic time warping (DTW) has been demonstrated to be a useful technique for finding acoustic similarity scores between segments of two speech utterances. Due to its high computational requirements, it had to be computed in an offline manner, limiting the applications of the technique. In this paper, we present results of parallelization of this task by distributing the workload in either a static or dynamic way on an 8-processor cluster and discuss the trade-offs among different distribution schemes. We show that online unsupervised pattern discovery using segmental DTW is plausible with as low as 8 processors. This brings the task within reach of today's general purpose multi-core servers. We also show results on a 32-processor system, and discuss factors affecting scalability of our methods.
Resumo:
It is being realized that the traditional closed-door and market driven approaches for drug discovery may not be the best suited model for the diseases of the developing world such as tuberculosis and malaria, because most patients suffering from these diseases have poor paying capacity. To ensure that new drugs are created for patients suffering from these diseases, it is necessary to formulate an alternate paradigm of drug discovery process. The current model constrained by limitations for collaboration and for sharing of resources with confidentiality hampers the opportunities for bringing expertise from diverse fields. These limitations hinder the possibilities of lowering the cost of drug discovery. The Open Source Drug Discovery project initiated by Council of Scientific and Industrial Research, India has adopted an open source model to power wide participation across geographical borders. Open Source Drug Discovery emphasizes integrative science through collaboration, open-sharing, taking up multi-faceted approaches and accruing benefits from advances on different fronts of new drug discovery. Because the open source model is based on community participation, it has the potential to self-sustain continuous development by generating a storehouse of alternatives towards continued pursuit for new drug discovery. Since the inventions are community generated, the new chemical entities developed by Open Source Drug Discovery will be taken up for clinical trial in a non-exclusive manner by participation of multiple companies with majority funding from Open Source Drug Discovery. This will ensure availability of drugs through a lower cost community driven drug discovery process for diseases afflicting people with poor paying capacity. Hopefully what LINUX the World Wide Web have done for the information technology, Open Source Drug Discovery will do for drug discovery. (C) 2011 Elsevier Ltd. All rights reserved.
Resumo:
Frequent episode discovery framework is a popular framework in temporal data mining with many applications. Over the years, many different notions of frequencies of episodes have been proposed along with different algorithms for episode discovery. In this paper, we present a unified view of all the apriori-based discoverymethods for serial episodes under these different notions of frequencies. Specifically, we present a unified view of the various frequency counting algorithms. We propose a generic counting algorithm such that all current algorithms are special cases of it. This unified view allows one to gain insights into different frequencies, and we present quantitative relationships among different frequencies.Our unified view also helps in obtaining correctness proofs for various counting algorithms as we show here. It also aids in understanding and obtaining the anti-monotonicity properties satisfied by the various frequencies, the properties exploited by the candidate generation step of any apriori-based method. We also point out how our unified view of counting helps to consider generalization of the algorithm to count episodes with general partial orders.