895 resultados para Anchoring heuristic
Resumo:
Topic detection and tracking (TDT) is an area of information retrieval research the focus of which revolves around news events. The problems TDT deals with relate to segmenting news text into cohesive stories, detecting something new, previously unreported, tracking the development of a previously reported event, and grouping together news that discuss the same event. The performance of the traditional information retrieval techniques based on full-text similarity has remained inadequate for online production systems. It has been difficult to make the distinction between same and similar events. In this work, we explore ways of representing and comparing news documents in order to detect new events and track their development. First, however, we put forward a conceptual analysis of the notions of topic and event. The purpose is to clarify the terminology and align it with the process of news-making and the tradition of story-telling. Second, we present a framework for document similarity that is based on semantic classes, i.e., groups of words with similar meaning. We adopt people, organizations, and locations as semantic classes in addition to general terms. As each semantic class can be assigned its own similarity measure, document similarity can make use of ontologies, e.g., geographical taxonomies. The documents are compared class-wise, and the outcome is a weighted combination of class-wise similarities. Third, we incorporate temporal information into document similarity. We formalize the natural language temporal expressions occurring in the text, and use them to anchor the rest of the terms onto the time-line. Upon comparing documents for event-based similarity, we look not only at matching terms, but also how near their anchors are on the time-line. Fourth, we experiment with an adaptive variant of the semantic class similarity system. The news reflect changes in the real world, and in order to keep up, the system has to change its behavior based on the contents of the news stream. We put forward two strategies for rebuilding the topic representations and report experiment results. We run experiments with three annotated TDT corpora. The use of semantic classes increased the effectiveness of topic tracking by 10-30\% depending on the experimental setup. The gain in spotting new events remained lower, around 3-4\%. The anchoring the text to a time-line based on the temporal expressions gave a further 10\% increase the effectiveness of topic tracking. The gains in detecting new events, again, remained smaller. The adaptive systems did not improve the tracking results.
Resumo:
Analyzing statistical dependencies is a fundamental problem in all empirical science. Dependencies help us understand causes and effects, create new scientific theories, and invent cures to problems. Nowadays, large amounts of data is available, but efficient computational tools for analyzing the data are missing. In this research, we develop efficient algorithms for a commonly occurring search problem - searching for the statistically most significant dependency rules in binary data. We consider dependency rules of the form X->A or X->not A, where X is a set of positive-valued attributes and A is a single attribute. Such rules describe which factors either increase or decrease the probability of the consequent A. A classical example are genetic and environmental factors, which can either cause or prevent a disease. The emphasis in this research is that the discovered dependencies should be genuine - i.e. they should also hold in future data. This is an important distinction from the traditional association rules, which - in spite of their name and a similar appearance to dependency rules - do not necessarily represent statistical dependencies at all or represent only spurious connections, which occur by chance. Therefore, the principal objective is to search for the rules with statistical significance measures. Another important objective is to search for only non-redundant rules, which express the real causes of dependence, without any occasional extra factors. The extra factors do not add any new information on the dependence, but can only blur it and make it less accurate in future data. The problem is computationally very demanding, because the number of all possible rules increases exponentially with the number of attributes. In addition, neither the statistical dependency nor the statistical significance are monotonic properties, which means that the traditional pruning techniques do not work. As a solution, we first derive the mathematical basis for pruning the search space with any well-behaving statistical significance measures. The mathematical theory is complemented by a new algorithmic invention, which enables an efficient search without any heuristic restrictions. The resulting algorithm can be used to search for both positive and negative dependencies with any commonly used statistical measures, like Fisher's exact test, the chi-squared measure, mutual information, and z scores. According to our experiments, the algorithm is well-scalable, especially with Fisher's exact test. It can easily handle even the densest data sets with 10000-20000 attributes. Still, the results are globally optimal, which is a remarkable improvement over the existing solutions. In practice, this means that the user does not have to worry whether the dependencies hold in future data or if the data still contains better, but undiscovered dependencies.
Resumo:
This study presents a comprehensive mathematical formulation model for a short-term open-pit mine block sequencing problem, which considers nearly all relevant technical aspects in open-pit mining. The proposed model aims to obtain the optimum extraction sequences of the original-size (smallest) blocks over short time intervals and in the presence of real-life constraints, including precedence relationship, machine capacity, grade requirements, processing demands and stockpile management. A hybrid branch-and-bound and simulated annealing algorithm is developed to solve the problem. Computational experiments show that the proposed methodology is a promising way to provide quantitative recommendations for mine planning and scheduling engineers.
Resumo:
This paper proposes a new multi-stage mine production timetabling (MMPT) model to optimise open-pit mine production operations including drilling, blasting and excavating under real-time mining constraints. The MMPT problem is formulated as a mixed integer programming model and can be optimally solved for small-size MMPT instances by IBM ILOG-CPLEX. Due to NP-hardness, an improved shifting-bottleneck-procedure algorithm based on the extended disjunctive graph is developed to solve large-size MMPT instances in an effective and efficient way. Extensive computational experiments are presented to validate the proposed algorithm that is able to efficiently obtain the near-optimal operational timetable of mining equipment units. The advantages are indicated by sensitivity analysis under various real-life scenarios. The proposed MMPT methodology is promising to be implemented as a tool for mining industry because it is straightforwardly modelled as a standard scheduling model, efficiently solved by the heuristic algorithm, and flexibly expanded by adopting additional industrial constraints.
Resumo:
Incursions of plant pests and diseases pose serious threats to food security, agricultural productivity and the natural environment. One of the challenges in confidently delimiting and eradicating incursions is how to choose from an arsenal of surveillance and quarantine approaches in order to best control multiple dispersal pathways. Anthropogenic spread (propagules carried on humans or transported on produce or equipment) can be controlled with quarantine measures, which in turn can vary in intensity. In contrast, environmental spread processes are more difficult to control, but often have a temporal signal (e.g. seasonality) which can introduce both challenges and opportunities for surveillance and control. This leads to complex decisions regarding when, where and how to search. Recent modelling investigations of surveillance performance have optimised the output of simulation models, and found that a risk-weighted randomised search can perform close to optimally. However, exactly how quarantine and surveillance strategies should change to reflect different dispersal modes remains largely unaddressed. Here we develop a spatial simulation model of a plant fungal-pathogen incursion into an agricultural region, and its subsequent surveillance and control. We include structural differences in dispersal via the interplay of biological, environmental and anthropogenic connectivity between host sites (farms). Our objective was to gain broad insights into the relative roles played by different spread modes in propagating an invasion, and how incorporating knowledge of these spread risks may improve approaches to quarantine restrictions and surveillance. We find that broad heuristic rules for quarantine restrictions fail to contain the pathogen due to residual connectivity between sites, but surveillance measures enable early detection and successfully lead to suppression of the pathogen in all farms. Alternative surveillance strategies attain similar levels of performance by incorporating environmental or anthropogenic dispersal risk in the prioritisation of sites. Our model provides the basis to develop essential insights into the effectiveness of different surveillance and quarantine decisions for fungal pathogen control. Parameterised for authentic settings it will aid our understanding of how the extent and resolution of interventions should suitably reflect the spatial structure of dispersal processes.
Resumo:
The work covered in this thesis is focused on the development of technology for bioconversion of glucose into D-erythorbic acid (D-EA) and 5-ketogluconic acid (5-KGA). The task was to show on proof-of-concept level the functionality of the enzymatic conversion or one-step bioconversion of glucose to these acids. The feasibility of both studies to be further developed for production processes was also evaluated. The glucose - D-EA bioconversion study was based on the use of a cloned gene encoding a D-EA forming soluble flavoprotein, D-gluconolactone oxidase (GLO). GLO was purified from Penicillium cyaneo-fulvum and partially sequenced. The peptide sequences obtained were used to isolate a cDNA clone encoding the enzyme. The cloned gene (GenBank accession no. AY576053) is homologous to the other known eukaryotic lactone oxidases and also to some putative prokaryotic lactone oxidases. Analysis of the deduced protein sequence of GLO indicated the presence of a typical secretion signal sequence at the N-terminus of the enzyme. No other targeting/anchoring signals were found, suggesting that GLO is the first known lactone oxidase that is secreted rather than targeted to the membranes of the endoplasmic reticulum or mitochondria. Experimental evidence supports this analysis, as near complete secretion of GLO was observed in two different yeast expression systems. Highest expression levels of GLO were obtained using Pichia pastoris as an expression host. Recombinant GLO was characterised and the suitability of purified GLO for the production of D-EA was studied. Immobilised GLO was found to be rapidly inactivated during D-EA production. The feasibility of in vivo glucose - D-EA conversion using a P. pastoris strain co-expressing the genes of GLO and glucose oxidase (GOD, E.C. 1.1.3.4) of A. niger was demonstrated. The glucose - 5-KGA bioconversion study followed a similar strategy to that used in the D-EA production research. The rationale was based on the use of a cloned gene encoding a membrane-bound pyrroloquinoline quinone (PQQ)-dependent gluconate 5-dehydrogenase (GA 5-DH). GA 5-DH was purified to homogeneity from the only source of this enzyme known in literature, Gluconobacter suboxydans, and partially sequenced. Using the amino acid sequence information, the GA 5-DH gene was cloned from a genomic library of G. suboxydans. The cloned gene was sequenced (GenBank accession no. AJ577472) and found to be an operon of two adjacent genes encoding two subunits of GA 5-DH. It turned out that GA 5-DH is a rather close homologue of a sorbitol dehydrogenase from another G. suboxydans strain. It was also found that GA 5-DH has significant polyol dehydrogenase activity. The G. suboxydans GA 5-DH gene was poorly expressed in E. coli. Under optimised conditions maximum expression levels of GA 5-DH did not exceed the levels found in wild-type G. suboxydans. Attempts to increase expression levels resulted in repression of growth and extensive cell lysis. However, the expression levels were sufficient to demonstrate the possibility of bioconversion of glucose and gluconate into 5-KGA using recombinant strains of E. coli. An uncharacterised homologue of GA 5-DH was identified in Xanthomonas campestris using in silico screening. This enzyme encoded by chromosomal locus NP_636946 was found by a sequencing project of X. campestris and named as a hypothetical glucose dehydrogenase. The gene encoding this uncharacterised enzyme was cloned, expressed in E. coli and found to encode a gluconate/polyol dehydrogenase without glucose dehydrogenase activity. Moreover, the X. campestris GA 5-DH gene was expressed in E. coli at nearly 30 times higher levels than the G. suboxydans GA 5-DH gene. Good expressability of the X. campestris GA-5DH gene makes it a valuable tool not only for 5-KGA production in the tartaric acid (TA) bioprocess, but possibly also for other bioprocesses (e.g. oxidation of sorbitol into L-sorbose). In addition to glucose - 5-KGA bioconversion, a preliminary study of the feasibility of enzymatic conversion of 5-KGA into TA was carried out. Here, the efficacy of the first step of a prospective two-step conversion route including a transketolase and a dehydrogenase was confirmed. It was found that transketolase convert 5-KGA into TA semialdehyde. A candidate for the second step was suggested to be succinic dehydrogenase, but this was not tested. The analysis of the two subprojects indicated that bioconversion of glucose to TA using X. campestris GA 5-DH should be prioritised first and the process development efforts in future should be focused on development of more efficient GA 5-DH production strains by screening a more suitable production host and by protein engineering.
Resumo:
The Golgi complex is a central organelle of the secretory pathway, responsible for a range of post-translational modifications, as well as for membrane traffic to the plasma membrane and to the endosomal-lysosomal pathway. In addition, this organelle has roles in cell migration, in the regulation of traffic, and as a mitotic check point. The structure of the Golgi complex is highly dynamic and able to respond to the amount of cargo being transported and the stage of the cell cycle. The Golgi proteome reflects the functions and structure of this organelle, and can be divided into three major groups: the Golgi resident proteins (e.g. modification enzymes), the Golgi matrix proteins (involved in structure and tethering events), and trafficking proteins (e.g. vesicle coat proteins and Rabs). The Golgi proteome has been studied on several occasions, from both rat liver and mammary gland Golgi membranes using proteomic approaches, but still little more than half of the estimated Golgi proteome is known. Nevertheless, methodological improvements and introduction of shotgun proteomics have increased the number of identified proteins, and especially the number of identified transmembrane proteins. Cartilage, even though not a typical tissue in which to study membrane traffic, secretes large amounts of extracellular matrix proteins that are extensively modified, especially by amino acid hydroxylation, glycosylation and sulfation. Furthermore, the cartilage ECM contains several, large oligomeric proteins (such as collagen II) that are difficult to assemble and transport. Indeed, cartilage has been shown to be susceptible to changes both in secretory pathway (e.g. the COPII coat assembly) and in post-translational modifications (e.g. heparan sulfate formation). Dental follicle, and the periodontal ligament (PDL) that it forms, are another type of connective tissue, and they have a role in anchoring teeth to bone. This anchorage is achieved by numerous matrix fibres that connect the bone matrix with the cementum. These tissues have in common the secretion of large matrix molecules. In this study the Golgi proteome was analysed from purified, stacked Golgi membranes isolated from rat liver. The identified, extensive proteome included a protein similar to Ab2-095, or Golgi protein 49kDa (GoPro49), which was shown to localise to the Golgi complex as an EGFP fusion protein. Surprisingly, in situ hybridisation showed the GoPro49 expression to be highly restricted to different mesenchymal tissues, especially in cartilage, and this expression pattern was clearly developmentally regulated. In addition to cartilage, GoPro49 was also expressed in the dental follicle, but was not observed in the mature PDL. Importantly, GoPro49 is the first specific marker for the dental follicle. Endogenous GoPro49 protein co-localised with β-COP in both chondrosarcoma and primary dental follicle cell lines. The COPI staining in these cells was highly dynamic, showing a number of tubules. This may reflect the type of secretory cargo they secrete. Currently GoPro49 is the only Golgi protein with such a restricted expression pattern.
Resumo:
We consider a modification of the three-dimensional Navier-Stokes equations and other hydrodynamical evolution equations with space-periodic initial conditions in which the usual Laplacian of the dissipation operator is replaced by an operator whose Fourier symbol grows exponentially as e(vertical bar k vertical bar/kd) at high wavenumbers vertical bar k vertical bar. Using estimates in suitable classes of analytic functions, we show that the solutions with initially finite energy become immediately entire in the space variables and that the Fourier coefficients decay faster than e-(C(k/kd) ln(vertical bar k vertical bar/kd)) for any C < 1/(2 ln 2). The same result holds for the one-dimensional Burgers equation with exponential dissipation but can be improved: heuristic arguments and very precise simulations, analyzed by the method of asymptotic extrapolation of van der Hoeven, indicate that the leading-order asymptotics is precisely of the above form with C = C-* = 1/ ln 2. The same behavior with a universal constant C-* is conjectured for the Navier-Stokes equations with exponential dissipation in any space dimension. This universality prevents the strong growth of intermittency in the far dissipation range which is obtained for ordinary Navier-Stokes turbulence. Possible applications to improved spectral simulations are briefly discussed.
Resumo:
We study the performance of greedy scheduling in multihop wireless networks where the objective is aggregate utility maximization. Following standard approaches, we consider the dual of the original optimization problem. Optimal scheduling requires selecting independent sets of maximum aggregate price, but this problem is known to be NP-hard. We propose and evaluate a simple greedy heuristic. Analytical bounds on performance are provided and simulations indicate that the greedy heuristic performs well in practice.
Resumo:
We present a generic theory for the dynamics of a stiff filament under tension, in an active medium with orientational correlations, such as a microtubule in contractile actin. In sharp contrast to the case of a passive medium, we find the filament can stiffen, and possibly oscillate or buckle, depending on both the contractile or tensile nature of the activity and the filament-medium anchoring interaction. We also demonstrate a strong violation of the fluctuation-dissipation (FD) relation in the effective dynamics of the filament, including a negative FD ratio. Our approach is also of relevance to the dynamics of axons, and our model equations bear a remarkable formal similarity to those in recent work [Martin P, Hudspeth AJ, Juelicher F (2001) Proc Natl Acad Sci USA 98: 14380-14385] on auditory hair cells. Detailed tests of our predictions can be made by using a single filament in actomyosin extracts or bacterial suspensions.
Resumo:
Bacterial persistent infections are responsible for a significant amount of the human morbidity and mortality. Unlike acute bacterial infections, it is very difficult to treat persistent bacterial infections (e.g. tuberculosis). Knowledge about the location of pathogenic bacteria during persistent infection will help to treat such conditions by designing novel drugs which can reach such locations. In this study, events of bacterial persistent infections were analyzed using game theory. A game was defined where the pathogen and the host are the two players with a conflict of interest. Criteria for the establishment of Nash equilibrium were calculated for this game. This theoretical model, which is very simple and heuristic, predicts that during persistent infections pathogenic bacteria stay in both intracellular and extracellular compartments of the host. The result of this study implies that a bacterium should be able to survive in both intracellular and extracellular compartments of the host in order to cause persistent infections. This explains why persistent infections are more often caused by intracellular pathogens like Mycobacterium and Salmonella. Moreover, this prediction is in consistence with the results of previous experimental studies.
Resumo:
We extend the modeling heuristic of (Harsha et al. 2006. In IEEE IWQoS 06, pp 178 - 187) to evaluate the performance of an IEEE 802.11e infrastructure network carrying packet telephone calls, streaming video sessions and TCP controlled file downloads, using Enhanced Distributed Channel Access (EDCA). We identify the time boundaries of activities on the channel (called channel slot boundaries) and derive a Markov Renewal Process of the contending nodes on these epochs. This is achieved by the use of attempt probabilities of the contending nodes as those obtained from the saturation fixed point analysis of (Ramaiyan et al. 2005. In Proceedings ACM Sigmetrics, `05. Journal version accepted for publication in IEEE TON). Regenerative analysis on this MRP yields the desired steady state performance measures. We then use the MRP model to develop an effective bandwidth approach for obtaining a bound on the size of the buffer required at the video queue of the AP, such that the streaming video packet loss probability is kept to less than 1%. The results obtained match well with simulations using the network simulator, ns-2. We find that, with the default IEEE 802.11e EDCA parameters for access categories AC 1, AC 2 and AC 3, the voice call capacity decreases if even one streaming video session and one TCP file download are initiated by some wireless station. Subsequently, reducing the voice calls increases the video downlink stream throughput by 0.38 Mbps and file download capacity by 0.14 Mbps, for every voice call (for the 11 Mbps PHY). We find that a buffer size of 75KB is sufficient to ensure that the video packet loss probability at the QAP is within 1%.
Resumo:
A direct borohydride-hydrogen peroxide fuel cell employing carbon-supported Prussian Blue (PB) as mediated electron-transfer cathode catalyst is reported. While operating at 30 °C, the direct borohydride-hydrogen peroxide fuel cell employing carbon-supported PB cathode catalyst shows superior performance with the maximum output power density of 68 mW cm−2 at an operating voltage of 1.1 V compared to direct borohydride-hydrogen peroxide fuel cell employing the conventional gold-based cathode with the maximum output power density of 47 mW cm−2 at an operating voltage of 0.7 V. X-ray diffraction (XRD), Scanning Electron Microscopy (SEM), and Energy Dispersive X-ray Analysis (EDAX) suggest that anchoring of Cetyl-Trimethyl Ammonium Bromide (CTAB) as a surfactant moiety on carbon-supported PB affects the catalyst morphology. Polarization studies on direct borohydride-hydrogen peroxide fuel cell with carbon-supported CTAB-anchored PB cathode exhibit better performance with the maximum output power density of 50 mW cm−2 at an operating voltage of 1 V than the direct borohydride-hydrogen peroxide fuel cell with carbon-supported Prussian Blue without CTAB with the maximum output power density of 29 mW cm−2 at an operating voltage of 1 V.
Resumo:
Experimental characterization of high dimensional dynamic systems sometimes uses the proper orthogonal decomposition (POD). If there are many measurement locations and relatively fewer sensors, then steady-state behavior can still be studied by sequentially taking several sets of simultaneous measurements. The number required of such sets of measurements can be minimized if we solve a combinatorial optimization problem. We aim to bring this problem to the attention of engineering audiences, summarize some known mathematical results about this problem, and present a heuristic (suboptimal) calculation that gives reasonable, if not stellar, results.
Resumo:
Virtual Machine (VM) management is an obvious need in today's data centers for various management activities and is accomplished in two phases— finding an optimal VM placement plan and implementing that placement through live VM migrations. These phases result in two research problems— VM placement problem (VMPP) and VM migration scheduling problem (VMMSP). This research proposes and develops several evolutionary algorithms and heuristic algorithms to address the VMPP and VMMSP. Experimental results show the effectiveness and scalability of the proposed algorithms. Finally, a VM management framework has been proposed and developed to automate the VM management activity in cost-efficient way.