23 resultados para EFFICIENT SIMULATION

em Helda - Digital Repository of University of Helsinki


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Gene mapping is a systematic search for genes that affect observable characteristics of an organism. In this thesis we offer computational tools to improve the efficiency of (disease) gene-mapping efforts. In the first part of the thesis we propose an efficient simulation procedure for generating realistic genetical data from isolated populations. Simulated data is useful for evaluating hypothesised gene-mapping study designs and computational analysis tools. As an example of such evaluation, we demonstrate how a population-based study design can be a powerful alternative to traditional family-based designs in association-based gene-mapping projects. In the second part of the thesis we consider a prioritisation of a (typically large) set of putative disease-associated genes acquired from an initial gene-mapping analysis. Prioritisation is necessary to be able to focus on the most promising candidates. We show how to harness the current biomedical knowledge for the prioritisation task by integrating various publicly available biological databases into a weighted biological graph. We then demonstrate how to find and evaluate connections between entities, such as genes and diseases, from this unified schema by graph mining techniques. Finally, in the last part of the thesis, we define the concept of reliable subgraph and the corresponding subgraph extraction problem. Reliable subgraphs concisely describe strong and independent connections between two given vertices in a random graph, and hence they are especially useful for visualising such connections. We propose novel algorithms for extracting reliable subgraphs from large random graphs. The efficiency and scalability of the proposed graph mining methods are backed by extensive experiments on real data. While our application focus is in genetics, the concepts and algorithms can be applied to other domains as well. We demonstrate this generality by considering coauthor graphs in addition to biological graphs in the experiments.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An important challenge in forest industry is to get the appropriate raw material out from the forests to the wood processing industry. Growth and stem reconstruction simulators are therefore increasingly integrated in industrial conversion simulators, for linking the properties of wooden products to the three-dimensional structure of stems and their growing conditions. Static simulators predict the wood properties from stem dimensions at the end of a growth simulation period, whereas in dynamic approaches, the structural components, e.g. branches, are incremented along with the growth processes. The dynamic approach can be applied to stem reconstruction by predicting the three-dimensional stem structure from external tree variables (i.e. age, height) as a result of growth to the current state. In this study, a dynamic growth simulator, PipeQual, and a stem reconstruction simulator, RetroSTEM, are adapted to Norway spruce (Picea abies [L.] Karst.) to predict the three-dimensional structure of stems (tapers, branchiness, wood basic density) over time such that both simulators can be integrated in a sawing simulator. The parameterisation of the PipeQual and RetroSTEM simulators for Norway spruce relied on the theoretically based description of tree structure developing in the growth process and following certain conservative structural regularities while allowing for plasticity in the crown development. The crown expressed both regularity and plasticity in its development, as the vertical foliage density peaked regularly at about 5 m from the stem apex, varying below that with tree age and dominance position (Study I). Conservative stem structure was characterized in terms of (1) the pipe ratios between foliage mass and branch and stem cross-sectional areas at crown base, (2) the allometric relationship between foliage mass and crown length, (3) mean branch length relative to crown length and (4) form coefficients in branches and stem (Study II). The pipe ratio between branch and stem cross-sectional area at crown base, and mean branch length relative to the crown length may differ in trees before and after canopy closure, but the variation should be further analysed in stands of different ages and densities with varying site fertilities and climates. The predictions of the PipeQual and RetroSTEM simulators were evaluated by comparing the simulated values to measured ones (Study III, IV). Both simulators predicted stem taper and branch diameter at the individual tree level with a small bias. RetroSTEM predictions of wood density were accurate. For focusing on even more accurate predictions of stem diameters and branchiness along the stem, both simulators should be further improved by revising the following aspects in the simulators: the relationship between foliage and stem sapwood area in the upper stem, the error source in branch sizes, the crown base development and the height growth models in RetroSTEM. In Study V, the RetroSTEM simulator was integrated in the InnoSIM sawing simulator, and according to the pilot simulations, this turned out to be an efficient tool for readily producing stand scale information about stem sizes and structure when approximating the available assortments of wood products.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Printing papers have been the main product of the Finnish paper industry. To improve properties and economy of printing papers, controlling of tracheid cross-sectional dimensions and wood viscoelasticity are examined in this study. Controlling is understood as any procedure which yields raw material classes with distinct properties and small internal variation. Tracheid cross-sectional dimensions, i.e., cell wall thickness and radial and tangential diameters can be controlled with methods such as sorting wood into pulpwood and sawmill chips, sorting of logs according to tree social status and fractionation of fibres. These control methods were analysed in this study with simulations, which were based on measured tracheid cross-sectional dimensions. A SilviScan device was used to measure the data set from five Norway spruce (Picea abies) and five Scots pine (Pinus sylvestris) trunks. The simulation results indicate that the sawmill chips and top pulpwood assortments have quite similar cross-sectional dimensions. Norway spruce and Scots pine are on average also relatively similar in their cross-sectional dimensions. The distributions of these species are somewhat different, but from a practical point of view, the differences are probably of minor importance. The controlling of tracheid cross-sectional dimensions can be done most efficiently with methods that can separate fibres into earlywood and latewood. Sorting of logs or partitioning of logs into juvenile and mature wood were markedly less efficient control methods than fractionation of fibres. Wood viscoelasticity affects energy consumption in mechanical pulping, and is thus an interesting control target when improving energy efficiency of the process. A literature study was made to evaluate the possibility of using viscoelasticity in controlling. The study indicates that there is considerable variation in viscoelastic properties within tree species, but unfortunately, the viscoelastic properties of important raw material lots such as top pulpwood or sawmill chips are not known. Viscoelastic properties of wood depend mainly on lignin, but also on microfibrillar angle, width of cellulose crystals and tracheid cross-sectional dimensions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Forest management is facing new challenges under climate change. By adjusting thinning regimes, conventional forest management can be adapted to various objectives of utilization of forest resources, such as wood quality, forest bioenergy, and carbon sequestration. This thesis aims to develop and apply a simulation-optimization system as a tool for an interdisciplinary understanding of the interactions between wood science, forest ecology, and forest economics. In this thesis, the OptiFor software was developed for forest resources management. The OptiFor simulation-optimization system integrated the process-based growth model PipeQual, wood quality models, biomass production and carbon emission models, as well as energy wood and commercial logging models into a single optimization model. Osyczka s direct and random search algorithm was employed to identify optimal values for a set of decision variables. The numerical studies in this thesis broadened our current knowledge and understanding of the relationships between wood science, forest ecology, and forest economics. The results for timber production show that optimal thinning regimes depend on site quality and initial stand characteristics. Taking wood properties into account, our results show that increasing the intensity of thinning resulted in lower wood density and shorter fibers. The addition of nutrients accelerated volume growth, but lowered wood quality for Norway spruce. Integrating energy wood harvesting into conventional forest management showed that conventional forest management without energy wood harvesting was still superior in sparse stands of Scots pine. Energy wood from pre-commercial thinning turned out to be optimal for dense stands. When carbon balance is taken into account, our results show that changing carbon assessment methods leads to very different optimal thinning regimes and average carbon stocks. Raising the carbon price resulted in longer rotations and a higher mean annual increment, as well as a significantly higher average carbon stock over the rotation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Event-based systems are seen as good candidates for supporting distributed applications in dynamic and ubiquitous environments because they support decoupled and asynchronous many-to-many information dissemination. Event systems are widely used, because asynchronous messaging provides a flexible alternative to RPC (Remote Procedure Call). They are typically implemented using an overlay network of routers. A content-based router forwards event messages based on filters that are installed by subscribers and other routers. The filters are organized into a routing table in order to forward incoming events to proper subscribers and neighbouring routers. This thesis addresses the optimization of content-based routing tables organized using the covering relation and presents novel data structures and configurations for improving local and distributed operation. Data structures are needed for organizing filters into a routing table that supports efficient matching and runtime operation. We present novel results on dynamic filter merging and the integration of filter merging with content-based routing tables. In addition, the thesis examines the cost of client mobility using different protocols and routing topologies. We also present a new matching technique called temporal subspace matching. The technique combines two new features. The first feature, temporal operation, supports notifications, or content profiles, that persist in time. The second feature, subspace matching, allows more expressive semantics, because notifications may contain intervals and be defined as subspaces of the content space. We also present an application of temporal subspace matching pertaining to metadata-based continuous collection and object tracking.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Minimum Description Length (MDL) principle is a general, well-founded theoretical formalization of statistical modeling. The most important notion of MDL is the stochastic complexity, which can be interpreted as the shortest description length of a given sample of data relative to a model class. The exact definition of the stochastic complexity has gone through several evolutionary steps. The latest instantation is based on the so-called Normalized Maximum Likelihood (NML) distribution which has been shown to possess several important theoretical properties. However, the applications of this modern version of the MDL have been quite rare because of computational complexity problems, i.e., for discrete data, the definition of NML involves an exponential sum, and in the case of continuous data, a multi-dimensional integral usually infeasible to evaluate or even approximate accurately. In this doctoral dissertation, we present mathematical techniques for computing NML efficiently for some model families involving discrete data. We also show how these techniques can be used to apply MDL in two practical applications: histogram density estimation and clustering of multi-dimensional data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Analyzing statistical dependencies is a fundamental problem in all empirical science. Dependencies help us understand causes and effects, create new scientific theories, and invent cures to problems. Nowadays, large amounts of data is available, but efficient computational tools for analyzing the data are missing. In this research, we develop efficient algorithms for a commonly occurring search problem - searching for the statistically most significant dependency rules in binary data. We consider dependency rules of the form X->A or X->not A, where X is a set of positive-valued attributes and A is a single attribute. Such rules describe which factors either increase or decrease the probability of the consequent A. A classical example are genetic and environmental factors, which can either cause or prevent a disease. The emphasis in this research is that the discovered dependencies should be genuine - i.e. they should also hold in future data. This is an important distinction from the traditional association rules, which - in spite of their name and a similar appearance to dependency rules - do not necessarily represent statistical dependencies at all or represent only spurious connections, which occur by chance. Therefore, the principal objective is to search for the rules with statistical significance measures. Another important objective is to search for only non-redundant rules, which express the real causes of dependence, without any occasional extra factors. The extra factors do not add any new information on the dependence, but can only blur it and make it less accurate in future data. The problem is computationally very demanding, because the number of all possible rules increases exponentially with the number of attributes. In addition, neither the statistical dependency nor the statistical significance are monotonic properties, which means that the traditional pruning techniques do not work. As a solution, we first derive the mathematical basis for pruning the search space with any well-behaving statistical significance measures. The mathematical theory is complemented by a new algorithmic invention, which enables an efficient search without any heuristic restrictions. The resulting algorithm can be used to search for both positive and negative dependencies with any commonly used statistical measures, like Fisher's exact test, the chi-squared measure, mutual information, and z scores. According to our experiments, the algorithm is well-scalable, especially with Fisher's exact test. It can easily handle even the densest data sets with 10000-20000 attributes. Still, the results are globally optimal, which is a remarkable improvement over the existing solutions. In practice, this means that the user does not have to worry whether the dependencies hold in future data or if the data still contains better, but undiscovered dependencies.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In visual object detection and recognition, classifiers have two interesting characteristics: accuracy and speed. Accuracy depends on the complexity of the image features and classifier decision surfaces. Speed depends on the hardware and the computational effort required to use the features and decision surfaces. When attempts to increase accuracy lead to increases in complexity and effort, it is necessary to ask how much are we willing to pay for increased accuracy. For example, if increased computational effort implies quickly diminishing returns in accuracy, then those designing inexpensive surveillance applications cannot aim for maximum accuracy at any cost. It becomes necessary to find trade-offs between accuracy and effort. We study efficient classification of images depicting real-world objects and scenes. Classification is efficient when a classifier can be controlled so that the desired trade-off between accuracy and effort (speed) is achieved and unnecessary computations are avoided on a per input basis. A framework is proposed for understanding and modeling efficient classification of images. Classification is modeled as a tree-like process. In designing the framework, it is important to recognize what is essential and to avoid structures that are narrow in applicability. Earlier frameworks are lacking in this regard. The overall contribution is two-fold. First, the framework is presented, subjected to experiments, and shown to be satisfactory. Second, certain unconventional approaches are experimented with. This allows the separation of the essential from the conventional. To determine if the framework is satisfactory, three categories of questions are identified: trade-off optimization, classifier tree organization, and rules for delegation and confidence modeling. Questions and problems related to each category are addressed and empirical results are presented. For example, related to trade-off optimization, we address the problem of computational bottlenecks that limit the range of trade-offs. We also ask if accuracy versus effort trade-offs can be controlled after training. For another example, regarding classifier tree organization, we first consider the task of organizing a tree in a problem-specific manner. We then ask if problem-specific organization is necessary.