39 resultados para tree similarity measure
em Helda - Digital Repository of University of Helsinki
Resumo:
Topic detection and tracking (TDT) is an area of information retrieval research the focus of which revolves around news events. The problems TDT deals with relate to segmenting news text into cohesive stories, detecting something new, previously unreported, tracking the development of a previously reported event, and grouping together news that discuss the same event. The performance of the traditional information retrieval techniques based on full-text similarity has remained inadequate for online production systems. It has been difficult to make the distinction between same and similar events. In this work, we explore ways of representing and comparing news documents in order to detect new events and track their development. First, however, we put forward a conceptual analysis of the notions of topic and event. The purpose is to clarify the terminology and align it with the process of news-making and the tradition of story-telling. Second, we present a framework for document similarity that is based on semantic classes, i.e., groups of words with similar meaning. We adopt people, organizations, and locations as semantic classes in addition to general terms. As each semantic class can be assigned its own similarity measure, document similarity can make use of ontologies, e.g., geographical taxonomies. The documents are compared class-wise, and the outcome is a weighted combination of class-wise similarities. Third, we incorporate temporal information into document similarity. We formalize the natural language temporal expressions occurring in the text, and use them to anchor the rest of the terms onto the time-line. Upon comparing documents for event-based similarity, we look not only at matching terms, but also how near their anchors are on the time-line. Fourth, we experiment with an adaptive variant of the semantic class similarity system. The news reflect changes in the real world, and in order to keep up, the system has to change its behavior based on the contents of the news stream. We put forward two strategies for rebuilding the topic representations and report experiment results. We run experiments with three annotated TDT corpora. The use of semantic classes increased the effectiveness of topic tracking by 10-30\% depending on the experimental setup. The gain in spotting new events remained lower, around 3-4\%. The anchoring the text to a time-line based on the temporal expressions gave a further 10\% increase the effectiveness of topic tracking. The gains in detecting new events, again, remained smaller. The adaptive systems did not improve the tracking results.
Design and testing of stand-specific bucking instructions for use on modern cut-to-length harvesters
Resumo:
This study addresses three important issues in tree bucking optimization in the context of cut-to-length harvesting. (1) Would the fit between the log demand and log output distributions be better if the price and/or demand matrices controlling the bucking decisions on modern cut-to-length harvesters were adjusted to the unique conditions of each individual stand? (2) In what ways can we generate stand and product specific price and demand matrices? (3) What alternatives do we have to measure the fit between the log demand and log output distributions, and what would be an ideal goodness-of-fit measure? Three iterative search systems were developed for seeking stand-specific price and demand matrix sets: (1) A fuzzy logic control system for calibrating the price matrix of one log product for one stand at a time (the stand-level one-product approach); (2) a genetic algorithm system for adjusting the price matrices of one log product in parallel for several stands (the forest-level one-product approach); and (3) a genetic algorithm system for dividing the overall demand matrix of each of the several log products into stand-specific sub-demands simultaneously for several stands and products (the forest-level multi-product approach). The stem material used for testing the performance of the stand-specific price and demand matrices against that of the reference matrices was comprised of 9 155 Norway spruce (Picea abies (L.) Karst.) sawlog stems gathered by harvesters from 15 mature spruce-dominated stands in southern Finland. The reference price and demand matrices were either direct copies or slightly modified versions of those used by two Finnish sawmilling companies. Two types of stand-specific bucking matrices were compiled for each log product. One was from the harvester-collected stem profiles and the other was from the pre-harvest inventory data. Four goodness-of-fit measures were analyzed for their appropriateness in determining the similarity between the log demand and log output distributions: (1) the apportionment degree (index), (2) the chi-square statistic, (3) Laspeyres quantity index, and (4) the price-weighted apportionment degree. The study confirmed that any improvement in the fit between the log demand and log output distributions can only be realized at the expense of log volumes produced. Stand-level pre-control of price matrices was found to be advantageous, provided the control is done with perfect stem data. Forest-level pre-control of price matrices resulted in no improvement in the cumulative apportionment degree. Cutting stands under the control of stand-specific demand matrices yielded a better total fit between the demand and output matrices at the forest level than was obtained by cutting each stand with non-stand-specific reference matrices. The theoretical and experimental analyses suggest that none of the three alternative goodness-of-fit measures clearly outperforms the traditional apportionment degree measure. Keywords: harvesting, tree bucking optimization, simulation, fuzzy control, genetic algorithms, goodness-of-fit
Resumo:
A straightforward computation of the list of the words (the `tail words' of the list) that are distributionally most similar to a given word (the `head word' of the list) leads to the question: How semantically similar to the head word are the tail words; that is: how similar are their meanings to its meaning? And can we do better? The experiment was done on nearly 18,000 most frequent nouns in a Finnish newsgroup corpus. These nouns are considered to be distributionally similar to the extent that they occur in the same direct dependency relations with the same nouns, adjectives and verbs. The extent of the similarity of their computational representations is quantified with the information radius. The semantic classification of head-tail pairs is intuitive; some tail words seem to be semantically similar to the head word, some do not. Each such pair is also associated with a number of further distributional variables. Individually, their overlap for the semantic classes is large, but the trained classification-tree models have some success in using combinations to predict the semantic class. The training data consists of a random sample of 400 head-tail pairs with the tail word ranked among the 20 distributionally most similar to the head word, excluding names. The models are then tested on a random sample of another 100 such pairs. The best success rates range from 70% to 92% of the test pairs, where a success means that the model predicted my intuitive semantic class of the pair. This seems somewhat promising when distributional similarity is used to capture semantically similar words. This analysis also includes a general discussion of several different similarity formulas, arranged in three groups: those that apply to sets with graded membership, those that apply to the members of a vector space, and those that apply to probability mass functions.
Resumo:
The study of social phenomena in the World Wide Web has been rather fragmentary, andthere is no coherent, reseach-based theory about sense of community in Web environment. Sense of community means part of one's self-concept that has to do with perceiving oneself belonging to, and feeling affinity to a certain social grouping. The present study aimed to find evidence for sense of community in Web environment, and specifically find out what the most critical psychological factors of sense of community would be. Based on known characteristics of real life communities and sense of community, and few occational studies of Web-communities, it was hypothesized that the following factors would be the most critical ones and that they could be grouped as prerequisites, facilitators and consequences of sense of community: awareness and social presence (prerequisites), criteria for membership and borders, common purpose, social interaction and reciprocity, norms and conformity, common history (facilitators), trust and accountability (consequences). In addition to critical factors, the present study aimed to find out if this kind of grouping would be valid. Furthermore, the effect of Web-community members' background variables to sense of community was of interest. In order to answer the questions, an online-questionnaire was created and tested. It included propositions that reflect factors that precede, facilitate and follow the sense of community in Web environment. A factor analysis was calculated to find out the critical factors and analyses of variance were calculated to see if the grouping to prerequisites, facilitators and consequences was right and how the background variables would affect the sense of community in Web environment. The results indicated that the psychological structure of sense of community in Web environment could not be presented with critical variables grouped as prerequisites, facilitators and consequences. Most factors did facilitate the sense of community, but based on this data it could not be argued that some of the factors chronologically precedesense of community and some follow it. Instead, the factor analysis revealed that the most critical factors in sense of community in Web environment are 1) reciprocal involvement, 2) basic trust for others, 3) similarity and common purpose of members, and 4) shared history of members. The most influencing background variables were the member's own participation activity (indicated with reading and writing messages) and the phase in membership lifecycle (from visitor to leader). The more the member participated and the further in membership life cycle he was, the more he felt sense of community. There are many descreptions of sense of community, but the present study was one of the first to actually measure the phenomenon in Web environment, and that gained well documented, valid results based on large data, proving that sense of community in Web environment is possible, and clarifying its psychological structure, thus enhancing the understanding of sense of community in Web environment. Keywords: sense of community, Web-community, psychology of the Internet
Resumo:
The issue of the usefulness of different prosopis species versus their status as weeds is a matter of hot debate around the world. The tree Prosopis juliflora had until 2000 been proclaimed weedy in its native range in South America and elsewhere in the dry tropics. P. juliflora or mesquite has a 90-year history in Sudan. During the early 1990s a popular opinion in central Sudan and the Sudanese Government had begun to consider prosopis a noxious weed and a problematic tree species due to its aggressive ability to invade farmlands and pastures, especially in and around irrigated agricultural lands. As a consequence prosopis was officially declared an invasive alien species also in Sudan, and in 1995 a presidential decree for its eradication was issued. Using a total economic valuation (TEV) approach, this study analysed the impacts of prosopis on the local livelihoods in two contrasting irrigated agricultural schemes. Primarily a problem-based approach was used in which the derivation of non-market values was captured using ecological economic tools. In the New Halfa Irrigation Scheme in Kassala State, four separate household surveys were conducted due to diversity between the respective population groups. The main aim was here to study the magnitude of environmental economic benefits and costs derived from the invasion of prosopis in a large agricultural irrigation scheme on clay soil. Another study site, the Gandato Irrigation Scheme in River Nile State represented impacts from prosopis that an irrigation scheme was confronted with on sandy soil in the arid and semi-arid ecozones along the main River Nile. The two cases showed distinctly different effects of prosopis but both indicated the benefits to exceed the costs. The valuation on clay soil in New Halfa identified a benefit/cost ratio of 2.1, while this indicator equalled 46 on the sandy soils of Gandato. The valuation results were site-specific and based on local market prices. The most important beneficial impacts of prosopis on local livelihoods were derived from free-grazing forage for livestock, environmental conservation of the native vegetation, wood and non-wood forest products, as well as shelterbelt effects. The main social costs from prosopis were derived from weeding and clearing it from farm lands and from canalsides, from thorn injuries to humans and livestock, as well as from repair expenses vehicle tyre punctures. Of the population groups, the tenants faced most of the detrimental impacts, while the landless population groups (originating from western and eastern Sudan) as well as the nomads were highly dependent on this tree resource. For the Gandato site the monetized benefit-cost ratio of 46 still excluded several additional beneficial impacts of prosopis in the area that were difficult to quantify and monetize credibly. In River Nile State the beneficial impact could thus be seen as completely outweighing the costs of prosopis. The results can contributed to the formulation of national and local forest and agricultural policies related to prosopis in Sudan and also be used in other countries faced with similar impacts caused by this tree.
Variation in tracheid cross-sectional dimensions and wood viscoelasticity extent and control methods
Resumo:
Printing papers have been the main product of the Finnish paper industry. To improve properties and economy of printing papers, controlling of tracheid cross-sectional dimensions and wood viscoelasticity are examined in this study. Controlling is understood as any procedure which yields raw material classes with distinct properties and small internal variation. Tracheid cross-sectional dimensions, i.e., cell wall thickness and radial and tangential diameters can be controlled with methods such as sorting wood into pulpwood and sawmill chips, sorting of logs according to tree social status and fractionation of fibres. These control methods were analysed in this study with simulations, which were based on measured tracheid cross-sectional dimensions. A SilviScan device was used to measure the data set from five Norway spruce (Picea abies) and five Scots pine (Pinus sylvestris) trunks. The simulation results indicate that the sawmill chips and top pulpwood assortments have quite similar cross-sectional dimensions. Norway spruce and Scots pine are on average also relatively similar in their cross-sectional dimensions. The distributions of these species are somewhat different, but from a practical point of view, the differences are probably of minor importance. The controlling of tracheid cross-sectional dimensions can be done most efficiently with methods that can separate fibres into earlywood and latewood. Sorting of logs or partitioning of logs into juvenile and mature wood were markedly less efficient control methods than fractionation of fibres. Wood viscoelasticity affects energy consumption in mechanical pulping, and is thus an interesting control target when improving energy efficiency of the process. A literature study was made to evaluate the possibility of using viscoelasticity in controlling. The study indicates that there is considerable variation in viscoelastic properties within tree species, but unfortunately, the viscoelastic properties of important raw material lots such as top pulpwood or sawmill chips are not known. Viscoelastic properties of wood depend mainly on lignin, but also on microfibrillar angle, width of cellulose crystals and tracheid cross-sectional dimensions.
Resumo:
There is an increasing need to compare the results obtained with different methods of estimation of tree biomass in order to reduce the uncertainty in the assessment of forest biomass carbon. In this study, tree biomass was investigated in a 30-year-old Scots pine (Pinus sylvestris) (Young-Stand) and a 130-year-old mixed Norway spruce (Picea abies)-Scots pine stand (Mature-Stand) located in southern Finland (61º50' N, 24º22' E). In particular, a comparison of the results of different estimation methods was conducted to assess the reliability and suitability of their applications. For the trees in Mature-Stand, annual stem biomass increment fluctuated following a sigmoid equation, and the fitting curves reached a maximum level (from about 1 kg/yr for understorey spruce to 7 kg/yr for dominant pine) when the trees were 100 years old. Tree biomass was estimated to be about 70 Mg/ha in Young-Stand and about 220 Mg/ha in Mature-Stand. In the region (58.00-62.13 ºN, 14-34 ºE, ≤ 300 m a.s.l.) surrounding the study stands, the tree biomass accumulation in Norway spruce and Scots pine stands followed a sigmoid equation with stand age, with a maximum of 230 Mg/ha at the age of 140 years. In Mature-Stand, lichen biomass on the trees was 1.63 Mg/ha with more than half of the biomass occurring on dead branches, and the standing crop of litter lichen on the ground was about 0.09 Mg/ha. There were substantial differences among the results estimated by different methods in the stands. These results imply that a possible estimation error should be taken into account when calculating tree biomass in a stand with an indirect approach.
Resumo:
Here I aimed at quantifying the main components of deadwood dynamics, i.e. tree mortality, deadwood pools, and their decomposition, in late-successional boreal forests. I focused on standing dead trees in three stand types dominated by Picea mariana and Abies balsamea in eastern Canada, and on standing and down dead trees in Picea abies-dominated stands in three areas in Northern Europe. Dead and living trees were measured on five sample plots of 1.6-ha size in each study area and stand type. Stem disks from dead trees were sampled to determine wood density and year of death, using dendrochronological methods. The results were applied to reconstruct past tree mortality and to model deadwood decay class dynamics. Site productivity, stand developmental stage, and the occurrence of episodic tree mortality influenced deadwood volume and quality. In all study areas tree mortality was continuous, leading to continuity in deadwood decay stage distribution. Episodic tree mortality due to either autogenic or allogenic causes influenced deadwood volume and quality in all but one study area. However, regardless of productivity and disturbance history deadwood was abundant, accounting for 20 53% of total wood volume in European study areas, and 15 27% of total standing volume in eastern Canada. Deadwood was a persistent structural component, since its expected residence time in early- and midstages of decay was 18 yr even in the area with the most rapid decomposition. The results indicated that in the absence of episodic tree mortality, stands may eventually develop to a steady state, in which deadwood volume fluctuates around an equilibrium state. However, in many forests deadwood is naturally variable, due to recurrent moderate-severity disturbances. This variability, the continuous tree mortality, and variation in rates of wood decomposition determine the dynamics and availability of deadwood as a habitat and carbon storage medium in boreal coniferous forest ecosystems.
Resumo:
The aim of this study was to explore soil microbial activities related to C and N cycling and the occurrence and concentrations of two important groups of plant secondary compounds, terpenes and phenolic compounds, under silver birch (Betula pendula Roth), Norway spruce (Picea abies (L.) Karst) and Scots pine (Pinus sylvestris L.) as well as to study the effects of volatile monoterpenes and tannins on soil microbial activities. The study site, located in Kivalo, northern Finland, included ca. 70-year-old adjacent stands dominated by silver birch, Norway spruce and Scots pine. Originally the soil was very probably similar in all three stands. All forest floor layers (litter (L), fermentation layer (F) and humified layer (H)) under birch and spruce showed higher rates of CO2 production, greater net mineralisation of nitrogen and higher amounts of carbon and nitrogen in microbial biomass than did the forest floor layers under pine. Concentrations of mono-, sesqui-, di- and triterpenes were higher under both conifers than under birch, while the concentration of total water-soluble phenolic compounds as well as the concentration of condensed tannins tended to be higher or at least as high under spruce as under birch or pine. In general, differences between tree species in soil microbial activities and in concentrations of secondary compounds were smaller in the H layer than in the upper layers. The rate of CO2 production and the amount of carbon in the microbial biomass correlated highly positively with the concentration of total water-soluble phenolic compounds and positively with the concentration of condensed tannins. Exposure of soil to volatile monoterpenes and tannins extracted and fractionated from spruce and pine needles affected carbon and nitrogen transformations in soil, but the effects were dependent on the compound and its molecular structure. Monoterpenes decreased net mineralisation of nitrogen and probably had a toxic effect on part of the microbial population in soil, while another part of the microbes seemed to be able to use monoterpenes as a carbon source. With tannins, low-molecular-weight compounds (also compounds other than tannins) increased soil CO2 production and nitrogen immobilisation by soil microbes while the higher-molecular-weight condensed tannins had inhibitory effects. In conclusion, plant secondary compounds may have a great potential in regulation of C and N transformations in forest soils, but the real magnitude of their significance in soil processes is impossible to estimate.