905 resultados para Tree Similarity
Resumo:
XML similarity evaluation has become a central issue in the database and information communities, its applications ranging over document clustering, version control, data integration and ranked retrieval. Various algorithms for comparing hierarchically structured data, XML documents in particular, have been proposed in the literature. Most of them make use of techniques for finding the edit distance between tree structures, XML documents being commonly modeled as Ordered Labeled Trees. Yet, a thorough investigation of current approaches led us to identify several similarity aspects, i.e., sub-tree related structural and semantic similarities, which are not sufficiently addressed while comparing XML documents. In this paper, we provide an integrated and fine-grained comparison framework to deal with both structural and semantic similarities in XML documents (detecting the occurrences and repetitions of structurally and semantically similar sub-trees), and to allow the end-user to adjust the comparison process according to her requirements. Our framework consists of four main modules for (i) discovering the structural commonalities between sub-trees, (ii) identifying sub-tree semantic resemblances, (iii) computing tree-based edit operations costs, and (iv) computing tree edit distance. Experimental results demonstrate higher comparison accuracy with respect to alternative methods, while timing experiments reflect the impact of semantic similarity on overall system performance.
Resumo:
The rainforest of Mexico has been degraded and severely fragmented, and urgently require restoration. However, the practice of restoration has been limited by the lack of species-specific data on survival and growth responses to local environmental variation. This study explores the differential performance of 14 wet tropical early-, mid- or late-successional tree species that were grown in two abandoned pastures with contrasting land-use histories. After 18 months, seedling survival and growth of at least 7 of the 14 tree species studied were significantly higher in the site with a much longer history of land use (site 2). Saplings of the three early-successional species showed exceptional growth rates. However, differences in performance were noted in relation to the differential soil properties between the experimental sites. Mid-successional species generally showed slow growth rates but high seedling survival, whereas late-successional species exhibited poor seedling survival at both the study sites. Stepwise linear regressions revealed that the species integrated response index combining survivorship and growth measurements, was influenced mostly by differences in soil pH between the two abandoned pastures. Our results suggest that local environmental variation among abandoned pastures of contrasting land-use histories influences sapling survival and growth. Furthermore, the similarity of responses among species with the same successional status allowed us to make some preliminary site and species-specific silvicultural recommendations. Future field experiments should extend the number of species and the range of environmental conditions to identify site generalists or more narrowly adapted species, that we would call sensitive.
Resumo:
Machine learning comprises a series of techniques for automatic extraction of meaningful information from large collections of noisy data. In many real world applications, data is naturally represented in structured form. Since traditional methods in machine learning deal with vectorial information, they require an a priori form of preprocessing. Among all the learning techniques for dealing with structured data, kernel methods are recognized to have a strong theoretical background and to be effective approaches. They do not require an explicit vectorial representation of the data in terms of features, but rely on a measure of similarity between any pair of objects of a domain, the kernel function. Designing fast and good kernel functions is a challenging problem. In the case of tree structured data two issues become relevant: kernel for trees should not be sparse and should be fast to compute. The sparsity problem arises when, given a dataset and a kernel function, most structures of the dataset are completely dissimilar to one another. In those cases the classifier has too few information for making correct predictions on unseen data. In fact, it tends to produce a discriminating function behaving as the nearest neighbour rule. Sparsity is likely to arise for some standard tree kernel functions, such as the subtree and subset tree kernel, when they are applied to datasets with node labels belonging to a large domain. A second drawback of using tree kernels is the time complexity required both in learning and classification phases. Such a complexity can sometimes prevents the kernel application in scenarios involving large amount of data. This thesis proposes three contributions for resolving the above issues of kernel for trees. A first contribution aims at creating kernel functions which adapt to the statistical properties of the dataset, thus reducing its sparsity with respect to traditional tree kernel functions. Specifically, we propose to encode the input trees by an algorithm able to project the data onto a lower dimensional space with the property that similar structures are mapped similarly. By building kernel functions on the lower dimensional representation, we are able to perform inexact matchings between different inputs in the original space. A second contribution is the proposal of a novel kernel function based on the convolution kernel framework. Convolution kernel measures the similarity of two objects in terms of the similarities of their subparts. Most convolution kernels are based on counting the number of shared substructures, partially discarding information about their position in the original structure. The kernel function we propose is, instead, especially focused on this aspect. A third contribution is devoted at reducing the computational burden related to the calculation of a kernel function between a tree and a forest of trees, which is a typical operation in the classification phase and, for some algorithms, also in the learning phase. We propose a general methodology applicable to convolution kernels. Moreover, we show an instantiation of our technique when kernels such as the subtree and subset tree kernels are employed. In those cases, Direct Acyclic Graphs can be used to compactly represent shared substructures in different trees, thus reducing the computational burden and storage requirements.
Resumo:
Digital signal processing (DSP) techniques for biological sequence analysis continue to grow in popularity due to the inherent digital nature of these sequences. DSP methods have demonstrated early success for detection of coding regions in a gene. Recently, these methods are being used to establish DNA gene similarity. We present the inter-coefficient difference (ICD) transformation, a novel extension of the discrete Fourier transformation, which can be applied to any DNA sequence. The ICD method is a mathematical, alignment-free DNA comparison method that generates a genetic signature for any DNA sequence that is used to generate relative measures of similarity among DNA sequences. We demonstrate our method on a set of insulin genes obtained from an evolutionarily wide range of species, and on a set of avian influenza viral sequences, which represents a set of highly similar sequences. We compare phylogenetic trees generated using our technique against trees generated using traditional alignment techniques for similarity and demonstrate that the ICD method produces a highly accurate tree without requiring an alignment prior to establishing sequence similarity.
Resumo:
The aim of this study was to analyse the effects of climatic factors (i.e. monthly mean temperature and total precipitation) on radial growth (earlywood width, latewood width, and total ringwidth) and on latewood stable carbon isotope composition in a pedunculate oak (Quercus robur L) stand in northeastern Hungary. Earlywood widths showed the weakest common variance and lack of statistically significant relationship to monthly precipitation and temperature. Latewood width showed the strongest common chronological signal. Correlation analysis with the monthly climate series pointed out the strongest positive/negative correlation with June precipitation for latewood width/stable carbon isotope ratio. These parameters shared the strongest climatic response also for seasonal scale since the highest correlation coefficients, 0.49 and -0.62 for latewood width and stable carbon isotope ratio, respectively, were obtained for both with a 10-month precipitation total (from previous November to current August of the growing season). A combined parameter, derived as difference between latewood width and stable carbon isotope indices showed improved statistical relationship compared to the hydroclimatic calibration target both for local and regional spatial scales. Spatial correlation analysis indicated that the hydroclimatic signal encoded in these moisture sensitive tree-ring parameters from Bakta Forest is expected to be representative for the northeastern Carpathians and for the large part of the Great Hungarian Plain. In addition, the hydroclimatic signal of latewood width chronology was compared to three independent records. Results showed that neither the strength nor the rank of the similarity of the local hydroclimate signals were stable throughout the past two centuries. Future palaeo(hydro)climatological efforts targeting the Carpathian(-Balkan) region are recommended to track carefully the spatial domains for which a given, local, proxy-derived hydroclimate reconstruction might provide useful information.
Resumo:
In this paper, we propose a novel high-dimensional index method, the BM+-tree, to support efficient processing of similarity search queries in high-dimensional spaces. The main idea of the proposed index is to improve data partitioning efficiency in a high-dimensional space by using a rotary binary hyperplane, which further partitions a subspace and can also take advantage of the twin node concept used in the M+-tree. Compared with the key dimension concept in the M+-tree, the binary hyperplane is more effective in data filtering. High space utilization is achieved by dynamically performing data reallocation between twin nodes. In addition, a post processing step is used after index building to ensure effective filtration. Experimental results using two types of real data sets illustrate a significantly improved filtering efficiency.
Resumo:
Music similarity query based on acoustic content is becoming important with the ever-increasing growth of the music information from emerging applications such as digital libraries and WWW. However, relative techniques are still in their infancy and much less than satisfactory. In this paper, we present a novel index structure, called Composite Feature tree, CF-tree, to facilitate efficient content-based music search adopting multiple musical features. Before constructing the tree structure, we use PCA to transform the extracted features into a new space sorted by the importance of acoustic features. The CF-tree is a balanced multi-way tree structure where each level represents the data space at different dimensionalities. The PCA transformed data and reduced dimensions in the upper levels can alleviate suffering from dimensionality curse. To accurately mimic human perception, an extension, named CF+-tree, is proposed, which further applies multivariable regression to determine the weight of each individual feature. We conduct extensive experiments to evaluate the proposed structures against state-of-art techniques. The experimental results demonstrate superiority of our technique.
Resumo:
This paper presents a novel approach to the computation of primitive geometrical structures, where no prior knowledge about the visual scene is available and a high level of noise is expected. We based our work on the grouping principles of proximity and similarity, of points and preliminary models. The former was realized using Minimum Spanning Trees (MST), on which we apply a stable alignment and goodness of fit criteria. As for the latter, we used spectral clustering of preliminary models. The algorithm can be generalized to various model fitting settings, without tuning of run parameters. Experiments demonstrate the significant improvement in the localization accuracy of models in plane, homography and motion segmentation examples. The efficiency of the algorithm is not dependent on fine tuning of run parameters like most others in the field.
Resumo:
The taxonomic position of a bacterium isolated from water samples from the Rio Negro, in Amazon, Brazil, was determined by using a polyphasic approach. The organism formed a distinct phyletic line in the Chromobacterium 16S rRNA gene tree and had chemotaxonomic and morphological properties consistent with its classification in this genus. It was found to be closely related to Chromobacterium vaccinii DSM 25150(T) (98.6 % 16S rRNA gene similarity) and shared 98.5 % 16S rRNA gene similarity with Chromobacterium piscinae LGM 3947(T). DNA-DNA relatedness studies showed that isolate CBMAI 310(T) belongs to distinct genomic species. The isolate was readily distinguished from the type strain of these species using a combination of phenotypic and chemotaxonomic properties. Thus, based on genotypic and phenotypic data, it is proposed that isolate CBMAI 310(T) (=DSM 26508(T)) be classified in the genus Chromobacterium as the type strain of a novel species, namely, Chromobacterium amazonense sp. nov.
Resumo:
Trees from tropical montane cloud forest (TMCF) display very dynamic patterns of water use. They are capable of downwards water transport towards the soil during leaf-wetting events, likely a consequence of foliar water uptake (FWU), as well as high rates of night-time transpiration (Enight) during drier nights. These two processes might represent important sources of water losses and gains to the plant, but little is known about the environmental factors controlling these water fluxes. We evaluated how contrasting atmospheric and soil water conditions control diurnal, nocturnal and seasonal dynamics of sap flow in Drimys brasiliensis (Miers), a common Neotropical cloud forest species. We monitored the seasonal variation of soil water content, micrometeorological conditions and sap flow of D. brasiliensis trees in the field during wet and dry seasons. We also conducted a greenhouse experiment exposing D. brasiliensis saplings under contrasting soil water conditions to deuterium-labelled fog water. We found that during the night D. brasiliensis possesses heightened stomatal sensitivity to soil drought and vapour pressure deficit, which reduces night-time water loss. Leaf-wetting events had a strong suppressive effect on tree transpiration (E). Foliar water uptake increased in magnitude with drier soil and during longer leaf-wetting events. The difference between diurnal and nocturnal stomatal behaviour in D. brasiliensis could be attributed to an optimization of carbon gain when leaves are dry, as well as minimization of nocturnal water loss. The leaf-wetting events on the other hand seem important to D. brasiliensis water balance, especially during soil droughts, both by suppressing tree transpiration (E) and as a small additional water supply through FWU. Our results suggest that decreases in leaf-wetting events in TMCF might increase D. brasiliensis water loss and decrease its water gains, which could compromise its ecophysiological performance and survival during dry periods.
Resumo:
Approximately 7.2% of the Atlantic rainforest remains in Brazil, with only 16% of this forest remaining in the State of Rio de Janeiro, all of it distributed in fragments. This forest fragmentation can produce biotic and abiotic differences between edges and the fragment interior. In this study, we compared the structure and richness of tree communities in three habitats - an anthropogenic edge (AE), a natural edge (NE) and the fragment interior (FI) - of a fragment of Atlantic forest in the State of Rio de Janeiro, Brazil (22°50'S and 42°28'W). One thousand and seventy-six trees with a diameter at breast height > 4.8 cm, belonging to 132 morphospecies and 39 families, were sampled in a total study area of 0.75 ha. NE had the greatest basal area and the trees in this habitat had the greatest diameter:height allometric coefficient, whereas AE had a lower richness and greater variation in the height of the first tree branch. Tree density, diameter, height and the proportion of standing dead trees did not differ among the habitats. There was marked heterogeneity among replicates within each habitat. These results indicate that the forest interior and the fragment edges (natural or anthropogenic) do not differ markedly considering the studied parameters. Other factors, such as the age from the edge, type of matrix and proximity of gaps, may play a more important role in plant community structure than the proximity from edges.
Resumo:
The aim of this work was to evaluate the floristic composition, richness, and diversity of the upper and lower strata of a stretch of mixed rain forest near the city of Itaberá, in southeastern Brazil. We also investigated the differences between this conservation area and other stretches of mixed rain forest in southern and southeastern Brazil, as well as other nearby forest formations, in terms of their floristic relationships. For our survey of the upper stratum (diameter at breast height [DBH] > 15 cm), we established 50 permanent plots of 10 × 20 m. Within each of those plots, we designated five, randomly located, 1 × 1 m subplots, in order to survey the lower stratum (total height > 30 cm and DBH < 15 cm). In the upper stratum, we sampled 1429 trees and shrubs, belonging to 134 species, 93 genera, and 47 families. In the lower stratum, we sampled 758 trees and shrubs, belonging to 93 species, 66 genera, and 39 families. In our floristic and phytosociological surveys, we recorded 177 species, belonging to 106 genera and 52 families. The Shannon Diversity Index was 4.12 and 3.5 for the upper and lower strata, respectively. Cluster analysis indicated that nearby forest formations had the strongest floristic influence on the study area, which was therefore distinct from other mixed rain forests in southern Brazil and in the Serra da Mantiqueira mountain range.
Resumo:
Mistletoe can have a major impact on the fitness of the host plant. If there is more than one species of mistletoe on the same host tree, the overall impact might be amplified. We report the occurrence of more than one species of mistletoe on the same host tree. Although it is not a rule in the field, to our knowledge, there have been no studies of this topic. In most cases, two species of mistletoe were recorded on the same host tree, although we recorded three species of mistletoe on one occasion. This demonstrates that different species of mistletoe can be compatible with the same host species. Therefore, compatibility (structural and physiological) might be an important factor for the occurrence of mistletoe. Recent studies have shown that if the mistletoe does not recognize the host species, the deposited seeds will germinate but the haustorium will not penetrate the host branch. This is probably the primary mechanism in the establishment of more than one species of mistletoe on the same host, which can trigger a cascade of harmful effects for the host species.
Resumo:
Considering the importance of water content for the conservation and storage of seeds, and the involvement of soluble carbohydrates and lipids for embryo development, a comparative study was carried out among the seeds of Inga vera (ingá), Eugenia uniflora (pitanga), both classified as recalcitrant, and Caesalpinia echinata (brazilwood) and Erythrina speciosa (mulungu), considered as orthodox seeds. Low concentrations of cyclitols (0.3-0.5%), raffinose family oligosaccharides (ca. 0.05%) and unsaturated fatty acids (0-19%) were found in the seeds of ingá and pitanga, while larger amounts of cyclitols (2-3%) and raffinose (4.6-13%) were found in brazilwood and mulungu, respectively. These results, in addition to higher proportions of unsaturated fatty acids (53-71%) in orthodox seeds, suggested that sugars and lipids played important role in water movement, protecting the embryo cell membranes against injuries during dehydration.
Resumo:
A análise das relações de similaridade florística entre comunidades geralmente conduz ao estabelecimento de padrões, condicionados por fatores diversos que determinam a ocorrência ou não das espécies em diferentes locais. Em busca de tais padrões, foram analisadas as relações de similaridade florística entre comunidades florestais localizadas na região do Planalto de Ibiúna, estado de São Paulo, Brasil. Incluíram-se na análise 21 fragmentos florestais e seis sítios em uma Reserva Florestal contínua, sendo que a composição florística e a estrutura da comunidade arbórea (DAP mínimo 5 cm) em cada local foram amostradas pelo método de quadrantes. Aplicaram-se dois métodos de análises multivariadas: 1) Análise de Correspondência Destendenciada (DCA), com base no índice de similaridade de Sørensen; e 2) Divisão Hierárquica Dicotômica (TWINSPAN). A similaridade florística foi mais elevada entre comunidades em estádios sucessionais semelhantes, especialmente se estivessem geograficamente próximas. Há um gradiente florístico associado à latitude, indicando tratar-se de uma região de transição entre biomas. Nos sítios situados na face norte da região de estudo estão presentes espécies que também ocorrem no cerradão e em floresta estacional semidecidual, enquanto nos sítios situados na face sul prevalecem espécies características da floresta ombrófila densa.