793 resultados para Tree Representation
Resumo:
Data coming out from various researches carried out over the last years in Italy on the problem of school dispersion in secondary school show that difficulty in studying mathematics is one of the most frequent reasons of discomfort reported by students. Nevertheless, it is definitely unrealistic to think we can do without such knowledge in today society: mathematics is largely taught in secondary school and it is not confined within technical-scientific courses only. It is reasonable to say that, although students may choose academic courses that are, apparently, far away from mathematics, all students will have to come to terms, sooner or later in their life, with this subject. Among the reasons of discomfort given by the study of mathematics, some mention the very nature of this subject and in particular the complex symbolic language through which it is expressed. In fact, mathematics is a multimodal system composed by oral and written verbal texts, symbol expressions, such as formulae and equations, figures and graphs. For this, the study of mathematics represents a real challenge to those who suffer from dyslexia: this is a constitutional condition limiting people performances in relation to the activities of reading and writing and, in particular, to the study of mathematical contents. Here the difficulties in working with verbal and symbolic codes entail, in turn, difficulties in the comprehension of texts from which to deduce operations that, once combined together, would lead to the problem final solution. Information technologies may support this learning disorder effectively. However, these tools have some implementation limits, restricting their use in the study of scientific subjects. Vocal synthesis word processors are currently used to compensate difficulties in reading within the area of classical studies, but they are not used within the area of mathematics. This is because the vocal synthesis (or we should say the screen reader supporting it) is not able to interpret all that is not textual, such as symbols, images and graphs. The DISMATH software, which is the subject of this project, would allow dyslexic users to read technical-scientific documents with the help of a vocal synthesis, to understand the spatial structure of formulae and matrixes, to write documents with a technical-scientific content in a format that is compatible with main scientific editors. The system uses LaTex, a text mathematic language, as mediation system. It is set up as LaTex editor, whose graphic interface, in line with main commercial products, offers some additional specific functions with the capability to support the needs of users who are not able to manage verbal and symbolic codes on their own. LaTex is translated in real time into a standard symbolic language and it is read by vocal synthesis in natural language, in order to increase, through the bimodal representation, the ability to process information. The understanding of the mathematic formula through its reading is made possible by the deconstruction of the formula itself and its “tree” representation, so allowing to identify the logical elements composing it. Users, even without knowing LaTex language, are able to write whatever scientific document they need: in fact the symbolic elements are recalled by proper menus and automatically translated by the software managing the correct syntax. The final aim of the project, therefore, is to implement an editor enabling dyslexic people (but not only them) to manage mathematic formulae effectively, through the integration of different software tools, so allowing a better teacher/learner interaction too.
Resumo:
Les courriels Spams (courriels indésirables ou pourriels) imposent des coûts annuels extrêmement lourds en termes de temps, d’espace de stockage et d’argent aux utilisateurs privés et aux entreprises. Afin de lutter efficacement contre le problème des spams, il ne suffit pas d’arrêter les messages de spam qui sont livrés à la boîte de réception de l’utilisateur. Il est obligatoire, soit d’essayer de trouver et de persécuter les spammeurs qui, généralement, se cachent derrière des réseaux complexes de dispositifs infectés, ou d’analyser le comportement des spammeurs afin de trouver des stratégies de défense appropriées. Cependant, une telle tâche est difficile en raison des techniques de camouflage, ce qui nécessite une analyse manuelle des spams corrélés pour trouver les spammeurs. Pour faciliter une telle analyse, qui doit être effectuée sur de grandes quantités des courriels non classés, nous proposons une méthodologie de regroupement catégorique, nommé CCTree, permettant de diviser un grand volume de spams en des campagnes, et ce, en se basant sur leur similarité structurale. Nous montrons l’efficacité et l’efficience de notre algorithme de clustering proposé par plusieurs expériences. Ensuite, une approche d’auto-apprentissage est proposée pour étiqueter les campagnes de spam en se basant sur le but des spammeur, par exemple, phishing. Les campagnes de spam marquées sont utilisées afin de former un classificateur, qui peut être appliqué dans la classification des nouveaux courriels de spam. En outre, les campagnes marquées, avec un ensemble de quatre autres critères de classement, sont ordonnées selon les priorités des enquêteurs. Finalement, une structure basée sur le semiring est proposée pour la représentation abstraite de CCTree. Le schéma abstrait de CCTree, nommé CCTree terme, est appliqué pour formaliser la parallélisation du CCTree. Grâce à un certain nombre d’analyses mathématiques et de résultats expérimentaux, nous montrons l’efficience et l’efficacité du cadre proposé.
Resumo:
In this paper, we develop a new family of graph kernels where the graph structure is probed by means of a discrete-time quantum walk. Given a pair of graphs, we let a quantum walk evolve on each graph and compute a density matrix with each walk. With the density matrices for the pair of graphs to hand, the kernel between the graphs is defined as the negative exponential of the quantum Jensen–Shannon divergence between their density matrices. In order to cope with large graph structures, we propose to construct a sparser version of the original graphs using the simplification method introduced in Qiu and Hancock (2007). To this end, we compute the minimum spanning tree over the commute time matrix of a graph. This spanning tree representation minimizes the number of edges of the original graph while preserving most of its structural information. The kernel between two graphs is then computed on their respective minimum spanning trees. We evaluate the performance of the proposed kernels on several standard graph datasets and we demonstrate their effectiveness and efficiency.
Resumo:
Finding an adequate paraphrase representation formalism is a challenging issue in Natural Language Processing. In this paper, we analyse the performance of Tree Edit Distance as a paraphrase representation baseline. Our experiments using Edit Distance Textual Entailment Suite show that, as Tree Edit Distance consists of a purely syntactic approach, paraphrase alternations not based on structural reorganizations do not find an adequate representation. They also show that there is much scope for better modelling of the way trees are aligned.
Resumo:
The generalized Gibbs sampler (GGS) is a recently developed Markov chain Monte Carlo (MCMC) technique that enables Gibbs-like sampling of state spaces that lack a convenient representation in terms of a fixed coordinate system. This paper describes a new sampler, called the tree sampler, which uses the GGS to sample from a state space consisting of phylogenetic trees. The tree sampler is useful for a wide range of phylogenetic applications, including Bayesian, maximum likelihood, and maximum parsimony methods. A fast new algorithm to search for a maximum parsimony phylogeny is presented, using the tree sampler in the context of simulated annealing. The mathematics underlying the algorithm is explained and its time complexity is analyzed. The method is tested on two large data sets consisting of 123 sequences and 500 sequences, respectively. The new algorithm is shown to compare very favorably in terms of speed and accuracy to the program DNAPARS from the PHYLIP package.
Resumo:
Topology optimization consists in finding the spatial distribution of a given total volume of material for the resulting structure to have some optimal property, for instance, maximization of structural stiffness or maximization of the fundamental eigenfrequency. In this paper a Genetic Algorithm (GA) employing a representation method based on trees is developed to generate initial feasible individuals that remain feasible upon crossover and mutation and as such do not require any repairing operator to ensure feasibility. Several application examples are studied involving the topology optimization of structures where the objective functions is the maximization of the stiffness and the maximization of the first and the second eigenfrequencies of a plate, all cases having a prescribed material volume constraint.
Resumo:
Machine learning comprises a series of techniques for automatic extraction of meaningful information from large collections of noisy data. In many real world applications, data is naturally represented in structured form. Since traditional methods in machine learning deal with vectorial information, they require an a priori form of preprocessing. Among all the learning techniques for dealing with structured data, kernel methods are recognized to have a strong theoretical background and to be effective approaches. They do not require an explicit vectorial representation of the data in terms of features, but rely on a measure of similarity between any pair of objects of a domain, the kernel function. Designing fast and good kernel functions is a challenging problem. In the case of tree structured data two issues become relevant: kernel for trees should not be sparse and should be fast to compute. The sparsity problem arises when, given a dataset and a kernel function, most structures of the dataset are completely dissimilar to one another. In those cases the classifier has too few information for making correct predictions on unseen data. In fact, it tends to produce a discriminating function behaving as the nearest neighbour rule. Sparsity is likely to arise for some standard tree kernel functions, such as the subtree and subset tree kernel, when they are applied to datasets with node labels belonging to a large domain. A second drawback of using tree kernels is the time complexity required both in learning and classification phases. Such a complexity can sometimes prevents the kernel application in scenarios involving large amount of data. This thesis proposes three contributions for resolving the above issues of kernel for trees. A first contribution aims at creating kernel functions which adapt to the statistical properties of the dataset, thus reducing its sparsity with respect to traditional tree kernel functions. Specifically, we propose to encode the input trees by an algorithm able to project the data onto a lower dimensional space with the property that similar structures are mapped similarly. By building kernel functions on the lower dimensional representation, we are able to perform inexact matchings between different inputs in the original space. A second contribution is the proposal of a novel kernel function based on the convolution kernel framework. Convolution kernel measures the similarity of two objects in terms of the similarities of their subparts. Most convolution kernels are based on counting the number of shared substructures, partially discarding information about their position in the original structure. The kernel function we propose is, instead, especially focused on this aspect. A third contribution is devoted at reducing the computational burden related to the calculation of a kernel function between a tree and a forest of trees, which is a typical operation in the classification phase and, for some algorithms, also in the learning phase. We propose a general methodology applicable to convolution kernels. Moreover, we show an instantiation of our technique when kernels such as the subtree and subset tree kernels are employed. In those cases, Direct Acyclic Graphs can be used to compactly represent shared substructures in different trees, thus reducing the computational burden and storage requirements.
Resumo:
Histological serial sections, three-dimensional reconstructions and morphometry served to study the postnatal development of V1 in tree shrews. The main objectives were to evaluate the expansion of V1, the implications of its growth on the occipital cortex and, vice versa, the effects of the expanding neocortex on the topography of V1. The future V1 was identified on postnatal day 1 by its granular layer IV, covering the superior surface of the occipital cortices including the poles. A subdivision of layer IV, distinctive for the binocular part, was evident in the central region. V1 expanded continuously with age into all directions succeeded by the maturation of layering. The monocular part was recognized from day 15 onward, after the binocular part had reached its medial border. In reference to the retinotopic map of V1, regions emerged in a coherent temporo-spatial sequence delineating the retinal topography in a central to peripheral gradient beginning with the visual streak representation. The growth of V1 was greatest until tree shrews open their eyes, culminated during adolescence, and completed after a subsequent decrease in the young adult. Simultaneous expansion of the neocortex induced a shifting of V1. Translation and elongation of V1 entailed that the occipital cortex covered the superior colliculi along with a downward rotation of the poles. The enlargement of the occipital part of the hemispheres was in addition associated with the formation of a small occipital horn in the lateral ventricles, indicating an incipient 'true' occipital lobe harbouring mainly cortices involved in visual functions.
Resumo:
External forcing and internal dynamics result in climate system variability ranging from sub-daily weather to multi-centennial trends and beyond1, 2. State-of-the-art palaeoclimatic methods routinely use hydroclimatic proxies to reconstruct temperature (for example, refs 3, 4), possibly blurring differences in the variability continuum of temperature and precipitation before the instrumental period. Here, we assess the spectral characteristics of temperature and precipitation fluctuations in observations, model simulations and proxy records across the globe. We find that whereas an ensemble of different general circulation models represents patterns captured in instrumental measurements, such as land–ocean contrasts and enhanced low-frequency tropical variability, the tree-ring-dominated proxy collection does not. The observed dominance of inter-annual precipitation fluctuations is not reflected in the annually resolved hydroclimatic proxy records. Likewise, temperature-sensitive proxies overestimate, on average, the ratio of low- to high-frequency variability. These spectral biases in the proxy records seem to propagate into multi-proxy climate reconstructions for which we observe an overestimation of low-frequency signals. Thus, a proper representation of the high- to low-frequency spectrum in proxy records is needed to reduce uncertainties in climate reconstruction efforts.
Resumo:
Aboveground tropical tree biomass and carbon storage estimates commonly ignore tree height (H). We estimate the effect of incorporating H on tropics-wide forest biomass estimates in 327 plots across four continents using 42 656 H and diameter measurements and harvested trees from 20 sites to answer the following questions: 1. What is the best H-model form and geographic unit to include in biomass models to minimise site-level uncertainty in estimates of destructive biomass? 2. To what extent does including H estimates derived in (1) reduce uncertainty in biomass estimates across all 327 plots? 3. What effect does accounting for H have on plot- and continental-scale forest biomass estimates? The mean relative error in biomass estimates of destructively harvested trees when including H (mean 0.06), was half that when excluding H (mean 0.13). Power- andWeibull-H models provided the greatest reduction in uncertainty, with regional Weibull-H models preferred because they reduce uncertainty in smaller-diameter classes (?40 cm D) that store about one-third of biomass per hectare in most forests. Propagating the relationships from destructively harvested tree biomass to each of the 327 plots from across the tropics shows that including H reduces errors from 41.8Mgha?1 (range 6.6 to 112.4) to 8.0Mgha?1 (?2.5 to 23.0). For all plots, aboveground live biomass was ?52.2 Mgha?1 (?82.0 to ?20.3 bootstrapped 95%CI), or 13%, lower when including H estimates, with the greatest relative reductions in estimated biomass in forests of the Brazilian Shield, east Africa, and Australia, and relatively little change in the Guiana Shield, central Africa and southeast Asia. Appreciably different stand structure was observed among regions across the tropical continents, with some storing significantly more biomass in small diameter stems, which affects selection of the best height models to reduce uncertainty and biomass reductions due to H. After accounting for variation in H, total biomass per hectare is greatest in Australia, the Guiana Shield, Asia, central and east Africa, and lowest in eastcentral Amazonia, W. Africa, W. Amazonia, and the Brazilian Shield (descending order). Thus, if tropical forests span 1668 million km2 and store 285 Pg C (estimate including H), then applying our regional relationships implies that carbon storage is overestimated by 35 PgC (31?39 bootstrapped 95%CI) if H is ignored, assuming that the sampled plots are an unbiased statistical representation of all tropical forest in terms of biomass and height factors. Our results show that tree H is an important allometric factor that needs to be included in future forest biomass estimates to reduce error in estimates of tropical carbon stocks and emissions due to deforestation.
Resumo:
The genes for the protein synthesis elongation factors Tu (EF-Tu) and G (EF-G) are the products of an ancient gene duplication, which appears to predate the divergence of all extant organismal lineages. Thus, it should be possible to root a universal phylogeny based on either protein using the second protein as an outgroup. This approach was originally taken independently with two separate gene duplication pairs, (i) the regulatory and catalytic subunits of the proton ATPases and (ii) the protein synthesis elongation factors EF-Tu and EF-G. Questions about the orthology of the ATPase genes have obscured the former results, and the elongation factor data have been criticized for inadequate taxonomic representation and alignment errors. We have expanded the latter analysis using a broad representation of taxa from all three domains of life. All phylogenetic methods used strongly place the root of the universal tree between two highly distinct groups, the archaeons/eukaryotes and the eubacteria. We also find that a combined data set of EF-Tu and EF-G sequences favors placement of the eukaryotes within the Archaea, as the sister group to the Crenarchaeota. This relationship is supported by bootstrap values of 60-89% with various distance and maximum likelihood methods, while unweighted parsimony gives 58% support for archaeal monophyly.
Resumo:
In this paper, we present syllable-based duration modelling in the context of a prosody model for Standard Yorùbá (SY) text-to-speech (TTS) synthesis applications. Our prosody model is conceptualised around a modular holistic framework. This framework is implemented using the Relational Tree (R-Tree) techniques. An important feature of our R-Tree framework is its flexibility in that it facilitates the independent implementation of the different dimensions of prosody, i.e. duration, intonation, and intensity, using different techniques and their subsequent integration. We applied the Fuzzy Decision Tree (FDT) technique to model the duration dimension. In order to evaluate the effectiveness of FDT in duration modelling, we have also developed a Classification And Regression Tree (CART) based duration model using the same speech data. Each of these models was integrated into our R-Tree based prosody model. We performed both quantitative (i.e. Root Mean Square Error (RMSE) and Correlation (Corr)) and qualitative (i.e. intelligibility and naturalness) evaluations on the two duration models. The results show that CART models the training data more accurately than FDT. The FDT model, however, shows a better ability to extrapolate from the training data since it achieved a better accuracy for the test data set. Our qualitative evaluation results show that our FDT model produces synthesised speech that is perceived to be more natural than our CART model. In addition, we also observed that the expressiveness of FDT is much better than that of CART. That is because the representation in FDT is not restricted to a set of piece-wise or discrete constant approximation. We, therefore, conclude that the FDT approach is a practical approach for duration modelling in SY TTS applications. © 2006 Elsevier Ltd. All rights reserved.
Resumo:
In south Florida, tropical hardwood forests (hammocks) occur in Everglades tree islands and as more extensive forests in coastal settings in the nearby Florida Keys. Keys hammocks have been less disturbed by humans, and many qualify as “old-growth,” while Everglades hammocks have received much heavier use. With improvement of tree island condition an important element in Everglades restoration efforts, we examined stand structure in 23 Keys hammocks and 69 Everglades tree islands. Based on Stand Density Index and tree diameter distributions, many Everglades hammocks were characterized by low stocking and under-representation in the smaller size classes. In contrast, most Keys forests had the dense canopies and open understories usually associated with old-growth hardwood hammocks. Subject to the same caveats that apply to off-site references elsewhere, structural information from mature Keys hammocks can be helpful in planning and implementing forest restoration in Everglades tree islands. In many of these islands, such restoration might involve supplementing tree stocking by planting native trees to produce more complete site utilization and a more open understory.
Resumo:
One way to achieve amplification of distal synaptic inputs on a dendritic tree is to scale the amplitude and/or duration of the synaptic conductance with its distance from the soma. This is an example of what is often referred to as “dendritic democracy”. Although well studied experimentally, to date this phenomenon has not been thoroughly explored from a mathematical perspective. In this paper we adopt a passive model of a dendritic tree with distributed excitatory synaptic conductances and analyze a number of key measures of democracy. In particular, via moment methods we derive laws for the transport, from synapse to soma, of strength, characteristic time, and dispersion. These laws lead immediately to synaptic scalings that overcome attenuation with distance. We follow this with a Neumann approximation of Green’s representation that readily produces the synaptic scaling that democratizes the peak somatic voltage response. Results are obtained for both idealized geometries and for the more realistic geometry of a rat CA1 pyramidal cell. For each measure of democratization we produce and contrast the synaptic scaling associated with treating the synapse as either a conductance change or a current injection. We find that our respective scalings agree up to a critical distance from the soma and we reveal how this critical distance decreases with decreasing branch radius.
Resumo:
Hevea brasiliensis (Willd. Ex Adr. Juss.) Muell.-Arg. is the primary source of natural rubber that is native to the Amazon rainforest. The singular properties of natural rubber make it superior to and competitive with synthetic rubber for use in several applications. Here, we performed RNA sequencing (RNA-seq) of H. brasiliensis bark on the Illumina GAIIx platform, which generated 179,326,804 raw reads on the Illumina GAIIx platform. A total of 50,384 contigs that were over 400 bp in size were obtained and subjected to further analyses. A similarity search against the non-redundant (nr) protein database returned 32,018 (63%) positive BLASTx hits. The transcriptome analysis was annotated using the clusters of orthologous groups (COG), gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and Pfam databases. A search for putative molecular marker was performed to identify simple sequence repeats (SSRs) and single nucleotide polymorphisms (SNPs). In total, 17,927 SSRs and 404,114 SNPs were detected. Finally, we selected sequences that were identified as belonging to the mevalonate (MVA) and 2-C-methyl-D-erythritol 4-phosphate (MEP) pathways, which are involved in rubber biosynthesis, to validate the SNP markers. A total of 78 SNPs were validated in 36 genotypes of H. brasiliensis. This new dataset represents a powerful information source for rubber tree bark genes and will be an important tool for the development of microsatellites and SNP markers for use in future genetic analyses such as genetic linkage mapping, quantitative trait loci identification, investigations of linkage disequilibrium and marker-assisted selection.