966 resultados para Complete K-ary Tree
Resumo:
Perez-Losada et al. [1] analyzed 72 complete genomes corresponding to nine mammalian (67 strains) and 2 avian (5 strains) polyomavirus species using maximum likelihood and Bayesian methods of phylogenetic inference. Because some data of 2 genomes in their work are now not available in GenBank, in this work, we analyze the phylogenetic relationship of the remaining 70 complete genomes corresponding to nine mammalian (65 strains) and two avian (5 strains) polyomavirus species using a dynamical language model approach developed by our group (Yu et al., [26]). This distance method does not require sequence alignment for deriving species phylogeny based on overall similarities of the complete genomes. Our best tree separates the bird polyomaviruses (avian polyomaviruses and goose hemorrhagic polymaviruses) from the mammalian polyomaviruses, which supports the idea of splitting the genus into two subgenera. Such a split is consistent with the different viral life strategies of each group. In the mammalian polyomavirus subgenera, mouse polyomaviruses (MPV), simian viruses 40 (SV40), BK viruses (BKV) and JC viruses (JCV) are grouped as different branches as expected. The topology of our best tree is quite similar to that of the tree constructed by Perez-Losada et al.
Resumo:
This paper describes the approach taken to the clustering task at INEX 2009 by a group at the Queensland University of Technology. The Random Indexing (RI) K-tree has been used with a representation that is based on the semantic markup available in the INEX 2009 Wikipedia collection. The RI K-tree is a scalable approach to clustering large document collections. This approach has produced quality clustering when evaluated using two different methodologies.
Resumo:
Digital collections are growing exponentially in size as the information age takes a firm grip on all aspects of society. As a result Information Retrieval (IR) has become an increasingly important area of research. It promises to provide new and more effective ways for users to find information relevant to their search intentions. Document clustering is one of the many tools in the IR toolbox and is far from being perfected. It groups documents that share common features. This grouping allows a user to quickly identify relevant information. If these groups are misleading then valuable information can accidentally be ignored. There- fore, the study and analysis of the quality of document clustering is important. With more and more digital information available, the performance of these algorithms is also of interest. An algorithm with a time complexity of O(n2) can quickly become impractical when clustering a corpus containing millions of documents. Therefore, the investigation of algorithms and data structures to perform clustering in an efficient manner is vital to its success as an IR tool. Document classification is another tool frequently used in the IR field. It predicts categories of new documents based on an existing database of (doc- ument, category) pairs. Support Vector Machines (SVM) have been found to be effective when classifying text documents. As the algorithms for classifica- tion are both efficient and of high quality, the largest gains can be made from improvements to representation. Document representations are vital for both clustering and classification. Representations exploit the content and structure of documents. Dimensionality reduction can improve the effectiveness of existing representations in terms of quality and run-time performance. Research into these areas is another way to improve the efficiency and quality of clustering and classification results. Evaluating document clustering is a difficult task. Intrinsic measures of quality such as distortion only indicate how well an algorithm minimised a sim- ilarity function in a particular vector space. Intrinsic comparisons are inherently limited by the given representation and are not comparable between different representations. Extrinsic measures of quality compare a clustering solution to a “ground truth” solution. This allows comparison between different approaches. As the “ground truth” is created by humans it can suffer from the fact that not every human interprets a topic in the same manner. Whether a document belongs to a particular topic or not can be subjective.
Resumo:
In this paper we present pyktree, an implementation of the K-tree algorithm in the Python programming language. The K-tree algorithm provides highly balanced search trees for vector quantization that scales up to very large data sets. Pyktree is highly modular and well suited for rapid-prototyping of novel distance measures and centroid representations. It is easy to install and provides a python package for library use as well as command line tools.
Resumo:
To facilitate marketing and export, the Australian macadamia industry requires accurate crop forecasts. Each year, two levels of crop predictions are produced for this industry. The first is an overall longer-term forecast based on tree census data of growers in the Australian Macadamia Society (AMS). This data set currently accounts for around 70% of total production, and is supplemented by our best estimates of non-AMS orchards. Given these total tree numbers, average yields per tree are needed to complete the long-term forecasts. Yields from regional variety trials were initially used, but were found to be consistently higher than the average yields that growers were obtaining. Hence, a statistical model was developed using growers' historical yields, also taken from the AMS database. This model accounted for the effects of tree age, variety, year, region and tree spacing, and explained 65% of the total variation in the yield per tree data. The second level of crop prediction is an annual climate adjustment of these overall long-term estimates, taking into account the expected effects on production of the previous year's climate. This adjustment is based on relative historical yields, measured as the percentage deviance between expected and actual production. The dominant climatic variables are observed temperature, evaporation, solar radiation and modelled water stress. Initially, a number of alternate statistical models showed good agreement within the historical data, with jack-knife cross-validation R2 values of 96% or better. However, forecasts varied quite widely between these alternate models. Exploratory multivariate analyses and nearest-neighbour methods were used to investigate these differences. For 2001-2003, the overall forecasts were in the right direction (when compared with the long-term expected values), but were over-estimates. In 2004 the forecast was well under the observed production, and in 2005 the revised models produced a forecast within 5.1% of the actual production. Over the first five years of forecasting, the absolute deviance for the climate-adjustment models averaged 10.1%, just outside the targeted objective of 10%.
Resumo:
Purpose We investigated the effects of weed control and fertilization at early establishment on foliar stable carbon (δ13C) and nitrogen (N) isotope (δ15N) compositions, foliar N concentration, tree growth and biomass, relative weed cover and other physiological traits in a 2-year old F1 hybrid (Pinus elliottii var. elliottii (Engelm) × Pinus caribaea var. hondurensis (Barr. ex Golf.)) plantation grown on a yellow earth in southeast Queensland of subtropical Australia. Materials and methods Treatments included routine weed control, luxury weed control, intermediate weed control, mechanical weed control, nil weed control, and routine and luxury fertilization in a randomised complete block design. Initial soil nutrition and soil fertility parameters included (hot water extractable organic carbon (C) and total nitrogen (N), total C and N, C/N ratio, labile N pools (nitrate (NO3 −) and ammonium (NH4 +)), extractable potassium (K+)), soil δ15N and δ13C. Relative weed cover, foliar N concentrations, tree growth rate and physiological parameters including photosynthesis, stomatal conductance, photosynthetic nitrogen use efficiency, foliar δ15N and foliar δ13C were also measured at early establishment. Results and discussion Foliar N concentration at 1.25 years was significantly different amongst the weed control treatments and was negatively correlated to the relative weed cover at 1.1 years. Foliar N concentration was also positively correlated to foliar δ15N and foliar δ13C, tree height, height growth rates and tree biomass. Foliar δ15N was negatively correlated to the relative weed cover at 0.8 and 1.1 years. The physiological measurements indicated that luxury fertilization and increasing weed competition on these soils decreased leaf xylem pressure potential (Ψxpp) when compared to the other treatments. Conclusions These results indicate how increasing N resources and weed competition have implications for tree N and water use at establishment in F1 hybrid plantations of southeast Queensland, Australia. These results suggest the desirability of weed control, in the inter-planting row, in the first year to maximise site N and water resources available for seedling growth. It also showed the need to avoid over-fertilisation, which interfered with the balance between available N and water on these soils.
Resumo:
The high species richness of tropical forests has long been recognized, yet there remains substantial uncertainty regarding the actual number of tropical tree species. Using a pantropical tree inventory database from closed canopy forests, consisting of 657,630 trees belonging to 11,371 species, we use a fitted value of Fisher's alpha and an approximate pantropical stem total to estimate the minimum number of tropical forest tree species to fall between similar to 40,000 and similar to 53,000, i.e., at the high end of previous estimates. Contrary to common assumption, the Indo-Pacific region was found to be as species-rich as the Neotropics, with both regions having a minimum of similar to 19,000-25,000 tree species. Continental Africa is relatively depauperate with a minimum of similar to 4,500-6,000 tree species. Very few species are shared among the African, American, and the Indo-Pacific regions. We provide a methodological framework for estimating species richness in trees that may help refine species richness estimates of tree-dependent taxa.
Resumo:
One among the most influential and popular data mining methods is the k-Means algorithm for cluster analysis. Techniques for improving the efficiency of k-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting geometrical constraints and an efficient data structure, notably a multidimensional binary search tree (KD-Tree). These techniques allow to reduce the number of distance computations the algorithm performs at each iteration. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient k-Means variants in parallel computing environments. In this work, we provide a parallel formulation of the KD-Tree based k-Means algorithm for distributed memory systems and address its load balancing issue. Three solutions have been developed and tested. Two approaches are based on a static partitioning of the data set and a third solution incorporates a dynamic load balancing policy.
Resumo:
In the hot and dry conditions in which seeds of the tree legume Peltophorum pterocarpum develop and mature in Vietnam, seed moisture content declined rapidly on the mother plant from 87% at 42 d after flowering (DAF) to 15% at 70 DAF. Dry weight of the pods attained a maximum value at about 42 DAF, but seed mass maturity (i.e. the end of the seed-filling phase) occurred at about 62 DAF, at which time seed moisture content was about 45-48%. The onset of the ability of freshly collected seeds to germinate (in 63-d tests at 28-34degreesC) occurred at 42 DAF, i.e. about 20 d before mass maturity. Full germination (98%) was attained at 70 DAF, i.e. at about 8 d after mass maturity. Thereafter, germination of fresh seeds declined, due to the imposition of a hard seed coat. Tolerance of desiccation to 10% moisture content was first detected at 56 DAF and was complete within the seed population by 84 DAF, i.e. about 22 d after mass maturity. Hardseededness began to be induced when seeds were dried to about 15% moisture content and below, with a negative logarithmic relation between hardseededness and moisture content below this value.
Resumo:
Parkinson's disease (PD) is a degenerative illness whose cardinal symptoms include rigidity, tremor, and slowness of movement. In addition to its widely recognized effects PD can have a profound effect on speech and voice.The speech symptoms most commonly demonstrated by patients with PD are reduced vocal loudness, monopitch, disruptions of voice quality, and abnormally fast rate of speech. This cluster of speech symptoms is often termed Hypokinetic Dysarthria.The disease can be difficult to diagnose accurately, especially in its early stages, due to this reason, automatic techniques based on Artificial Intelligence should increase the diagnosing accuracy and to help the doctors make better decisions. The aim of the thesis work is to predict the PD based on the audio files collected from various patients.Audio files are preprocessed in order to attain the features.The preprocessed data contains 23 attributes and 195 instances. On an average there are six voice recordings per person, By using data compression technique such as Discrete Cosine Transform (DCT) number of instances can be minimized, after data compression, attribute selection is done using several WEKA build in methods such as ChiSquared, GainRatio, Infogain after identifying the important attributes, we evaluate attributes one by one by using stepwise regression.Based on the selected attributes we process in WEKA by using cost sensitive classifier with various algorithms like MultiPass LVQ, Logistic Model Tree(LMT), K-Star.The classified results shows on an average 80%.By using this features 95% approximate classification of PD is acheived.This shows that using the audio dataset, PD could be predicted with a higher level of accuracy.
Resumo:
Mode of access: Internet.
Resumo:
This activity book is designed to supplement the information provided in the A to Z From a Tree, Illinois Fall Colors, Illinois' Forestry Industry and Illinois Trees : Seeds and Leaves posters from the Illinois Department of Natural Resources (IDNR). When using this activity book, students will become familiar with many characteristics of trees, industries related to trees and products made from trees. The information and activities included can assist your students of grades kindergarten through three in meeting the Illinois Learning Standards listed below. Although it is not necessary to have a copy of the posters named above to complete this activity book, if you would like them, they can be ordered online. Go to http://dnr.state.il.us then click on the "Education" button in the right side box. You'll find the link to the online order form.
Resumo:
This activity book is designed to supplement the information provided in the A to Z From a Tree, Illinois Fall Colors, Illinois' Forestry Industry and Illinois Trees : Seeds and Leaves posters from the Illinois Department of Natural Resources (IDNR). When using this activity book, students will become familiar with many characteristics of trees, industries related to trees and products made from trees. The information and activities included can assist your students of grades kindergarten through three in meeting the Illinois Learning Standards listed below. Although it is not necessary to have a copy of the posters named above to complete this activity book, if you would like them, they can be ordered online. Go to http://dnr.state.il.us then click on the "Education" button in the right side box. You'll find the link to the online order form.