56 resultados para agglomerative clustering
Resumo:
Multiple sclerosis and idiopathic dilated cardiomyopathy are two conditions in which an autoimmune process is implicated in the pathogenesis. There is evidence to support clustering of autoimmune diseases in patients with multiple sclerosis and their families. To our knowledge, this is the first report of idiopathic dilated cardiomyopathy occurring in a patient with multiple sclerosis.
Resumo:
Data mining is the process to identify valid, implicit, previously unknown, potentially useful and understandable information from large databases. It is an important step in the process of knowledge discovery in databases, (Olaru & Wehenkel, 1999). In a data mining process, input data can be structured, seme-structured, or unstructured. Data can be in text, categorical or numerical values. One of the important characteristics of data mining is its ability to deal data with large volume, distributed, time variant, noisy, and high dimensionality. A large number of data mining algorithms have been developed for different applications. For example, association rules mining can be useful for market basket problems, clustering algorithms can be used to discover trends in unsupervised learning problems, classification algorithms can be applied in decision-making problems, and sequential and time series mining algorithms can be used in predicting events, fault detection, and other supervised learning problems (Vapnik, 1999). Classification is among the most important tasks in the data mining, particularly for data mining applications into engineering fields. Together with regression, classification is mainly for predictive modelling. So far, there have been a number of classification algorithms in practice. According to (Sebastiani, 2002), the main classification algorithms can be categorized as: decision tree and rule based approach such as C4.5 (Quinlan, 1996); probability methods such as Bayesian classifier (Lewis, 1998); on-line methods such as Winnow (Littlestone, 1988) and CVFDT (Hulten 2001), neural networks methods (Rumelhart, Hinton & Wiliams, 1986); example-based methods such as k-nearest neighbors (Duda & Hart, 1973), and SVM (Cortes & Vapnik, 1995). Other important techniques for classification tasks include Associative Classification (Liu et al, 1998) and Ensemble Classification (Tumer, 1996).
Resumo:
Liver samples from rabbits killed by RHDV, collected from five States in Australia in 1996 and 1997 were analysed by RT-PCR. A 398 bp fragment of the capsid protein (VP60) gene was amplified by PCR and directly sequenced. The alignment of the nucleotide and amino acid sequences and their comparison with the original strain of the virus released in Australia indicated genetic changes after two years have been small with 98.2% to 100% identity. The constructed phylogenetic tree suggests slight differences in nucleotide substitutions in various States but there is no clear evidence of clustering of sequences according to their geographic origin. In practical terms, sequencing of viral RNA provides a means of testing the efficacy of further releases and subsequent spread of the virus if such a strategy is employed as a means of enhancing RHD as a biological control of the wild rabbit in Australia.
Resumo:
Cylindrospermopsis raciborskii is a toxic-bloom-forming cyanobacterium that is commonly found in tropical to subtropical climatic regions worldwide, but it is also recognized as a common component of cyanobacterial communities in temperate climates. Genetic profiles of C. raciborskii were examined in 19 cultured isolates originating from geographically diverse regions of Australia and represented by two distinct morphotypes. A 609-bp region of rpoC1, a DNA-dependent RNA polymerase gene, was amplified by PCR from these isolates with cyanobacterium-specific primers. Sequence analysis revealed that all isolates belonged to the same species, including morphotypes with straight or coiled trichomes. Additional rpoC1 gene sequences obtained for a range of cyanobacteria highlighted clustering of C. raciborskii with other heterocyst-producing cyanobacteria (orders Nostocales and Stigonematales). In contrast, randomly amplified polymorphic DNA and short tandemly repeated repetitive sequence profiles revealed a greater level of genetic heterogeneity among C. raciborskii isolates than did rpoC1 gene analysis, and unique band profiles were also found among each of the cyanobacterial genera examined. A PCR test targeting a region of the rpoC1 gene unique to C. raciborskii was developed for the specific identification of C. raciborskii from both purified genomic DNA and environmental samples. The PCR was evaluated with a number of cyanobacterial isolates, but a PCR-positive result was only achieved with C, raciborskii. This method provides an accurate alternative to traditional morphological identification of C. raciborskii.
Resumo:
Normal mixture models are being increasingly used to model the distributions of a wide variety of random phenomena and to cluster sets of continuous multivariate data. However, for a set of data containing a group or groups of observations with longer than normal tails or atypical observations, the use of normal components may unduly affect the fit of the mixture model. In this paper, we consider a more robust approach by modelling the data by a mixture of t distributions. The use of the ECM algorithm to fit this t mixture model is described and examples of its use are given in the context of clustering multivariate data in the presence of atypical observations in the form of background noise.
Resumo:
This paper develops an interactive approach for exploratory spatial data analysis. Measures of attribute similarity and spatial proximity are combined in a clustering model to support the identification of patterns in spatial information. Relationships between the developed clustering approach, spatial data mining and choropleth display are discussed. Analysis of property crime rates in Brisbane, Australia is presented. A surprising finding in this research is that there are substantial inconsistencies in standard choropleth display options found in two widely used commercial geographical information systems, both in terms of definition and performance. The comparative results demonstrate the usefulness and appeal of the developed approach in a geographical information system environment for exploratory spatial data analysis.
Resumo:
Examples from the Murray-Darling basin in Australia are used to illustrate different methods of disaggregation of reconnaissance-scale maps. One approach for disaggregation revolves around the de-convolution of the soil-landscape paradigm elaborated during a soil survey. The descriptions of soil ma units and block diagrams in a soil survey report detail soil-landscape relationships or soil toposequences that can be used to disaggregate map units into component landscape elements. Toposequences can be visualised on a computer by combining soil maps with digital elevation data. Expert knowledge or statistics can be used to implement the disaggregation. Use of a restructuring element and k-means clustering are illustrated. Another approach to disaggregation uses training areas to develop rules to extrapolate detailed mapping into other, larger areas where detailed mapping is unavailable. A two-level decision tree example is presented. At one level, the decision tree method is used to capture mapping rules from the training area; at another level, it is used to define the domain over which those rules can be extrapolated. (C) 2001 Elsevier Science B.V. All rights reserved.
Resumo:
Using data from the H I Parkes All Sky Survey (HIPASS), we have searched for neutral hydrogen in galaxies in a region similar to25x25 deg(2) centred on NGC 1399, the nominal centre of the Fornax cluster. Within a velocity search range of 300-3700 km s(-1) and to a 3sigma lower flux limit of similar to40 mJy, 110 galaxies with H I emission were detected, one of which is previously uncatalogued. None of the detections has early-type morphology. Previously unknown velocities for 14 galaxies have been determined, with a further four velocity measurements being significantly dissimilar to published values. Identification of an optical counterpart is relatively unambiguous for more than similar to90 per cent of our H I galaxies. The galaxies appear to be embedded in a sheet at the cluster velocity which extends for more than 30degrees across the search area. At the nominal cluster distance of similar to20 Mpc, this corresponds to an elongated structure more than 10 Mpc in extent. A velocity gradient across the structure is detected, with radial velocities increasing by similar to500 km s(-1) from south-east to north-west. The clustering of galaxies evident in optical surveys is only weakly suggested in the spatial distribution of our H I detections. Of 62 H I detections within a 10degrees projected radius of the cluster centre, only two are within the core region (projected radius
Resumo:
The habit of inducing plant galls has evolved multiple times among insects but most species diversity occurs in only a few groups, such as gall midges and gall wasps. This phylogenetic clustering may reflect adaptive radiations in insect groups in which the trait has evolved. Alternatively, multiple independent origins of galling may suggest a selective advantage to the habit. We use DNA sequence data to examine the origins of galling among the most speciose group of gall-inducing scale insects, the eriococcids. We determine that the galling habit has evolved multiple times, including four times in Australian taxa, suggesting that there has been a selective advantage to galling in Australia. Additionally, although most gall-inducing eriococcid species occur on Myrtaceae, we found that lineages feeding on Myrtaceae are no more likely to have evolved the galling habit than those feeding on other plant groups. However, most gall-inducing species-richness is clustered in only two clades (Apiomorpha and Lachnodius + Opisthoscelis), all of which occur exclusively on Eucalyptus s.s. The Eriococcidae and the large genus Eriococcus were determined to be non-monophyletic and each will require revision. (C) 2004 The Linnean Society of London.
Resumo:
Objective: To examine the quality of diabetes care and prevention of cardiovascular disease (CVD) in Australian general practice patients with type 2 diabetes and to investigate its relationship with coronary heart disease absolute risk (CHDAR). Methods: A total of 3286 patient records were extracted from registers of patients with type 2 diabetes held by 16 divisions of general practice (250 practices) across Australia for the year 2002. CHDAR was estimated using the United Kingdom Prospective Diabetes Study algorithm with higher CHDAR set at a 10 year risk of >15%. Multivariate multilevel logistic regression investigated the association between CHDAR and diabetes care. Results: 47.9% of diabetic patient records had glycosylated haemoglobin (HbA1c) >7%, 87.6% had total cholesterol >= 4.0 mmol/l, and 73.8% had blood pressure (BP) >= 130/85 mm Hg. 57.6% of patients were at a higher CHDAR, 76.8% of whom were not on lipid modifying medication and 66.2% were not on antihypertensive medication. After adjusting for clustering at the general practice level and age, lipid modifying medication was negatively related to CHDAR (odds ratio (OR) 0.84) and total cholesterol. Antihypertensive medication was positively related to systolic BP but negatively related to CHDAR (OR 0.88). Referral to ophthalmologists/optometrists and attendance at other health professionals were not related to CHDAR. Conclusions: At the time of the study the diabetes and CVD preventive care in Australian general practice was suboptimal, even after a number of national initiatives. The Australian Pharmaceutical Benefits Scheme (PBS) guidelines need to be modified to improve CVD preventive care in patients with type 2 diabetes.
Resumo:
Cerebral Autosomal Dominant Arteriopathy with Subcortical Infarcts and Leucoencephalopathy (CADASIL) is a recently described cause of stroke or stroke-like episodes. It is caused by mutations in the Notch3 gene on chromosome 19p. We sought to demonstrate mutations of the Notch3 gene in Australian patients suspected of having CADASIL. Patients from several families were referred to the study. A diagnosis was determined clinically and by neuroimaging. Those suspected of having CADASIL had sequencing of exons 3 and 4 of the Notch3 gene. Eight patients, two of whom were siblings, were suspected of having CADASIL. Five patients (including the siblings) had mutations. Because of strong clustering of Notch3 mutations in CADASIL, this has potential as a reliable test for the disease in Australian patients. (C) 2001 Harcourt Publishers Ltd.
Resumo:
When the data consist of certain attributes measured on the same set of items in different situations, they would be described as a three-mode three-way array. A mixture likelihood approach can be implemented to cluster the items (i.e., one of the modes) on the basis of both of the other modes simultaneously (i.e,, the attributes measured in different situations). In this paper, it is shown that this approach can be extended to handle three-mode three-way arrays where some of the data values are missing at random in the sense of Little and Rubin (1987). The methodology is illustrated by clustering the genotypes in a three-way soybean data set where various attributes were measured on genotypes grown in several environments.
Three-dimensional structure of RTD-1, a cyclic antimicrobial defensin from rhesus macaque leukocytes
Resumo:
Most mammalian defensins are cationic peptides of 29-42 amino acids long, stabilized by three disulfide bonds. However, recently Tang et al. (1999, Science 286, 498-502) reported the isolation of a new defensin type found in the leukocytes of rhesus macaques. In contrast to all the other defensins found so far, rhesus theta defensin-1 (RTD-1) is composed of just 18 amino acids with the backbone cyclized through peptide bonds. Antibacterial activities of both the native cyclic peptide and a linear form were examined, showing that the cyclic form was 3-fold more active than the open chain analogue [Tang et al. (1999) Science 286, 498-502]. To elucidate the three-dimensional structure of RTD-1 and its open chain analogue, both peptides were synthesized using solid-phase peptide synthesis and tert-butyloxycarbonyl chemistry. The structures of both peptides in aqueous solution were determined from two-dimensional H-1 NMR data recorded at 500 and 750 MHz. Structural constraints consisting of interproton distances and dihedral angles were used as input for simulated-annealing calculations and water refinement with the program CNS. RTD-1 and its open chain analogue oRTD-1 adopt very similar structures in water. Both comprise an extended beta -hairpin structure with turns at one or both ends. The turns are well defined within themselves and seem to be flexible with respect to the extended regions of the molecules. Although the two strands of the beta -sheet are connected by three disulfide bonds, this region displays a degree of flexibility. The structural similarity of RTD-1 and its open chain analogue oRTD-1, as well as their comparable degree of flexibility, support the theory that the additional charges at the termini of the open chain analogue rather than overall differences in structure or flexibility are the cause for oRTD-1's lower antimicrobial activity. In contrast to numerous other antimicrobial peptides, RTD-1 does not display any amphiphilic character, even though surface models of RTD-1 exhibit a certain clustering of positive charges. Some amide protons of RTD-1 that should be solvent-exposed in monomeric beta -sheet structures show low-temperature coefficients, suggesting the possible presence of weak intermolecular hydrogen bonds.
Resumo:
A multivariate model using hierarchical clustering and discriminant analysis is used to identify clusters of community opportunity and community vulnerability across Australia's mega metropolitan regions, Variables used in the model measure aspects of structural economic change, occupational change, human capital, income, unemployment, family/household disadvantage, and housing stress. A nine-cluster solution is used to categorise communities across metropolitan space. Significant between-city variations in the incidence of these clusters of opportunity and vulnerability are apparent, suggesting the emergence of marked differentiation between Australia's mega metropolitan regions in their adjustments to changing economic and social conditions. JEL classification: C49, R11, R12.
Resumo:
1. Schizophrenia is a chronic, disabling brain disease that affects approxmately 1% of the world's population. It is characterized by delusions, hallucinations and formal thought disorder, together with a decline in socio-occupational functioning. While the causes for schizophrenia remain unknown, evidence from family, twin and adoption studies clearly demonstrates that it aggregates in families, with this clustering largely attributable to genetic rather than cultural or environmental factors. Identifying the genes involved, however, has proven to be a difficult task because schizophrenia is a complex trait characterized by an imprecise phenotype, the existence of phenocopies and the presence of low disease penetrance, 2. The current working hypothesis for schizophrenia causation is that multiple genes of small to moderate effect confer compounding risk through interactions with each other and with non-genetic risk factors, The same genes may be commonly involved in conferring risk across populations or they may vary in number and strength between different populations. To search for evidence of such genetic loci, both candidate gene and genome-wide linkage studies have been used in clinical cohorts collected from a variety of populations. Collectively, these works provide some evidence for the involvement of a number of specific genes (e.g. the 5-hydroxytryptamine (5-HT) type 2a receptor (5-HT2a) gene and the dopamine D-3 receptor gene) and as yet unidentified factors localized to specific chromosomal regions, including 6p, 6q, 8p, 13q and 22q, These data provide suggestive, but no conclusive, evidence for causative genes. 3. To enable further progress there is a need to: (i) collect fine-grained clinical datasets while searching the schizophrenia phenotype for subgroups or dimensions that may provide a more direct route to causative genes; and (ii) integrate recent refinements in molecular genetic technology, including modern composite marker maps, DNA expression assays and relevant animal models, while using the latest analytical techniques to extract maximum information in order to help distinguish a true result from a false-positive finding.