812 resultados para Hier-archical clustering
Resumo:
While close talking microphones give the best signal quality and produce the highest accuracy from current Automatic Speech Recognition (ASR) systems, the speech signal enhanced by microphone array has been shown to be an effective alternative in a noisy environment. The use of microphone arrays in contrast to close talking microphones alleviates the feeling of discomfort and distraction to the user. For this reason, microphone arrays are popular and have been used in a wide range of applications such as teleconferencing, hearing aids, speaker tracking, and as the front-end to speech recognition systems. With advances in sensor and sensor network technology, there is considerable potential for applications that employ ad-hoc networks of microphone-equipped devices collaboratively as a virtual microphone array. By allowing such devices to be distributed throughout the users’ environment, the microphone positions are no longer constrained to traditional fixed geometrical arrangements. This flexibility in the means of data acquisition allows different audio scenes to be captured to give a complete picture of the working environment. In such ad-hoc deployment of microphone sensors, however, the lack of information about the location of devices and active speakers poses technical challenges for array signal processing algorithms which must be addressed to allow deployment in real-world applications. While not an ad-hoc sensor network, conditions approaching this have in effect been imposed in recent National Institute of Standards and Technology (NIST) ASR evaluations on distant microphone recordings of meetings. The NIST evaluation data comes from multiple sites, each with different and often loosely specified distant microphone configurations. This research investigates how microphone array methods can be applied for ad-hoc microphone arrays. A particular focus is on devising methods that are robust to unknown microphone placements in order to improve the overall speech quality and recognition performance provided by the beamforming algorithms. In ad-hoc situations, microphone positions and likely source locations are not known and beamforming must be achieved blindly. There are two general approaches that can be employed to blindly estimate the steering vector for beamforming. The first is direct estimation without regard to the microphone and source locations. An alternative approach is instead to first determine the unknown microphone positions through array calibration methods and then to use the traditional geometrical formulation for the steering vector. Following these two major approaches investigated in this thesis, a novel clustered approach which includes clustering the microphones and selecting the clusters based on their proximity to the speaker is proposed. Novel experiments are conducted to demonstrate that the proposed method to automatically select clusters of microphones (ie, a subarray), closely located both to each other and to the desired speech source, may in fact provide a more robust speech enhancement and recognition than the full array could.
Resumo:
Speaker verification is the process of verifying the identity of a person by analysing their speech. There are several important applications for automatic speaker verification (ASV) technology including suspect identification, tracking terrorists and detecting a person’s presence at a remote location in the surveillance domain, as well as person authentication for phone banking and credit card transactions in the private sector. Telephones and telephony networks provide a natural medium for these applications. The aim of this work is to improve the usefulness of ASV technology for practical applications in the presence of adverse conditions. In a telephony environment, background noise, handset mismatch, channel distortions, room acoustics and restrictions on the available testing and training data are common sources of errors for ASV systems. Two research themes were pursued to overcome these adverse conditions: Modelling mismatch and modelling uncertainty. To directly address the performance degradation incurred through mismatched conditions it was proposed to directly model this mismatch. Feature mapping was evaluated for combating handset mismatch and was extended through the use of a blind clustering algorithm to remove the need for accurate handset labels for the training data. Mismatch modelling was then generalised by explicitly modelling the session conditions as a constrained offset of the speaker model means. This session variability modelling approach enabled the modelling of arbitrary sources of mismatch, including handset type, and halved the error rates in many cases. Methods to model the uncertainty in speaker model estimates and verification scores were developed to address the difficulties of limited training and testing data. The Bayes factor was introduced to account for the uncertainty of the speaker model estimates in testing by applying Bayesian theory to the verification criterion, with improved performance in matched conditions. Modelling the uncertainty in the verification score itself met with significant success. Estimating a confidence interval for the "true" verification score enabled an order of magnitude reduction in the average quantity of speech required to make a confident verification decision based on a threshold. The confidence measures developed in this work may also have significant applications for forensic speaker verification tasks.
Resumo:
Artificial neural network (ANN) learning methods provide a robust and non-linear approach to approximating the target function for many classification, regression and clustering problems. ANNs have demonstrated good predictive performance in a wide variety of practical problems. However, there are strong arguments as to why ANNs are not sufficient for the general representation of knowledge. The arguments are the poor comprehensibility of the learned ANN, and the inability to represent explanation structures. The overall objective of this thesis is to address these issues by: (1) explanation of the decision process in ANNs in the form of symbolic rules (predicate rules with variables); and (2) provision of explanatory capability by mapping the general conceptual knowledge that is learned by the neural networks into a knowledge base to be used in a rule-based reasoning system. A multi-stage methodology GYAN is developed and evaluated for the task of extracting knowledge from the trained ANNs. The extracted knowledge is represented in the form of restricted first-order logic rules, and subsequently allows user interaction by interfacing with a knowledge based reasoner. The performance of GYAN is demonstrated using a number of real world and artificial data sets. The empirical results demonstrate that: (1) an equivalent symbolic interpretation is derived describing the overall behaviour of the ANN with high accuracy and fidelity, and (2) a concise explanation is given (in terms of rules, facts and predicates activated in a reasoning episode) as to why a particular instance is being classified into a certain category.
Resumo:
We propose a model-based approach to unify clustering and network modeling using time-course gene expression data. Specifically, our approach uses a mixture model to cluster genes. Genes within the same cluster share a similar expression profile. The network is built over cluster-specific expression profiles using state-space models. We discuss the application of our model to simulated data as well as to time-course gene expression data arising from animal models on prostate cancer progression. The latter application shows that with a combined statistical/bioinformatics analyses, we are able to extract gene-to-gene relationships supported by the literature as well as new plausible relationships.
Resumo:
During the last decade many cities have sought to promote creativity by encouraging creative industries as drivers for economic and spatial growth. Among the creative industries, film industry play an important role in establishing high level of success in economic and spatial development of cities by fostering endogenous creativeness, attracting exogenous talent, and contributing to the formation of places that creative cities require. The paper aims to scrutinize the role of creative industries in general and the film industry in particular for place making, spatial development, tourism, and the formation of creative cities, their clustering and locational decisions. This paper investigates the positive effects of the film industry on tourism such as incubating creativity potential, increasing place recognition through locations of movies filmed and film festivals hosted, attracting visitors and establishing interaction among visitors, places and their cultures. This paper reveals the preliminary findings of two case studies from Beyoglu, Istanbul and Soho, London, examines the relation between creativity, tourism, culture and the film industry, and discusses their effects on place-making and tourism.
Resumo:
This overview focuses on the application of chemometrics techniques for the investigation of soils contaminated by polycyclic aromatic hydrocarbons (PAHs) and metals because these two important and very diverse groups of pollutants are ubiquitous in soils. The salient features of various studies carried out in the micro- and recreational environments of humans, are highlighted in the context of the various multivariate statistical techniques available across discipline boundaries that have been effectively used in soil studies. Particular attention is paid to techniques employed in the geosciences that may be effectively utilized for environmental soil studies; classical multivariate approaches that may be used in isolation or as complementary methods to these are also discussed. Chemometrics techniques widely applied in atmospheric studies for identifying sources of pollutants or for determining the importance of contaminant source contributions to a particular site, have seen little use in soil studies, but may be effectively employed in such investigations. Suitable programs are also available for suggesting mitigating measures in cases of soil contamination, and these are also considered. Specific techniques reviewed include pattern recognition techniques such as Principal Components Analysis (PCA), Fuzzy Clustering (FC) and Cluster Analysis (CA); geostatistical tools include variograms, Geographical Information Systems (GIS), contour mapping and kriging; source identification and contribution estimation methods reviewed include Positive Matrix Factorisation (PMF), and Principal Component Analysis on Absolute Principal Component Scores (PCA/APCS). Mitigating measures to limit or eliminate pollutant sources may be suggested through the use of ranking analysis and multi criteria decision making methods (MCDM). These methods are mainly represented in this review by studies employing the Preference Ranking Organisation Method for Enrichment Evaluation (PROMETHEE) and its associated graphic output, Geometrical Analysis for Interactive Aid (GAIA).
Resumo:
An investigation into the effects of changes in urban traffic characteristics due to rapid urbanisation and the predicted changes in rainfall characteristics due to climate change on the build-up and wash-off of heavy metals was carried out in Gold Coast, Australia. The study sites encompassed three different urban land uses. Nine heavy metals commonly associated with traffic emissions were selected. The results were interpreted using multivariate data analysis and decision making tools, such as principal component analysis (PCA), fuzzy clustering (FC), PROMETHEE and GAIA. Initial analyses established high, low and moderate traffic scenarios as well as low, low to moderate, moderate, high and extreme rainfall scenarios for build-up and wash-off investigations. GAIA analyses established that moderate to high traffic scenarios could affect the build-up while moderate to high rainfall scenarios could affect the wash-off of heavy metals under changed conditions. However, in wash-off, metal concentrations in 1-75µm fraction were found to be independent of the changes to rainfall characteristics. In build-up, high traffic activities in commercial and industrial areas influenced the accumulation of heavy metal concentrations in particulate size range from 75 - >300 µm, whereas metal concentrations in finer size range of <1-75 µm were not affected. As practical implications, solids <1 µm and organic matter from 1 - >300 µm can be targeted for removal of Ni, Cu, Pb, Cd, Cr and Zn from build-up whilst organic matter from <1 - >300 µm can be targeted for removal of Cd, Cr, Pb and Ni from wash-off. Cu and Zn need to be removed as free ions from most fractions in wash-off.
Resumo:
This chapter evaluates the rise of creative industries from four standpoints: the growing interest in creativity in the early 21st century; the 'culturalisation' of economic life with the rise of service industries; clustering and uneven development in the cultural economic geography of the creative industries; and the future of arts and cultural policy.
Resumo:
After a brief personal orientation, this presentation offers an opening section on „clash, cluster, complexity, cities‟ – making the case that innovation (both creative and economic) proceeds not only from incremental improvements within an expert-pipeline process, but also from the clash of different systems, generations, and cultures. The argument is that cultural complexity arises from such clashes, and that clustering is the solution to problems of complexity. The classic, 10,000-year-old, institutional form taken by such clusters is … cities. Hence, a creative city is one where clashing and competitive complexity is clustered… and, latterly, networked.
Resumo:
Background: There has been a lack of investigation into the spatial distribution and clustering of suicide in Australia, where the population density is lower than many countries and varies dramatically among urban, rural and remote areas. This study aims to examine the spatial distribution of suicide at a Local Governmental Area (LGA) level and identify the LGAs with a high relative risk of suicide in Queensland, Australia, using geographical information system (GIS) techniques.---------- Methods: Data on suicide and demographic variables in each LGA between 1999 and 2003 were acquired from the Australian Bureau of Statistics. An age standardised mortality (ASM) rate for suicide was calculated at the LGA level. GIS techniques were used to examine the geographical difference of suicide across different areas.---------- Results: Far north and north-eastern Queensland (i.e., Cook and Mornington Shires) had the highest suicide incidence in both genders, while the south-western areas (i.e., Barcoo and Bauhinia Shires) had the lowest incidence in both genders. In different age groups (≤24 years, 25 to 44 years, 45 to 64 years, and ≥65 years), ASM rates of suicide varied with gender at the LGA level. Mornington and six other LGAs with low socioeconomic status in the upper Southeast had significant spatial clusters of high suicide risk.---------- Conclusions: There was a notable difference in ASM rates of suicide at the LGA level in Queensland. Some LGAs had significant spatial clusters of high suicide risk. The determinants of the geographical difference of suicide should be addressed in future research.
Resumo:
Lactobacillus reuteri BR11 possesses an abundant cystine uptake (Cyu) ABC-transporter that was previously found to be involved in a novel mechanism of oxidative defence mediated by cystine. The current study aimed to elucidate this mechanism with a focus on the role of the co-transcribed cystathionine ã-lyase (Cgl). Growth studies of wild-type L. reuteri BR11 and mutants inactivated in cgl and the cystine-binding protein encoding gene cyuC showed that in contrast to the Cyu transporter, whose inactivation led to growth arrest in aerated cultures, Cgl is not crucial for oxidative defence. However, the role of Cgl in oxidative defence became apparent in the presence of severe oxidative damage and cysteine deprivation. Cysteine was found to be protective against oxidative stress, and the action of Cgl in both cysteine biosynthesis and degradation poses a seemingly futile pathway that deprives the intracellular cysteine pool. To further characterise the relationship between Cgl activity and cysteine and their roles in oxidative defence, enzymatic assays were performed on purified Cgl, and intracellular concentrations of cysteine, cystathionine and methionine were determined. Cgl was highly active towards cystine and cystathionine and less active towards cysteine in vitro, suggesting the main function of Cgl to be cysteine biosynthesis. Cysteine was found at high concentrations in the cell, but the levels were not significantly affected by inactivation of cgl or growth under aerobic conditions. It was concluded that both anabolic and catabolic activities of Cgl towards cysteine contribute to oxidative defence, the former by maintaining an intracellular reservoir of thiol analogous to glutathione, and the latter by producing H2S which is readily secreted, thus creating a reducing extracellular environment. The significance of the Cyu transporter to the physiology of L. reuteri BR11 prompted a phylogenetic study to determine its presence in bacteria. Orthologs of the Cyu transporter that are closest matches to the Cyu transporter are only limited to several species of Lactobacillus and Leuconostoc. Outside the Lactobacillales order, the closest matching orthologs belong to Proteobacteria, and there are more orthologs in Proteobacteria than non-Lactobacillales Firmicutes, suggesting that the Cyu transporter locus was present in the ancestor of the Proteobacteria and Firmicutes, and over evolutionary time has been lost or diverged in many Firmicutes. The clustering of the Cyu transporter locus with a gene encoding a Cgl family protein is even rarer. It was only found in L. reuteri, Lactobacillus vaginalis, Weissella paramesenteroides, the Lactobacillus casei group, and several Campylobacter sp. An accompanying phylogenetic study of L. reuteri BR11 using multi-locus sequence analysis showed that L. reuteri BR11 had diverged from more than 100 strains of L. reuteri isolated from various hosts and geographical locations. However, comparison with other Lactobacillus species supported the current classification of BR11 as L. reuteri. The most closely related species to L. reuteri is L. vaginalis or Lactobacillus antri, depending on the housekeeping gene used for analysis. The close evolutionary relationship of L. vaginalis to L. reuteri and the high degree of sequence identity between the cgl-cyuABC loci in both species suggest that the Cyu system is highly likely to perform similar functions in L. vaginalis. In search of other genes that function in oxidative defence, a number of mutants which were inactivated in genes that confer increased resistance to oxidative stress in other bacteria were constructed. The genes targeted were ahpC (peroxidase component of the alkyl hydroperoxide reductase system), tpx (thiol peroxidase), osmC (osmotically induced protein C), mntH (Mn2+/Fe2+ transporter), gshA (ã-glutamylcysteine synthetase) and msrA (methionine sulfoxide reductase). The ahpC and mntH mutants had slightly lower minimum inhibitory concentrations of organic peroxides, suggesting these genes might be involved in resistance to organic peroxides in L. reuteri. However, none of the mutants exhibited growth defects in aerated cultures, in stark contrast to the cyuC mutant. This may be due to compensatory functions of other genes, a hypothesis which cannot be tested until a robust protocol for constructing markerless multiple gene deletion mutants in L. reuteri is developed. These results highlight the importance of the Cyu transporter in oxidative defence and provide a foundation for extending the research of this system in other bacteria.
Resumo:
Snakehead fishes in the family Channidae are obligate freshwater fishes represented by two extant genera, the African Parachannna and the Asian Channa. These species prefer still or slow flowing water bodies, where they are top predators that exercise high levels of parental care, have the ability to breathe air, can tolerate poor water quality, and interestingly, can aestivate or traverse terrestrial habitat in response to seasonal changes in freshwater habitat availability. These attributes suggest that snakehead fishes may possess high dispersal potential, irrespective of the terrestrial barriers that would otherwise constrain the distribution of most freshwater fishes. A number of biogeographical hypotheses have been developed to account for the modern distributions of snakehead fishes across two continents, including ancient vicariance during Gondwanan break-up, or recent colonisation tracking the formation of suitable climatic conditions. Taxonomic uncertainty also surrounds some members of the Channa genus, as geographical distributions for some taxa across southern and Southeast (SE) Asia are very large, and in one case is highly disjunct. The current study adopted a molecular genetics approach to gain an understanding of the evolution of this group of fishes, and in particular how the phylogeography of two Asian species may have been influenced by contemporary versus historical levels of dispersal and vicariance. First, a molecular phylogeny was constructed based on multiple DNA loci and calibrated with fossil evidence to provide a dated chronology of divergence events among extant species, and also within species with widespread geographical distributions. The data provide strong evidence that trans-continental distribution of the Channidae arose as a result of dispersal out of Asia and into Africa in the mid–Eocene. Among Asian Channa, deep divergence among lineages indicates that the Oligocene-Miocene boundary was a time of significant species radiation, potentially associated with historical changes in climate and drainage geomorphology. Mid-Miocene divergence among lineages suggests that a taxonomic revision is warranted for two taxa. Deep intra-specific divergence (~8Mya) was also detected between C. striata lineages that occur sympatrically in the Mekong River Basin. The study then examined the phylogeography and population structure of two major taxa, Channa striata (the chevron snakehead) and the C. micropeltes (the giant snakehead), across SE Asia. Species specific microsatellite loci were developed and used in addition to a mitochondrial DNA marker (Cyt b) to screen neutral genetic variation within and among wild populations. C. striata individuals were sampled across SE Asia (n=988), with the major focus being the Mekong Basin, which is the largest drainage basin in the region. The distributions of two divergent lineages were identified and admixture analysis showed that where they co-occur they are interbreeding, indicating that after long periods of evolution in isolation, divergence has not resulted in reproductive isolation. One lineage is predominantly confined to upland areas of northern Lao PDR to the north of the Khorat Plateau, while the other, which is more closely related to individuals from southern India, has a widespread distribution across mainland SE Asian and Sumatra. The phylogeographical pattern recovered is associated with past river networks, and high diversity and divergence among all populations sampled reveal that contemporary dispersal is very low for this taxon, even where populations occur in contiguous freshwater habitats. C. micropeltes (n=280) were also sampled from across the Mekong River Basin, focusing on the lower basin where it constitutes an important wild fishery resource. In comparison with C. striata, allelic diversity and genetic divergence among populations were extremely low, suggesting very recent colonisation of the greater Mekong region. Populations were significantly structured into at least three discrete populations in the lower Mekong. Results of this study have implications for establishing effective conservation plans for managing both species, that represent economically important wild fishery resources for the region. For C. micropeltes, it is likely that a single fisheries stock in the Tonle Sap Great Lake is being exploited by multiple fisheries operations, and future management initiatives for this species in this region will need to account for this. For C. striata, conservation of natural levels of genetic variation will require management initiatives designed to promote population persistence at very localised spatial scales, as the high level of population structuring uncovered for this species indicates that significant unique diversity is present at this fine spatial scale.
Resumo:
Advances in data mining have provided techniques for automatically discovering underlying knowledge and extracting useful information from large volumes of data. Data mining offers tools for quick discovery of relationships, patterns and knowledge in large complex databases. Application of data mining to manufacturing is relatively limited mainly because of complexity of manufacturing data. Growing self organizing map (GSOM) algorithm has been proven to be an efficient algorithm to analyze unsupervised DNA data. However, it produced unsatisfactory clustering when used on some large manufacturing data. In this paper a data mining methodology has been proposed using a GSOM tool which was developed using a modified GSOM algorithm. The proposed method is used to generate clusters for good and faulty products from a manufacturing dataset. The clustering quality (CQ) measure proposed in the paper is used to evaluate the performance of the cluster maps. The paper also proposed an automatic identification of variables to find the most probable causative factor(s) that discriminate between good and faulty product by quickly examining the historical manufacturing data. The proposed method offers the manufacturers to smoothen the production flow and improve the quality of the products. Simulation results on small and large manufacturing data show the effectiveness of the proposed method.
Resumo:
We have recently demonstrated the geographic isolation of rice tungro bacilliform virus (RTBV) populations in the tungro-endemic provinces of Isabela and North Cotabato, Philippines. In this study, we examined the genetic structure of the virus populations at the tungro-outbreak sites of Lanao del Norte, a province adjacent to North Cotabato. We also analyzed the virus populations at the tungro-endemic sites of Subang, Indonesia, and Dien Khanh, Vietnam. Total DNA extracts from 274 isolates were digested with EcoRV restriction enzyme and hybridized with a full-length probe of RTBV. In the total population, 22 EcoRV-restricted genome profiles (genotypes) were identified. Although overlapping genotypes could be observed, the outbreak sites of Lanao del Norte had a genotype combination distinct from that of Subang or Dien Khanh but a genotype combination similar to that identified earlier from North Cotabato, the adjacent endemic province. Sequence analysis of the intergenic region and part of the ORF1 RTBV genome from randomly selected genotypes confirms the geographic clustering of RTBV genotypes and, combined with restriction analysis, the results suggest a fragmented spatial distribution of RTBV local populations in the three countries. Because RTBV depends on rice tungro spherical virus (RTSV) for transmission, the population dynamics of both tungro viruses were then examined at the endemic and outbreak sites within the Philippines. The RTBV genotypes and the coat protein RTSV genotypes were used as indicators for virus diversity. A shift in population structure of both viruses was observed at the outbreak sites with a reduced RTBV but increased RTSV gene diversity