997 resultados para complete linkage clustering


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Perez-Losada et al. [1] analyzed 72 complete genomes corresponding to nine mammalian (67 strains) and 2 avian (5 strains) polyomavirus species using maximum likelihood and Bayesian methods of phylogenetic inference. Because some data of 2 genomes in their work are now not available in GenBank, in this work, we analyze the phylogenetic relationship of the remaining 70 complete genomes corresponding to nine mammalian (65 strains) and two avian (5 strains) polyomavirus species using a dynamical language model approach developed by our group (Yu et al., [26]). This distance method does not require sequence alignment for deriving species phylogeny based on overall similarities of the complete genomes. Our best tree separates the bird polyomaviruses (avian polyomaviruses and goose hemorrhagic polymaviruses) from the mammalian polyomaviruses, which supports the idea of splitting the genus into two subgenera. Such a split is consistent with the different viral life strategies of each group. In the mammalian polyomavirus subgenera, mouse polyomaviruses (MPV), simian viruses 40 (SV40), BK viruses (BKV) and JC viruses (JCV) are grouped as different branches as expected. The topology of our best tree is quite similar to that of the tree constructed by Perez-Losada et al.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes the use of the Bayes Factor to replace the Bayesian Information Criterion (BIC) as a criterion for speaker clustering within a speaker diarization system. The BIC is one of the most popular decision criteria used in speaker diarization systems today. However, it will be shown in this paper that the BIC is only an approximation to the Bayes factor of marginal likelihoods of the data given each hypothesis. This paper uses the Bayes factor directly as a decision criterion for speaker clustering, thus removing the error introduced by the BIC approximation. Results obtained on the 2002 Rich Transcription (RT-02) Evaluation dataset show an improved clustering performance, leading to a 14.7% relative improvement in the overall Diarization Error Rate (DER) compared to the baseline system.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes a clustered approach for blind beamfoming from ad-hoc microphone arrays. In such arrangements, microphone placement is arbitrary and the speaker may be close to one, all or a subset of microphones at a given time. Practical issues with such a configuration mean that some microphones might be better discarded due to poor input signal to noise ratio (SNR) or undesirable spatial aliasing effects from large inter-element spacings when beamforming. Large inter-microphone spacings may also lead to inaccuracies in delay estimation during blind beamforming. In such situations, using a cluster of microphones (ie, a sub-array), closely located both to each other and to the desired speech source, may provide more robust enhancement than the full array. This paper proposes a method for blind clustering of microphones based on the magnitude square coherence function, and evaluates the method on a database recorded using various ad-hoc microphone arrangements.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Aims: To describe a local data linkage project to match hospital data with the Australian Institute of Health and Welfare (AIHW) National Death Index (NDI) to assess longterm outcomes of intensive care unit patients. Methods: Data were obtained from hospital intensive care and cardiac surgery databases on all patients aged 18 years and over admitted to either of two intensive care units at a tertiary-referral hospital between 1 January 1994 and 31 December 2005. Date of death was obtained from the AIHW NDI by probabilistic software matching, in addition to manual checking through hospital databases and other sources. Survival was calculated from time of ICU admission, with a censoring date of 14 February 2007. Data for patients with multiple hospital admissions requiring intensive care were analysed only from the first admission. Summary and descriptive statistics were used for preliminary data analysis. Kaplan-Meier survival analysis was used to analyse factors determining long-term survival. Results: During the study period, 21 415 unique patients had 22 552 hospital admissions that included an ICU admission; 19 058 surgical procedures were performed with a total of 20 092 ICU admissions. There were 4936 deaths. Median follow-up was 6.2 years, totalling 134 203 patient years. The casemix was predominantly cardiac surgery (80%), followed by cardiac medical (6%), and other medical (4%). The unadjusted survival at 1, 5 and 10 years was 97%, 84% and 70%, respectively. The 1-year survival ranged from 97% for cardiac surgery to 36% for cardiac arrest. An APACHE II score was available for 16 877 patients. In those discharged alive from hospital, the 1, 5 and 10-year survival varied with discharge location. Conclusions: ICU-based linkage projects are feasible to determine long-term outcomes of ICU patients

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Mobile ad-hoc networks (MANETs) are temporary wireless networks useful in emergency rescue services, battlefields operations, mobile conferencing and a variety of other useful applications. Due to dynamic nature and lack of centralized monitoring points, these networks are highly vulnerable to attacks. Intrusion detection systems (IDS) provide audit and monitoring capabilities that offer the local security to a node and help to perceive the specific trust level of other nodes. We take benefit of the clustering concept in MANETs for the effective communication between nodes, where each cluster involves a number of member nodes and is managed by a cluster-head. It can be taken as an advantage in these battery and memory constrained networks for the purpose of intrusion detection, by separating tasks for the head and member nodes, at the same time providing opportunity for launching collaborative detection approach. The clustering schemes are generally used for the routing purposes to enhance the route efficiency. However, the effect of change of a cluster tends to change the route; thus degrades the performance. This paper presents a low overhead clustering algorithm for the benefit of detecting intrusion rather than efficient routing. It also discusses the intrusion detection techniques with the help of this simplified clustering scheme.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper describes the approach taken to the clustering task at INEX 2009 by a group at the Queensland University of Technology. The Random Indexing (RI) K-tree has been used with a representation that is based on the semantic markup available in the INEX 2009 Wikipedia collection. The RI K-tree is a scalable approach to clustering large document collections. This approach has produced quality clustering when evaluated using two different methodologies.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This report explains the objectives, datasets and evaluation criteria of both the clustering and classification tasks set in the INEX 2009 XML Mining track. The report also describes the approaches and results obtained by the different participants.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

While close talking microphones give the best signal quality and produce the highest accuracy from current Automatic Speech Recognition (ASR) systems, the speech signal enhanced by microphone array has been shown to be an effective alternative in a noisy environment. The use of microphone arrays in contrast to close talking microphones alleviates the feeling of discomfort and distraction to the user. For this reason, microphone arrays are popular and have been used in a wide range of applications such as teleconferencing, hearing aids, speaker tracking, and as the front-end to speech recognition systems. With advances in sensor and sensor network technology, there is considerable potential for applications that employ ad-hoc networks of microphone-equipped devices collaboratively as a virtual microphone array. By allowing such devices to be distributed throughout the users’ environment, the microphone positions are no longer constrained to traditional fixed geometrical arrangements. This flexibility in the means of data acquisition allows different audio scenes to be captured to give a complete picture of the working environment. In such ad-hoc deployment of microphone sensors, however, the lack of information about the location of devices and active speakers poses technical challenges for array signal processing algorithms which must be addressed to allow deployment in real-world applications. While not an ad-hoc sensor network, conditions approaching this have in effect been imposed in recent National Institute of Standards and Technology (NIST) ASR evaluations on distant microphone recordings of meetings. The NIST evaluation data comes from multiple sites, each with different and often loosely specified distant microphone configurations. This research investigates how microphone array methods can be applied for ad-hoc microphone arrays. A particular focus is on devising methods that are robust to unknown microphone placements in order to improve the overall speech quality and recognition performance provided by the beamforming algorithms. In ad-hoc situations, microphone positions and likely source locations are not known and beamforming must be achieved blindly. There are two general approaches that can be employed to blindly estimate the steering vector for beamforming. The first is direct estimation without regard to the microphone and source locations. An alternative approach is instead to first determine the unknown microphone positions through array calibration methods and then to use the traditional geometrical formulation for the steering vector. Following these two major approaches investigated in this thesis, a novel clustered approach which includes clustering the microphones and selecting the clusters based on their proximity to the speaker is proposed. Novel experiments are conducted to demonstrate that the proposed method to automatically select clusters of microphones (ie, a subarray), closely located both to each other and to the desired speech source, may in fact provide a more robust speech enhancement and recognition than the full array could.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Digital collections are growing exponentially in size as the information age takes a firm grip on all aspects of society. As a result Information Retrieval (IR) has become an increasingly important area of research. It promises to provide new and more effective ways for users to find information relevant to their search intentions. Document clustering is one of the many tools in the IR toolbox and is far from being perfected. It groups documents that share common features. This grouping allows a user to quickly identify relevant information. If these groups are misleading then valuable information can accidentally be ignored. There- fore, the study and analysis of the quality of document clustering is important. With more and more digital information available, the performance of these algorithms is also of interest. An algorithm with a time complexity of O(n2) can quickly become impractical when clustering a corpus containing millions of documents. Therefore, the investigation of algorithms and data structures to perform clustering in an efficient manner is vital to its success as an IR tool. Document classification is another tool frequently used in the IR field. It predicts categories of new documents based on an existing database of (doc- ument, category) pairs. Support Vector Machines (SVM) have been found to be effective when classifying text documents. As the algorithms for classifica- tion are both efficient and of high quality, the largest gains can be made from improvements to representation. Document representations are vital for both clustering and classification. Representations exploit the content and structure of documents. Dimensionality reduction can improve the effectiveness of existing representations in terms of quality and run-time performance. Research into these areas is another way to improve the efficiency and quality of clustering and classification results. Evaluating document clustering is a difficult task. Intrinsic measures of quality such as distortion only indicate how well an algorithm minimised a sim- ilarity function in a particular vector space. Intrinsic comparisons are inherently limited by the given representation and are not comparable between different representations. Extrinsic measures of quality compare a clustering solution to a “ground truth” solution. This allows comparison between different approaches. As the “ground truth” is created by humans it can suffer from the fact that not every human interprets a topic in the same manner. Whether a document belongs to a particular topic or not can be subjective.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The complete nucleotide sequence of rice tungro spherical virus (RTSV) strain Vt6, originally from Mindanao, the Philippines, with higher virulence to resistant rice cultivars, was determined and compared with the published sequence for the Philippine-type strain A (RTSV-A-Shen). It was reported that RTSV-A was not able to infect a rice resistant cultivar TKM 6 (10). RTSV-Vt6 and RTSV-A-Shen share 90% and 95% homology at nucleotide and amino-acid levels, respectively. The N-terminal leader sequence of RTSV-Vt6 contained a 39-amino acids-region (positions 65 to 103) which was totally different from that of RTSV-A-Shen; the difference resulted from frame shifting by nucleotide insertions and deletions. To confirm the amino-acid sequence differences of the leader polypeptide, the same region was cloned and sequenced using a newly obtained variant of RTSV-type 6, which had been collected in the field of IRRI, and seven field isolates from Mindanao, the Philippines. Since all the sequences of the target region are identical to that of the Vt6 leader polypeptide, the sequence difference in the leader region seems not to correlate with the virulence of Vt6.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A hierarchical structure is used to represent the content of the semi-structured documents such as XML and XHTML. The traditional Vector Space Model (VSM) is not sufficient to represent both the structure and the content of such web documents. Hence in this paper, we introduce a novel method of representing the XML documents in Tensor Space Model (TSM) and then utilize it for clustering. Empirical analysis shows that the proposed method is scalable for a real-life dataset as well as the factorized matrices produced from the proposed method helps to improve the quality of clusters due to the enriched document representation with both the structure and the content information.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents an overview of the experiments conducted using Hybrid Clustering of XML documents using Constraints (HCXC) method for the clustering task in the INEX 2009 XML Mining track. This technique utilises frequent subtrees generated from the structure to extract the content for clustering the XML documents. It also presents the experimental study using several data representations such as the structure-only, content-only and using both the structure and the content of XML documents for the purpose of clustering them. Unlike previous years, this year the XML documents were marked up using the Wiki tags and contains categories derived by using the YAGO ontology. This paper also presents the results of studying the effect of these tags on XML clustering using the HCXC method.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In February 2010, the Delhi High Court delivered its decision in Bayer Corp v Union of India in which Bayer had appealed against an August 2009 decision of the same court. Both decisions prevented Bayer from introducing the concept of patent linkage into India’s drug regulatory regime. Bayer appealed to the Indian Supreme Court, the highest court in India, which agreed on 2 March 2010 to hear the appeal. Given that India is regarded as a global pharmaceutical manufacturer of generic medications, how its judiciary and government perceive their international obligations has a significant impact on the global access to medicines regime. In rejecting the application of patent linkage, the case provides an opportunity for India to further acknowledge its international human rights obligations.