785 resultados para clustering and QoS-aware routing
Resumo:
For many applications, it is necessary to produce speech transcriptions in a causal fashion. To produce high quality transcripts, speaker adaptation is often used. This requires online speaker clustering and incremental adaptation techniques to be developed. This paper presents an integrated approach to online speaker clustering and adaptation which allows efficient clustering of speakers using the same accumulated statistics that are normally used for adaptation. Using a consistent criterion for both clustering and adaptation should yield gains for both stages. The proposed approach is evaluated on a meetings transcription task using audio from multiple distant microphones. Consistent gains over standard clustering and adaptation were obtained. Copyright © 2011 ISCA.
Resumo:
The fundamental aim of clustering algorithms is to partition data points. We consider tasks where the discovered partition is allowed to vary with some covariate such as space or time. One approach would be to use fragmentation-coagulation processes, but these, being Markov processes, are restricted to linear or tree structured covariate spaces. We define a partition-valued process on an arbitrary covariate space using Gaussian processes. We use the process to construct a multitask clustering model which partitions datapoints in a similar way across multiple data sources, and a time series model of network data which allows cluster assignments to vary over time. We describe sampling algorithms for inference and apply our method to defining cancer subtypes based on different types of cellular characteristics, finding regulatory modules from gene expression data from multiple human populations, and discovering time varying community structure in a social network.
Resumo:
This paper proposes a novel protocol which uses the Internet Domain Name System (DNS) to partition Web clients into disjoint sets, each of which is associated with a single DNS server. We define an L-DNS cluster to be a grouping of Web Clients that use the same Local DNS server to resolve Internet host names. We identify such clusters in real-time using data obtained from a Web Server in conjunction with that server's Authoritative DNS―both instrumented with an implementation of our clustering algorithm. Using these clusters, we perform measurements from four distinct Internet locations. Our results show that L-DNS clustering enables a better estimation of proximity of a Web Client to a Web Server than previously proposed techniques. Thus, in a Content Distribution Network, a DNS-based scheme that redirects a request from a web client to one of many servers based on the client's name server coordinates (e.g., hops/latency/loss-rates between the client and servers) would perform better with our algorithm.
Resumo:
Spectral methods of graph partitioning have been shown to provide a powerful approach to the image segmentation problem. In this paper, we adopt a different approach, based on estimating the isoperimetric constant of an image graph. Our algorithm produces the high quality segmentations and data clustering of spectral methods, but with improved speed and stability.
Resumo:
Laying hens generally choose to aggregate, but the extent to which the environments in which we house them impact on social group dynamics is not known. In this paper the effect of pen environment on spatial clustering is considered. Twelve groups of four laying hens were studied under three environmental conditions: wire floor (W), shavings (Sh) and perches, peat, nestbox and shavings (PPN). Groups experienced each environment twice, for five weeks each time, in a systematic order that varied from group to group. Video recordings were made one day per week for 30 weeks. To determine level of clustering, we recorded positional data from a randomly selected 20-min excerpt per video (a total of 20 min x 360 videos analysed). On screen, pens were divided into six equal areas. In addition, PPN pens were divided into an additional four (sub) areas, to account for the use of perches (one area per half perch). Every 5 s, we recorded the location of each bird and calculated location use over time, feeding synchrony and cluster scores for each environment. Feeding synchrony and cluster scores were compared against unweighted and weighted (according to observed proportional location use) Poisson distributions to distinguish between resource and social attraction.
Resumo:
In studies of radiation-induced DNA fragmentation and repair, analytical models may provide rapid and easy-to-use methods to test simple hypotheses regarding the breakage and rejoining mechanisms involved. The random breakage model, according to which lesions are distributed uniformly and independently of each other along the DNA, has been the model most used to describe spatial distribution of radiation-induced DNA damage. Recently several mechanistic approaches have been proposed that model clustered damage to DNA. In general, such approaches focus on the study of initial radiation-induced DNA damage and repair, without considering the effects of additional (unwanted and unavoidable) fragmentation that may take place during the experimental procedures. While most approaches, including measurement of total DNA mass below a specified value, allow for the occurrence of background experimental damage by means of simple subtractive procedures, a more detailed analysis of DNA fragmentation necessitates a more accurate treatment. We have developed a new, relatively simple model of DNA breakage and the resulting rejoining kinetics of broken fragments. Initial radiation-induced DNA damage is simulated using a clustered breakage approach, with three free parameters: the number of independently located clusters, each containing several DNA double-strand breaks (DSBs), the average number of DSBs within a cluster (multiplicity of the cluster), and the maximum allowed radius within which DSBs belonging to the same cluster are distributed. Random breakage is simulated as a special case of the DSB clustering procedure. When the model is applied to the analysis of DNA fragmentation as measured with pulsed-field gel electrophoresis (PFGE), the hypothesis that DSBs in proximity rejoin at a different rate from that of sparse isolated breaks can be tested, since the kinetics of rejoining of fragments of varying size may be followed by means of computer simulations. The problem of how to account for background damage from experimental handling is also carefully considered. We have shown that the conventional procedure of subtracting the background damage from the experimental data may lead to erroneous conclusions during the analysis of both initial fragmentation and DSB rejoining. Despite its relative simplicity, the method presented allows both the quantitative and qualitative description of radiation-induced DNA fragmentation and subsequent rejoining of double-stranded DNA fragments. (C) 2004 by Radiation Research Society.
Resumo:
Massive multiple-input multiple-output (MIMO) systems are cellular networks where the base stations (BSs) are equipped with unconventionally many antennas, deployed on colocated or distributed arrays. Huge spatial degrees-of-freedom are achieved by coherent processing over these massive arrays, which provide strong signal gains, resilience to imperfect channel knowledge, and low interference. This comes at the price of more infrastructure; the hardware cost and circuit power consumption scale linearly/affinely with the number of BS antennas N. Hence, the key to cost-efficient deployment of large arrays is low-cost antenna branches with low circuit power, in contrast to today’s conventional expensive and power-hungry BS antenna branches. Such low-cost transceivers are prone to hardware imperfections, but it has been conjectured that the huge degrees-of-freedom would bring robustness to such imperfections. We prove this claim for a generalized uplink system with multiplicative phasedrifts, additive distortion noise, and noise amplification. Specifically, we derive closed-form expressions for the user rates and a scaling law that shows how fast the hardware imperfections can increase with N while maintaining high rates. The connection between this scaling law and the power consumption of different transceiver circuits is rigorously exemplified. This reveals that one can make the circuit power increase as p N, instead of linearly, by careful circuit-aware system design.
Resumo:
Increasingly semiconductor manufacturers are exploring opportunities for virtual metrology (VM) enabled process monitoring and control as a means of reducing non-value added metrology and achieving ever more demanding wafer fabrication tolerances. However, developing robust, reliable and interpretable VM models can be very challenging due to the highly correlated input space often associated with the underpinning data sets. A particularly pertinent example is etch rate prediction of plasma etch processes from multichannel optical emission spectroscopy data. This paper proposes a novel input-clustering based forward stepwise regression methodology for VM model building in such highly correlated input spaces. Max Separation Clustering (MSC) is employed as a pre-processing step to identify a reduced srt of well-conditioned, representative variables that can then be used as inputs to state-of-the-art model building techniques such as Forward Selection Regression (FSR), Ridge regression, LASSO and Forward Selection Ridge Regression (FCRR). The methodology is validated on a benchmark semiconductor plasma etch dataset and the results obtained are compared with those achieved when the state-of-art approaches are applied directly to the data without the MSC pre-processing step. Significant performance improvements are observed when MSC is combined with FSR (13%) and FSRR (8.5%), but not with Ridge Regression (-1%) or LASSO (-32%). The optimal VM results are obtained using the MSC-FSR and MSC-FSRR generated models. © 2012 IEEE.
Resumo:
In Mobile Ad hoc NETworks (MANETs), where cooperative behaviour is mandatory, there is a high probability for some nodes to become overloaded with packet forwarding operations in order to support neighbor data exchange. This altruistic behaviour leads to an unbalanced load in the network in terms of traffic and energy consumption. In such scenarios, mobile nodes can benefit from the use of energy efficient and traffic fitting routing protocol that better suits the limited battery capacity and throughput limitation of the network. This PhD work focuses on proposing energy efficient and load balanced routing protocols for ad hoc networks. Where most of the existing routing protocols simply consider the path length metric when choosing the best route between a source and a destination node, in our proposed mechanism, nodes are able to find several routes for each pair of source and destination nodes and select the best route according to energy and traffic parameters, effectively extending the lifespan of the network. Our results show that by applying this novel mechanism, current flat ad hoc routing protocols can achieve higher energy efficiency and load balancing. Also, due to the broadcast nature of the wireless channels in ad hoc networks, other technique such as Network Coding (NC) looks promising for energy efficiency. NC can reduce the number of transmissions, number of re-transmissions, and increase the data transfer rate that directly translates to energy efficiency. However, due to the need to access foreign nodes for coding and forwarding packets, NC needs a mitigation technique against unauthorized accesses and packet corruption. Therefore, we proposed different mechanisms for handling these security attacks by, in particular by serially concatenating codes to support reliability in ad hoc network. As a solution to this problem, we explored a new security framework that proposes an additional degree of protection against eavesdropping attackers based on using concatenated encoding. Therefore, malicious intermediate nodes will find it computationally intractable to decode the transitive packets. We also adopted another code that uses Luby Transform (LT) as a pre-coding code for NC. Primarily being designed for security applications, this code enables the sink nodes to recover corrupted packets even in the presence of byzantine attacks.
Resumo:
The growing importance and influence of new resources connected to the power systems has caused many changes in their operation. Environmental policies and several well know advantages have been made renewable based energy resources largely disseminated. These resources, including Distributed Generation (DG), are being connected to lower voltage levels where Demand Response (DR) must be considered too. These changes increase the complexity of the system operation due to both new operational constraints and amounts of data to be processed. Virtual Power Players (VPP) are entities able to manage these resources. Addressing these issues, this paper proposes a methodology to support VPP actions when these act as a Curtailment Service Provider (CSP) that provides DR capacity to a DR program declared by the Independent System Operator (ISO) or by the VPP itself. The amount of DR capacity that the CSP can assure is determined using data mining techniques applied to a database which is obtained for a large set of operation scenarios. The paper includes a case study based on 27,000 scenarios considering a diversity of distributed resources in a 33 bus distribution network.
Resumo:
Cloud computing is increasingly being adopted in different scenarios, like social networking, business applications, scientific experiments, etc. Relying in virtualization technology, the construction of these computing environments targets improvements in the infrastructure, such as power-efficiency and fulfillment of users’ SLA specifications. The methodology usually applied is packing all the virtual machines on the proper physical servers. However, failure occurrences in these networked computing systems can induce substantial negative impact on system performance, deviating the system from ours initial objectives. In this work, we propose adapted algorithms to dynamically map virtual machines to physical hosts, in order to improve cloud infrastructure power-efficiency, with low impact on users’ required performance. Our decision making algorithms leverage proactive fault-tolerance techniques to deal with systems failures, allied with virtual machine technology to share nodes resources in an accurately and controlled manner. The results indicate that our algorithms perform better targeting power-efficiency and SLA fulfillment, in face of cloud infrastructure failures.
Resumo:
In this work a new method for clustering and building a topographic representation of a bacteria taxonomy is presented. The method is based on the analysis of stable parts of the genome, the so-called “housekeeping genes”. The proposed method generates topographic maps of the bacteria taxonomy, where relations among different type strains can be visually inspected and verified. Two well known DNA alignement algorithms are applied to the genomic sequences. Topographic maps are optimized to represent the similarity among the sequences according to their evolutionary distances. The experimental analysis is carried out on 147 type strains of the Gammaprotebacteria class by means of the 16S rRNA housekeeping gene. Complete sequences of the gene have been retrieved from the NCBI public database. In the experimental tests the maps show clusters of homologous type strains and present some singular cases potentially due to incorrect classification or erroneous annotations in the database.