953 resultados para Annotation Tag
Speaker attribution of multiple telephone conversations using a complete-linkage clustering approach
Resumo:
In this paper we propose and evaluate a speaker attribution system using a complete-linkage clustering method. Speaker attribution refers to the annotation of a collection of spoken audio based on speaker identities. This can be achieved using diarization and speaker linking. The main challenge associated with attribution is achieving computational efficiency when dealing with large audio archives. Traditional agglomerative clustering methods with model merging and retraining are not feasible for this purpose. This has motivated the use of linkage clustering methods without retraining. We first propose a diarization system using complete-linkage clustering and show that it outperforms traditional agglomerative and single-linkage clustering based diarization systems with a relative improvement of 40% and 68%, respectively. We then propose a complete-linkage speaker linking system to achieve attribution and demonstrate a 26% relative improvement in attribution error rate (AER) over the single-linkage speaker linking approach.
Resumo:
Poem.
Resumo:
Citizen Science projects are initiatives in which members of the general public participate in scientific research projects and perform or manage research-related tasks such as data collection and/or data annotation. Citizen Science is technologically possible and scientifically significant. However, although research teams can save time and money by recruiting general citizens to volunteer their time and skills to help data analysis, the reliability of contributed data varies a lot. Data reliability issues are significant to the domain of Citizen Science due to the quantity and diversity of people and devices involved. Participants may submit low quality, misleading, inaccurate, or even malicious data. Therefore, finding a way to improve the data reliability has become an urgent demand. This study aims to investigate techniques to enhance the reliability of data contributed by general citizens in scientific research projects especially for acoustic sensing projects. In particular, we propose to design a reputation framework to enhance data reliability and also investigate some critical elements that should be aware of during developing and designing new reputation systems.
Resumo:
This item provides supplementary materials for the paper mentioned in the title, specifically a range of organisms used in the study. The full abstract for the main paper is as follows: Next Generation Sequencing (NGS) technologies have revolutionised molecular biology, allowing clinical sequencing to become a matter of routine. NGS data sets consist of short sequence reads obtained from the machine, given context and meaning through downstream assembly and annotation. For these techniques to operate successfully, the collected reads must be consistent with the assumed species or species group, and not corrupted in some way. The common bacterium Staphylococcus aureus may cause severe and life-threatening infections in humans,with some strains exhibiting antibiotic resistance. In this paper, we apply an SVM classifier to the important problem of distinguishing S. aureus sequencing projects from alternative pathogens, including closely related Staphylococci. Using a sequence k-mer representation, we achieve precision and recall above 95%, implicating features with important functional associations.
Resumo:
The increase of online services, such as eBanks, WebMails, in which users are verified by a username and password, is increasingly exploited by Identity Theft procedures. Identity Theft is a fraud, in which someone pretends to be someone else is order to steal money or get other benefits. To overcome the problem of Identity Theft an additional security layer is required. Within the last decades the option of verifying users based on their keystroke dynamics was proposed during login verification. Thus, the imposter has to be able to type in a similar way to the real user in addition to having the username and password. However, verifying users upon login is not enough, since a logged station/mobile is vulnerable for imposters when the user leaves her machine. Thus, verifying users continuously based on their activities is required. Within the last decade there is a growing interest and use of biometrics tools, however, these are often costly and require additional hardware. Behavioral biometrics, in which users are verified, based on their keyboard and mouse activities, present potentially a good solution. In this paper we discuss the problem of Identity Theft and propose behavioral biometrics as a solution. We survey existing studies and list the challenges and propose solutions.
Resumo:
Several approaches have been introduced in literature for active noise control (ANC) systems. Since FxLMS algorithm appears to be the best choice as a controller filter, researchers tend to improve performance of ANC systems by enhancing and modifying this algorithm. This paper proposes a new version of FxLMS algorithm. In many ANC applications an online secondary path modelling method using a white noise as a training signal is required to ensure convergence of the system. This paper also proposes a new approach for online secondary path modelling in feedfoward ANC systems. The proposed algorithm stops injection of the white noise at the optimum point and reactivate the injection during the operation, if needed, to maintain performance of the system. Benefiting new version of FxLMS algorithm and not continually injection of white noise makes the system more desirable and improves the noise attenuation performance. Comparative simulation results indicate effectiveness of the proposed approach.
Resumo:
Presented is the material and gas sensing properties of graphene-like nano-sheets deposited on 36° YX lithium tantalate (LiTaO3) surface acoustic wave (SAW) transducers. The graphene-like nano-sheets were characterized via scanning electron microscopy (SEM), atomic force microscopy(AFM)and X-ray photoelectron spectroscopy (XPS). The graphenelike nano-sheet/SAW sensors were exposed to different concentrations of hydrogen (H2) gas in a synthetic air at room temperature. The developed sensors exhibit good sensitivity towards low concentrations of H2 in ambient conditions, as well as excellent dynamic performance towards H2 at room temperature.
Resumo:
This paper examines patterns of political activity and campaigning on Twitter in the context of the 2012 election in the Australian state of Queensland. Social media have been a visible component of political campaigning in Australia at least since the 2007 federal election, with Twitter, in particular, rising to greater prominence in the 2010 federal election. At state level, however, they have remained comparatively less important thus far. In this paper, we track uses of Twitter in the Queensland campaign from its unofficial start in February through to the election day of 24 March 2012. We both examine the overall patterns of activity in the hash tag #qldvotes, and track specific interactions between politicians and other users by following some 80 Twitter accounts of sitting members of parliament and alternative candidates. Such analysis provides new insights into the different approaches to social media campaigning which were embraced by specific candidates and party organisations, as well as an indication of the relative importance of social media activities, at present, for state-level election campaigns.
Resumo:
Availability has become a primary goal of information security and is as significant as other goals, in particular, confidentiality and integrity. Maintaining availability of essential services on the public Internet is an increasingly difficult task in the presence of sophisticated attackers. Attackers may abuse limited computational resources of a service provider and thus managing computational costs is a key strategy for achieving the goal of availability. In this thesis we focus on cryptographic approaches for managing computational costs, in particular computational effort. We focus on two cryptographic techniques: computational puzzles in cryptographic protocols and secure outsourcing of cryptographic computations. This thesis contributes to the area of cryptographic protocols in the following ways. First we propose the most efficient puzzle scheme based on modular exponentiations which, unlike previous schemes of the same type, involves only a few modular multiplications for solution verification; our scheme is provably secure. We then introduce a new efficient gradual authentication protocol by integrating a puzzle into a specific signature scheme. Our software implementation results for the new authentication protocol show that our approach is more efficient and effective than the traditional RSA signature-based one and improves the DoSresilience of Secure Socket Layer (SSL) protocol, the most widely used security protocol on the Internet. Our next contributions are related to capturing a specific property that enables secure outsourcing of cryptographic tasks in partial-decryption. We formally define the property of (non-trivial) public verifiability for general encryption schemes, key encapsulation mechanisms (KEMs), and hybrid encryption schemes, encompassing public-key, identity-based, and tag-based encryption avors. We show that some generic transformations and concrete constructions enjoy this property and then present a new public-key encryption (PKE) scheme having this property and proof of security under the standard assumptions. Finally, we combine puzzles with PKE schemes for enabling delayed decryption in applications such as e-auctions and e-voting. For this we first introduce the notion of effort-release PKE (ER-PKE), encompassing the well-known timedrelease encryption and encapsulated key escrow techniques. We then present a security model for ER-PKE and a generic construction of ER-PKE complying with our security notion.
Resumo:
The giant freshwater prawn (Macrobrachium rosenbergii) or GFP is one of the most important freshwater crustacean species in the inland aquaculture sector of many tropical and subtropical countries. Since the 1990’s, there has been rapid global expansion of freshwater prawn farming, especially in Asian countries, with an average annual rate of increase of 48% between 1999 and 2001 (New, 2005). In Vietnam, GFP is cultured in a variety of culture systems, typically in integrated or rotational rice-prawn culture (Phuong et al., 2006) and has become one of the most common farmed aquatic species in the country, due to its ability to grow rapidly and to attract high market price and high demand. Despite potential for expanded production, sustainability of freshwater prawn farming in the region is currently threatened by low production efficiency and vulnerability of farmed stocks to disease. Commercial large scale and small scale GFP farms in Vietnam have experienced relatively low stock productivity, large size and weight variation, a low proportion of edible meat (large head to body ratio), scarcity of good quality seed stock. The current situation highlights the need for a systematic stock improvement program for GFP in Vietnam aimed at improving economically important traits in this species. This study reports on the breeding program for fast growth employing combined (between and within) family selection in giant freshwater prawn in Vietnam. The base population was synthesized using a complete diallel cross including 9 crosses from two local stocks (DN and MK strains) and a third exotic stock (Malaysian strain - MY). In the next three selection generations, matings were conducted between genetically unrelated brood stock to produce full-sib and (paternal) half-sib families. All families were produced and reared separately until juveniles in each family were tagged as a batch using visible implant elastomer (VIE) at a body size of approximately 2 g. After tags were verified, 60 to 120 juveniles chosen randomly from each family were released into two common earthen ponds of 3,500 m2 pond for a grow-out period of 16 to 18 weeks. Selection applied at harvest on body weight was a combined (between and within) family selection approach. 81, 89, 96 and 114 families were produced for the Selection line in the F0, F1, F2 and F3 generations, respectively. In addition to the Selection line, 17 to 42 families were produced for the Control group in each generation. Results reported here are based on a data set consisting of 18,387 body and 1,730 carcass records, as well as full pedigree information collected over four generations. Variance and covariance components were estimated by restricted maximum likelihood fitting a multi-trait animal model. Experiments assessed performance of VIE tags in juvenile GFP of different size classes and individuals tagged with different numbers of tags showed that juvenile GFP at 2 g were of suitable size for VIE tags with no negative effects evident on growth or survival. Tag retention rates were above 97.8% and tag readability rates were 100% with a correct assignment rate of 95% through to mature animal size of up to 170 g. Across generations, estimates of heritability for body traits (body weight, body length, cephalothorax length, abdominal length, cephalothorax width and abdominal width) and carcass weight traits (abdominal weight, skeleton-off weight and telson-off weight) were moderate and ranged from 0.14 to 0.19 and 0.17 to 0.21, respectively. Body trait heritabilities estimated for females were significantly higher than for males whereas carcass weight trait heritabilities estimated for females and males were not significantly different (P > 0.05). Maternal and common environmental effects for body traits accounted for 4 to 5% of the total variance and were greater in females (7 to 10%) than in males (4 to 5%). Genetic correlations among body traits were generally high in both sexes. Genetic correlations between body and carcass weight traits were also high in the mixed sexes. Average selection response (% per generation) for body weight (transformed to square root) estimated as the difference between the Selection and the Control group was 7.4% calculated from least squares means (LSMs), 7.0% from estimated breeding values (EBVs) and 4.4% calculated from EBVs between two consecutive generations. Favourable correlated selection responses (estimated from LSMs) were detected for other body traits (12.1%, 14.5%, 10.4%, 15.5% and 13.3% for body length, cephalothorax length, abdominal length, cephalothorax width and abdominal width, respectively) over three selection generations. Data in the second selection generation showed positive correlated responses for carcass weight traits (8.8%, 8.6% and 8.8% for abdominal weight, skeleton-off weight and telson-off weight, respectively). Data in the third selection generation showed that heritability for body traits were moderate and ranged from 0.06 to 0.11 and 0.11 to 0.22 at weeks 10 and 18, respectively. Body trait heritabilities estimated at week 10 were not significantly lower than at week 18. Genetic correlations between body traits within age and genetic correlations for body traits between ages were generally high. Overall our results suggest that growth rate responds well to the application of family selection and carcass weight traits can also be improved in parallel, using this approach. Moreover, selection for high growth rate in GFP can be undertaken successfully before full market size has been reached. The outcome of this study was production of an improved culture strain of GFP for the Vietnamese culture industry that will be trialed in real farm production environments to confirm the genetic gains identified in the experimental stock improvement program.
Resumo:
Next Generation Sequencing (NGS) has revolutionised molec- ular biology, allowing routine clinical sequencing. NGS data consists of short sequence reads, given context through downstream assembly and annotation, a process requiring reads consistent with the assumed species or species group. The common bacterium Staphylococcus aureus may cause severe and life-threatening infections in humans, with some strains exhibiting antibiotic resistance. Here we apply an SVM classifier to the important problem of distinguishing S. aureus sequencing projects from other pathogens, including closely related Staphylococci. Using a sequence k-mer representation, we achieve precision and recall above 95%, implicating features with important functional associations.
Resumo:
The representation of business process models has been a continuing research topic for many years now. However, many process model representations have not developed beyond minimally interactive 2D icon-based representations of directed graphs and networks, with little or no annotation for information overlays. In addition, very few of these representations have undergone a thorough analysis or design process with reference to psychological theories on data and process visualization. This dearth of visualization research, we believe, has led to problems with BPM uptake in some organizations, as the representations can be difficult for stakeholders to understand, and thus remains an open research question for the BPM community. In addition, business analysts and process modeling experts themselves need visual representations that are able to assist with key BPM life cycle tasks in the process of generating optimal solutions. With the rise of desktop computers and commodity mobile devices capable of supporting rich interactive 3D environments, we believe that much of the research performed in computer human interaction, virtual reality, games and interactive entertainment have much potential in areas of BPM; to engage, provide insight, and to promote collaboration amongst analysts and stakeholders alike. We believe this is a timely topic, with research emerging in a number of places around the globe, relevant to this workshop. This is the second TAProViz workshop being run at BPM. The intention this year is to consolidate on the results of last year's successful workshop by further developing this important topic, identifying the key research topics of interest to the BPM visualization community.
Resumo:
Prior to the completion of the human genome project, the human genome was thought to have a greater number of genes as it seemed structurally and functionally more complex than other simpler organisms. This along with the belief of “one gene, one protein”, were demonstrated to be incorrect. The inequality in the ratio of gene to protein formation gave rise to the theory of alternative splicing (AS). AS is a mechanism by which one gene gives rise to multiple protein products. Numerous databases and online bioinformatic tools are available for the detection and analysis of AS. Bioinformatics provides an important approach to study mRNA and protein diversity by various tools such as expressed sequence tag (EST) sequences obtained from completely processed mRNA. Microarrays and deep sequencing approaches also aid in the detection of splicing events. Initially it was postulated that AS occurred only in about 5%; of all genes but was later found to be more abundant. Using bioinformatic approaches, the level of AS in human genes was found to be fairly high with 35-59%; of genes having at least one AS form. Our ability to determine and predict AS is important as disorders in splicing patterns may lead to abnormal splice variants resulting in genetic diseases. In addition, the diversity of proteins produced by AS poses a challenge for successful drug discovery and therefore a greater understanding of AS would be beneficial.
Resumo:
Background: Multiple sclerosis (MS) is the most common cause of chronic neurologic disability beginning in early to middle adult life. Results from recent genome-wide association studies (GWAS) have substantially lengthened the list of disease loci and provide convincing evidence supporting a multifactorial and polygenic model of inheritance. Nevertheless, the knowledge of MS genetics remains incomplete, with many risk alleles still to be revealed. Methods: We used a discovery GWAS dataset (8,844 samples, 2,124 cases and 6,720 controls) and a multi-step logistic regression protocol to identify novel genetic associations. The emerging genetic profile included 350 independent markers and was used to calculate and estimate the cumulative genetic risk in an independent validation dataset (3,606 samples). Analysis of covariance (ANCOVA) was implemented to compare clinical characteristics of individuals with various degrees of genetic risk. Gene ontology and pathway enrichment analysis was done using the DAVID functional annotation tool, the GO Tree Machine, and the Pathway-Express profiling tool. Results: In the discovery dataset, the median cumulative genetic risk (P-Hat) was 0.903 and 0.007 in the case and control groups, respectively, together with 79.9% classification sensitivity and 95.8% specificity. The identified profile shows a significant enrichment of genes involved in the immune response, cell adhesion, cell communication/ signaling, nervous system development, and neuronal signaling, including ionotropic glutamate receptors, which have been implicated in the pathological mechanism driving neurodegeneration. In the validation dataset, the median cumulative genetic risk was 0.59 and 0.32 in the case and control groups, respectively, with classification sensitivity 62.3% and specificity 75.9%. No differences in disease progression or T2-lesion volumes were observed among four levels of predicted genetic risk groups (high, medium, low, misclassified). On the other hand, a significant difference (F = 2.75, P = 0.04) was detected for age of disease onset between the affected misclassified as controls (mean = 36 years) and the other three groups (high, 33.5 years; medium, 33.4 years; low, 33.1 years). Conclusions: The results are consistent with the polygenic model of inheritance. The cumulative genetic risk established using currently available genome-wide association data provides important insights into disease heterogeneity and completeness of current knowledge in MS genetics.
Resumo:
We have explored the potential of deep Raman spectroscopy, specifically surface enhanced spatially offset Raman spectroscopy (SESORS), for non-invasive detection from within animal tissue, by employing SERS-barcoded nanoparticle (NP) assemblies as the diagnostic agent. This concept has been experimentally verified in a clinic-relevant backscattered Raman system with an excitation line of 785 nm under ex vivo conditions. We have shown that our SORS system, with a fixed offset of 2-3 mm, offered sensitive probing of injected QTH-barcoded NP assemblies through animal tissue containing both protein and lipid. In comparison to that of non-aggregated SERS-barcoded gold NPs, we have demonstrated that the tailored SERS-barcoded aggregated NP assemblies have significantly higher detection sensitivity. We report that these NP assemblies can be readily detected at depths of 7-8 mm from within animal proteinaceous tissue with high signal-to-noise (S/N) ratio. In addition they could also be detected from beneath 1-2 mm of animal tissue with high lipid content, which generally poses a challenge due to high absorption of lipids in the near-infrared region. We have also shown that the signal intensity and S/N ratio at a particular depth is a function of the SERS tag concentration used and that our SORS system has a QTH detection limit of 10-6 M. Higher detection depths may possibly be obtained with optimization of the NP assemblies, along with improvements in the instrumentation. Such NP assemblies offer prospects for in vivo, non-invasive detection of tumours along with scope for incorporation of drugs and their targeted and controlled release at tumour sites. These diagnostic agents combined with drug delivery systems could serve as a “theranostic agent”, an integration of diagnostics and therapeutics into a single platform.