364 resultados para Dataset


Relevância:

10.00% 10.00%

Publicador:

Resumo:

A database will be protected under Australian law if it is a literary work; expressed in material form; meets the originality test; and has a relevant connection with Australia. Facts and data in themselves are not protected by copyright. However, a collection of data, a dataset, or a database may be protected by copyright if it is sufficiently original. Whether a work is sufficiently original to be protected by copyright depends on whether it has been produced by the application of independent intellectual effort by the author/s, which may involve the exercise of skill, judgement, or creativity in selecting, presenting, or arranging the information. This summary synthesises recent cases regarding originality in factual compilations.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Despite recent methodological advances in inferring the time-scale of biological evolution from molecular data, the fundamental question of whether our substitution models are sufficiently well specified to accurately estimate branch-lengths has received little attention. I examine this implicit assumption of all molecular dating methods, on a vertebrate mitochondrial protein-coding dataset. Comparison with analyses in which the data are RY-coded (AG → R; CT → Y) suggests that even rates-across-sites maximum likelihood greatly under-compensates for multiple substitutions among the standard (ACGT) NT-coded data, which has been subject to greater phylogenetic signal erosion. Accordingly, the fossil record indicates that branch-lengths inferred from the NT-coded data translate into divergence time overestimates when calibrated from deeper in the tree. Intriguingly, RY-coding led to the opposite result. The underlying NT and RY substitution model misspecifications likely relate respectively to “hidden” rate heterogeneity and changes in substitution processes across the tree, for which I provide simulated examples. Given the magnitude of the inferred molecular dating errors, branch-length estimation biases may partly explain current conflicts with some palaeontological dating estimates.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background Evolutionary biologists are often misled by convergence of morphology and this has been common in the study of bird evolution. However, the use of molecular data sets have their own problems and phylogenies based on short DNA sequences have the potential to mislead us too. The relationships among clades and timing of the evolution of modern birds (Neoaves) has not yet been well resolved. Evidence of convergence of morphology remain controversial. With six new bird mitochondrial genomes (hummingbird, swift, kagu, rail, flamingo and grebe) we test the proposed Metaves/Coronaves division within Neoaves and the parallel radiations in this primary avian clade. Results Our mitochondrial trees did not return the Metaves clade that had been proposed based on one nuclear intron sequence. We suggest that the high number of indels within the seventh intron of the β-fibrinogen gene at this phylogenetic level, which left a dataset with not a single site across the alignment shared by all taxa, resulted in artifacts during analysis. With respect to the overall avian tree, we find the flamingo and grebe are sister taxa and basal to the shorebirds (Charadriiformes). Using a novel site-stripping technique for noise-reduction we found this relationship to be stable. The hummingbird/swift clade is outside the large and very diverse group of raptors, shore and sea birds. Unexpectedly the kagu is not closely related to the rail in our analysis, but because neither the kagu nor the rail have close affinity to any taxa within this dataset of 41 birds, their placement is not yet resolved. Conclusion Our phylogenetic hypothesis based on 41 avian mitochondrial genomes (13,229 bp) rejects monophyly of seven Metaves species and we therefore conclude that the members of Metaves do not share a common evolutionary history within the Neoaves.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background The genus Rattus is highly speciose and has a complex taxonomy that is not fully resolved. As shown previously there are two major groups within the genus, an Asian and an Australo-Papuan group. This study focuses on the Australo-Papuan group and particularly on the Australian rats. There are uncertainties regarding the number of species within the group and the relationships among them. We analysed 16 mitochondrial genomes, including seven novel genomes from six species, to help elucidate the evolutionary history of the Australian rats. We also demonstrate, from a larger dataset, the usefulness of short regions of the mitochondrial genome in identifying these rats at the species level. Results Analyses of 16 mitochondrial genomes representing species sampled from Australo-Papuan and Asian clades of Rattus indicate divergence of these two groups ~2.7 million years ago (Mya). Subsequent diversification of at least 4 lineages within the Australo-Papuan clade was rapid and occurred over the period from ~ 0.9-1.7 Mya, a finding that explains the difficulty in resolving some relationships within this clade. Phylogenetic analyses of our 126 taxon, but shorter sequence (1952 nucleotides long), Rattus database generally give well supported species clades. Conclusions Our whole mitochondrial genome analyses are concordant with a taxonomic division that places the native Australian rats into the Rattus fuscipes species group. We suggest the following order of divergence of the Australian species. R. fuscipes is the oldest lineage among the Australian rats and is not part of a New Guinean radiation. R. lutreolus is also within this Australian clade and shallower than R. tunneyi while the R. sordidus group is the shallowest lineage in the clade. The divergences within the R. sordidus and R. leucopus lineages occurring about half a million years ago support the hypotheses of more recent interchanges of rats between Australia and New Guinea. While problematic for inference of deeper divergences, we report that the analysis of shorter mitochondrial sequences is very useful for species identification in rats.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In the medical and healthcare arena, patients‟ data is not just their own personal history but also a valuable large dataset for finding solutions for diseases. While electronic medical records are becoming popular and are used in healthcare work places like hospitals, as well as insurance companies, and by major stakeholders such as physicians and their patients, the accessibility of such information should be dealt with in a way that preserves privacy and security. Thus, finding the best way to keep the data secure has become an important issue in the area of database security. Sensitive medical data should be encrypted in databases. There are many encryption/ decryption techniques and algorithms with regard to preserving privacy and security. Currently their performance is an important factor while the medical data is being managed in databases. Another important factor is that the stakeholders should decide more cost-effective ways to reduce the total cost of ownership. As an alternative, DAS (Data as Service) is a popular outsourcing model to satisfy the cost-effectiveness but it takes a consideration that the encryption/ decryption modules needs to be handled by trustworthy stakeholders. This research project is focusing on the query response times in a DAS model (AES-DAS) and analyses the comparison between the outsourcing model and the in-house model which incorporates Microsoft built-in encryption scheme in a SQL Server. This research project includes building a prototype of medical database schemas. There are 2 types of simulations to carry out the project. The first stage includes 6 databases in order to carry out simulations to measure the performance between plain-text, Microsoft built-in encryption and AES-DAS (Data as Service). Particularly, the AES-DAS incorporates implementations of symmetric key encryption such as AES (Advanced Encryption Standard) and a Bucket indexing processor using Bloom filter. The results are categorised such as character type, numeric type, range queries, range queries using Bucket Index and aggregate queries. The second stage takes the scalability test from 5K to 2560K records. The main result of these simulations is that particularly as an outsourcing model, AES-DAS using the Bucket index shows around 3.32 times faster than a normal AES-DAS under the 70 partitions and 10K record-sized databases. Retrieving Numeric typed data takes shorter time than Character typed data in AES-DAS. The aggregation query response time in AES-DAS is not as consistent as that in MS built-in encryption scheme. The scalability test shows that the DBMS reaches in a certain threshold; the query response time becomes rapidly slower. However, there is more to investigate in order to bring about other outcomes and to construct a secured EMR (Electronic Medical Record) more efficiently from these simulations.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper investigates the effects of limited speech data in the context of speaker verification using a probabilistic linear discriminant analysis (PLDA) approach. Being able to reduce the length of required speech data is important to the development of automatic speaker verification system in real world applications. When sufficient speech is available, previous research has shown that heavy-tailed PLDA (HTPLDA) modeling of speakers in the i-vector space provides state-of-the-art performance, however, the robustness of HTPLDA to the limited speech resources in development, enrolment and verification is an important issue that has not yet been investigated. In this paper, we analyze the speaker verification performance with regards to the duration of utterances used for both speaker evaluation (enrolment and verification) and score normalization and PLDA modeling during development. Two different approaches to total-variability representation are analyzed within the PLDA approach to show improved performance in short-utterance mismatched evaluation conditions and conditions for which insufficient speech resources are available for adequate system development. The results presented within this paper using the NIST 2008 Speaker Recognition Evaluation dataset suggest that the HTPLDA system can continue to achieve better performance than Gaussian PLDA (GPLDA) as evaluation utterance lengths are decreased. We also highlight the importance of matching durations for score normalization and PLDA modeling to the expected evaluation conditions. Finally, we found that a pooled total-variability approach to PLDA modeling can achieve better performance than the traditional concatenated total-variability approach for short utterances in mismatched evaluation conditions and conditions for which insufficient speech resources are available for adequate system development.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper investigates the use of the dimensionality-reduction techniques weighted linear discriminant analysis (WLDA), and weighted median fisher discriminant analysis (WMFD), before probabilistic linear discriminant analysis (PLDA) modeling for the purpose of improving speaker verification performance in the presence of high inter-session variability. Recently it was shown that WLDA techniques can provide improvement over traditional linear discriminant analysis (LDA) for channel compensation in i-vector based speaker verification systems. We show in this paper that the speaker discriminative information that is available in the distance between pair of speakers clustered in the development i-vector space can also be exploited in heavy-tailed PLDA modeling by using the weighted discriminant approaches prior to PLDA modeling. Based upon the results presented within this paper using the NIST 2008 Speaker Recognition Evaluation dataset, we believe that WLDA and WMFD projections before PLDA modeling can provide an improved approach when compared to uncompensated PLDA modeling for i-vector based speaker verification systems.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper we use a sequence-based visual localization algorithm to reveal surprising answers to the question, how much visual information is actually needed to conduct effective navigation? The algorithm actively searches for the best local image matches within a sliding window of short route segments or 'sub-routes', and matches sub-routes by searching for coherent sequences of local image matches. In contract to many existing techniques, the technique requires no pre-training or camera parameter calibration. We compare the algorithm's performance to the state-of-the-art FAB-MAP 2.0 algorithm on a 70 km benchmark dataset. Performance matches or exceeds the state of the art feature-based localization technique using images as small as 4 pixels, fields of view reduced by a factor of 250, and pixel bit depths reduced to 2 bits. We present further results demonstrating the system localizing in an office environment with near 100% precision using two 7 bit Lego light sensors, as well as using 16 and 32 pixel images from a motorbike race and a mountain rally car stage. By demonstrating how little image information is required to achieve localization along a route, we hope to stimulate future 'low fidelity' approaches to visual navigation that complement probabilistic feature-based techniques.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The popularity of Bayesian Network modelling of complex domains using expert elicitation has raised questions of how one might validate such a model given that no objective dataset exists for the model. Past attempts at delineating a set of tests for establishing confidence in an entirely expert-elicited model have focused on single types of validity stemming from individual sources of uncertainty within the model. This paper seeks to extend the frameworks proposed by earlier researchers by drawing upon other disciplines where measuring latent variables is also an issue. We demonstrate that even in cases where no data exist at all there is a broad range of validity tests that can be used to establish confidence in the validity of a Bayesian Belief Network.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: Kallikrein 15 (KLK15)/Prostinogen is a plausible candidate for prostate cancer susceptibility. Elevated KLK15 expression has been reported in prostate cancer and it has been described as an unfavorable prognostic marker for the disease. Objectives: We performed a comprehensive analysis of association of variants in the KLK15 gene with prostate cancer risk and aggressiveness by genotyping tagSNPs, as well as putative functional SNPs identified by extensive bioinformatics analysis. Methods and Data Sources: Twelve out of 22 SNPs, selected on the basis of linkage disequilibrium pattern, were analyzed in an Australian sample of 1,011 histologically verified prostate cancer cases and 1,405 ethnically matched controls. Replication was sought from two existing genome wide association studies (GWAS): the Cancer Genetic Markers of Susceptibility (CGEMS) project and a UK GWAS study. Results: Two KLK15 SNPs, rs2659053 and rs3745522, showed evidence of association (p, 0.05) but were not present on the GWAS platforms. KLK15 SNP rs2659056 was found to be associated with prostate cancer aggressiveness and showed evidence of association in a replication cohort of 5,051 patients from the UK, Australia, and the CGEMS dataset of US samples. A highly significant association with Gleason score was observed when the data was combined from these three studies with an Odds Ratio (OR) of 0.85 (95% CI = 0.77-0.93; p = 2.7610 24). The rs2659056 SNP is predicted to alter binding of the RORalpha transcription factor, which has a role in the control of cell growth and differentiation and has been suggested to control the metastatic behavior of prostate cancer cells. Conclusions: Our findings suggest a role for KLK15 genetic variation in the etiology of prostate cancer among men of European ancestry, although further studies in very large sample sets are necessary to confirm effect sizes.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

KLK15 over-expression is reported to be a significant predictor of reduced progression-free survival and overall survival in ovarian cancer. Our aim was to analyse the KLK15 gene for putative functional single nucleotide polymorphisms (SNPs) and assess the association of these and KLK15 HapMap tag SNPs with ovarian cancer survival. Results In silico analysis was performed to identify KLK15 regulatory elements and to classify potentially functional SNPs in these regions. After SNP validation and identification by DNA sequencing of ovarian cancer cell lines and aggressive ovarian cancer patients, 9 SNPs were shortlisted and genotyped using the Sequenom iPLEX Mass Array platform in a cohort of Australian ovarian cancer patients (N = 319). In the Australian dataset we observed significantly worse survival for the KLK15 rs266851 SNP in a dominant model (Hazard Ratio (HR) 1.42, 95% CI 1.02-1.96). This association was observed in the same direction in two independent datasets, with a combined HR for the three studies of 1.16 (1.00-1.34). This SNP lies 15bp downstream of a novel exon and is predicted to be involved in mRNA splicing. The mutant allele is also predicted to abrogate an HSF-2 binding site. Conclusions We provide evidence of association for the SNP rs266851 with ovarian cancer survival. Our results provide the impetus for downstream functional assays and additional independent validation studies to assess the role of KLK15 regulatory SNPs and KLK15 isoforms with alternative intracellular functional roles in ovarian cancer survival.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background Cohort studies can provide valuable evidence of cause and effect relationships but are subject to loss of participants over time, limiting the validity of findings. Computerised record linkage offers a passive and ongoing method of obtaining health outcomes from existing routinely collected data sources. However, the quality of record linkage is reliant upon the availability and accuracy of common identifying variables. We sought to develop and validate a method for linking a cohort study to a state-wide hospital admissions dataset with limited availability of unique identifying variables. Methods A sample of 2000 participants from a cohort study (n = 41 514) was linked to a state-wide hospitalisations dataset in Victoria, Australia using the national health insurance (Medicare) number and demographic data as identifying variables. Availability of the health insurance number was limited in both datasets; therefore linkage was undertaken both with and without use of this number and agreement tested between both algorithms. Sensitivity was calculated for a sub-sample of 101 participants with a hospital admission confirmed by medical record review. Results Of the 2000 study participants, 85% were found to have a record in the hospitalisations dataset when the national health insurance number and sex were used as linkage variables and 92% when demographic details only were used. When agreement between the two methods was tested the disagreement fraction was 9%, mainly due to "false positive" links when demographic details only were used. A final algorithm that used multiple combinations of identifying variables resulted in a match proportion of 87%. Sensitivity of this final linkage was 95%. Conclusions High quality record linkage of cohort data with a hospitalisations dataset that has limited identifiers can be achieved using combinations of a national health insurance number and demographic data as identifying variables.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The chief challenge facing persistent robotic navigation using vision sensors is the recognition of previously visited locations under different lighting and illumination conditions. The majority of successful approaches to outdoor robot navigation use active sensors such as LIDAR, but the associated weight and power draw of these systems makes them unsuitable for widespread deployment on mobile robots. In this paper we investigate methods to combine representations for visible and long-wave infrared (LWIR) thermal images with time information to combat the time-of-day-based limitations of each sensing modality. We calculate appearance-based match likelihoods using the state-of-the-art FAB-MAP [1] algorithm to analyse loop closure detection reliability across different times of day. We present preliminary results on a dataset of 10 successive traverses of a combined urban-parkland environment, recorded in 2-hour intervals from before dawn to after dusk. Improved location recognition throughout an entire day is demonstrated using the combined system compared with methods which use visible or thermal sensing alone.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Traditional recommendation methods offer items, that are inanimate and one way recommendation, to users. Emerging new applications such as online dating or job recruitments require reciprocal people-to-people recommendations that are animate and two-way recommendations. In this paper, we propose a reciprocal collaborative method based on the concepts of users' similarities and common neighbors. The dataset employed for the experiment is gathered from a real life online dating network. The proposed method is compared with baseline methods that use traditional collaborative algorithms. Results show the proposed method can achieve noticeably better performance than the baseline methods.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background China has one of the highest suicide rates in the world; however, the recent trends in suicide have not been adequately studied. This study aimed to examine the potential changes in the rates and characteristics in a Chinese population. Methods Data on suicide deaths in 1991–2010 were extracted from the Shandong Disease Surveillance Point (DSP) mortality dataset based on ICD-10 codes. The temporal trend in age-adjusted suicide rates for each subpopulation was tested using log-linear Poisson regression analysis. Results From 1991 to 2010, there was a marked decrease in the overall suicide rate in Shandong, with an average reduction of 8% per year. The decrease trend was stronger in rural than in urban areas and more evident in females than in males. Similar decreases were observed for all age groups. Pesticide ingestion and hanging remained the top two methods for suicide. Limitations There are likely quality concerns in the morality data, such as underreporting and misclassification, as well as low accuracy in determining the underlying causes of deaths. The representativeness of the DSP system may also be problematic due to the rapid changes in economy and demography. Conclusions Completed suicides in Shandong have sharply declined over the past 20 years. Higher rates in females versus males and in rural versus urban areas, which were previously considered to be distinguishing features of suicide in China, are becoming less pronounced.