942 resultados para Blog datasets
Resumo:
Modularity has been suggested to be connected to evolvability because a higher degree of independence among parts allows them to evolve as separate units. Recently, the Escoufier RV coefficient has been proposed as a measure of the degree of integration between modules in multivariate morphometric datasets. However, it has been shown, using randomly simulated datasets, that the value of the RV coefficient depends on sample size. Also, so far there is no statistical test for the difference in the RV coefficient between a priori defined groups of observations. Here, we (1), using a rarefaction analysis, show that the value of the RV coefficient depends on sample size also in real geometric morphometric datasets; (2) propose a permutation procedure to test for the difference in the RV coefficient between a priori defined groups of observations; (3) show, through simulations, that such a permutation procedure has an appropriate Type I error; (4) suggest that a rarefaction procedure could be used to obtain sample-size-corrected values of the RV coefficient; and (5) propose a nearest-neighbor procedure that could be used when studying the variation of modularity in geographic space. The approaches outlined here, readily extendable to non-morphometric datasets, allow study of the variation in the degree of integration between a priori defined modules. A Java application – that will allow performance of the proposed test using a software with graphical user interface – has also been developed and is available at the Morphometrics at Stony Brook Web page (http://life.bio.sunysb.edu/morph/).
Resumo:
Osteoporotic fracture is a major cause of morbidity and mortality worldwide. Low bone mineral density (BMD) is a major predisposing factor to fracture and is known to be highly heritable. Site-, gender-, and age-specific genetic effects on BMD are thought to be significant, but have largely not been considered in the design of genome-wide association studies (GWAS) of BMD to date. We report here a GWAS using a novel study design focusing on women of a specific age (postmenopausal women, age 55-85 years), with either extreme high or low hip BMD (age- and gender-adjusted BMD z-scores of +1.5 to +4.0, n = 1055, or -4.0 to -1.5, n = 900), with replication in cohorts of women drawn from the general population (n = 20,898). The study replicates 21 of 26 known BMD-associated genes. Additionally, we report suggestive association of a further six new genetic associations in or around the genes CLCN7, GALNT3, IBSP, LTBP3, RSPO3, and SOX4, with replication in two independent datasets. A novel mouse model with a loss-of-function mutation in GALNT3 is also reported, which has high bone mass, supporting the involvement of this gene in BMD determination. In addition to identifying further genes associated with BMD, this study confirms the efficiency of extreme-truncate selection designs for quantitative trait association studies. © 2011 Duncan et al.
Resumo:
Genome-wide association studies (GWAS) have identified around 60 common variants associated with multiple sclerosis (MS), but these loci only explain a fraction of the heritability of MS. Some missing heritability may be caused by rare variants that have been suggested to play an important role in the aetiology of complex diseases such as MS. However current genetic and statistical methods for detecting rare variants are expensive and time consuming. 'Population-based linkage analysis' (PBLA) or so called identity-by-descent (IBD) mapping is a novel way to detect rare variants in extant GWAS datasets. We employed BEAGLE fastIBD to search for rare MS variants utilising IBD mapping in a large GWAS dataset of 3,543 cases and 5,898 controls. We identified a genome-wide significant linkage signal on chromosome 19 (LOD = 4.65; p = 1.9×10-6). Network analysis of cases and controls sharing haplotypes on chromosome 19 further strengthened the association as there are more large networks of cases sharing haplotypes than controls. This linkage region includes a cluster of zinc finger genes of unknown function. Analysis of genome wide transcriptome data suggests that genes in this zinc finger cluster may be involved in very early developmental regulation of the CNS. Our study also indicates that BEAGLE fastIBD allowed identification of rare variants in large unrelated population with moderate computational intensity. Even with the development of whole-genome sequencing, IBD mapping still may be a promising way to narrow down the region of interest for sequencing priority. © 2013 Lin et al.
Resumo:
This project was a step forward in applying statistical methods and models to provide new insights for more informed decision-making at large spatial scales. The model has been designed to address complicated effects of ecological processes that govern the state of populations and uncertainties inherent in large spatio-temporal datasets. Specifically, the thesis contributes to better understanding and management of the Great Barrier Reef.
Resumo:
Antigen selection of B cells within the germinal center reaction generally leads to the accumulation of replacement mutations in the complementarity-determining regions (CDRs) of immunoglobulin genes. Studies of mutations in IgE-associated VDJ gene sequences have cast doubt on the role of antigen selection in the evolution of the human IgE response, and it may be that selection for high affinity antibodies is a feature of some but not all allergic diseases. The severity of IgE-mediated anaphylaxis is such that it could result from higher affinity IgE antibodies. We therefore investigated IGHV mutations in IgE-associated sequences derived from ten individuals with a history of anaphylactic reactions to bee or wasp venom or peanut allergens. IgG sequences, which more certainly experience antigen selection, served as a control dataset. A total of 6025 unique IgE and 5396 unique IgG sequences were generated using high throughput 454 pyrosequencing. The proportion of replacement mutations seen in the CDRs of the IgG dataset was significantly higher than that of the IgE dataset, and the IgE sequences showed little evidence of antigen selection. To exclude the possibility that 454 errors had compromised analysis, rigorous filtering of the datasets led to datasets of 90 core IgE sequences and 411 IgG sequences. These sequences were present as both forward and reverse reads, and so were most unlikely to include sequencing errors. The filtered datasets confirmed that antigen selection plays a greater role in the evolution of IgG sequences than of IgE sequences derived from the study participants.
Resumo:
Objective The Nintendo Wii Fit integrates virtual gaming with body movement, and may be suitable as an adjunct to conventional physiotherapy following lower limb fractures. This study examined the feasibility and safety of using the Wii Fit as an adjunct to outpatient physiotherapy following lower limb fractures, and reports sample size considerations for an appropriately powered randomised trial. Methodology Ambulatory patients receiving physiotherapy following a lower limb fracture participated in this study (n = 18). All participants received usual care (individual physiotherapy). The first nine participants also used the Wii Fit under the supervision of their treating clinician as an adjunct to usual care. Adverse events, fracture malunion or exacerbation of symptoms were recorded. Pain, balance and patient-reported function were assessed at baseline and discharge from physiotherapy. Results No adverse events were attributed to either the usual care physiotherapy or Wii Fit intervention for any patient. Overall, 15 (83%) participants completed both assessments and interventions as scheduled. For 80% power in a clinical trial, the number of complete datasets required in each group to detect a small, medium or large effect of the Wii Fit at a post-intervention assessment was calculated at 175, 63 and 25, respectively. Conclusions The Nintendo Wii Fit was safe and feasible as an adjunct to ambulatory physiotherapy in this sample. When considering a likely small effect size and the 17% dropout rate observed in this study, 211 participants would be required in each clinical trial group. A larger effect size or multiple repeated measures design would require fewer participants.
Resumo:
There is strong evidence from twin and family studies indicating that a substantial proportion of the heritability of susceptibility to ankylosing spondylitis (AS) and its clinical manifestations is encoded by non-major-histocompatibility-complex genes. Efforts to identify these genes have included genomewide linkage studies and candidate gene association studies. One region, the interleukin (IL)-1 gene complex on chromosome 2, has been repeatedly associated with AS in both Caucasians and Asians. It is likely that more than one gene in this complex is involved in AS, with the strongest evidence to date implicating IL-1A. Identifying the genes underlying other linkage regions has been difficult due to the lack of obvious candidates and the low power of most studies to date to identify genes of the small to moderate magnitude that are likely to be involved. The field is moving towards genomewide association analysis, involving much larger datasets of unrelated cases and controls. Early successes using this approach in other diseases indicates that it is likely to identify genes in common diseases like AS, but there remains the risk that the common-variant, common-disease hypothesis will not hold true in AS. Nonetheless, it is appropriate for the field to be cautiously optimistic that the next few years will bring great advances in our understanding of the genetics of this condition.
Resumo:
Grass pollen is a major trigger for allergic rhinitis and asthma, yet little is known about the timing and levels of human exposure to airborne grass pollen across Australasian urban environments. The relationships between environmental aeroallergen exposure and allergic respiratory disease bridge the fields of ecology, aerobiology, geospatial science and public health. The Australian Aerobiology Working Group comprised of experts in botany, palynology, biogeography, climate change science, plant genetics, biostatistics, ecology, pollen allergy, public and environmental health, and medicine, was established to systematically source, collate and analyse atmospheric pollen concentration data from 11 Australian and six New Zealand sites. Following two week-long workshops, post-workshop evaluations were conducted to reflect upon the utility of this analysis and synthesis approach to address complex multidisciplinary questions. This Working Group described i) a biogeographically dependent variation in airborne pollen diversity, ii) a latitudinal gradient in the timing, duration and number of peaks of the grass pollen season, and iii) the emergence of new methodologies based on trans-disciplinary synthesis of aerobiology and remote sensing data. Challenges included resolving methodological variations between pollen monitoring sites and temporal variations in pollen datasets. Other challenges included “marrying” ecosystem and health sciences and reconciling divergent expert opinion. The Australian Aerobiology Working Group facilitated knowledge transfer between diverse scientific disciplines, mentored students and early career scientists, and provided an uninterrupted collaborative opportunity to focus on a unifying problem globally. The Working Group provided a platform to optimise the value of large existing ecological datasets that have importance for human respiratory health and ecosystems research. Compilation of current knowledge of Australasian pollen aerobiology is a critical first step towards the management of exposure to pollen in patients with allergic disease and provides a basis from which the future impacts of climate change on pollen distribution can be assessed and monitored.
Resumo:
Big Datasets are endemic, but they are often notoriously difficult to analyse because of their size, heterogeneity, history and quality. The purpose of this paper is to open a discourse on the use of modern experimental design methods to analyse Big Data in order to answer particular questions of interest. By appealing to a range of examples, it is suggested that this perspective on Big Data modelling and analysis has wide generality and advantageous inferential and computational properties. In particular, the principled experimental design approach is shown to provide a flexible framework for analysis that, for certain classes of objectives and utility functions, delivers near equivalent answers compared with analyses of the full dataset under a controlled error rate. It can also provide a formalised method for iterative parameter estimation, model checking, identification of data gaps and evaluation of data quality. Finally, it has the potential to add value to other Big Data sampling algorithms, in particular divide-and-conquer strategies, by determining efficient sub-samples.
Resumo:
Membrane proteins play important roles in many biochemical processes and are also attractive targets of drug discovery for various diseases. The elucidation of membrane protein types provides clues for understanding the structure and function of proteins. Recently we developed a novel system for predicting protein subnuclear localizations. In this paper, we propose a simplified version of our system for predicting membrane protein types directly from primary protein structures, which incorporates amino acid classifications and physicochemical properties into a general form of pseudo-amino acid composition. In this simplified system, we will design a two-stage multi-class support vector machine combined with a two-step optimal feature selection process, which proves very effective in our experiments. The performance of the present method is evaluated on two benchmark datasets consisting of five types of membrane proteins. The overall accuracies of prediction for five types are 93.25% and 96.61% via the jackknife test and independent dataset test, respectively. These results indicate that our method is effective and valuable for predicting membrane protein types. A web server for the proposed method is available at http://www.juemengt.com/jcc/memty_page.php
Resumo:
We propose a novel multiview fusion scheme for recognizing human identity based on gait biometric data. The gait biometric data is acquired from video surveillance datasets from multiple cameras. Experiments on publicly available CASIA dataset show the potential of proposed scheme based on fusion towards development and implementation of automatic identity recognition systems.
Resumo:
Whole genome sequences are generally accepted as excellent tools for studying evolutionary relationships. Due to the problems caused by the uncertainty in alignment, existing tools for phylogenetic analysis based on multiple alignments could not be directly applied to the whole-genome comparison and phylogenomic studies. There has been a growing interest in alignment-free methods for phylogenetic analysis using complete genome data. The “distances” used in these alignment-free methods are not proper distance metrics in the strict mathematical sense. In this study, we first review them in a more general frame — dissimilarity. Then we propose some new dissimilarities for phylogenetic analysis. Last three genome datasets are employed to evaluate these dissimilarities from a biological point of view.
Resumo:
In the field of face recognition, sparse representation (SR) has received considerable attention during the past few years, with a focus on holistic descriptors in closed-set identification applications. The underlying assumption in such SR-based methods is that each class in the gallery has sufficient samples and the query lies on the subspace spanned by the gallery of the same class. Unfortunately, such an assumption is easily violated in the face verification scenario, where the task is to determine if two faces (where one or both have not been seen before) belong to the same person. In this study, the authors propose an alternative approach to SR-based face verification, where SR encoding is performed on local image patches rather than the entire face. The obtained sparse signals are pooled via averaging to form multiple region descriptors, which then form an overall face descriptor. Owing to the deliberate loss of spatial relations within each region (caused by averaging), the resulting descriptor is robust to misalignment and various image deformations. Within the proposed framework, they evaluate several SR encoding techniques: l1-minimisation, Sparse Autoencoder Neural Network (SANN) and an implicit probabilistic technique based on Gaussian mixture models. Thorough experiments on AR, FERET, exYaleB, BANCA and ChokePoint datasets show that the local SR approach obtains considerably better and more robust performance than several previous state-of-the-art holistic SR methods, on both the traditional closed-set identification task and the more applicable face verification task. The experiments also show that l1-minimisation-based encoding has a considerably higher computational cost when compared with SANN-based and probabilistic encoding, but leads to higher recognition rates.
Resumo:
In this paper, we aim at predicting protein structural classes for low-homology data sets based on predicted secondary structures. We propose a new and simple kernel method, named as SSEAKSVM, to predict protein structural classes. The secondary structures of all protein sequences are obtained by using the tool PSIPRED and then a linear kernel on the basis of secondary structure element alignment scores is constructed for training a support vector machine classifier without parameter adjusting. Our method SSEAKSVM was evaluated on two low-homology datasets 25PDB and 1189 with sequence homology being 25% and 40%, respectively. The jackknife test is used to test and compare our method with other existing methods. The overall accuracies on these two data sets are 86.3% and 84.5%, respectively, which are higher than those obtained by other existing methods. Especially, our method achieves higher accuracies (88.1% and 88.5%) for differentiating the α + β class and the α/β class compared to other methods. This suggests that our method is valuable to predict protein structural classes particularly for low-homology protein sequences. The source code of the method in this paper can be downloaded at http://math.xtu.edu.cn/myphp/math/research/source/SSEAK_source_code.rar.
Resumo:
Background This paper examines changing patterns in the utilisation and geographic access to health services in Great Britain using National Travel Survey data (1985-2012). The National Travel Survey (NTS) is a series of household surveys designed to provide data on personal travel and monitor changes in travel behaviour over time. The utilisation rate was derived using the proportion of journeys made to access health services. Geographic access was analysed by separating the concept into its accessibility and mobility dimensions. Methods Variables from the PSU, households, and individuals datasets were used as explanatory variables. Whereas, variables extracted from the journeys dataset were used as dependent variables to identify patterns of utilisation i.e. the proportion of journeys made by different groups to access health facilities in a particular journey distance or time band or by mode of transport; and geographic access to health services. A binary logistic regression analysis was conducted to identify the utilisation rate over the different time periods between different groups. This analysis shows the Odds Ratios (ORs) for different groups making a trip to utilise health services compared to their respective counterparts. Linear multiple regression analyses were conducted to then identify patterns of change in the accessibility and mobility level. Results Analysis of the data has shown that that journey distances to health facilities were signi fi cantly shorter and also gradually reduced over the period in question for Londoners, females, those without a car or on low incomes, and older people. Although rates of utilisation of health services we re Oral Abstracts / Journal of Transport & Health 2 (2015) S5 – S63 S43 signi fi cantly lower because of longer journey times. These fi ndings indicate that the rate of utilisation of health services largely depends on mobility level although previous research studies have traditionally overlooked the mobility dimension. Conclusions This fi nding, therefore, suggests the need to improve geographic access to services together with an enhanced mobility option for disadvantaged groups in order for them to have improved levels of access to health facilities. This research has also found that the volume of car trips to health services also increased steadily over the period 1985-2012 while all other modes accounted for a smaller number of trips. However, it is dif fi cult to conclude from this research whether this increase in the volume of car trips was due to a lack of alternative transport or due to an increase in the level of car-ownership.