957 resultados para Imbalanced datasets
Resumo:
To date, the processing of wildlife location data has relied on a diversity of software and file formats. Data management and the following spatial and statistical analyses were undertaken in multiple steps, involving many time-consuming importing/exporting phases. Recent technological advancements in tracking systems have made large, continuous, high-frequency datasets of wildlife behavioral data available, such as those derived from the global positioning system (GPS) and other animal-attached sensor devices. These data can be further complemented by a wide range of other information about the animals’ environment. Management of these large and diverse datasets for modelling animal behaviour and ecology can prove challenging, slowing down analysis and increasing the probability of mistakes in data handling. We address these issues by critically evaluating the requirements for good management of GPS data for wildlife biology. We highlight that dedicated data management tools and expertise are needed. We explore current research in wildlife data management. We suggest a general direction of development, based on a modular software architecture with a spatial database at its core, where interoperability, data model design and integration with remote-sensing data sources play an important role in successful GPS data handling.
Resumo:
The university course timetabling problem involves assigning a given number of events into a limited number of timeslots and rooms under a given set of constraints; the objective is to satisfy the hard constraints (essential requirements) and minimize the violation of soft constraints (desirable requirements). In this study we employed a Dual-sequence Simulated Annealing (DSA) algorithm as an improvement algorithm. The Round Robin (RR) algorithm is used to control the selection of neighbourhood structures within DSA. The performance of our approach is tested over eleven benchmark datasets. Experimental results show that our approach is able to generate competitive results when compared with other state-of-the-art techniques.
Resumo:
This article offers a replication for Britain of Brown and Heywood's analysis of the determinants of performance appraisal in Australia. Although there are some important limiting differences between our two datasets - the Australia Workplace Industrial Relations Survey (AWIRS) and the Workplace Employment Relations Survey (WERS) - we reach one central point of agreement and one intriguing shared insight. First, performance appraisal is negatively associated with tenure: where employers cannot rely on the carrot of deferred pay or the stick of dismissal to motivate workers, they will tend to rely more on monitoring, ceteris paribus. Second, employer monitoring and performance pay may be complementary. However, consonant with the disparate results from the wider literature, there is more modest agreement on the contribution of specific human resource management practices, and still less on the role of job control.
Resumo:
Geographically referenced databases of species records are becoming increasingly available. Doubts over the heterogeneous quality of the underlying data may restrict analyses of such collated databases. We partitioned the spatial variation in species richness of littoral algae and molluscs from the UK National Biodiversity Network database into a smoothed mesoscale component and a local component. Trend surface analysis (TSA) was used to define the mesoscale patterns of species richness, leaving a local residual component that lacked spatial autocorrelation. The analysis was based on 10 km grid squares with 115035 records of littoral algae (729 species) and 66879 records of littoral molluscs (569 species). The TSA identified variation in algal and molluscan species richness with a characteristic length scale of approximately 120 km. Locations of the most species-rich grid squares were consistent with the southern and western bias of species richness in the UK marine flora and fauna. The TSA also identified areas which showed significant changes in the spatial pattern of species richness: breakpoints, which correspond to major headlands along the south coast of England. Patterns of algal and molluscan species richness were broadly congruent. Residual variability was strongly influenced by proxies of collection effort, but local environmental variables including length of the coastline and variability in wave exposure were also important. Relative to the underlying trend, local species richness hotspots occurred on all coasts. While there is some justification for scepticism in analyses of heterogeneous datasets, our results indicate that the analysis of collated datasets can be informative.
Resumo:
An experiment to quantify intra- and interobserver error in anatomical measurements found that interobserver measurements can vary by over 14% of mean specimen length; disparity in measurement increases logarithmically with the number of contributors; instructions did not reduce variation or measurement disparity; scale of the specimen influenced the precision of measurement (relative error increasing with specimen size); different methods of taking a measurement yielded different results, although they did not differ in terms of precision, and topographical complexity of the elements being considered may potentially influence error (error increasing with complexity). These results highlight concerns about introduction of noise and potential bias that should be taken into account when compiling composite datasets and meta-analyses.
Resumo:
Connectivity mapping is a recently developed technique for discovering the underlying connections between different biological states based on gene-expression similarities. The sscMap method has been shown to provide enhanced sensitivity in mapping meaningful connections leading to testable biological hypotheses and in identifying drug candidates with particular pharmacological and/or toxicological properties. Challenges remain, however, as to how to prioritise the large number of discovered connections in an unbiased manner such that the success rate of any following-up investigation can be maximised. We introduce a new concept, gene-signature perturbation, which aims to test whether an identified connection is stable enough against systematic minor changes (perturbation) to the gene-signature. We applied the perturbation method to three independent datasets obtained from the GEO database: acute myeloid leukemia (AML), cervical cancer, and breast cancer treated with letrozole. We demonstrate that the perturbation approach helps to identify meaningful biological connections which suggest the most relevant candidate drugs. In the case of AML, we found that the prevalent compounds were retinoic acids and PPAR activators. For cervical cancer, our results suggested that potential drugs are likely to involve the EGFR pathway; and with the breast cancer dataset, we identified candidates that are involved in prostaglandin inhibition. Thus the gene-signature perturbation approach added real values to the whole connectivity mapping process, allowing for increased specificity in the identification of possible therapeutic candidates.
Resumo:
Background/Aims: The NOS3 gene is a biological and positional candidate for diabetic nephropathy. However, the relationship between NOS3 polymorphisms and renal disease is inconclusive. This study aimed to clarify the association of NOS3 variants with nephropathy in individuals with type 1 diabetes. Methods: We conducted a case-control study examining all common SNPs in the NOS3 gene by a tag SNP approach. Individuals with type 1 diabetes and persistent proteinuria (cases, n = 718) were compared with individuals with type 1 diabetes but no evidence of renal disease (controls, n = 749). Our replication collection comprised 1,105 individuals with type 1 diabetes recruited to a nephropathy case group and 862 control individuals with normal urinary albumin excretion rates. Meta-analysis was conducted for SNPs where more than three genotype datasets were available. Results: A novel association was identified in the discovery collection (rs1800783, p(genotype) = 0.006, p(allele) = 0.002, OR = 1.26, 95% CI: 1.08-1.47) and supported by independent replication using a tag SNP (rs4496877, pairwise r(2) = 0.96 with rs1800783) in the replication collection (p(genotype) = 0.002, p(allele) = 0.0006, OR = 1.27, 95% CI: 1.10-1.45). Conclusion: The A allele of rs1800783 is a significant risk factor for nephropathy in individuals with type 1 diabetes, and further comprehensive studies are warranted to confirm the definitive functional variant in the NOS3 gene. Copyright (C) 2010 S. Karger AG, Basel
Resumo:
Schizophrenia is a common psychotic mental disorder that is believed to result from the effects of multiple genetic and environmental factors. In this study, we explored gene-gene interactions and main effects in both case-control (657 cases and 411 controls) and family-based (273 families, 1350 subjects) datasets of English or Irish ancestry. Fifty three markers in 8 genes were genotyped in the family sample and 44 markers in 7 genes were genotyped in the case-control sample. The Multifactor Dimensionality Reduction Pedigree Disequilibrium Test (MDR-PDT) was used to examine epistasis in the family dataset and a 3-locus model was identified (permuted p=0.003). The 3-locus model involved the IL3 (rs2069803), RGS4 (rs2661319), and DTNBP1 (rs21319539) genes. We used MDR to analyze the case-control dataset containing the same markers typed in the RGS4, IL3 and DTNBP1 genes and found evidence of a joint effect between IL3 (rs31400) and DTNBP1 (rs760761) (cross-validation consistency 4/5, balanced prediction accuracy=56.84%, p=0.019). While this is not a direct replication, the results obtained from both the family and case-control samples collectively suggest that IL3 and DTNBP1 are likely to interact and jointly contribute to increase risk for schizophrenia. We also observed a significant main effect in DTNBP1, which survived correction for multiple comparisons, and numerous nominally significant effects in several genes. (C) 2008 Elsevier B.V. All rights reserved.
Resumo:
Background: Late Onset Alzheimer's disease (LOAD) is the leading cause of dementia. Recent large genome-wide association studies (GWAS) identified the first strongly supported LOAD susceptibility genes since the discovery of the involvement of APOE in the early 1990s. We have now exploited these GWAS datasets to uncover key LOAD pathophysiological processes. Methodology: We applied a recently developed tool for mining GWAS data for biologically meaningful information to a LOAD GWAS dataset. The principal findings were then tested in an independent GWAS dataset.
Resumo:
We sought to identify new susceptibility loci for Alzheimer's disease through a staged association study (GERAD+) and by testing suggestive loci reported by the Alzheimer's Disease Genetic Consortium (ADGC) in a companion paper. We undertook a combined analysis of four genome-wide association datasets (stage 1) and identified ten newly associated variants with P = 1 × 10(-5). We tested these variants for association in an independent sample (stage 2). Three SNPs at two loci replicated and showed evidence for association in a further sample (stage 3). Meta-analyses of all data provided compelling evidence that ABCA7 (rs3764650, meta P = 4.5 × 10(-17); including ADGC data, meta P = 5.0 × 10(-21)) and the MS4A gene cluster (rs610932, meta P = 1.8 × 10(-14); including ADGC data, meta P = 1.2 × 10(-16)) are new Alzheimer's disease susceptibility loci. We also found independent evidence for association for three loci reported by the ADGC, which, when combined, showed genome-wide significance: CD2AP (GERAD+, P = 8.0 × 10(-4); including ADGC data, meta P = 8.6 × 10(-9)), CD33 (GERAD+, P = 2.2 × 10(-4); including ADGC data, meta P = 1.6 × 10(-9)) and EPHA1 (GERAD+, P = 3.4 × 10(-4); including ADGC data, meta P = 6.0 × 10(-10)).
Resumo:
Segregation measures have been applied in the study of many societies, and traditionally such measures have been used to assess the degree of division between social and cultural groups across urban areas, wider regions, or perhaps national areas. The degree of segregation can vary substantially from place to place even within very small areas. In this paper the substantive concern is with religious/political segregation in Northern Ireland—particularly the proportion of Protestants (often taken as an indicator of those who wish to retain the union with Britain) to Catholics (often taken as an indicator of those who favour union with the Republic of Ireland). Traditionally, segregation is measured globally—that is, across all units in a given area. A recent trend in spatial data analysis generally, and in segregation analysis specifically, is to assess local features of spatial datasets. The rationale behind such approaches is that global methods may obscure important spatial variations in the property of interest, and thus prevent full use of the data. In this paper the utility of local measures of residential segregation is assessed with reference to the religious/political composition of Northern Ireland. The paper demonstrates marked spatial variations in the degree and nature of residential segregation across Northern Ireland. It is argued that local measures provide highly useful information in addition to that provided in maps of the raw variables and in standard global segregation measures.
Resumo:
To improve the performance of classification using Support Vector Machines (SVMs) while reducing the model selection time, this paper introduces Differential Evolution, a heuristic method for model selection in two-class SVMs with a RBF kernel. The model selection method and related tuning algorithm are both presented. Experimental results from application to a selection of benchmark datasets for SVMs show that this method can produce an optimized classification in less time and with higher accuracy than a classical grid search. Comparison with a Particle Swarm Optimization (PSO) based alternative is also included.
Resumo:
Patterns of residential segregation in Northern Ireland reflect historic sectarian conflict as well as current animosities. A number of indices of segregation are examined in this paper and their relative merits in capturing localised societal divisions are discussed.The implications of such divisions on health as mediated through conflict-related stress are then considered. Costed datasets of hospital, community and anxiety/depression prescribing data havebeen assembled and attributed to local geographies.The association between geographical variations in these costs and levels of segregation was modelled using regression analysis.It was found that the level of segregation does not help to explain variations in costed utilisation of acute and elderly services but does explain variations in the costs of prescribing for anxiety and depression with controls for socio-economic deprivation included. Results in this paper would indicate that strategies to promote good relations in Northern Ireland have positive implications for mental health.
Resumo:
Purpose. Keratoconus is a progressive disorder of the cornea that can lead to severe visual impairment or blindness. Although several genomic regions have been linked to rare familial forms of keratoconus, no genes have yet been definitively identified for common forms of the disease. Methods. Two genome-wide association scans were undertaken in parallel. The first used pooled DNA from an Australian cohort, followed by typing of top-ranked single-nucleotide polymorphisms (SNPs) in individual DNA samples. The second was conducted in individually genotyped patients, and controls from the USA. Tag SNPs around the hepatocyte growth factor (HGF) gene were typed in three additional replication cohorts. Serum levels of HGF protein in normal individuals were assessed with ELISA and correlated with genotype. Results. The only SNP observed to be associated in both the pooled discovery and primary replication cohort was rs1014091, located upstream of the HGF gene. The nearby SNP rs3735520 was found to be associated in the individually typed discovery cohort (P = 6.1 × 10 ). Genotyping of tag SNPs around HGF revealed association at rs3735520 and rs17501108/rs1014091 in four of the five cohorts. Meta-analysis of all five datasets together yielded suggestive P values for rs3735520 (P = 9.9 × 10 ) and rs17501108 (P = 9.9 × 10 ). In addition, SNP rs3735520 was found to be associated with serum HGF level in normal individuals (P = 0.036). Conclusions. Taken together, these results implicate genetic variation at the HGF locus with keratoconus susceptibility. © 2011 The Association for Research in Vision and Ophthalmology, Inc.
Resumo:
Polyphosphate is a ubiquitous linear homopolymer of phosphate residues linked by high-energy bonds similar to those found in ATP. It has been associated with many processes including pathogenicity, DNA uptake and multiple stress responses across all domains. Bacteria have also been shown to use polyphosphate as a way to store phosphate when transferred from phosphate-limited to phosphate-rich media - a process exploited in wastewater treatment and other environmental contaminant remediation. Despite this, there has, to date, been little research into the role of polyphosphate in the survival of marine bacterioplankton in oligotrophic environments. The three main proteins involved in polyphosphate metabolism, Ppk1, Ppk2 and Ppx are multi-domain and have differential inter-domain and inter-gene conservation, making unbiased analysis of relative abundance in metagenomic datasets difficult. This paper describes the development of a novel Isofunctional Homolog Annotation Tool (IHAT) to detect homologs of genes with a broad range of conservation without bias of traditional expect-value cutoffs. IHAT analysis of the Global Ocean Sampling (GOS) dataset revealed that genes associated with polyphosphate metabolism are more abundant in environments where available phosphate is limited, suggesting an important role for polyphosphate metabolism in marine oligotrophs.