152 resultados para Datasets
Resumo:
The analysis of chironomid taxa and environmental datasets from 46 New Zealand lakes identified temperature (February mean air temperature) and lake production (chlorophyll a (Chl a)) as the main drivers of chironomid distribution. Temperature was the strongest driver of chironomid distribution and consequently produced the most robust inference models. We present two possible temperature transfer functions from this dataset. The most robust model (weighted averaging-partial least squares (WA-PLS), n = 36) was based on a dataset with the most productive (Chl a > 10 lg l)1) lakes removed. This model produced a coefficient of determination (r2 jack) of 0.77, and a root mean squared error of prediction (RMSEPjack) of 1.31C. The Chl a transfer function (partial least squares (PLS), n = 37) was far less reliable, with an r2 jack of 0.49 and an RMSEPjack of 0.46 Log10lg l)1. Both of these transfer functions could be improved by a revision of the taxonomy for the New Zealand chironomid taxa, particularly the genus Chironomus. The Chironomus morphotype was common in high altitude, cool, oligotrophic lakes and lowland, warm, eutrophic lakes. This could reflect the widespread distribution of one eurythermic species, or the collective distribution of a number of different Chironomus species with more limited tolerances. The Chl a transfer function could also be improved by inputting mean Chl a values into the inference model rather than the spot measurements that were available for this study.
Resumo:
PURPOSE. To examine internal consistency, refine the response scale, and obtain a linear scoring system for the visual function instrument, the Daily Living Tasks Dependent on Vision (DLTV). METHODS. Data were available from 186 participants with a clinical diagnosis of AMD who completed the 22-item DLTV (DLTV-22) according to four-point ordinal response scale. An independent group of 386 participants with AMD were administered a reduced version of the DLTV with 11 items (DLTV-11), according to a five-point response scale. Rasch analysis was performed on both datasets and used to generate item statistics for measure order, response odds ratios per item and per person, and infit and outfit mean square statistics. The Rasch output from the DLTV-22 was examined to identify redundant items and for factorial validity and person item measure separation reliabilities. RESULTS. The average rating for the DLTV-22 changed monotonically with the magnitude of the latent person trait. The expected versus observed average measures were extremely close, with step calibrations evenly separated for the four-point ordinal scale. In the case of the DLTV-11, step calibrations were not as evenly separated, suggesting that the five-point scale should be reduced to either a four- or three-point scale. Five items in the DLTV-22 were removed, and all 17 remaining items had good infit and outfit mean squares. PCA with residuals from Rasch analysis identified two domains containing 7 and 10 items each. The domains had high person separation reliabilities (0.86 and 0.77 for domains 1 and 2, respectively) and item measure reliabilities (0.99 and 0.98 for domains 1 and 2, respectively). CONCLUSIONS. With the improved internal consistency, establishment of the accuracy and precision of the rating scale for the DLTV and the establishment of a valid domain structure we believe that it constitutes a useful instrument for assessing visual function in older adults with age-related macular degeneration.
Resumo:
The identification of non-linear systems using only observed finite datasets has become a mature research area over the last two decades. A class of linear-in-the-parameter models with universal approximation capabilities have been intensively studied and widely used due to the availability of many linear-learning algorithms and their inherent convergence conditions. This article presents a systematic overview of basic research on model selection approaches for linear-in-the-parameter models. One of the fundamental problems in non-linear system identification is to find the minimal model with the best model generalisation performance from observational data only. The important concepts in achieving good model generalisation used in various non-linear system-identification algorithms are first reviewed, including Bayesian parameter regularisation and models selective criteria based on the cross validation and experimental design. A significant advance in machine learning has been the development of the support vector machine as a means for identifying kernel models based on the structural risk minimisation principle. The developments on the convex optimisation-based model construction algorithms including the support vector regression algorithms are outlined. Input selection algorithms and on-line system identification algorithms are also included in this review. Finally, some industrial applications of non-linear models are discussed.
Resumo:
PEGS (Production and Environmental Generic Scheduler) is a generic production scheduler that produces good schedules over a wide range of problems. It is centralised, using search strategies with the Shifting Bottleneck algorithm. We have also developed an alternative distributed approach using software agents. In some cases this reduces run times by a factor of 10 or more. In most cases, the agent-based program also produces good solutions for published benchmark data, and the short run times make our program useful for a large range of problems. Test results show that the agents can produce schedules comparable to the best found so far for some benchmark datasets and actually better schedules than PEGS on our own random datasets. The flexibility that agents can provide for today's dynamic scheduling is also appealing. We suggest that in this sort of generic or commercial system, the agent-based approach is a good alternative.
Resumo:
Exam timetabling is one of the most important administrative activities that takes place in academic institutions. In this paper we present a critical discussion of the research on exam timetabling in the last decade or so. This last ten years has seen an increased level of attention on this important topic. There has been a range of significant contributions to the scientific literature both in terms of theoretical andpractical aspects. The main aim of this survey is to highlight the new trends and key research achievements that have been carried out in the last decade.We also aim to outline a range of relevant important research issues and challenges that have been generated by this body of work.
We first define the problem and review previous survey papers. Algorithmic approaches are then classified and discussed. These include early techniques (e.g. graph heuristics) and state-of-the-art approaches including meta-heuristics, constraint based methods, multi-criteria techniques, hybridisations, and recent new trends concerning neighbourhood structures, which are motivated by raising the generality of the approaches. Summarising tables are presented to provide an overall view of these techniques. We discuss some issues on decomposition techniques, system tools and languages, models and complexity. We also present and discuss some important issues which have come to light concerning the public benchmark exam timetabling data. Different versions of problem datasetswith the same name have been circulating in the scientific community in the last ten years which has generated a significant amount of confusion. We clarify the situation and present a re-naming of the widely studied datasets to avoid future confusion. We also highlight which research papershave dealt with which dataset. Finally, we draw upon our discussion of the literature to present a (non-exhaustive) range of potential future research directions and open issues in exam timetabling research.
Resumo:
DIN (diabetic nephropathy) is the leading cause of end-stage renal disease worldwide and develops in 25-40% of patients with Type 1 or Type 2 diabetes mellitus. Elevated blood glucose over long periods together with glomerular hypertension leads to progressive glomerulosclerosis and tubulointerstitial fibrosis in susceptible individuals. Central to the pathology of DIN are cytokines and growth factors such as TGF-beta (transforming growth factor beta) superfamily members, including BMPs (bone morphogenetic protein) and TGF-beta 1, which play key roles in fibrogenic responses of the kidney, including podocyte loss, mesangial cell hypertrophy, matrix accumulation and tubulointerstitial fibrosis. Many of these responses can be mimicked in in vitro models of cells cultured in high glucose. We have applied differential gene expression technologies to identify novel genes expressed in in vitro and in vivo models of DN and, importantly, in human renal tissue. By mining these datasets and probing the regulation of expression and actions of specific molecules, we have identified novel roles for molecules such as Gremlin, IHG-1 (induced in high glucose-1) and CTGF (connective tissue growth factor) in DIN and potential regulators of their bioactions.
Resumo:
This paper proposes a new hierarchical learning structure, namely the holistic triple learning (HTL), for extending the binary support vector machine (SVM) to multi-classification problems. For an N-class problem, a HTL constructs a decision tree up to a depth of A leaf node of the decision tree is allowed to be placed with a holistic triple learning unit whose generalisation abilities are assessed and approved. Meanwhile, the remaining nodes in the decision tree each accommodate a standard binary SVM classifier. The holistic triple classifier is a regression model trained on three classes, whose training algorithm is originated from a recently proposed implementation technique, namely the least-squares support vector machine (LS-SVM). A major novelty with the holistic triple classifier is the reduced number of support vectors in the solution. For the resultant HTL-SVM, an upper bound of the generalisation error can be obtained. The time complexity of training the HTL-SVM is analysed, and is shown to be comparable to that of training the one-versus-one (1-vs.-1) SVM, particularly on small-scale datasets. Empirical studies show that the proposed HTL-SVM achieves competitive classification accuracy with a reduced number of support vectors compared to the popular 1-vs-1 alternative.
Resumo:
To date, the processing of wildlife location data has relied on a diversity of software and file formats. Data management and the following spatial and statistical analyses were undertaken in multiple steps, involving many time-consuming importing/exporting phases. Recent technological advancements in tracking systems have made large, continuous, high-frequency datasets of wildlife behavioral data available, such as those derived from the global positioning system (GPS) and other animal-attached sensor devices. These data can be further complemented by a wide range of other information about the animals’ environment. Management of these large and diverse datasets for modelling animal behaviour and ecology can prove challenging, slowing down analysis and increasing the probability of mistakes in data handling. We address these issues by critically evaluating the requirements for good management of GPS data for wildlife biology. We highlight that dedicated data management tools and expertise are needed. We explore current research in wildlife data management. We suggest a general direction of development, based on a modular software architecture with a spatial database at its core, where interoperability, data model design and integration with remote-sensing data sources play an important role in successful GPS data handling.
Resumo:
The university course timetabling problem involves assigning a given number of events into a limited number of timeslots and rooms under a given set of constraints; the objective is to satisfy the hard constraints (essential requirements) and minimize the violation of soft constraints (desirable requirements). In this study we employed a Dual-sequence Simulated Annealing (DSA) algorithm as an improvement algorithm. The Round Robin (RR) algorithm is used to control the selection of neighbourhood structures within DSA. The performance of our approach is tested over eleven benchmark datasets. Experimental results show that our approach is able to generate competitive results when compared with other state-of-the-art techniques.
Resumo:
This article offers a replication for Britain of Brown and Heywood's analysis of the determinants of performance appraisal in Australia. Although there are some important limiting differences between our two datasets - the Australia Workplace Industrial Relations Survey (AWIRS) and the Workplace Employment Relations Survey (WERS) - we reach one central point of agreement and one intriguing shared insight. First, performance appraisal is negatively associated with tenure: where employers cannot rely on the carrot of deferred pay or the stick of dismissal to motivate workers, they will tend to rely more on monitoring, ceteris paribus. Second, employer monitoring and performance pay may be complementary. However, consonant with the disparate results from the wider literature, there is more modest agreement on the contribution of specific human resource management practices, and still less on the role of job control.
Resumo:
Geographically referenced databases of species records are becoming increasingly available. Doubts over the heterogeneous quality of the underlying data may restrict analyses of such collated databases. We partitioned the spatial variation in species richness of littoral algae and molluscs from the UK National Biodiversity Network database into a smoothed mesoscale component and a local component. Trend surface analysis (TSA) was used to define the mesoscale patterns of species richness, leaving a local residual component that lacked spatial autocorrelation. The analysis was based on 10 km grid squares with 115035 records of littoral algae (729 species) and 66879 records of littoral molluscs (569 species). The TSA identified variation in algal and molluscan species richness with a characteristic length scale of approximately 120 km. Locations of the most species-rich grid squares were consistent with the southern and western bias of species richness in the UK marine flora and fauna. The TSA also identified areas which showed significant changes in the spatial pattern of species richness: breakpoints, which correspond to major headlands along the south coast of England. Patterns of algal and molluscan species richness were broadly congruent. Residual variability was strongly influenced by proxies of collection effort, but local environmental variables including length of the coastline and variability in wave exposure were also important. Relative to the underlying trend, local species richness hotspots occurred on all coasts. While there is some justification for scepticism in analyses of heterogeneous datasets, our results indicate that the analysis of collated datasets can be informative.
Resumo:
An experiment to quantify intra- and interobserver error in anatomical measurements found that interobserver measurements can vary by over 14% of mean specimen length; disparity in measurement increases logarithmically with the number of contributors; instructions did not reduce variation or measurement disparity; scale of the specimen influenced the precision of measurement (relative error increasing with specimen size); different methods of taking a measurement yielded different results, although they did not differ in terms of precision, and topographical complexity of the elements being considered may potentially influence error (error increasing with complexity). These results highlight concerns about introduction of noise and potential bias that should be taken into account when compiling composite datasets and meta-analyses.
Resumo:
Connectivity mapping is a recently developed technique for discovering the underlying connections between different biological states based on gene-expression similarities. The sscMap method has been shown to provide enhanced sensitivity in mapping meaningful connections leading to testable biological hypotheses and in identifying drug candidates with particular pharmacological and/or toxicological properties. Challenges remain, however, as to how to prioritise the large number of discovered connections in an unbiased manner such that the success rate of any following-up investigation can be maximised. We introduce a new concept, gene-signature perturbation, which aims to test whether an identified connection is stable enough against systematic minor changes (perturbation) to the gene-signature. We applied the perturbation method to three independent datasets obtained from the GEO database: acute myeloid leukemia (AML), cervical cancer, and breast cancer treated with letrozole. We demonstrate that the perturbation approach helps to identify meaningful biological connections which suggest the most relevant candidate drugs. In the case of AML, we found that the prevalent compounds were retinoic acids and PPAR activators. For cervical cancer, our results suggested that potential drugs are likely to involve the EGFR pathway; and with the breast cancer dataset, we identified candidates that are involved in prostaglandin inhibition. Thus the gene-signature perturbation approach added real values to the whole connectivity mapping process, allowing for increased specificity in the identification of possible therapeutic candidates.
Resumo:
Background/Aims: The NOS3 gene is a biological and positional candidate for diabetic nephropathy. However, the relationship between NOS3 polymorphisms and renal disease is inconclusive. This study aimed to clarify the association of NOS3 variants with nephropathy in individuals with type 1 diabetes. Methods: We conducted a case-control study examining all common SNPs in the NOS3 gene by a tag SNP approach. Individuals with type 1 diabetes and persistent proteinuria (cases, n = 718) were compared with individuals with type 1 diabetes but no evidence of renal disease (controls, n = 749). Our replication collection comprised 1,105 individuals with type 1 diabetes recruited to a nephropathy case group and 862 control individuals with normal urinary albumin excretion rates. Meta-analysis was conducted for SNPs where more than three genotype datasets were available. Results: A novel association was identified in the discovery collection (rs1800783, p(genotype) = 0.006, p(allele) = 0.002, OR = 1.26, 95% CI: 1.08-1.47) and supported by independent replication using a tag SNP (rs4496877, pairwise r(2) = 0.96 with rs1800783) in the replication collection (p(genotype) = 0.002, p(allele) = 0.0006, OR = 1.27, 95% CI: 1.10-1.45). Conclusion: The A allele of rs1800783 is a significant risk factor for nephropathy in individuals with type 1 diabetes, and further comprehensive studies are warranted to confirm the definitive functional variant in the NOS3 gene. Copyright (C) 2010 S. Karger AG, Basel
Resumo:
Schizophrenia is a common psychotic mental disorder that is believed to result from the effects of multiple genetic and environmental factors. In this study, we explored gene-gene interactions and main effects in both case-control (657 cases and 411 controls) and family-based (273 families, 1350 subjects) datasets of English or Irish ancestry. Fifty three markers in 8 genes were genotyped in the family sample and 44 markers in 7 genes were genotyped in the case-control sample. The Multifactor Dimensionality Reduction Pedigree Disequilibrium Test (MDR-PDT) was used to examine epistasis in the family dataset and a 3-locus model was identified (permuted p=0.003). The 3-locus model involved the IL3 (rs2069803), RGS4 (rs2661319), and DTNBP1 (rs21319539) genes. We used MDR to analyze the case-control dataset containing the same markers typed in the RGS4, IL3 and DTNBP1 genes and found evidence of a joint effect between IL3 (rs31400) and DTNBP1 (rs760761) (cross-validation consistency 4/5, balanced prediction accuracy=56.84%, p=0.019). While this is not a direct replication, the results obtained from both the family and case-control samples collectively suggest that IL3 and DTNBP1 are likely to interact and jointly contribute to increase risk for schizophrenia. We also observed a significant main effect in DTNBP1, which survived correction for multiple comparisons, and numerous nominally significant effects in several genes. (C) 2008 Elsevier B.V. All rights reserved.