962 resultados para data complexity
Resumo:
The schema of an information system can significantly impact the ability of end users to efficiently and effectively retrieve the information they need. Obtaining quickly the appropriate data increases the likelihood that an organization will make good decisions and respond adeptly to challenges. This research presents and validates a methodology for evaluating, ex ante, the relative desirability of alternative instantiations of a model of data. In contrast to prior research, each instantiation is based on a different formal theory. This research theorizes that the instantiation that yields the lowest weighted average query complexity for a representative sample of information requests is the most desirable instantiation for end-user queries. The theory was validated by an experiment that compared end-user performance using an instantiation of a data structure based on the relational model of data with performance using the corresponding instantiation of the data structure based on the object-relational model of data. Complexity was measured using three different Halstead metrics: program length, difficulty, and effort. For a representative sample of queries, the average complexity using each instantiation was calculated. As theorized, end users querying the instantiation with the lower average complexity made fewer semantic errors, i.e., were more effective at composing queries. (c) 2005 Elsevier B.V. All rights reserved.
Resumo:
The increased data complexity and task interdependency associated with servitization represent significant barriers to its adoption. The outline of a business game is presented which demonstrates the increasing complexity of the management problem when moving through Base, Intermediate and Advanced levels of servitization. Linked data is proposed as an agile set of technologies, based on well established standards, for data exchange both in the game and more generally in supply chains.
Resumo:
Principal component analysis (PCA) is one of the most popular techniques for processing, compressing and visualising data, although its effectiveness is limited by its global linearity. While nonlinear variants of PCA have been proposed, an alternative paradigm is to capture data complexity by a combination of local linear PCA projections. However, conventional PCA does not correspond to a probability density, and so there is no unique way to combine PCA models. Previous attempts to formulate mixture models for PCA have therefore to some extent been ad hoc. In this paper, PCA is formulated within a maximum-likelihood framework, based on a specific form of Gaussian latent variable model. This leads to a well-defined mixture model for probabilistic principal component analysers, whose parameters can be determined using an EM algorithm. We discuss the advantages of this model in the context of clustering, density modelling and local dimensionality reduction, and we demonstrate its application to image compression and handwritten digit recognition.
Resumo:
The increased data complexity and task interdependency associated with servitization represent significant barriers to its adoption. The outline of a business game is presented which demonstrates the increasing complexity of the management problem when moving through Base, Intermediate and Advanced levels of servitization. Linked data is proposed as an agile set of technologies, based on well established standards, for data exchange both in the game and more generally in supply chains.
Resumo:
MOTIVATION: High-throughput sequencing technologies enable the genome-wide analysis of the impact of genetic variation on molecular phenotypes at unprecedented resolution. However, although powerful, these technologies can also introduce unexpected artifacts. Results: We investigated the impact of library amplification bias on the identification of allele-specific (AS) molecular events from high-throughput sequencing data derived from chromatin immunoprecipitation assays (ChIP-seq). Putative AS DNA binding activity for RNA polymerase II was determined using ChIP-seq data derived from lymphoblastoid cell lines of two parent-daughter trios. We found that, at high-sequencing depth, many significant AS binding sites suffered from an amplification bias, as evidenced by a larger number of clonal reads representing one of the two alleles. To alleviate this bias, we devised an amplification bias detection strategy, which filters out sites with low read complexity and sites featuring a significant excess of clonal reads. This method will be useful for AS analyses involving ChIP-seq and other functional sequencing assays.
Resumo:
Maximum entropy modeling (Maxent) is a widely used algorithm for predicting species distributions across space and time. Properly assessing the uncertainty in such predictions is non-trivial and requires validation with independent datasets. Notably, model complexity (number of model parameters) remains a major concern in relation to overfitting and, hence, transferability of Maxent models. An emerging approach is to validate the cross-temporal transferability of model predictions using paleoecological data. In this study, we assess the effect of model complexity on the performance of Maxent projections across time using two European plant species (Alnus giutinosa (L.) Gaertn. and Corylus avellana L) with an extensive late Quaternary fossil record in Spain as a study case. We fit 110 models with different levels of complexity under present time and tested model performance using AUC (area under the receiver operating characteristic curve) and AlCc (corrected Akaike Information Criterion) through the standard procedure of randomly partitioning current occurrence data. We then compared these results to an independent validation by projecting the models to mid-Holocene (6000 years before present) climatic conditions in Spain to assess their ability to predict fossil pollen presence-absence and abundance. We find that calibrating Maxent models with default settings result in the generation of overly complex models. While model performance increased with model complexity when predicting current distributions, it was higher with intermediate complexity when predicting mid-Holocene distributions. Hence, models of intermediate complexity resulted in the best trade-off to predict species distributions across time. Reliable temporal model transferability is especially relevant for forecasting species distributions under future climate change. Consequently, species-specific model tuning should be used to find the best modeling settings to control for complexity, notably with paleoecological data to independently validate model projections. For cross-temporal projections of species distributions for which paleoecological data is not available, models of intermediate complexity should be selected.
Resumo:
Complexity in time series is an intriguing feature of living dynamical systems, with potential use for identification of system state. Although various methods have been proposed for measuring physiologic complexity, uncorrelated time series are often assigned high values of complexity, errouneously classifying them as a complex physiological signals. Here, we propose and discuss a method for complex system analysis based on generalized statistical formalism and surrogate time series. Sample entropy (SampEn) was rewritten inspired in Tsallis generalized entropy, as function of q parameter (qSampEn). qSDiff curves were calculated, which consist of differences between original and surrogate series qSampEn. We evaluated qSDiff for 125 real heart rate variability (HRV) dynamics, divided into groups of 70 healthy, 44 congestive heart failure (CHF), and 11 atrial fibrillation (AF) subjects, and for simulated series of stochastic and chaotic process. The evaluations showed that, for nonperiodic signals, qSDiff curves have a maximum point (qSDiff(max)) for q not equal 1. Values of q where the maximum point occurs and where qSDiff is zero were also evaluated. Only qSDiff(max) values were capable of distinguish HRV groups (p-values 5.10 x 10(-3); 1.11 x 10(-7), and 5.50 x 10(-7) for healthy vs. CHF, healthy vs. AF, and CHF vs. AF, respectively), consistently with the concept of physiologic complexity, and suggests a potential use for chaotic system analysis. (C) 2012 American Institute of Physics. [http://dx.doi.org/10.1063/1.4758815]
Resumo:
Intensity modulated radiation therapy (IMRT) is a technique that delivers a highly conformal dose distribution to a target volume while attempting to maximally spare the surrounding normal tissues. IMRT is a common treatment modality used for treating head and neck (H&N) cancers, and the presence of many critical structures in this region requires accurate treatment delivery. The Radiological Physics Center (RPC) acts as both a remote and on-site quality assurance agency that credentials institutions participating in clinical trials. To date, about 30% of all IMRT participants have failed the RPC’s remote audit using the IMRT H&N phantom. The purpose of this project is to evaluate possible causes of H&N IMRT delivery errors observed by the RPC, specifically IMRT treatment plan complexity and the use of improper dosimetry data from machines that were thought to be matched but in reality were not. Eight H&N IMRT plans with a range of complexity defined by total MU (1460-3466), number of segments (54-225), and modulation complexity scores (MCS) (0.181-0.609) were created in Pinnacle v.8m. These plans were delivered to the RPC’s H&N phantom on a single Varian Clinac. One of the IMRT plans (1851 MU, 88 segments, and MCS=0.469) was equivalent to the median H&N plan from 130 previous RPC H&N phantom irradiations. This average IMRT plan was also delivered on four matched Varian Clinac machines and the dose distribution calculated using a different 6MV beam model. Radiochromic film and TLD within the phantom were used to analyze the dose profiles and absolute doses, respectively. The measured and calculated were compared to evaluate the dosimetric accuracy. All deliveries met the RPC acceptance criteria of ±7% absolute dose difference and 4 mm distance-to-agreement (DTA). Additionally, gamma index analysis was performed for all deliveries using a ±7%/4mm and ±5%/3mm criteria. Increasing the treatment plan complexity by varying the MU, number of segments, or varying the MCS resulted in no clear trend toward an increase in dosimetric error determined by the absolute dose difference, DTA, or gamma index. Varying the delivery machines as well as the beam model (use of a Clinac 6EX 6MV beam model vs. Clinac 21EX 6MV model), also did not show any clear trend towards an increased dosimetric error using the same criteria indicated above.
Resumo:
Valoración de la transferencia temporal de los modelos de distribución de especies para su aplicación en nuestros días utilizando datos paleobotánicos Corilus avellana y Alnus glutinosa.
Resumo:
DUE TO COPYRIGHT RESTRICTIONS ONLY AVAILABLE FOR CONSULTATION AT ASTON UNIVERSITY LIBRARY AND INFORMATION SERVICES WITH PRIOR ARRANGEMENT
Resumo:
DUE TO COPYRIGHT RESTRICTIONS ONLY AVAILABLE FOR CONSULTATION AT ASTON UNIVERSITY LIBRARY AND INFORMATION SERVICES WITH PRIOR ARRANGEMENT
Resumo:
Background: The inherent complexity of statistical methods and clinical phenomena compel researchers with diverse domains of expertise to work in interdisciplinary teams, where none of them have a complete knowledge in their counterpart's field. As a result, knowledge exchange may often be characterized by miscommunication leading to misinterpretation, ultimately resulting in errors in research and even clinical practice. Though communication has a central role in interdisciplinary collaboration and since miscommunication can have a negative impact on research processes, to the best of our knowledge, no study has yet explored how data analysis specialists and clinical researchers communicate over time. Methods/Principal Findings: We conducted qualitative analysis of encounters between clinical researchers and data analysis specialists (epidemiologist, clinical epidemiologist, and data mining specialist). These encounters were recorded and systematically analyzed using a grounded theory methodology for extraction of emerging themes, followed by data triangulation and analysis of negative cases for validation. A policy analysis was then performed using a system dynamics methodology looking for potential interventions to improve this process. Four major emerging themes were found. Definitions using lay language were frequently employed as a way to bridge the language gap between the specialties. Thought experiments presented a series of ""what if'' situations that helped clarify how the method or information from the other field would behave, if exposed to alternative situations, ultimately aiding in explaining their main objective. Metaphors and analogies were used to translate concepts across fields, from the unfamiliar to the familiar. Prolepsis was used to anticipate study outcomes, thus helping specialists understand the current context based on an understanding of their final goal. Conclusion/Significance: The communication between clinical researchers and data analysis specialists presents multiple challenges that can lead to errors.
Resumo:
During the early Holocene two main paleoamerican cultures thrived in Brazil: the Tradicao Nordeste in the semi-desertic Sertao and the Tradicao Itaparica in the high plains of the Planalto Central. Here we report on paleodietary singals of a Paleoamerican found in a third Brazilian ecological setting - a riverine shellmound, or sambaqui, located in the Atlantic forest. Most sambaquis are found along the coast. The peoples associated with them subsisted on marine resources. We are reporting a different situation from the oldest recorded riverine sambaqui, called Capelinha. Capelinha is a relatively small sambaqui established along a river 60 km from the Atlantic Ocean coast. It contained the well-preserved remains of a Paleoamerican known as Luzio dated to 9,945 +/- 235 years ago; the oldest sambaqui dweller so far. Luzio's bones were remarkably well preserved and allowed for stable isotopic analysis of diet. Although artifacts found at this riverine site show connections with the Atlantic coast, we show that he represents a population that was dependent on inland resources as opposed to marine coastal resources. After comparing Luzio's paleodietary data with that of other extant and prehistoric groups, we discuss where his group could have come from, if terrestrial diet persisted in riverine sambaquis and how Luzio fits within the discussion of the replacement of paleamerican by amerindian morphology. This study adds to the evidence that shows a greater complexity in the prehistory of the colonization of and the adaptations to the New World.