942 resultados para Data Driven Modeling
Resumo:
Background: During the last part of the 1990s the chance of surviving breast cancer increased. Changes in survival functions reflect a mixture of effects. Both, the introduction of adjuvant treatments and early screening with mammography played a role in the decline in mortality. Evaluating the contribution of these interventions using mathematical models requires survival functions before and after their introduction. Furthermore, required survival functions may be different by age groups and are related to disease stage at diagnosis. Sometimes detailed information is not available, as was the case for the region of Catalonia (Spain). Then one may derive the functions using information from other geographical areas. This work presents the methodology used to estimate age- and stage-specific Catalan breast cancer survival functions from scarce Catalan survival data by adapting the age- and stage-specific US functions. Methods: Cubic splines were used to smooth data and obtain continuous hazard rate functions. After, we fitted a Poisson model to derive hazard ratios. The model included time as a covariate. Then the hazard ratios were applied to US survival functions detailed by age and stage to obtain Catalan estimations. Results: We started estimating the hazard ratios for Catalonia versus the USA before and after the introduction of screening. The hazard ratios were then multiplied by the age- and stage-specific breast cancer hazard rates from the USA to obtain the Catalan hazard rates. We also compared breast cancer survival in Catalonia and the USA in two time periods, before cancer control interventions (USA 1975–79, Catalonia 1980–89) and after (USA and Catalonia 1990–2001). Survival in Catalonia in the 1980–89 period was worse than in the USA during 1975–79, but the differences disappeared in 1990–2001. Conclusion: Our results suggest that access to better treatments and quality of care contributed to large improvements in survival in Catalonia. On the other hand, we obtained detailed breast cancer survival functions that will be used for modeling the effect of screening and adjuvant treatments in Catalonia.
Resumo:
Approximate models (proxies) can be employed to reduce the computational costs of estimating uncertainty. The price to pay is that the approximations introduced by the proxy model can lead to a biased estimation. To avoid this problem and ensure a reliable uncertainty quantification, we propose to combine functional data analysis and machine learning to build error models that allow us to obtain an accurate prediction of the exact response without solving the exact model for all realizations. We build the relationship between proxy and exact model on a learning set of geostatistical realizations for which both exact and approximate solvers are run. Functional principal components analysis (FPCA) is used to investigate the variability in the two sets of curves and reduce the dimensionality of the problem while maximizing the retained information. Once obtained, the error model can be used to predict the exact response of any realization on the basis of the sole proxy response. This methodology is purpose-oriented as the error model is constructed directly for the quantity of interest, rather than for the state of the system. Also, the dimensionality reduction performed by FPCA allows a diagnostic of the quality of the error model to assess the informativeness of the learning set and the fidelity of the proxy to the exact model. The possibility of obtaining a prediction of the exact response for any newly generated realization suggests that the methodology can be effectively used beyond the context of uncertainty quantification, in particular for Bayesian inference and optimization.
Resumo:
OBJECTIVES: Blood pressures in persons of African descent exceed those of other racial/ethnic groups in the United States. Whether this trait is attributable to the genetic factors in African-origin populations, or a result of inadequately measured environmental exposures, such as racial discrimination, is not known. To study this question, we conducted a multisite comparative study of communities in the African diaspora, drawn from metropolitan Chicago, Kingston, Jamaica, rural Ghana, Cape Town, South Africa, and the Seychelles. METHODS: At each site, 500 participants between the age of 25 and 49 years, with approximately equal sex balance, were enrolled for a longitudinal study of energy expenditure and weight gain. In this study, we describe the patterns of blood pressure and hypertension observed at baseline among the sites. RESULTS: Mean SBP and DBP were very similar in the United States and South Africa in both men and women, although among women, the prevalence of hypertension was higher in the United States (24 vs. 17%, respectively). After adjustment for multiple covariates, relative to participants in the United States, SBP was significantly higher among the South Africans by 9.7 mmHg (P < 0.05) and significantly lower for each of the other sites: for example, Jamaica: -7.9 mmHg (P = 0.06), Ghana: -12.8 mmHg (P < 0.01) and Seychelles: -11.1 mmHg (P = 0.01). CONCLUSION: These data are consistent with prior findings of a blood pressure gradient in societies of the African diaspora and confirm that African-origin populations with lower social status in multiracial societies, such as the United States and South Africa, experience more hypertension than anticipated based on anthropometric and measurable socioeconomic risk factors.
Resumo:
In this paper we provide a formal account for underapplication of vowel reduction to schwa in Majorcan Catalan loanwords and learned words. On the basis of the comparison of these data with those concerning productive derivation and verbal inflection, which show analogous patterns, in this paper we also explore the existing and not yet acknowledged correlation between those processes that exhibit a particular behaviour in the loanword phonology with respect to the native phonology of the language, those processes that show lexical exceptions and those processes that underapply due to morphological reasons. In light of the analysis of the very same data and taking into account the aforementioned correlation, we show how there might exist a natural diachronic relation between two kinds of Optimality Theory constraints which are commonly used but, in principle, mutually exclusive: positional faithfulness and contextual markedness constraints. Overall, phonological productivity is proven to be crucial in three respects: first, as a context of the grammar, given that «underapplication» is systematically found in what we call the productive phonology of the dialect (including loanwords, learned words, productive derivation and verbal inflection); second, as a trigger or blocker of processes, in that the productivity or the lack of productivity of a specific process or constraint in the language is what explains whether it is challenged or not in any of the depicted situations, and, third, as a guiding principle which can explain the transition from the historical to the synchronic phonology of a linguistic variety.
Resumo:
Maximum entropy modeling (Maxent) is a widely used algorithm for predicting species distributions across space and time. Properly assessing the uncertainty in such predictions is non-trivial and requires validation with independent datasets. Notably, model complexity (number of model parameters) remains a major concern in relation to overfitting and, hence, transferability of Maxent models. An emerging approach is to validate the cross-temporal transferability of model predictions using paleoecological data. In this study, we assess the effect of model complexity on the performance of Maxent projections across time using two European plant species (Alnus giutinosa (L.) Gaertn. and Corylus avellana L) with an extensive late Quaternary fossil record in Spain as a study case. We fit 110 models with different levels of complexity under present time and tested model performance using AUC (area under the receiver operating characteristic curve) and AlCc (corrected Akaike Information Criterion) through the standard procedure of randomly partitioning current occurrence data. We then compared these results to an independent validation by projecting the models to mid-Holocene (6000 years before present) climatic conditions in Spain to assess their ability to predict fossil pollen presence-absence and abundance. We find that calibrating Maxent models with default settings result in the generation of overly complex models. While model performance increased with model complexity when predicting current distributions, it was higher with intermediate complexity when predicting mid-Holocene distributions. Hence, models of intermediate complexity resulted in the best trade-off to predict species distributions across time. Reliable temporal model transferability is especially relevant for forecasting species distributions under future climate change. Consequently, species-specific model tuning should be used to find the best modeling settings to control for complexity, notably with paleoecological data to independently validate model projections. For cross-temporal projections of species distributions for which paleoecological data is not available, models of intermediate complexity should be selected.
Resumo:
Tutkimus tarkastelee taloudellisia mallintamismahdollisuuksia metsäteollisuuden liiketoimintayksikössä. Tavoitteena on suunnitella ja luoda taloudellinen malli liiketoimintayksikölle, jonka avulla sen tuloksen analysoiminen ja ennustaminen on mahdollista. Tutkimusta tarkastellaan konstruktiivisen tutkimusmenetelmän avulla. Teoreettinen viitekehys tarkastelee olemassa olevan informaation muotoilemista keskittyen tiedon jalostamisen tarpeisiin, päätöksenteon asettamiin vaatimuksiin sekä mallintamiseen. Toiseksi, teoria esittää informaatiolle asetettavia vaatimuksia organisatorisen ohjauksen näkökulmasta.Empiirinen tieto kerätään osallistuvan havainnoinnin avulla hyödyntäen epävirallisia keskusteluja, tietojärjestelmiä ja laskentatoimen dokumentteja. Tulokset osoittavat, että liikevoiton ennustaminen mallin avulla on vaikeaa, koska taustalla vaikuttavien muuttujien määrä on suuri. Tästä johtuen malli täytyykin rakentaa niin, että se tarkastelee liikevoittoa niin yksityiskohtaisella tasolla kuin mahdollista. Testauksessa mallin tarkkuus osoittautui sitä paremmaksi, mitä tarkemmalla tasolla ennustaminen tapahtui. Lisäksi testaus osoitti, että malli on käyttökelpoinen liiketoiminnan ohjauksessa lyhyellä aikavälillä. Näin se luo myös pohjan pitkän aikavälin ennustamiselle.
Resumo:
Low-copy-number molecules are involved in many functions in cells. The intrinsic fluctuations of these numbers can enable stochastic switching between multiple steady states, inducing phenotypic variability. Herein we present a theoretical and computational study based on Master Equations and Fokker-Planck and Langevin descriptions of stochastic switching for a genetic circuit of autoactivation. We show that in this circuit the intrinsic fluctuations arising from low-copy numbers, which are inherently state-dependent, drive asymmetric switching. These theoretical results are consistent with experimental data that have been reported for the bistable system of the gallactose signaling network in yeast. Our study unravels that intrinsic fluctuations, while not required to describe bistability, are fundamental to understand stochastic switching and the dynamical relative stability of multiple states.
Resumo:
There is great scientific and popular interest in understanding the genetic history of populations in the Americas. We wish to understand when different regions of the continent were inhabited, where settlers came from, and how current inhabitants relate genetically to earlier populations. Recent studies unraveled parts of the genetic history of the continent using genotyping arrays and uniparental markers. The 1000 Genomes Project provides a unique opportunity for improving our understanding of population genetic history by providing over a hundred sequenced low coverage genomes and exomes from Colombian (CLM), Mexican-American (MXL), and Puerto Rican (PUR) populations. Here, we explore the genomic contributions of African, European, and especially Native American ancestry to these populations. Estimated Native American ancestry is 48% in MXL, 25% in CLM, and 13% in PUR. Native American ancestry in PUR is most closely related to populations surrounding the Orinoco River basin, confirming the Southern American ancestry of the Taíno people of the Caribbean. We present new methods to estimate the allele frequencies in the Native American fraction of the populations, and model their distribution using a demographic model for three ancestral Native American populations. These ancestral populations likely split in close succession: the most likely scenario, based on a peopling of the Americas 16 thousand years ago (kya), supports that the MXL Ancestors split 12.2kya, with a subsequent split of the ancestors to CLM and PUR 11.7kya. The model also features effective populations of 62,000 in Mexico, 8,700 in Colombia, and 1,900 in Puerto Rico. Modeling Identity-by-descent (IBD) and ancestry tract length, we show that post-contact populations also differ markedly in their effective sizes and migration patterns, with Puerto Rico showing the smallest effective size and the earlier migration from Europe. Finally, we compare IBD and ancestry assignments to find evidence for relatedness among European founders to the three populations.
Resumo:
Nowadays, Species Distribution Models (SDMs) are a widely used tool. Using different statistical approaches these models reconstruct the realized niche of a species using presence data and a set of variables, often topoclimatic. There utilization range is quite large from understanding single species requirements, to the creation of nature reserve based on species hotspots, or modeling of climate change impact, etc... Most of the time these models are using variables at a resolution of 50km x 50km or 1 km x 1 km. However in some cases these models are used with resolutions below the kilometer scale and thus called high resolution models (100 m x 100 m or 25 m x 25 m). Quite recently a new kind of data has emerged enabling precision up to lm x lm and thus allowing very high resolution modeling. However these new variables are very costly and need an important amount of time to be processed. This is especially the case when these variables are used in complex calculation like models projections over large areas. Moreover the importance of very high resolution data in SDMs has not been assessed yet and is not well understood. Some basic knowledge on what drive species presence-absences is still missing. Indeed, it is not clear whether in mountain areas like the Alps coarse topoclimatic gradients are driving species distributions or if fine scale temperature or topography are more important or if their importance can be neglected when balance to competition or stochasticity. In this thesis I investigated the importance of very high resolution data (2-5m) in species distribution models using either very high resolution topographic, climatic or edaphic variables over a 2000m elevation gradient in the Western Swiss Alps. I also investigated more local responses of these variables for a subset of species living in this area at two precise elvation belts. During this thesis I showed that high resolution data necessitates very good datasets (species and variables for the models) to produce satisfactory results. Indeed, in mountain areas, temperature is the most important factor driving species distribution and needs to be modeled at very fine resolution instead of being interpolated over large surface to produce satisfactory results. Despite the instinctive idea that topographic should be very important at high resolution, results are mitigated. However looking at the importance of variables over a large gradient buffers the importance of the variables. Indeed topographic factors have been shown to be highly important at the subalpine level but their importance decrease at lower elevations. Wether at the mountane level edaphic and land use factors are more important high resolution topographic data is more imporatant at the subalpine level. Finally the biggest improvement in the models happens when edaphic variables are added. Indeed, adding soil variables is of high importance and variables like pH are overpassing the usual topographic variables in SDMs in term of importance in the models. To conclude high resolution is very important in modeling but necessitate very good datasets. Only increasing the resolution of the usual topoclimatic predictors is not sufficient and the use of edaphic predictors has been highlighted as fundamental to produce significantly better models. This is of primary importance, especially if these models are used to reconstruct communities or as basis for biodiversity assessments. -- Ces dernières années, l'utilisation des modèles de distribution d'espèces (SDMs) a continuellement augmenté. Ces modèles utilisent différents outils statistiques afin de reconstruire la niche réalisée d'une espèce à l'aide de variables, notamment climatiques ou topographiques, et de données de présence récoltées sur le terrain. Leur utilisation couvre de nombreux domaines allant de l'étude de l'écologie d'une espèce à la reconstruction de communautés ou à l'impact du réchauffement climatique. La plupart du temps, ces modèles utilisent des occur-rences issues des bases de données mondiales à une résolution plutôt large (1 km ou même 50 km). Certaines bases de données permettent cependant de travailler à haute résolution, par conséquent de descendre en dessous de l'échelle du kilomètre et de travailler avec des résolutions de 100 m x 100 m ou de 25 m x 25 m. Récemment, une nouvelle génération de données à très haute résolution est apparue et permet de travailler à l'échelle du mètre. Les variables qui peuvent être générées sur la base de ces nouvelles données sont cependant très coûteuses et nécessitent un temps conséquent quant à leur traitement. En effet, tout calcul statistique complexe, comme des projections de distribution d'espèces sur de larges surfaces, demande des calculateurs puissants et beaucoup de temps. De plus, les facteurs régissant la distribution des espèces à fine échelle sont encore mal connus et l'importance de variables à haute résolution comme la microtopographie ou la température dans les modèles n'est pas certaine. D'autres facteurs comme la compétition ou la stochasticité naturelle pourraient avoir une influence toute aussi forte. C'est dans ce contexte que se situe mon travail de thèse. J'ai cherché à comprendre l'importance de la haute résolution dans les modèles de distribution d'espèces, que ce soit pour la température, la microtopographie ou les variables édaphiques le long d'un important gradient d'altitude dans les Préalpes vaudoises. J'ai également cherché à comprendre l'impact local de certaines variables potentiellement négligées en raison d'effets confondants le long du gradient altitudinal. Durant cette thèse, j'ai pu monter que les variables à haute résolution, qu'elles soient liées à la température ou à la microtopographie, ne permettent qu'une amélioration substantielle des modèles. Afin de distinguer une amélioration conséquente, il est nécessaire de travailler avec des jeux de données plus importants, tant au niveau des espèces que des variables utilisées. Par exemple, les couches climatiques habituellement interpolées doivent être remplacées par des couches de température modélisées à haute résolution sur la base de données de terrain. Le fait de travailler le long d'un gradient de température de 2000m rend naturellement la température très importante au niveau des modèles. L'importance de la microtopographie est négligeable par rapport à la topographie à une résolution de 25m. Cependant, lorsque l'on regarde à une échelle plus locale, la haute résolution est une variable extrêmement importante dans le milieu subalpin. À l'étage montagnard par contre, les variables liées aux sols et à l'utilisation du sol sont très importantes. Finalement, les modèles de distribution d'espèces ont été particulièrement améliorés par l'addition de variables édaphiques, principalement le pH, dont l'importance supplante ou égale les variables topographique lors de leur ajout aux modèles de distribution d'espèces habituels.
Resumo:
We have investigated the behavior of bistable cells made up of four quantum dots and occupied by two electrons, in the presence of realistic confinement potentials produced by depletion gates on top of a GaAs/AlGaAs heterostructure. Such a cell represents the basic building block for logic architectures based on the concept of quantum cellular automata (QCA) and of ground state computation, which have been proposed as an alternative to traditional transistor-based logic circuits. We have focused on the robustness of the operation of such cells with respect to asymmetries derived from fabrication tolerances. We have developed a two-dimensional model for the calculation of the electron density in a driven cell in response to the polarization state of a driver cell. Our method is based on the one-shot configuration-interaction technique, adapted from molecular chemistry. From the results of our simulations, we conclude that an implementation of QCA logic based on simple ¿hole arrays¿ is not feasible, because of the extreme sensitivity to fabrication tolerances. As an alternative, we propose cells defined by multiple gates, where geometrical asymmetries can be compensated for by adjusting the bias voltages. Even though not immediately applicable to the implementation of logic gates and not suitable for large scale integration, the proposed cell layout should allow an experimental demonstration of a chain of QCA cells.
Resumo:
Many European states apply score systems to evaluate the disability severity of non-fatal motor victims under the law of third-party liability. The score is a non-negative integer with an upper bound at 100 that increases with severity. It may be automatically converted into financial terms and thus also reflects the compensation cost for disability. In this paper, discrete regression models are applied to analyze the factors that influence the disability severity score of victims. Standard and zero-altered regression models are compared from two perspectives: an interpretation of the data generating process and the level of statistical fit. The results have implications for traffic safety policy decisions aimed at reducing accident severity. An application using data from Spain is provided.
Resumo:
The objective of this thesis is to provide a business model framework that connects customer value to firm resources and explains the change logic of the business model. Strategic supply management and especially dynamic value network management as its scope, the dissertation is based on basic economic theories, transaction cost economics and the resource-based view. The main research question is how the changing customer values should be taken into account when planning business in a networked environment. The main question is divided into questions that form the basic research problems for the separate case studies presented in the five Publications. This research adopts the case study strategy, and the constructive research approach within it. The material consists of data from several Delphi panels and expert workshops, software pilot documents, company financial statements and information on investor relations on the companies’ web sites. The cases used in this study are a mobile multi-player game value network, smart phone and “Skype mobile” services, the business models of AOL, eBay, Google, Amazon and a telecom operator, a virtual city portal business system and a multi-play offering. The main contribution of this dissertation is bridging the gap between firm resources and customer value. This has been done by theorizing the business model concept and connecting it to both the resource-based view and customer value. This thesis contributes to the resource-based view, which deals with customer value and firm resources needed to deliver the value but has a gap in explaining how the customer value changes should be connected to the changes in key resources. This dissertation also provides tools and processes for analyzing the customer value preferences of ICT services, constructing and analyzing business models and business concept innovation and conducting resource analysis.
Resumo:
Liver is unique in its capacity to regenerate in response to injury or tissue loss. Hepatocytes and other liver cells are able to proliferate and repopulate the liver. However, when this response is impaired, the contribution of hepatic progenitors becomes very relevant. Here, we present an update of recent studies on growth factors and cytokine-driven intracellular pathways that govern liver stem/progenitor cell expansion and differentiation, and the relevance of these signals in liver development, regeneration and carcinogenesis. Tyrosine kinase receptor signaling, in particular, c-Met, epidermal growth factor receptors or fibroblast growth factor receptors, contribute to proliferation, survival and differentiation of liver stem/progenitor cells. Different evidence suggests a dual role for the transforming growth factor (TGF)-β signaling pathway in liver stemness and differentiation. On the one hand, TGF-β mediates progression of differentiation from a progenitor stage, but on the other hand, it contributes to the expansion of liver stem cells. Hedgehog family ligands are necessary to promote hepatoblast proliferation but need to be shut off to permit subsequent hepatoblast differentiation. In the same line, the Wnt family and β-catenin/T-cell factor pathway is clearly involved in the maintenance of liver stemness phenotype, and its repression is necessary for liver differentiation during development. Collectively, data indicate that liver stem/progenitor cells follow their own rules and regulations. The same signals that are essential for their activation, expansion and differentiation are good candidates to contribute, under adequate conditions, to the paradigm of transformation from a pro-regenerative to a pro-tumorigenic role. From a clinical perspective, this is a fundamental issue for liver stem/progenitor cell-based therapies.
Resumo:
BACKGROUND: Most available pharmacotherapies for alcohol-dependent patients target abstinence; however, reduced alcohol consumption may be a more realistic goal. Using randomized clinical trial (RCT) data, a previous microsimulation model evaluated the clinical relevance of reduced consumption in terms of avoided alcohol-attributable events. Using real-life observational data, the current analysis aimed to adapt the model and confirm previous findings about the clinical relevance of reduced alcohol consumption. METHODS: Based on the prospective observational CONTROL study, evaluating daily alcohol consumption among alcohol-dependent patients, the model predicted the probability of drinking any alcohol during a given day. Predicted daily alcohol consumption was simulated in a hypothetical sample of 200,000 patients observed over a year. Individual total alcohol consumption (TAC) and number of heavy drinking days (HDD) were derived. Using published risk equations, probabilities of alcohol-attributable adverse health events (e.g., hospitalizations or death) corresponding to simulated consumptions were computed, and aggregated for categories of patients defined by HDDs and TAC (expressed per 100,000 patient-years). Sensitivity analyses tested model robustness. RESULTS: Shifting from >220 HDDs per year to 120-140 HDDs and shifting from 36,000-39,000 g TAC per year (120-130 g/day) to 15,000-18,000 g TAC per year (50-60 g/day) impacted substantially on the incidence of events (14,588 and 6148 events avoided per 100,000 patient-years, respectively). Results were robust to sensitivity analyses. CONCLUSIONS: This study corroborates the previous microsimulation modeling approach and, using real-life data, confirms RCT-based findings that reduced alcohol consumption is a relevant objective for consideration in alcohol dependence management to improve public health.
Resumo:
Life sciences are yielding huge data sets that underpin scientific discoveries fundamental to improvement in human health, agriculture and the environment. In support of these discoveries, a plethora of databases and tools are deployed, in technically complex and diverse implementations, across a spectrum of scientific disciplines. The corpus of documentation of these resources is fragmented across the Web, with much redundancy, and has lacked a common standard of information. The outcome is that scientists must often struggle to find, understand, compare and use the best resources for the task at hand.Here we present a community-driven curation effort, supported by ELIXIR-the European infrastructure for biological information-that aspires to a comprehensive and consistent registry of information about bioinformatics resources. The sustainable upkeep of this Tools and Data Services Registry is assured by a curation effort driven by and tailored to local needs, and shared amongst a network of engaged partners.As of November 2015, the registry includes 1785 resources, with depositions from 126 individual registrations including 52 institutional providers and 74 individuals. With community support, the registry can become a standard for dissemination of information about bioinformatics resources: we welcome everyone to join us in this common endeavour. The registry is freely available at https://bio.tools.