67 resultados para Data Modeling
em Universit
Resumo:
This paper presents a review of methodology for semi-supervised modeling with kernel methods, when the manifold assumption is guaranteed to be satisfied. It concerns environmental data modeling on natural manifolds, such as complex topographies of the mountainous regions, where environmental processes are highly influenced by the relief. These relations, possibly regionalized and nonlinear, can be modeled from data with machine learning using the digital elevation models in semi-supervised kernel methods. The range of the tools and methodological issues discussed in the study includes feature selection and semisupervised Support Vector algorithms. The real case study devoted to data-driven modeling of meteorological fields illustrates the discussed approach.
Resumo:
Diffusion MRI has evolved towards an important clinical diagnostic and research tool. Though clinical routine is using mainly diffusion weighted and tensor imaging approaches, Q-ball imaging and diffusion spectrum imaging techniques have become more widely available. They are frequently used in research-oriented investigations in particular those aiming at measuring brain network connectivity. In this work, we aim at assessing the dependency of connectivity measurements on various diffusion encoding schemes in combination with appropriate data modeling. We process and compare the structural connection matrices computed from several diffusion encoding schemes, including diffusion tensor imaging, q-ball imaging and high angular resolution schemes, such as diffusion spectrum imaging with a publically available processing pipeline for data reconstruction, tracking and visualization of diffusion MR imaging. The results indicate that the high angular resolution schemes maximize the number of obtained connections when applying identical processing strategies to the different diffusion schemes. Compared to the conventional diffusion tensor imaging, the added connectivity is mainly found for pathways in the 50-100mm range, corresponding to neighboring association fibers and long-range associative, striatal and commissural fiber pathways. The analysis of the major associative fiber tracts of the brain reveals striking differences between the applied diffusion schemes. More complex data modeling techniques (beyond tensor model) are recommended 1) if the tracts of interest run through large fiber crossings such as the centrum semi-ovale, or 2) if non-dominant fiber populations, e.g. the neighboring association fibers are the subject of investigation. An important finding of the study is that since the ground truth sensitivity and specificity is not known, the comparability between results arising from different strategies in data reconstruction and/or tracking becomes implausible to understand.
Resumo:
Odds ratios for head and neck cancer increase with greater cigarette and alcohol use and lower body mass index (BMI; weight (kg)/height(2) (m(2))). Using data from the International Head and Neck Cancer Epidemiology Consortium, the authors conducted a formal analysis of BMI as a modifier of smoking- and alcohol-related effects. Analysis of never and current smokers included 6,333 cases, while analysis of never drinkers and consumers of < or =10 drinks/day included 8,452 cases. There were 8,000 or more controls, depending on the analysis. Odds ratios for all sites increased with lower BMI, greater smoking, and greater drinking. In polytomous regression, odds ratios for BMI (P = 0.65), smoking (P = 0.52), and drinking (P = 0.73) were homogeneous for oral cavity and pharyngeal cancers. Odds ratios for BMI and drinking were greater for oral cavity/pharyngeal cancer (P < 0.01), while smoking odds ratios were greater for laryngeal cancer (P < 0.01). Lower BMI enhanced smoking- and drinking-related odds ratios for oral cavity/pharyngeal cancer (P < 0.01), while BMI did not modify smoking and drinking odds ratios for laryngeal cancer. The increased odds ratios for all sites with low BMI may suggest related carcinogenic mechanisms; however, BMI modification of smoking and drinking odds ratios for cancer of the oral cavity/pharynx but not larynx cancer suggests additional factors specific to oral cavity/pharynx cancer.
Resumo:
Natural selection is typically exerted at some specific life stages. If natural selection takes place before a trait can be measured, using conventional models can cause wrong inference about population parameters. When the missing data process relates to the trait of interest, a valid inference requires explicit modeling of the missing process. We propose a joint modeling approach, a shared parameter model, to account for nonrandom missing data. It consists of an animal model for the phenotypic data and a logistic model for the missing process, linked by the additive genetic effects. A Bayesian approach is taken and inference is made using integrated nested Laplace approximations. From a simulation study we find that wrongly assuming that missing data are missing at random can result in severely biased estimates of additive genetic variance. Using real data from a wild population of Swiss barn owls Tyto alba, our model indicates that the missing individuals would display large black spots; and we conclude that genes affecting this trait are already under selection before it is expressed. Our model is a tool to correctly estimate the magnitude of both natural selection and additive genetic variance.
Resumo:
Despite their limited proliferation capacity, regulatory T cells (T(regs)) constitute a population maintained over the entire lifetime of a human organism. The means by which T(regs) sustain a stable pool in vivo are controversial. Using a mathematical model, we address this issue by evaluating several biological scenarios of the origins and the proliferation capacity of two subsets of T(regs): precursor CD4(+)CD25(+)CD45RO(-) and mature CD4(+)CD25(+)CD45RO(+) cells. The lifelong dynamics of T(regs) are described by a set of ordinary differential equations, driven by a stochastic process representing the major immune reactions involving these cells. The model dynamics are validated using data from human donors of different ages. Analysis of the data led to the identification of two properties of the dynamics: (1) the equilibrium in the CD4(+)CD25(+)FoxP3(+)T(regs) population is maintained over both precursor and mature T(regs) pools together, and (2) the ratio between precursor and mature T(regs) is inverted in the early years of adulthood. Then, using the model, we identified three biologically relevant scenarios that have the above properties: (1) the unique source of mature T(regs) is the antigen-driven differentiation of precursors that acquire the mature profile in the periphery and the proliferation of T(regs) is essential for the development and the maintenance of the pool; there exist other sources of mature T(regs), such as (2) a homeostatic density-dependent regulation or (3) thymus- or effector-derived T(regs), and in both cases, antigen-induced proliferation is not necessary for the development of a stable pool of T(regs). This is the first time that a mathematical model built to describe the in vivo dynamics of regulatory T cells is validated using human data. The application of this model provides an invaluable tool in estimating the amount of regulatory T cells as a function of time in the blood of patients that received a solid organ transplant or are suffering from an autoimmune disease.
Resumo:
The present research deals with an important public health threat, which is the pollution created by radon gas accumulation inside dwellings. The spatial modeling of indoor radon in Switzerland is particularly complex and challenging because of many influencing factors that should be taken into account. Indoor radon data analysis must be addressed from both a statistical and a spatial point of view. As a multivariate process, it was important at first to define the influence of each factor. In particular, it was important to define the influence of geology as being closely associated to indoor radon. This association was indeed observed for the Swiss data but not probed to be the sole determinant for the spatial modeling. The statistical analysis of data, both at univariate and multivariate level, was followed by an exploratory spatial analysis. Many tools proposed in the literature were tested and adapted, including fractality, declustering and moving windows methods. The use of Quan-tité Morisita Index (QMI) as a procedure to evaluate data clustering in function of the radon level was proposed. The existing methods of declustering were revised and applied in an attempt to approach the global histogram parameters. The exploratory phase comes along with the definition of multiple scales of interest for indoor radon mapping in Switzerland. The analysis was done with a top-to-down resolution approach, from regional to local lev¬els in order to find the appropriate scales for modeling. In this sense, data partition was optimized in order to cope with stationary conditions of geostatistical models. Common methods of spatial modeling such as Κ Nearest Neighbors (KNN), variography and General Regression Neural Networks (GRNN) were proposed as exploratory tools. In the following section, different spatial interpolation methods were applied for a par-ticular dataset. A bottom to top method complexity approach was adopted and the results were analyzed together in order to find common definitions of continuity and neighborhood parameters. Additionally, a data filter based on cross-validation was tested with the purpose of reducing noise at local scale (the CVMF). At the end of the chapter, a series of test for data consistency and methods robustness were performed. This lead to conclude about the importance of data splitting and the limitation of generalization methods for reproducing statistical distributions. The last section was dedicated to modeling methods with probabilistic interpretations. Data transformation and simulations thus allowed the use of multigaussian models and helped take the indoor radon pollution data uncertainty into consideration. The catego-rization transform was presented as a solution for extreme values modeling through clas-sification. Simulation scenarios were proposed, including an alternative proposal for the reproduction of the global histogram based on the sampling domain. The sequential Gaussian simulation (SGS) was presented as the method giving the most complete information, while classification performed in a more robust way. An error measure was defined in relation to the decision function for data classification hardening. Within the classification methods, probabilistic neural networks (PNN) show to be better adapted for modeling of high threshold categorization and for automation. Support vector machines (SVM) on the contrary performed well under balanced category conditions. In general, it was concluded that a particular prediction or estimation method is not better under all conditions of scale and neighborhood definitions. Simulations should be the basis, while other methods can provide complementary information to accomplish an efficient indoor radon decision making.
Resumo:
The paper presents some contemporary approaches to spatial environmental data analysis. The main topics are concentrated on the decision-oriented problems of environmental spatial data mining and modeling: valorization and representativity of data with the help of exploratory data analysis, spatial predictions, probabilistic and risk mapping, development and application of conditional stochastic simulation models. The innovative part of the paper presents integrated/hybrid model-machine learning (ML) residuals sequential simulations-MLRSS. The models are based on multilayer perceptron and support vector regression ML algorithms used for modeling long-range spatial trends and sequential simulations of the residuals. NIL algorithms deliver non-linear solution for the spatial non-stationary problems, which are difficult for geostatistical approach. Geostatistical tools (variography) are used to characterize performance of ML algorithms, by analyzing quality and quantity of the spatially structured information extracted from data with ML algorithms. Sequential simulations provide efficient assessment of uncertainty and spatial variability. Case study from the Chernobyl fallouts illustrates the performance of the proposed model. It is shown that probability mapping, provided by the combination of ML data driven and geostatistical model based approaches, can be efficiently used in decision-making process. (C) 2003 Elsevier Ltd. All rights reserved.
Resumo:
BACKGROUND: Globally, Africans and African Americans experience a disproportionate burden of type 2 diabetes, compared to other race and ethnic groups. The aim of the study was to examine the association of plasma glucose with indices of glucose metabolism in young adults of African origin from 5 different countries. METHODS: We identified participants from the Modeling the Epidemiologic Transition Study, an international study of weight change and cardiovascular disease (CVD) risk in five populations of African origin: USA (US), Jamaica, Ghana, South Africa, and Seychelles. For the current study, we included 667 participants (34.8 ± 6.3 years), with measures of plasma glucose, insulin, leptin, and adiponectin, as well as moderate and vigorous physical activity (MVPA, minutes/day [min/day]), daily sedentary time (min/day), anthropometrics, and body composition. RESULTS: Among the 282 men, body mass index (BMI) ranged from 22.1 to 29.6 kg/m(2) in men and from 25.8 to 34.8 kg/m(2) in 385 women. MVPA ranged from 26.2 to 47.1 min/day in men, and from 14.3 to 27.3 min/day in women and correlated with adiposity (BMI, waist size, and % body fat) only among US males after controlling for age. Plasma glucose ranged from 4.6 ± 0.8 mmol/L in the South African men to 5.8 mmol/L US men, while the overall prevalence for diabetes was very low, except in the US men and women (6.7 and 12 %, respectively). Using multivariate linear regression, glucose was associated with BMI, age, sex, smoking hypertension, daily sedentary time but not daily MVPA. CONCLUSION: Obesity, metabolic risk, and other potential determinants vary significantly between populations at differing stages of the epidemiologic transition, requiring tailored public health policies to address local population characteristics.
Ab initio modeling and molecular dynamics simulation of the alpha 1b-adrenergic receptor activation.
Resumo:
This work describes the ab initio procedure employed to build an activation model for the alpha 1b-adrenergic receptor (alpha 1b-AR). The first version of the model was progressively modified and complicated by means of a many-step iterative procedure characterized by the employment of experimental validations of the model in each upgrading step. A combined simulated (molecular dynamics) and experimental mutagenesis approach was used to determine the structural and dynamic features characterizing the inactive and active states of alpha 1b-AR. The latest version of the model has been successfully challenged with respect to its ability to interpret and predict the functional properties of a large number of mutants. The iterative approach employed to describe alpha 1b-AR activation in terms of molecular structure and dynamics allows further complications of the model to allow prediction and interpretation of an ever-increasing number of experimental data.
Resumo:
Among the largest resources for biological sequence data is the large amount of expressed sequence tags (ESTs) available in public and proprietary databases. ESTs provide information on transcripts but for technical reasons they often contain sequencing errors. Therefore, when analyzing EST sequences computationally, such errors must be taken into account. Earlier attempts to model error prone coding regions have shown good performance in detecting and predicting these while correcting sequencing errors using codon usage frequencies. In the research presented here, we improve the detection of translation start and stop sites by integrating a more complex mRNA model with codon usage bias based error correction into one hidden Markov model (HMM), thus generalizing this error correction approach to more complex HMMs. We show that our method maintains the performance in detecting coding sequences.
Resumo:
In recent years, multi-atlas fusion methods have gainedsignificant attention in medical image segmentation. Inthis paper, we propose a general Markov Random Field(MRF) based framework that can perform edge-preservingsmoothing of the labels at the time of fusing the labelsitself. More specifically, we formulate the label fusionproblem with MRF-based neighborhood priors, as an energyminimization problem containing a unary data term and apairwise smoothness term. We present how the existingfusion methods like majority voting, global weightedvoting and local weighted voting methods can be reframedto profit from the proposed framework, for generatingmore accurate segmentations as well as more contiguoussegmentations by getting rid of holes and islands. Theproposed framework is evaluated for segmenting lymphnodes in 3D head and neck CT images. A comparison ofvarious fusion algorithms is also presented.
Resumo:
PECUBE is a three-dimensional thermal-kinematic code capable of solving the heat production-diffusion-advection equation under a temporally varying surface boundary condition. It was initially developed to assess the effects of time-varying surface topography (relief) on low-temperature thermochronological datasets. Thermochronometric ages are predicted by tracking the time-temperature histories of rock-particles ending up at the surface and by combining these with various age-prediction models. In the decade since its inception, the PECUBE code has been under continuous development as its use became wider and addressed different tectonic-geomorphic problems. This paper describes several major recent improvements in the code, including its integration with an inverse-modeling package based on the Neighborhood Algorithm, the incorporation of fault-controlled kinematics, several different ways to address topographic and drainage change through time, the ability to predict subsurface (tunnel or borehole) data, prediction of detrital thermochronology data and a method to compare these with observations, and the coupling with landscape-evolution (or surface-process) models. Each new development is described together with one or several applications, so that the reader and potential user can clearly assess and make use of the capabilities of PECUBE. We end with describing some developments that are currently underway or should take place in the foreseeable future. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
1. Species distribution modelling is used increasingly in both applied and theoretical research to predict how species are distributed and to understand attributes of species' environmental requirements. In species distribution modelling, various statistical methods are used that combine species occurrence data with environmental spatial data layers to predict the suitability of any site for that species. While the number of data sharing initiatives involving species' occurrences in the scientific community has increased dramatically over the past few years, various data quality and methodological concerns related to using these data for species distribution modelling have not been addressed adequately. 2. We evaluated how uncertainty in georeferences and associated locational error in occurrences influence species distribution modelling using two treatments: (1) a control treatment where models were calibrated with original, accurate data and (2) an error treatment where data were first degraded spatially to simulate locational error. To incorporate error into the coordinates, we moved each coordinate with a random number drawn from the normal distribution with a mean of zero and a standard deviation of 5 km. We evaluated the influence of error on the performance of 10 commonly used distributional modelling techniques applied to 40 species in four distinct geographical regions. 3. Locational error in occurrences reduced model performance in three of these regions; relatively accurate predictions of species distributions were possible for most species, even with degraded occurrences. Two species distribution modelling techniques, boosted regression trees and maximum entropy, were the best performing models in the face of locational errors. The results obtained with boosted regression trees were only slightly degraded by errors in location, and the results obtained with the maximum entropy approach were not affected by such errors. 4. Synthesis and applications. To use the vast array of occurrence data that exists currently for research and management relating to the geographical ranges of species, modellers need to know the influence of locational error on model quality and whether some modelling techniques are particularly robust to error. We show that certain modelling techniques are particularly robust to a moderate level of locational error and that useful predictions of species distributions can be made even when occurrence data include some error.