965 resultados para linear machine modeling
Resumo:
Tractography is a class of algorithms aiming at in vivo mapping the major neuronal pathways in the white matter from diffusion magnetic resonance imaging (MRI) data. These techniques offer a powerful tool to noninvasively investigate at the macroscopic scale the architecture of the neuronal connections of the brain. However, unfortunately, the reconstructions recovered with existing tractography algorithms are not really quantitative even though diffusion MRI is a quantitative modality by nature. As a matter of fact, several techniques have been proposed in recent years to estimate, at the voxel level, intrinsic microstructural features of the tissue, such as axonal density and diameter, by using multicompartment models. In this paper, we present a novel framework to reestablish the link between tractography and tissue microstructure. Starting from an input set of candidate fiber-tracts, which are estimated from the data using standard fiber-tracking techniques, we model the diffusion MRI signal in each voxel of the image as a linear combination of the restricted and hindered contributions generated in every location of the brain by these candidate tracts. Then, we seek for the global weight of each of them, i.e., the effective contribution or volume, such that they globally fit the measured signal at best. We demonstrate that these weights can be easily recovered by solving a global convex optimization problem and using efficient algorithms. The effectiveness of our approach has been evaluated both on a realistic phantom with known ground-truth and in vivo brain data. Results clearly demonstrate the benefits of the proposed formulation, opening new perspectives for a more quantitative and biologically plausible assessment of the structural connectivity of the brain.
Resumo:
BACKGROUND: We sought to improve upon previously published statistical modeling strategies for binary classification of dyslipidemia for general population screening purposes based on the waist-to-hip circumference ratio and body mass index anthropometric measurements. METHODS: Study subjects were participants in WHO-MONICA population-based surveys conducted in two Swiss regions. Outcome variables were based on the total serum cholesterol to high density lipoprotein cholesterol ratio. The other potential predictor variables were gender, age, current cigarette smoking, and hypertension. The models investigated were: (i) linear regression; (ii) logistic classification; (iii) regression trees; (iv) classification trees (iii and iv are collectively known as "CART"). Binary classification performance of the region-specific models was externally validated by classifying the subjects from the other region. RESULTS: Waist-to-hip circumference ratio and body mass index remained modest predictors of dyslipidemia. Correct classification rates for all models were 60-80%, with marked gender differences. Gender-specific models provided only small gains in classification. The external validations provided assurance about the stability of the models. CONCLUSIONS: There were no striking differences between either the algebraic (i, ii) vs. non-algebraic (iii, iv), or the regression (i, iii) vs. classification (ii, iv) modeling approaches. Anticipated advantages of the CART vs. simple additive linear and logistic models were less than expected in this particular application with a relatively small set of predictor variables. CART models may be more useful when considering main effects and interactions between larger sets of predictor variables.
Resumo:
Piecewise linear models systems arise as mathematical models of systems in many practical applications, often from linearization for nonlinear systems. There are two main approaches of dealing with these systems according to their continuous or discrete-time aspects. We propose an approach which is based on the state transformation, more particularly the partition of the phase portrait in different regions where each subregion is modeled as a two-dimensional linear time invariant system. Then the Takagi-Sugeno model, which is a combination of local model is calculated. The simulation results show that the Alpha partition is well-suited for dealing with such a system
Resumo:
BACKGROUND: Risks of significant infant drug exposurethrough breastmilk are poorly defined for many drugs, and largescalepopulation data are lacking. We used population pharmacokinetics(PK) modeling to predict fluoxetine exposure levels ofinfants via mother's milk in a simulated population of 1000 motherinfantpairs.METHODS: Using our original data on fluoxetine PK of 25breastfeeding women, a population PK model was developed withNONMEM and parameters, including milk concentrations, wereestimated. An exponential distribution model was used to account forindividual variation. Simulation random and distribution-constrainedassignment of doses, dosing time, feeding intervals and milk volumewas conducted to generate 1000 mother-infant pairs with characteristicssuch as the steady-state serum concentrations (Css) and infantdose relative to the maternal weight-adjusted dose (relative infantdose: RID). Full bioavailability and a conservative point estimate of1-month-old infant CYP2D6 activity to be 20% of the adult value(adjusted by weigth) according to a recent study, were assumed forinfant Css calculations.RESULTS: A linear 2-compartment model was selected as thebest model. Derived parameters, including milk-to-plasma ratios(mean: 0.66; SD: 0.34; range, 0 - 1.1) were consistent with the valuesreported in the literature. The estimated RID was below 10% in >95%of infants. The model predicted median infant-mother Css ratio was0.096 (range 0.035 - 0.25); literature reported mean was 0.07 (range0-0.59). Moreover, the predicted incidence of infant-mother Css ratioof >0.2 was less than 1%.CONCLUSION: Our in silico model prediction is consistent withclinical observations, suggesting that substantial systemic fluoxetineexposure in infants through human milk is rare, but further analysisshould include active metabolites. Our approach may be valid forother drugs. [supported by CIHR and Swiss National Science Foundation(SNSF)]
Resumo:
An important statistical development of the last 30 years has been the advance in regression analysis provided by generalized linear models (GLMs) and generalized additive models (GAMs). Here we introduce a series of papers prepared within the framework of an international workshop entitled: Advances in GLMs/GAMs modeling: from species distribution to environmental management, held in Riederalp, Switzerland, 6-11 August 2001.We first discuss some general uses of statistical models in ecology, as well as provide a short review of several key examples of the use of GLMs and GAMs in ecological modeling efforts. We next present an overview of GLMs and GAMs, and discuss some of their related statistics used for predictor selection, model diagnostics, and evaluation. Included is a discussion of several new approaches applicable to GLMs and GAMs, such as ridge regression, an alternative to stepwise selection of predictors, and methods for the identification of interactions by a combined use of regression trees and several other approaches. We close with an overview of the papers and how we feel they advance our understanding of their application to ecological modeling.
Resumo:
This paper presents a review of methodology for semi-supervised modeling with kernel methods, when the manifold assumption is guaranteed to be satisfied. It concerns environmental data modeling on natural manifolds, such as complex topographies of the mountainous regions, where environmental processes are highly influenced by the relief. These relations, possibly regionalized and nonlinear, can be modeled from data with machine learning using the digital elevation models in semi-supervised kernel methods. The range of the tools and methodological issues discussed in the study includes feature selection and semisupervised Support Vector algorithms. The real case study devoted to data-driven modeling of meteorological fields illustrates the discussed approach.
Resumo:
Most research on single machine scheduling has assumedthe linearity of job holding costs, which is arguablynot appropriate in some applications. This motivates ourstudy of a model for scheduling $n$ classes of stochasticjobs on a single machine, with the objective of minimizingthe total expected holding cost (discounted or undiscounted). We allow general holding cost rates that are separable,nondecreasing and convex on the number of jobs in eachclass. We formulate the problem as a linear program overa certain greedoid polytope, and establish that it issolved optimally by a dynamic (priority) index rule,whichextends the classical Smith's rule (1956) for the linearcase. Unlike Smith's indices, defined for each class, ournew indices are defined for each extended class, consistingof a class and a number of jobs in that class, and yieldan optimal dynamic index rule: work at each time on a jobwhose current extended class has larger index. We furthershow that the indices possess a decomposition property,as they are computed separately for each class, andinterpret them in economic terms as marginal expected cost rate reductions per unit of expected processing time.We establish the results by deploying a methodology recentlyintroduced by us [J. Niño-Mora (1999). "Restless bandits,partial conservation laws, and indexability. "Forthcomingin Advances in Applied Probability Vol. 33 No. 1, 2001],based on the satisfaction by performance measures of partialconservation laws (PCL) (which extend the generalizedconservation laws of Bertsimas and Niño-Mora (1996)):PCL provide a polyhedral framework for establishing theoptimality of index policies with special structure inscheduling problems under admissible objectives, which weapply to the model of concern.
Resumo:
The research considers the problem of spatial data classification using machine learning algorithms: probabilistic neural networks (PNN) and support vector machines (SVM). As a benchmark model simple k-nearest neighbor algorithm is considered. PNN is a neural network reformulation of well known nonparametric principles of probability density modeling using kernel density estimator and Bayesian optimal or maximum a posteriori decision rules. PNN is well suited to problems where not only predictions but also quantification of accuracy and integration of prior information are necessary. An important property of PNN is that they can be easily used in decision support systems dealing with problems of automatic classification. Support vector machine is an implementation of the principles of statistical learning theory for the classification tasks. Recently they were successfully applied for different environmental topics: classification of soil types and hydro-geological units, optimization of monitoring networks, susceptibility mapping of natural hazards. In the present paper both simulated and real data case studies (low and high dimensional) are considered. The main attention is paid to the detection and learning of spatial patterns by the algorithms applied.
Resumo:
BACKGROUND: The goal of this paper is to investigate the respective influence of work characteristics, the effort-reward ratio, and overcommitment on the poor mental health of out-of-hospital care providers. METHODS: 333 out-of-hospital care providers answered a questionnaire that included queries on mental health (GHQ-12), demographics, health-related information and work characteristics, questions from the Effort-Reward Imbalance Questionnaire, and items about overcommitment. A two-level multiple regression was performed between mental health (the dependent variable) and the effort-reward ratio, the overcommitment score, weekly number of interventions, percentage of non-prehospital transport of patients out of total missions, gender, and age. Participants were first-level units, and ambulance services were second-level units. We also shadowed ambulance personnel for a total of 416 hr. RESULTS: With cutoff points of 2/3 and 3/4 positive answers on the GHQ-12, the percentages of potential cases with poor mental health were 20% and 15%, respectively. The effort-reward ratio was associated with poor mental health (P < 0.001), irrespective of age or gender. Overcommitment was associated with poor mental health; this association was stronger in women (β = 0.054) than in men (β = 0.020). The percentage of prehospital missions out of total missions was only associated with poor mental health at the individual level. CONCLUSIONS: Emergency medical services should pay attention to the way employees perceive their efforts and the rewarding aspects of their work: an imbalance of those aspects is associated with poor mental health. Low perceived esteem appeared particularly associated with poor mental health. This suggests that supervisors of emergency medical services should enhance the value of their employees' work. Employees with overcommitment should also receive appropriate consideration. Preventive measures should target individual perceptions of effort and reward in order to improve mental health in prehospital care providers.
Resumo:
Because of the increase in workplace automation and the diversification of industrial processes, workplaces have become more and more complex. The classical approaches used to address workplace hazard concerns, such as checklists or sequence models, are, therefore, of limited use in such complex systems. Moreover, because of the multifaceted nature of workplaces, the use of single-oriented methods, such as AEA (man oriented), FMEA (system oriented), or HAZOP (process oriented), is not satisfactory. The use of a dynamic modeling approach in order to allow multiple-oriented analyses may constitute an alternative to overcome this limitation. The qualitative modeling aspects of the MORM (man-machine occupational risk modeling) model are discussed in this article. The model, realized on an object-oriented Petri net tool (CO-OPN), has been developed to simulate and analyze industrial processes in an OH&S perspective. The industrial process is modeled as a set of interconnected subnets (state spaces), which describe its constitutive machines. Process-related factors are introduced, in an explicit way, through machine interconnections and flow properties. While man-machine interactions are modeled as triggering events for the state spaces of the machines, the CREAM cognitive behavior model is used in order to establish the relevant triggering events. In the CO-OPN formalism, the model is expressed as a set of interconnected CO-OPN objects defined over data types expressing the measure attached to the flow of entities transiting through the machines. Constraints on the measures assigned to these entities are used to determine the state changes in each machine. Interconnecting machines implies the composition of such flow and consequently the interconnection of the measure constraints. This is reflected by the construction of constraint enrichment hierarchies, which can be used for simulation and analysis optimization in a clear mathematical framework. The use of Petri nets to perform multiple-oriented analysis opens perspectives in the field of industrial risk management. It may significantly reduce the duration of the assessment process. But, most of all, it opens perspectives in the field of risk comparisons and integrated risk management. Moreover, because of the generic nature of the model and tool used, the same concepts and patterns may be used to model a wide range of systems and application fields.
Resumo:
Automatic environmental monitoring networks enforced by wireless communication technologies provide large and ever increasing volumes of data nowadays. The use of this information in natural hazard research is an important issue. Particularly useful for risk assessment and decision making are the spatial maps of hazard-related parameters produced from point observations and available auxiliary information. The purpose of this article is to present and explore the appropriate tools to process large amounts of available data and produce predictions at fine spatial scales. These are the algorithms of machine learning, which are aimed at non-parametric robust modelling of non-linear dependencies from empirical data. The computational efficiency of the data-driven methods allows producing the prediction maps in real time which makes them superior to physical models for the operational use in risk assessment and mitigation. Particularly, this situation encounters in spatial prediction of climatic variables (topo-climatic mapping). In complex topographies of the mountainous regions, the meteorological processes are highly influenced by the relief. The article shows how these relations, possibly regionalized and non-linear, can be modelled from data using the information from digital elevation models. The particular illustration of the developed methodology concerns the mapping of temperatures (including the situations of Föhn and temperature inversion) given the measurements taken from the Swiss meteorological monitoring network. The range of the methods used in the study includes data-driven feature selection, support vector algorithms and artificial neural networks.
Resumo:
The choice network revenue management (RM) model incorporates customer purchase behavioras customers purchasing products with certain probabilities that are a function of the offeredassortment of products, and is the appropriate model for airline and hotel network revenuemanagement, dynamic sales of bundles, and dynamic assortment optimization. The underlyingstochastic dynamic program is intractable and even its certainty-equivalence approximation, inthe form of a linear program called Choice Deterministic Linear Program (CDLP) is difficultto solve in most cases. The separation problem for CDLP is NP-complete for MNL with justtwo segments when their consideration sets overlap; the affine approximation of the dynamicprogram is NP-complete for even a single-segment MNL. This is in contrast to the independentclass(perfect-segmentation) case where even the piecewise-linear approximation has been shownto be tractable. In this paper we investigate the piecewise-linear approximation for network RMunder a general discrete-choice model of demand. We show that the gap between the CDLP andthe piecewise-linear bounds is within a factor of at most 2. We then show that the piecewiselinearapproximation is polynomially-time solvable for a fixed consideration set size, bringing itinto the realm of tractability for small consideration sets; small consideration sets are a reasonablemodeling tradeoff in many practical applications. Our solution relies on showing that forany discrete-choice model the separation problem for the linear program of the piecewise-linearapproximation can be solved exactly by a Lagrangian relaxation. We give modeling extensionsand show by numerical experiments the improvements from using piecewise-linear approximationfunctions.
Resumo:
Modeling the mechanisms that determine how humans and other agents choose among different behavioral and cognitive processes-be they strategies, routines, actions, or operators-represents a paramount theoretical stumbling block across disciplines, ranging from the cognitive and decision sciences to economics, biology, and machine learning. By using the cognitive and decision sciences as a case study, we provide an introduction to what is also known as the strategy selection problem. First, we explain why many researchers assume humans and other animals to come equipped with a repertoire of behavioral and cognitive processes. Second, we expose three descriptive, predictive, and prescriptive challenges that are common to all disciplines which aim to model the choice among these processes. Third, we give an overview of different approaches to strategy selection. These include cost‐benefit, ecological, learning, memory, unified, connectionist, sequential sampling, and maximization approaches. We conclude by pointing to opportunities for future research and by stressing that the selection problem is far from being resolved.
Resumo:
The objective of this study was to adapt a nonlinear model (Wang and Engel - WE) for simulating the phenology of maize (Zea mays L.), and to evaluate this model and a linear one (thermal time), in order to predict developmental stages of a field-grown maize variety. A field experiment, during 2005/2006 and 2006/2007 was conducted in Santa Maria, RS, Brazil, in two growing seasons, with seven sowing dates each. Dates of emergence, silking, and physiological maturity of the maize variety BRS Missões were recorded in six replications in each sowing date. Data collected in 2005/2006 growing season were used to estimate the coefficients of the two models, and data collected in the 2006/2007 growing season were used as independent data set for model evaluations. The nonlinear WE model accurately predicted the date of silking and physiological maturity, and had a lower root mean square error (RMSE) than the linear (thermal time) model. The overall RMSE for silking and physiological maturity was 2.7 and 4.8 days with WE model, and 5.6 and 8.3 days with thermal time model, respectively.
Resumo:
Résumé Cette thèse est consacrée à l'analyse, la modélisation et la visualisation de données environnementales à référence spatiale à l'aide d'algorithmes d'apprentissage automatique (Machine Learning). L'apprentissage automatique peut être considéré au sens large comme une sous-catégorie de l'intelligence artificielle qui concerne particulièrement le développement de techniques et d'algorithmes permettant à une machine d'apprendre à partir de données. Dans cette thèse, les algorithmes d'apprentissage automatique sont adaptés pour être appliqués à des données environnementales et à la prédiction spatiale. Pourquoi l'apprentissage automatique ? Parce que la majorité des algorithmes d'apprentissage automatiques sont universels, adaptatifs, non-linéaires, robustes et efficaces pour la modélisation. Ils peuvent résoudre des problèmes de classification, de régression et de modélisation de densité de probabilités dans des espaces à haute dimension, composés de variables informatives spatialisées (« géo-features ») en plus des coordonnées géographiques. De plus, ils sont idéaux pour être implémentés en tant qu'outils d'aide à la décision pour des questions environnementales allant de la reconnaissance de pattern à la modélisation et la prédiction en passant par la cartographie automatique. Leur efficacité est comparable au modèles géostatistiques dans l'espace des coordonnées géographiques, mais ils sont indispensables pour des données à hautes dimensions incluant des géo-features. Les algorithmes d'apprentissage automatique les plus importants et les plus populaires sont présentés théoriquement et implémentés sous forme de logiciels pour les sciences environnementales. Les principaux algorithmes décrits sont le Perceptron multicouches (MultiLayer Perceptron, MLP) - l'algorithme le plus connu dans l'intelligence artificielle, le réseau de neurones de régression généralisée (General Regression Neural Networks, GRNN), le réseau de neurones probabiliste (Probabilistic Neural Networks, PNN), les cartes auto-organisées (SelfOrganized Maps, SOM), les modèles à mixture Gaussiennes (Gaussian Mixture Models, GMM), les réseaux à fonctions de base radiales (Radial Basis Functions Networks, RBF) et les réseaux à mixture de densité (Mixture Density Networks, MDN). Cette gamme d'algorithmes permet de couvrir des tâches variées telle que la classification, la régression ou l'estimation de densité de probabilité. L'analyse exploratoire des données (Exploratory Data Analysis, EDA) est le premier pas de toute analyse de données. Dans cette thèse les concepts d'analyse exploratoire de données spatiales (Exploratory Spatial Data Analysis, ESDA) sont traités selon l'approche traditionnelle de la géostatistique avec la variographie expérimentale et selon les principes de l'apprentissage automatique. La variographie expérimentale, qui étudie les relations entre pairs de points, est un outil de base pour l'analyse géostatistique de corrélations spatiales anisotropiques qui permet de détecter la présence de patterns spatiaux descriptible par une statistique. L'approche de l'apprentissage automatique pour l'ESDA est présentée à travers l'application de la méthode des k plus proches voisins qui est très simple et possède d'excellentes qualités d'interprétation et de visualisation. Une part importante de la thèse traite de sujets d'actualité comme la cartographie automatique de données spatiales. Le réseau de neurones de régression généralisée est proposé pour résoudre cette tâche efficacement. Les performances du GRNN sont démontrées par des données de Comparaison d'Interpolation Spatiale (SIC) de 2004 pour lesquelles le GRNN bat significativement toutes les autres méthodes, particulièrement lors de situations d'urgence. La thèse est composée de quatre chapitres : théorie, applications, outils logiciels et des exemples guidés. Une partie importante du travail consiste en une collection de logiciels : Machine Learning Office. Cette collection de logiciels a été développée durant les 15 dernières années et a été utilisée pour l'enseignement de nombreux cours, dont des workshops internationaux en Chine, France, Italie, Irlande et Suisse ainsi que dans des projets de recherche fondamentaux et appliqués. Les cas d'études considérés couvrent un vaste spectre de problèmes géoenvironnementaux réels à basse et haute dimensionnalité, tels que la pollution de l'air, du sol et de l'eau par des produits radioactifs et des métaux lourds, la classification de types de sols et d'unités hydrogéologiques, la cartographie des incertitudes pour l'aide à la décision et l'estimation de risques naturels (glissements de terrain, avalanches). Des outils complémentaires pour l'analyse exploratoire des données et la visualisation ont également été développés en prenant soin de créer une interface conviviale et facile à l'utilisation. Machine Learning for geospatial data: algorithms, software tools and case studies Abstract The thesis is devoted to the analysis, modeling and visualisation of spatial environmental data using machine learning algorithms. In a broad sense machine learning can be considered as a subfield of artificial intelligence. It mainly concerns with the development of techniques and algorithms that allow computers to learn from data. In this thesis machine learning algorithms are adapted to learn from spatial environmental data and to make spatial predictions. Why machine learning? In few words most of machine learning algorithms are universal, adaptive, nonlinear, robust and efficient modeling tools. They can find solutions for the classification, regression, and probability density modeling problems in high-dimensional geo-feature spaces, composed of geographical space and additional relevant spatially referenced features. They are well-suited to be implemented as predictive engines in decision support systems, for the purposes of environmental data mining including pattern recognition, modeling and predictions as well as automatic data mapping. They have competitive efficiency to the geostatistical models in low dimensional geographical spaces but are indispensable in high-dimensional geo-feature spaces. The most important and popular machine learning algorithms and models interesting for geo- and environmental sciences are presented in details: from theoretical description of the concepts to the software implementation. The main algorithms and models considered are the following: multi-layer perceptron (a workhorse of machine learning), general regression neural networks, probabilistic neural networks, self-organising (Kohonen) maps, Gaussian mixture models, radial basis functions networks, mixture density networks. This set of models covers machine learning tasks such as classification, regression, and density estimation. Exploratory data analysis (EDA) is initial and very important part of data analysis. In this thesis the concepts of exploratory spatial data analysis (ESDA) is considered using both traditional geostatistical approach such as_experimental variography and machine learning. Experimental variography is a basic tool for geostatistical analysis of anisotropic spatial correlations which helps to understand the presence of spatial patterns, at least described by two-point statistics. A machine learning approach for ESDA is presented by applying the k-nearest neighbors (k-NN) method which is simple and has very good interpretation and visualization properties. Important part of the thesis deals with a hot topic of nowadays, namely, an automatic mapping of geospatial data. General regression neural networks (GRNN) is proposed as efficient model to solve this task. Performance of the GRNN model is demonstrated on Spatial Interpolation Comparison (SIC) 2004 data where GRNN model significantly outperformed all other approaches, especially in case of emergency conditions. The thesis consists of four chapters and has the following structure: theory, applications, software tools, and how-to-do-it examples. An important part of the work is a collection of software tools - Machine Learning Office. Machine Learning Office tools were developed during last 15 years and was used both for many teaching courses, including international workshops in China, France, Italy, Ireland, Switzerland and for realizing fundamental and applied research projects. Case studies considered cover wide spectrum of the real-life low and high-dimensional geo- and environmental problems, such as air, soil and water pollution by radionuclides and heavy metals, soil types and hydro-geological units classification, decision-oriented mapping with uncertainties, natural hazards (landslides, avalanches) assessments and susceptibility mapping. Complementary tools useful for the exploratory data analysis and visualisation were developed as well. The software is user friendly and easy to use.