63 resultados para Data Modelling

em Universit


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper investigates the use of ensemble of predictors in order to improve the performance of spatial prediction methods. Support vector regression (SVR), a popular method from the field of statistical machine learning, is used. Several instances of SVR are combined using different data sampling schemes (bagging and boosting). Bagging shows good performance, and proves to be more computationally efficient than training a single SVR model while reducing error. Boosting, however, does not improve results on this specific problem.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The algorithmic approach to data modelling has developed rapidly these last years, in particular methods based on data mining and machine learning have been used in a growing number of applications. These methods follow a data-driven methodology, aiming at providing the best possible generalization and predictive abilities instead of concentrating on the properties of the data model. One of the most successful groups of such methods is known as Support Vector algorithms. Following the fruitful developments in applying Support Vector algorithms to spatial data, this paper introduces a new extension of the traditional support vector regression (SVR) algorithm. This extension allows for the simultaneous modelling of environmental data at several spatial scales. The joint influence of environmental processes presenting different patterns at different scales is here learned automatically from data, providing the optimum mixture of short and large-scale models. The method is adaptive to the spatial scale of the data. With this advantage, it can provide efficient means to model local anomalies that may typically arise in situations at an early phase of an environmental emergency. However, the proposed approach still requires some prior knowledge on the possible existence of such short-scale patterns. This is a possible limitation of the method for its implementation in early warning systems. The purpose of this paper is to present the multi-scale SVR model and to illustrate its use with an application to the mapping of Cs137 activity given the measurements taken in the region of Briansk following the Chernobyl accident.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Interviewer performance with respect to convincing sample members to participate in surveys is an important dimension of survey quality. However, unlike in CAPI surveys where each sample case 'belongs' to one interviewer, there are hardly any good measures of interview performance for centralised CATI surveys, where even single contacts are assigned to interviewers at random. If more than one interviewer works one sample case, it is not clear how to attribute success or failure to the interviewers involved. In this article, we propose two correlated methods to measure interviewer contact performance in centralised CATI surveys. Their modelling must take complex multilevel clustering effects, which need not be hierarchical, into account. Results are consistent with findings from CAPI data modelling, and we find that when comparing effects with a direct ('naive') measure of interviewer contact results, interviewer random effects are largely underestimated using the naive measure.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Context There are no evidence syntheses available to guide clinicians on when to titrate antihypertensive medication after initiation. Objective To model the blood pressure (BP) response after initiating antihypertensive medication. Data sources electronic databases including Medline, Embase, Cochrane Register and reference lists up to December 2009. Study selection Trials that initiated antihypertensive medication as single therapy in hypertensive patients who were either drug naive or had a placebo washout from previous drugs. Data extraction Office BP measurements at a minimum of two weekly intervals for a minimum of 4 weeks. An asymptotic approach model of BP response was assumed and non-linear mixed effects modelling used to calculate model parameters. Results and conclusions Eighteen trials that recruited 4168 patients met inclusion criteria. The time to reach 50% of the maximum estimated BP lowering effect was 1 week (systolic 0.91 weeks, 95% CI 0.74 to 1.10; diastolic 0.95, 0.75 to 1.15). Models incorporating drug class as a source of variability did not improve fit of the data. Incorporating the presence of a titration schedule improved model fit for both systolic and diastolic pressure. Titration increased both the predicted maximum effect and the time taken to reach 50% of the maximum (systolic 1.2 vs 0.7 weeks; diastolic 1.4 vs 0.7 weeks). Conclusions Estimates of the maximum efficacy of antihypertensive agents can be made early after starting therapy. This knowledge will guide clinicians in deciding when a newly started antihypertensive agent is likely to be effective or not at controlling BP.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The research considers the problem of spatial data classification using machine learning algorithms: probabilistic neural networks (PNN) and support vector machines (SVM). As a benchmark model simple k-nearest neighbor algorithm is considered. PNN is a neural network reformulation of well known nonparametric principles of probability density modeling using kernel density estimator and Bayesian optimal or maximum a posteriori decision rules. PNN is well suited to problems where not only predictions but also quantification of accuracy and integration of prior information are necessary. An important property of PNN is that they can be easily used in decision support systems dealing with problems of automatic classification. Support vector machine is an implementation of the principles of statistical learning theory for the classification tasks. Recently they were successfully applied for different environmental topics: classification of soil types and hydro-geological units, optimization of monitoring networks, susceptibility mapping of natural hazards. In the present paper both simulated and real data case studies (low and high dimensional) are considered. The main attention is paid to the detection and learning of spatial patterns by the algorithms applied.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The proportion of population living in or around cites is more important than ever. Urban sprawl and car dependence have taken over the pedestrian-friendly compact city. Environmental problems like air pollution, land waste or noise, and health problems are the result of this still continuing process. The urban planners have to find solutions to these complex problems, and at the same time insure the economic performance of the city and its surroundings. At the same time, an increasing quantity of socio-economic and environmental data is acquired. In order to get a better understanding of the processes and phenomena taking place in the complex urban environment, these data should be analysed. Numerous methods for modelling and simulating such a system exist and are still under development and can be exploited by the urban geographers for improving our understanding of the urban metabolism. Modern and innovative visualisation techniques help in communicating the results of such models and simulations. This thesis covers several methods for analysis, modelling, simulation and visualisation of problems related to urban geography. The analysis of high dimensional socio-economic data using artificial neural network techniques, especially self-organising maps, is showed using two examples at different scales. The problem of spatiotemporal modelling and data representation is treated and some possible solutions are shown. The simulation of urban dynamics and more specifically the traffic due to commuting to work is illustrated using multi-agent micro-simulation techniques. A section on visualisation methods presents cartograms for transforming the geographic space into a feature space, and the distance circle map, a centre-based map representation particularly useful for urban agglomerations. Some issues on the importance of scale in urban analysis and clustering of urban phenomena are exposed. A new approach on how to define urban areas at different scales is developed, and the link with percolation theory established. Fractal statistics, especially the lacunarity measure, and scale laws are used for characterising urban clusters. In a last section, the population evolution is modelled using a model close to the well-established gravity model. The work covers quite a wide range of methods useful in urban geography. Methods should still be developed further and at the same time find their way into the daily work and decision process of urban planners. La part de personnes vivant dans une région urbaine est plus élevé que jamais et continue à croître. L'étalement urbain et la dépendance automobile ont supplanté la ville compacte adaptée aux piétons. La pollution de l'air, le gaspillage du sol, le bruit, et des problèmes de santé pour les habitants en sont la conséquence. Les urbanistes doivent trouver, ensemble avec toute la société, des solutions à ces problèmes complexes. En même temps, il faut assurer la performance économique de la ville et de sa région. Actuellement, une quantité grandissante de données socio-économiques et environnementales est récoltée. Pour mieux comprendre les processus et phénomènes du système complexe "ville", ces données doivent être traitées et analysées. Des nombreuses méthodes pour modéliser et simuler un tel système existent et sont continuellement en développement. Elles peuvent être exploitées par le géographe urbain pour améliorer sa connaissance du métabolisme urbain. Des techniques modernes et innovatrices de visualisation aident dans la communication des résultats de tels modèles et simulations. Cette thèse décrit plusieurs méthodes permettant d'analyser, de modéliser, de simuler et de visualiser des phénomènes urbains. L'analyse de données socio-économiques à très haute dimension à l'aide de réseaux de neurones artificiels, notamment des cartes auto-organisatrices, est montré à travers deux exemples aux échelles différentes. Le problème de modélisation spatio-temporelle et de représentation des données est discuté et quelques ébauches de solutions esquissées. La simulation de la dynamique urbaine, et plus spécifiquement du trafic automobile engendré par les pendulaires est illustrée à l'aide d'une simulation multi-agents. Une section sur les méthodes de visualisation montre des cartes en anamorphoses permettant de transformer l'espace géographique en espace fonctionnel. Un autre type de carte, les cartes circulaires, est présenté. Ce type de carte est particulièrement utile pour les agglomérations urbaines. Quelques questions liées à l'importance de l'échelle dans l'analyse urbaine sont également discutées. Une nouvelle approche pour définir des clusters urbains à des échelles différentes est développée, et le lien avec la théorie de la percolation est établi. Des statistiques fractales, notamment la lacunarité, sont utilisées pour caractériser ces clusters urbains. L'évolution de la population est modélisée à l'aide d'un modèle proche du modèle gravitaire bien connu. Le travail couvre une large panoplie de méthodes utiles en géographie urbaine. Toutefois, il est toujours nécessaire de développer plus loin ces méthodes et en même temps, elles doivent trouver leur chemin dans la vie quotidienne des urbanistes et planificateurs.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

BACKGROUND: Most available pharmacotherapies for alcohol-dependent patients target abstinence; however, reduced alcohol consumption may be a more realistic goal. Using randomized clinical trial (RCT) data, a previous microsimulation model evaluated the clinical relevance of reduced consumption in terms of avoided alcohol-attributable events. Using real-life observational data, the current analysis aimed to adapt the model and confirm previous findings about the clinical relevance of reduced alcohol consumption. METHODS: Based on the prospective observational CONTROL study, evaluating daily alcohol consumption among alcohol-dependent patients, the model predicted the probability of drinking any alcohol during a given day. Predicted daily alcohol consumption was simulated in a hypothetical sample of 200,000 patients observed over a year. Individual total alcohol consumption (TAC) and number of heavy drinking days (HDD) were derived. Using published risk equations, probabilities of alcohol-attributable adverse health events (e.g., hospitalizations or death) corresponding to simulated consumptions were computed, and aggregated for categories of patients defined by HDDs and TAC (expressed per 100,000 patient-years). Sensitivity analyses tested model robustness. RESULTS: Shifting from >220 HDDs per year to 120-140 HDDs and shifting from 36,000-39,000 g TAC per year (120-130 g/day) to 15,000-18,000 g TAC per year (50-60 g/day) impacted substantially on the incidence of events (14,588 and 6148 events avoided per 100,000 patient-years, respectively). Results were robust to sensitivity analyses. CONCLUSIONS: This study corroborates the previous microsimulation modeling approach and, using real-life data, confirms RCT-based findings that reduced alcohol consumption is a relevant objective for consideration in alcohol dependence management to improve public health.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

1. Model-based approaches have been used increasingly in conservation biology over recent years. Species presence data used for predictive species distribution modelling are abundant in natural history collections, whereas reliable absence data are sparse, most notably for vagrant species such as butterflies and snakes. As predictive methods such as generalized linear models (GLM) require absence data, various strategies have been proposed to select pseudo-absence data. However, only a few studies exist that compare different approaches to generating these pseudo-absence data. 2. Natural history collection data are usually available for long periods of time (decades or even centuries), thus allowing historical considerations. However, this historical dimension has rarely been assessed in studies of species distribution, although there is great potential for understanding current patterns, i.e. the past is the key to the present. 3. We used GLM to model the distributions of three 'target' butterfly species, Melitaea didyma, Coenonympha tullia and Maculinea teleius, in Switzerland. We developed and compared four strategies for defining pools of pseudo-absence data and applied them to natural history collection data from the last 10, 30 and 100 years. Pools included: (i) sites without target species records; (ii) sites where butterfly species other than the target species were present; (iii) sites without butterfly species but with habitat characteristics similar to those required by the target species; and (iv) a combination of the second and third strategies. Models were evaluated and compared by the total deviance explained, the maximized Kappa and the area under the curve (AUC). 4. Among the four strategies, model performance was best for strategy 3. Contrary to expectations, strategy 2 resulted in even lower model performance compared with models with pseudo-absence data simulated totally at random (strategy 1). 5. Independent of the strategy model, performance was enhanced when sites with historical species presence data were not considered as pseudo-absence data. Therefore, the combination of strategy 3 with species records from the last 100 years achieved the highest model performance. 6. Synthesis and applications. The protection of suitable habitat for species survival or reintroduction in rapidly changing landscapes is a high priority among conservationists. Model-based approaches offer planning authorities the possibility of delimiting priority areas for species detection or habitat protection. The performance of these models can be enhanced by fitting them with pseudo-absence data relying on large archives of natural history collection species presence data rather than using randomly sampled pseudo-absence data.