875 resultados para modeling of data sources
Resumo:
��The number of people suffering dementia will triple in the next 40 years, according to a new study by the World Health Organization, leading to catastrophic social and financial costs. Dementia, a brain illness that affects memory, behavior and the ability to perform even common tasks, affects mostly older people; Alzheimer's causes many cases. Read the report:Global burden of dementia in the year 2050: summary of methods and data sources
Resumo:
This paper presents a review of methodology for semi-supervised modeling with kernel methods, when the manifold assumption is guaranteed to be satisfied. It concerns environmental data modeling on natural manifolds, such as complex topographies of the mountainous regions, where environmental processes are highly influenced by the relief. These relations, possibly regionalized and nonlinear, can be modeled from data with machine learning using the digital elevation models in semi-supervised kernel methods. The range of the tools and methodological issues discussed in the study includes feature selection and semisupervised Support Vector algorithms. The real case study devoted to data-driven modeling of meteorological fields illustrates the discussed approach.
Resumo:
Yosemite Valley poses significant rockfall hazard and related risk due to its glacially steepened walls and approximately 4 million visitors annually. To assess rockfall hazard, it is necessary to evaluate the geologic structure that contributes to the destabilization of rockfall sources and locate the most probable future source areas. Coupling new remote sensing techniques (Terrestrial Laser Scanning, Aerial Laser Scanning) and traditional field surveys, we investigated the regional geologic and structural setting, the orientation of the primary discontinuity sets for large areas of Yosemite Valley, and the specific discontinuity sets present at active rockfall sources. This information, combined with better understanding of the geologic processes that contribute to the progressive destabilization and triggering of granitic rock slabs, contributes to a more accurate rockfall susceptibility assessment for Yosemite Valley and elsewhere.
Resumo:
The present research deals with an important public health threat, which is the pollution created by radon gas accumulation inside dwellings. The spatial modeling of indoor radon in Switzerland is particularly complex and challenging because of many influencing factors that should be taken into account. Indoor radon data analysis must be addressed from both a statistical and a spatial point of view. As a multivariate process, it was important at first to define the influence of each factor. In particular, it was important to define the influence of geology as being closely associated to indoor radon. This association was indeed observed for the Swiss data but not probed to be the sole determinant for the spatial modeling. The statistical analysis of data, both at univariate and multivariate level, was followed by an exploratory spatial analysis. Many tools proposed in the literature were tested and adapted, including fractality, declustering and moving windows methods. The use of Quan-tité Morisita Index (QMI) as a procedure to evaluate data clustering in function of the radon level was proposed. The existing methods of declustering were revised and applied in an attempt to approach the global histogram parameters. The exploratory phase comes along with the definition of multiple scales of interest for indoor radon mapping in Switzerland. The analysis was done with a top-to-down resolution approach, from regional to local lev¬els in order to find the appropriate scales for modeling. In this sense, data partition was optimized in order to cope with stationary conditions of geostatistical models. Common methods of spatial modeling such as Κ Nearest Neighbors (KNN), variography and General Regression Neural Networks (GRNN) were proposed as exploratory tools. In the following section, different spatial interpolation methods were applied for a par-ticular dataset. A bottom to top method complexity approach was adopted and the results were analyzed together in order to find common definitions of continuity and neighborhood parameters. Additionally, a data filter based on cross-validation was tested with the purpose of reducing noise at local scale (the CVMF). At the end of the chapter, a series of test for data consistency and methods robustness were performed. This lead to conclude about the importance of data splitting and the limitation of generalization methods for reproducing statistical distributions. The last section was dedicated to modeling methods with probabilistic interpretations. Data transformation and simulations thus allowed the use of multigaussian models and helped take the indoor radon pollution data uncertainty into consideration. The catego-rization transform was presented as a solution for extreme values modeling through clas-sification. Simulation scenarios were proposed, including an alternative proposal for the reproduction of the global histogram based on the sampling domain. The sequential Gaussian simulation (SGS) was presented as the method giving the most complete information, while classification performed in a more robust way. An error measure was defined in relation to the decision function for data classification hardening. Within the classification methods, probabilistic neural networks (PNN) show to be better adapted for modeling of high threshold categorization and for automation. Support vector machines (SVM) on the contrary performed well under balanced category conditions. In general, it was concluded that a particular prediction or estimation method is not better under all conditions of scale and neighborhood definitions. Simulations should be the basis, while other methods can provide complementary information to accomplish an efficient indoor radon decision making.
Resumo:
Panel data can be arranged into a matrix in two ways, called 'long' and 'wide' formats (LFand WF). The two formats suggest two alternative model approaches for analyzing paneldata: (i) univariate regression with varying intercept; and (ii) multivariate regression withlatent variables (a particular case of structural equation model, SEM). The present papercompares the two approaches showing in which circumstances they yield equivalent?insome cases, even numerically equal?results. We show that the univariate approach givesresults equivalent to the multivariate approach when restrictions of time invariance (inthe paper, the TI assumption) are imposed on the parameters of the multivariate model.It is shown that the restrictions implicit in the univariate approach can be assessed bychi-square difference testing of two nested multivariate models. In addition, commontests encountered in the econometric analysis of panel data, such as the Hausman test, areshown to have an equivalent representation as chi-square difference tests. Commonalitiesand differences between the univariate and multivariate approaches are illustrated usingan empirical panel data set of firms' profitability as well as a simulated panel data.
Resumo:
Longitudinal surveys are increasingly used to collect event history data on person-specific processes such as transitions between labour market states. Surveybased event history data pose a number of challenges for statistical analysis. These challenges include survey errors due to sampling, non-response, attrition and measurement. This study deals with non-response, attrition and measurement errors in event history data and the bias caused by them in event history analysis. The study also discusses some choices faced by a researcher using longitudinal survey data for event history analysis and demonstrates their effects. These choices include, whether a design-based or a model-based approach is taken, which subset of data to use and, if a design-based approach is taken, which weights to use. The study takes advantage of the possibility to use combined longitudinal survey register data. The Finnish subset of European Community Household Panel (FI ECHP) survey for waves 1–5 were linked at person-level with longitudinal register data. Unemployment spells were used as study variables of interest. Lastly, a simulation study was conducted in order to assess the statistical properties of the Inverse Probability of Censoring Weighting (IPCW) method in a survey data context. The study shows how combined longitudinal survey register data can be used to analyse and compare the non-response and attrition processes, test the missingness mechanism type and estimate the size of bias due to non-response and attrition. In our empirical analysis, initial non-response turned out to be a more important source of bias than attrition. Reported unemployment spells were subject to seam effects, omissions, and, to a lesser extent, overreporting. The use of proxy interviews tended to cause spell omissions. An often-ignored phenomenon classification error in reported spell outcomes, was also found in the data. Neither the Missing At Random (MAR) assumption about non-response and attrition mechanisms, nor the classical assumptions about measurement errors, turned out to be valid. Both measurement errors in spell durations and spell outcomes were found to cause bias in estimates from event history models. Low measurement accuracy affected the estimates of baseline hazard most. The design-based estimates based on data from respondents to all waves of interest and weighted by the last wave weights displayed the largest bias. Using all the available data, including the spells by attriters until the time of attrition, helped to reduce attrition bias. Lastly, the simulation study showed that the IPCW correction to design weights reduces bias due to dependent censoring in design-based Kaplan-Meier and Cox proportional hazard model estimators. The study discusses implications of the results for survey organisations collecting event history data, researchers using surveys for event history analysis, and researchers who develop methods to correct for non-sampling biases in event history data.
Resumo:
Wenn man die Existenz von physikalischen Mechanismen ignoriert, die für die Struktur hydrologischer Zeitreihen verantwortlich sind, kann das zu falschen Schlussfolgerungen bzgl. des Vorhandenseins möglicher Gedächtnis (memory) -Effekte, d.h. von Persistenz, führen. Die hier vorgelegte Doktorarbeit spürt der niedrigfrequenten klimatischen Variabilität innerhalb den hydrologischen Zyklus nach und bietet auf dieser "Reise" neue Einsichten in die Transformation der charakteristischen Eigenschaften von Zeitreihen mit einem Langzeitgedächtnis. Diese Studie vereint statistische Methoden der Zeitreihenanalyse mit empirisch-basierten Modelltechniken, um operative Modelle zu entwickeln, die in der Lage sind (1) die Dynamik des Abflusses zu modellieren, (2) sein zukünftiges Verhalten zu prognostizieren und (3) die Abflusszeitreihen an unbeobachteten Stellen abzuschätzen. Als solches präsentiert die hier vorgelegte Dissertation eine ausführliche Untersuchung zu den Ursachen der niedrigfrequenten Variabilität von hydrologischen Zeitreihen im deutschen Teil des Elbe-Einzugsgebietes, den Folgen dieser Variabilität und den physikalisch basierten Reaktionen von Oberflächen- und Grundwassermodellen auf die niedrigfrequenten Niederschlags-Eingangsganglinien. Die Doktorarbeit gliedert sich wie folgt: In Kapitel 1 wird als Hintergrundinformation das Hurst Phänomen beschrieben und ein kurzer Rückblick auf diesbezügliche Studien gegeben. Das Kapitel 2 diskutiert den Einfluss der Präsenz von niedrigfrequenten periodischen Zeitreihen auf die Zuverlässigkeit verschiedener Hurst-Parameter-Schätztechniken. Kapitel 3 korreliert die niedrigfrequente Niederschlagsvariabilität mit dem Index der Nord-Atlantischen Ozillations (NAO). Kapitel 4-6 sind auf den deutschen Teil des Elbe-Einzugsgebietes fokussiert. So werden in Kapitel 4 die niedrigfrequenten Variabilitäten der unterschiedlichen hydro-meteorologischen Parameter untersucht und es werden Modelle beschrieben, die die Dynamik dieser Niedrigfrequenzen und deren zukünftiges Verhalten simulieren. Kapitel 5 diskutiert die mögliche Anwendung der Ergebnisse für die charakteristische Skalen und die Verfahren der Analyse der zeitlichen Variabilität auf praktische Fragestellungen im Wasserbau sowie auf die zeitliche Bestimmung des Gebiets-Abflusses an unbeobachteten Stellen. Kapitel 6 verfolgt die Spur der Niedrigfrequenzzyklen im Niederschlag durch die einzelnen Komponenten des hydrologischen Zyklus, nämlich dem Direktabfluss, dem Basisabfluss, der Grundwasserströmung und dem Gebiets-Abfluss durch empirische Modellierung. Die Schlussfolgerungen werden im Kapitel 7 präsentiert. In einem Anhang werden technische Einzelheiten zu den verwendeten statistischen Methoden und die entwickelten Software-Tools beschrieben.
Resumo:
In this study, we systematically compare a wide range of observational and numerical precipitation datasets for Central Asia. Data considered include two re-analyses, three datasets based on direct observations, and the output of a regional climate model simulation driven by a global re-analysis. These are validated and intercompared with respect to their ability to represent the Central Asian precipitation climate. In each of the datasets, we consider the mean spatial distribution and the seasonal cycle of precipitation, the amplitude of interannual variability, the representation of individual yearly anomalies, the precipitation sensitivity (i.e. the response to wet and dry conditions), and the temporal homogeneity of precipitation. Additionally, we carried out part of these analyses for datasets available in real time. The mutual agreement between the observations is used as an indication of how far these data can be used for validating precipitation data from other sources. In particular, we show that the observations usually agree qualitatively on anomalies in individual years while it is not always possible to use them for the quantitative validation of the amplitude of interannual variability. The regional climate model is capable of improving the spatial distribution of precipitation. At the same time, it strongly underestimates summer precipitation and its variability, while interannual variations are well represented during the other seasons, in particular in the Central Asian mountains during winter and spring
Resumo:
Specific choices about how to represent complex networks can have a substantial impact on the execution time required for the respective construction and analysis of those structures. In this work we report a comparison of the effects of representing complex networks statically by adjacency matrices or dynamically by adjacency lists. Three theoretical models of complex networks are considered: two types of Erdos-Renyi as well as the Barabasi-Albert model. We investigated the effect of the different representations with respect to the construction and measurement of several topological properties (i.e. degree, clustering coefficient, shortest path length, and betweenness centrality). We found that different forms of representation generally have a substantial effect on the execution time, with the sparse representation frequently resulting in remarkably superior performance. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Includes bibliography