918 resultados para Spatial analysis statistics -- Data processing


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: In a classical study, Durkheim mapped suicide rates, wealth, and low family density and realized that they clustered in northern France. Assessing others variables, such as religious society, he constructed a framework for the analysis of the suicide, which still allows international comparisons using the same basic methodology. The present study aims to identify possible significantly clusters of suicide in the city of Sao Paulo, and then, verify their statistical associations with socio-economic and cultural characteristics. Methods: A spatial scan statistical test was performed to analyze the geographical pattern of suicide deaths of residents in the city of Sao Paulo by Administrative District, from 1996 to 2005. Relative risks and high and/or low clusters were calculated accounting for gender and age as co-variates, were analyzed using spatial scan statistics to identify geographical patterns. Logistic regression was used to estimate associations with socioeconomic variables, considering, the spatial cluster of high suicide rates as the response variable. Drawing from Durkheim's original work, current World Health Organization (WHO) reports and recent reviews, the following independent variables were considered: marital status, income, education, religion, and migration. Results: The mean suicide rate was 4.1/100,000 inhabitant-years. Against this baseline, two clusters were identified: the first, of increased risk (RR = 1.66), comprising 18 districts in the central region; the second, of decreased risk (RR = 0.78), including 14 districts in the southern region. The downtown area toward the southwestern region of the city displayed the highest risk for suicide, and though the overall risk may be considered low, the rate climbs up to an intermediate level in this region. One logistic regression analysis contrasted the risk cluster (18 districts) against the other remaining 78 districts, testing the effects of socioeconomic-cultural variables. The following categories of proportion of persons within the clusters were identified as risk factors: singles (OR = 2.36), migrants (OR = 1.50), Catholics (OR = 1.37) and higher income (OR = 1.06). In a second logistic model, likewise conceived, the following categories of proportion of persons were identified as protective factors: married (OR = 0.49) and Evangelical (OR = 0.60). Conclusions: This risk/ protection profile is in accordance with the interpretation that, as a social phenomenon, suicide is related to social isolation. Thus, the classical framework put forward by Durkheim seems to still hold, even though its categorical expression requires re-interpretation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The present PhD thesis was focused on the development and application of chemical methodology (Py-GC-MS) and data-processing method by multivariate data analysis (chemometrics). The chromatographic and mass spectrometric data obtained with this technique are particularly suitable to be interpreted by chemometric methods such as PCA (Principal Component Analysis) as regards data exploration and SIMCA (Soft Independent Models of Class Analogy) for the classification. As a first approach, some issues related to the field of cultural heritage were discussed with a particular attention to the differentiation of binders used in pictorial field. A marker of egg tempera the phosphoric acid esterified, a pyrolysis product of lecithin, was determined using HMDS (hexamethyldisilazane) rather than the TMAH (tetramethylammonium hydroxide) as a derivatizing reagent. The validity of analytical pyrolysis as tool to characterize and classify different types of bacteria was verified. The FAMEs chromatographic profiles represent an important tool for the bacterial identification. Because of the complexity of the chromatograms, it was possible to characterize the bacteria only according to their genus, while the differentiation at the species level has been achieved by means of chemometric analysis. To perform this study, normalized areas peaks relevant to fatty acids were taken into account. Chemometric methods were applied to experimental datasets. The obtained results demonstrate the effectiveness of analytical pyrolysis and chemometric analysis for the rapid characterization of bacterial species. Application to a samples of bacterial (Pseudomonas Mendocina), fungal (Pleorotus ostreatus) and mixed- biofilms was also performed. A comparison with the chromatographic profiles established the possibility to: • Differentiate the bacterial and fungal biofilms according to the (FAMEs) profile. • Characterize the fungal biofilm by means the typical pattern of pyrolytic fragments derived from saccharides present in the cell wall. • Individuate the markers of bacterial and fungal biofilm in the same mixed-biofilm sample.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Advances in biomedical signal acquisition systems for motion analysis have led to lowcost and ubiquitous wearable sensors which can be used to record movement data in different settings. This implies the potential availability of large amounts of quantitative data. It is then crucial to identify and to extract the information of clinical relevance from the large amount of available data. This quantitative and objective information can be an important aid for clinical decision making. Data mining is the process of discovering such information in databases through data processing, selection of informative data, and identification of relevant patterns. The databases considered in this thesis store motion data from wearable sensors (specifically accelerometers) and clinical information (clinical data, scores, tests). The main goal of this thesis is to develop data mining tools which can provide quantitative information to the clinician in the field of movement disorders. This thesis will focus on motor impairment in Parkinson's disease (PD). Different databases related to Parkinson subjects in different stages of the disease were considered for this thesis. Each database is characterized by the data recorded during a specific motor task performed by different groups of subjects. The data mining techniques that were used in this thesis are feature selection (a technique which was used to find relevant information and to discard useless or redundant data), classification, clustering, and regression. The aims were to identify high risk subjects for PD, characterize the differences between early PD subjects and healthy ones, characterize PD subtypes and automatically assess the severity of symptoms in the home setting.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis presents several data processing and compression techniques capable of addressing the strict requirements of wireless sensor networks. After introducing a general overview of sensor networks, the energy problem is introduced, dividing the different energy reduction approaches according to the different subsystem they try to optimize. To manage the complexity brought by these techniques, a quick overview of the most common middlewares for WSNs is given, describing in detail SPINE2, a framework for data processing in the node environment. The focus is then shifted on the in-network aggregation techniques, used to reduce data sent by the network nodes trying to prolong the network lifetime as long as possible. Among the several techniques, the most promising approach is the Compressive Sensing (CS). To investigate this technique, a practical implementation of the algorithm is compared against a simpler aggregation scheme, deriving a mixed algorithm able to successfully reduce the power consumption. The analysis moves from compression implemented on single nodes to CS for signal ensembles, trying to exploit the correlations among sensors and nodes to improve compression and reconstruction quality. The two main techniques for signal ensembles, Distributed CS (DCS) and Kronecker CS (KCS), are introduced and compared against a common set of data gathered by real deployments. The best trade-off between reconstruction quality and power consumption is then investigated. The usage of CS is also addressed when the signal of interest is sampled at a Sub-Nyquist rate, evaluating the reconstruction performance. Finally the group sparsity CS (GS-CS) is compared to another well-known technique for reconstruction of signals from an highly sub-sampled version. These two frameworks are compared again against a real data-set and an insightful analysis of the trade-off between reconstruction quality and lifetime is given.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The advances that have been characterizing spatial econometrics in recent years are mostly theoretical and have not found an extensive empirical application yet. In this work we aim at supplying a review of the main tools of spatial econometrics and to show an empirical application for one of the most recently introduced estimators. Despite the numerous alternatives that the econometric theory provides for the treatment of spatial (and spatiotemporal) data, empirical analyses are still limited by the lack of availability of the correspondent routines in statistical and econometric software. Spatiotemporal modeling represents one of the most recent developments in spatial econometric theory and the finite sample properties of the estimators that have been proposed are currently being tested in the literature. We provide a comparison between some estimators (a quasi-maximum likelihood, QML, estimator and some GMM-type estimators) for a fixed effects dynamic panel data model under certain conditions, by means of a Monte Carlo simulation analysis. We focus on different settings, which are characterized either by fully stable or quasi-unit root series. We also investigate the extent of the bias that is caused by a non-spatial estimation of a model when the data are characterized by different degrees of spatial dependence. Finally, we provide an empirical application of a QML estimator for a time-space dynamic model which includes a temporal, a spatial and a spatiotemporal lag of the dependent variable. This is done by choosing a relevant and prolific field of analysis, in which spatial econometrics has only found limited space so far, in order to explore the value-added of considering the spatial dimension of the data. In particular, we study the determinants of cropland value in Midwestern U.S.A. in the years 1971-2009, by taking the present value model (PVM) as the theoretical framework of analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Analyzing and modeling relationships between the structure of chemical compounds, their physico-chemical properties, and biological or toxic effects in chemical datasets is a challenging task for scientific researchers in the field of cheminformatics. Therefore, (Q)SAR model validation is essential to ensure future model predictivity on unseen compounds. Proper validation is also one of the requirements of regulatory authorities in order to approve its use in real-world scenarios as an alternative testing method. However, at the same time, the question of how to validate a (Q)SAR model is still under discussion. In this work, we empirically compare a k-fold cross-validation with external test set validation. The introduced workflow allows to apply the built and validated models to large amounts of unseen data, and to compare the performance of the different validation approaches. Our experimental results indicate that cross-validation produces (Q)SAR models with higher predictivity than external test set validation and reduces the variance of the results. Statistical validation is important to evaluate the performance of (Q)SAR models, but does not support the user in better understanding the properties of the model or the underlying correlations. We present the 3D molecular viewer CheS-Mapper (Chemical Space Mapper) that arranges compounds in 3D space, such that their spatial proximity reflects their similarity. The user can indirectly determine similarity, by selecting which features to employ in the process. The tool can use and calculate different kinds of features, like structural fragments as well as quantitative chemical descriptors. Comprehensive functionalities including clustering, alignment of compounds according to their 3D structure, and feature highlighting aid the chemist to better understand patterns and regularities and relate the observations to established scientific knowledge. Even though visualization tools for analyzing (Q)SAR information in small molecule datasets exist, integrated visualization methods that allows for the investigation of model validation results are still lacking. We propose visual validation, as an approach for the graphical inspection of (Q)SAR model validation results. New functionalities in CheS-Mapper 2.0 facilitate the analysis of (Q)SAR information and allow the visual validation of (Q)SAR models. The tool enables the comparison of model predictions to the actual activity in feature space. Our approach reveals if the endpoint is modeled too specific or too generic and highlights common properties of misclassified compounds. Moreover, the researcher can use CheS-Mapper to inspect how the (Q)SAR model predicts activity cliffs. The CheS-Mapper software is freely available at http://ches-mapper.org.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the last years radar sensor networks for localization and tracking in indoor environment have generated more and more interest, especially for anti-intrusion security systems. These networks often use Ultra Wide Band (UWB) technology, which consists in sending very short (few nanoseconds) impulse signals. This approach guarantees high resolution and accuracy and also other advantages such as low price, low power consumption and narrow-band interference (jamming) robustness. In this thesis the overall data processing (done in MATLAB environment) is discussed, starting from experimental measures from sensor devices, ending with the 2D visualization of targets movements over time and focusing mainly on detection and localization algorithms. Moreover, two different scenarios and both single and multiple target tracking are analyzed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There is an emerging interest in modeling spatially correlated survival data in biomedical and epidemiological studies. In this paper, we propose a new class of semiparametric normal transformation models for right censored spatially correlated survival data. This class of models assumes that survival outcomes marginally follow a Cox proportional hazard model with unspecified baseline hazard, and their joint distribution is obtained by transforming survival outcomes to normal random variables, whose joint distribution is assumed to be multivariate normal with a spatial correlation structure. A key feature of the class of semiparametric normal transformation models is that it provides a rich class of spatial survival models where regression coefficients have population average interpretation and the spatial dependence of survival times is conveniently modeled using the transformed variables by flexible normal random fields. We study the relationship of the spatial correlation structure of the transformed normal variables and the dependence measures of the original survival times. Direct nonparametric maximum likelihood estimation in such models is practically prohibited due to the high dimensional intractable integration of the likelihood function and the infinite dimensional nuisance baseline hazard parameter. We hence develop a class of spatial semiparametric estimating equations, which conveniently estimate the population-level regression coefficients and the dependence parameters simultaneously. We study the asymptotic properties of the proposed estimators, and show that they are consistent and asymptotically normal. The proposed method is illustrated with an analysis of data from the East Boston Ashma Study and its performance is evaluated using simulations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dr. Rossi discusses the common errors that are made when fitting statistical models to data. Focuses on the planning, data analysis, and interpretation phases of a statistical analysis, and highlights the errors that are commonly made by researchers of these phases. The implications of these commonly made errors are discussed along with a discussion of the methods that can be used to prevent these errors from occurring. A prescription for carrying out a correct statistical analysis will be discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Questionnaire data may contain missing values because certain questions do not apply to all respondents. For instance, questions addressing particular attributes of a symptom, such as frequency, triggers or seasonality, are only applicable to those who have experienced the symptom, while for those who have not, responses to these items will be missing. This missing information does not fall into the category 'missing by design', rather the features of interest do not exist and cannot be measured regardless of survey design. Analysis of responses to such conditional items is therefore typically restricted to the subpopulation in which they apply. This article is concerned with joint multivariate modelling of responses to both unconditional and conditional items without restricting the analysis to this subpopulation. Such an approach is of interest when the distributions of both types of responses are thought to be determined by common parameters affecting the whole population. By integrating the conditional item structure into the model, inference can be based both on unconditional data from the entire population and on conditional data from subjects for whom they exist. This approach opens new possibilities for multivariate analysis of such data. We apply this approach to latent class modelling and provide an example using data on respiratory symptoms (wheeze and cough) in children. Conditional data structures such as that considered here are common in medical research settings and, although our focus is on latent class models, the approach can be applied to other multivariate models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objective To determine the comparative effectiveness and safety of current maintenance strategies in preventing exacerbations of asthma. Design Systematic review and network meta-analysis using Bayesian statistics. Data sources Cochrane systematic reviews on chronic asthma, complemented by an updated search when appropriate. Eligibility criteria Trials of adults with asthma randomised to maintenance treatments of at least 24 weeks duration and that reported on asthma exacerbations in full text. Low dose inhaled corticosteroid treatment was the comparator strategy. The primary effectiveness outcome was the rate of severe exacerbations. The secondary outcome was the composite of moderate or severe exacerbations. The rate of withdrawal was analysed as a safety outcome. Results 64 trials with 59 622 patient years of follow-up comparing 15 strategies and placebo were included. For prevention of severe exacerbations, combined inhaled corticosteroids and long acting β agonists as maintenance and reliever treatment and combined inhaled corticosteroids and long acting β agonists in a fixed daily dose performed equally well and were ranked first for effectiveness. The rate ratios compared with low dose inhaled corticosteroids were 0.44 (95% credible interval 0.29 to 0.66) and 0.51 (0.35 to 0.77), respectively. Other combined strategies were not superior to inhaled corticosteroids and all single drug treatments were inferior to single low dose inhaled corticosteroids. Safety was best for conventional best (guideline based) practice and combined maintenance and reliever therapy. Conclusions Strategies with combined inhaled corticosteroids and long acting β agonists are most effective and safe in preventing severe exacerbations of asthma, although some heterogeneity was observed in this network meta-analysis of full text reports.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many observed time series of the global radiosonde or PILOT networks exist as fragments distributed over different archives. Identifying and merging these fragments can enhance their value for studies on the three-dimensional spatial structure of climate change. The Comprehensive Historical Upper-Air Network (CHUAN version 1.7), which was substantially extended in 2013, and the Integrated Global Radiosonde Archive (IGRA) are the most important collections of upper-air measurements taken before 1958. CHUAN (tracked) balloon data start in 1900, with higher numbers from the late 1920s onward, whereas IGRA data start in 1937. However, a substantial fraction of those measurements have not been taken at synoptic times (preferably 00:00 or 12:00 GMT) and on altitude levels instead of standard pressure levels. To make them comparable with more recent data, the records have been brought to synoptic times and standard pressure levels using state-of-the-art interpolation techniques, employing geopotential information from the National Oceanic and Atmospheric Administration (NOAA) 20th Century Reanalysis (NOAA 20CR). From 1958 onward the European Re-Analysis archives (ERA-40 and ERA-Interim) available at the European Centre for Medium-Range Weather Forecasts (ECMWF) are the main data sources. These are easier to use, but pilot data still have to be interpolated to standard pressure levels. Fractions of the same records distributed over different archives have been merged, if necessary, taking care that the data remain traceable back to their original sources. If possible, station IDs assigned by the World Meteorological Organization (WMO) have been allocated to the station records. For some records which have never been identified by a WMO ID, a local ID above 100 000 has been assigned. The merged data set contains 37 wind records longer than 70 years and 139 temperature records longer than 60 years. It can be seen as a useful basis for further data processing steps, most notably homogenization and gridding, after which it should be a valuable resource for climatological studies. Homogeneity adjustments for wind using the NOAA-20CR as a reference are described in Ramella Pralungo and Haimberger (2014). Reliable homogeneity adjustments for temperature beyond 1958 using a surface-data-only reanalysis such as NOAA-20CR as a reference have yet to be created. All the archives and metadata files are available in ASCII and netCDF format in the PANGAEA archive

Relevância:

100.00% 100.00%

Publicador:

Resumo:

These data result from an investigation examining the interplay between dyadic rapport and consequential behavior-mirroring. Participants responded to a variety of interpersonally-focused pretest measures prior to their engagement in videotaped interdependent tasks (coded for interactional synchrony using Motion Energy Analysis [17,18]). A post-task evaluation of rapport and other related constructs followed each exchange. Four studies shared these same dependent measures, but asked distinct questions: Study 1 (Ndyad = 38) explored the influence of perceived responsibility and gender-specificity of the task; Study 2 (Ndyad = 51) focused on dyad sex-makeup; Studies 3 (Ndyad = 41) and 4 (Ndyad = 63) examined cognitive load impacts on the interactions. Versions of the data are structured with both individual and dyad as the unit of analysis. Our data possess strong reuse potential for theorists interested in dyadic processes and are especially pertinent to questions about dyad agreement and interpersonal perception / behavior association relationships.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The article proposes granular computing as a theoretical, formal and methodological basis for the newly emerging research field of human–data interaction (HDI). We argue that the ability to represent and reason with information granules is a prerequisite for data legibility. As such, it allows for extending the research agenda of HDI to encompass the topic of collective intelligence amplification, which is seen as an opportunity of today’s increasingly pervasive computing environments. As an example of collective intelligence amplification in HDI, we introduce a collaborative urban planning use case in a cognitive city environment and show how an iterative process of user input and human-oriented automated data processing can support collective decision making. As a basis for automated human-oriented data processing, we use the spatial granular calculus of granular geometry.