914 resultados para statistical data analysis
Resumo:
BACKGROUND: Sudden cardiac death (SCD) among the young is a rare and devastating event, but its exact incidence in many countries remains unknown. An autopsy is recommended in every case because some of the cardiac pathologies may have a genetic origin, which can have an impact on the living family members. The aims of this retrospective study completed in the canton of Vaud, Switzerland were to determine both the incidence of SCD and the autopsy rate for individuals from 5 to 39 years of age. METHODS: The study was conducted from 2000 to 2007 on the basis of official statistics and analysis of the International Classification of Diseases codes for potential SCDs and other deaths that might have been due to cardiac disease. RESULTS: During the 8 year study period there was an average of 292'546 persons aged 5-39 and there were a total of 1122 deaths, certified as potential SCDs in 3.6% of cases. The calculated incidence is 1.71/100'000 person-years (2.73 for men and 0.69 for women). If all possible cases of SCD (unexplained deaths, drowning, traffic accidents, etc.) are included, the incidence increases to 13.67/100'000 person-years. However, the quality of the officially available data was insufficient to provide an accurate incidence of SCD as well as autopsy rates. The presumed autopsy rate of sudden deaths classified as diseases of the circulatory system is 47.5%. For deaths of unknown cause (11.1% of the deaths), the autopsy was conducted in 13.7% of the cases according to codified data. CONCLUSIONS: The incidence of presumed SCD in the canton of Vaud, Switzerland, is comparable to the data published in the literature for other geographic regions but may be underestimated as it does not take into account other potential SCDs, as unexplained deaths. Increasing the autopsy rate of SCD in the young, better management of information obtained from autopsies as well developing of structured registry could improve the reliability of the statistical data, optimize the diagnostic procedures, and the preventive measures for the family members.
Resumo:
Background and purpose: Decision making (DM) has been defined as the process through which a person forms preferences, selects and executes actions, and evaluates the outcome related to a selected choice. This ability represents an important factor for adequate behaviour in everyday life. DM impairment in multiple sclerosis (MS) has been previously reported. The purpose of the present study was to assess DM in patients with MS at the earliest clinically detectable time point of the disease. Methods: Patients with definite (n=109) or possible (clinically isolated syndrome, CIS; n=56) MS, a short disease duration (mean 2.3 years) and a minor neurological disability (mean EDSS 1.8) were compared to 50 healthy controls aged 18 to 60 years (mean age 32.2) using the Iowa Gambling Task (IGT). Subjects had to select a card from any of 4 decks (A/B [disadvantageous]; C/D [advantageous]). The game consisted of 100 trials then grouped in blocks of 20 cards for data analysis. Skill in DM was assessed by means of a learning index (LI) defined as the difference between the averaged last three block indexes and first two block indexes (LI=[(BI-3+BI-4+BI-5)/3-(BI-1+B2)/2]). Non parametric tests were used for statistical analysis. Results: LI was higher in the control group (0.24, SD 0.44) than in the MS group (0.21, SD 0.38), however without reaching statistical significance (p=0.7). Interesting differences were detected when MS patients were grouped according to phenotype. A trend to a difference between MS subgroups and controls was observed for LI (p=0.06), which became significant between MS subgroups (p=0.03). CIS patients who confirmed MS diagnosis by presenting a second relapse after study entry showed a dysfunction in the IGT in comparison to the other CIS (p=0.01) and definite MS (p=0.04) patients. In the opposite, CIS patients characterised by not entirely fulfilled McDonald criteria at inclusion and absence of relapse during the study showed an normal learning pattern on the IGT. Finally, comparing MS patients who developed relapses after study entry, those who remained clinically stable and controls, we observed impaired performances only in relapsing patients in comparison to stable patients (p=0.008) and controls (p=0.03). Discussion: These results raise the assumption of a sustained role for both MS relapsing activity and disease heterogeneity (i.e. infra-clinical severity or activity of MS) in the impaired process of decision making.
Resumo:
This paper describes the development of an analytical technique for arsenic analyses that is based on genetically-modified bioreporter bacteria bearing a gene encoding for the production of a green fluorescent protein (gfp). Upon exposure to arsenic (in the aqueous form of arsenite), the bioreporter production of the fluorescent reporter molecule is monitored spectroscopically. We compared the response measured as a function of time and concentration by steady-state fluorimetry (SSF) to that measured by epi-fluorescent microscopy (EFM). SSF is a bulk technique; as such it inherently yields less information, whereas EFM monitors the response of many individual cells simultaneously and data can be processed in terms of population averages or subpopulations. For the bioreporter strain used here, as well as for the literature we cite, the two techniques exhibit similar performance characteristics. The results presented here show that the EFM technique can compete with SSF and shows substantially more promise for future improvement; it is a matter of research interest to develop optimized methods of EFM image analysis and statistical data treatment. EFM is a conduit for understanding the dynamics of individual cell response vs. population response, which is not only a matter of research interest, but is also promising in the practical terms of developing micro-scale analysis.
Resumo:
This paper presents general problems and approaches for the spatial data analysis using machine learning algorithms. Machine learning is a very powerful approach to adaptive data analysis, modelling and visualisation. The key feature of the machine learning algorithms is that they learn from empirical data and can be used in cases when the modelled environmental phenomena are hidden, nonlinear, noisy and highly variable in space and in time. Most of the machines learning algorithms are universal and adaptive modelling tools developed to solve basic problems of learning from data: classification/pattern recognition, regression/mapping and probability density modelling. In the present report some of the widely used machine learning algorithms, namely artificial neural networks (ANN) of different architectures and Support Vector Machines (SVM), are adapted to the problems of the analysis and modelling of geo-spatial data. Machine learning algorithms have an important advantage over traditional models of spatial statistics when problems are considered in a high dimensional geo-feature spaces, when the dimension of space exceeds 5. Such features are usually generated, for example, from digital elevation models, remote sensing images, etc. An important extension of models concerns considering of real space constrains like geomorphology, networks, and other natural structures. Recent developments in semi-supervised learning can improve modelling of environmental phenomena taking into account on geo-manifolds. An important part of the study deals with the analysis of relevant variables and models' inputs. This problem is approached by using different feature selection/feature extraction nonlinear tools. To demonstrate the application of machine learning algorithms several interesting case studies are considered: digital soil mapping using SVM, automatic mapping of soil and water system pollution using ANN; natural hazards risk analysis (avalanches, landslides), assessments of renewable resources (wind fields) with SVM and ANN models, etc. The dimensionality of spaces considered varies from 2 to more than 30. Figures 1, 2, 3 demonstrate some results of the studies and their outputs. Finally, the results of environmental mapping are discussed and compared with traditional models of geostatistics.
Resumo:
The geographic information system approach has permitted integration between demographic, socio-economic and environmental data, providing correlation between information from several data banks. In the current work, occurrence of human and canine visceral leishmaniases and insect vectors (Lutzomyia longipalpis) as well as biogeographic information related to 9 areas that comprise the city of Belo Horizonte, Brazil, between April 2001 and March 2002 were correlated and georeferenced. By using this technique it was possible to define concentration loci of canine leishmaniasis in the following regions: East; Northeast; Northwest; West; and Venda Nova. However, as for human leishmaniasis, it was not possible to perform the same analysis. Data analysis has also shown that 84.2% of the human leishmaniasis cases were related with canine leishmaniasis cases. Concerning biogeographic (altitude, area of vegetation influence, hydrographic, and areas of poverty) analysis, only altitude showed to influence emergence of leishmaniasis cases. A number of 4673 canine leishmaniasis cases and 64 human leishmaniasis cases were georeferenced, of which 67.5 and 71.9%, respectively, were living between 780 and 880 m above the sea level. At these same altitudes, a large number of phlebotomine sand flies were collected. Therefore, we suggest control measures for leishmaniasis in the city of Belo Horizonte, giving priority to canine leishmaniasis foci and regions at altitudes between 780 and 880 m.
Resumo:
In this paper we look at how a web-based social software can be used to make qualitative data analysis of online peer-to-peer learning experiences. Specifically, we propose to use Cohere, a web-based social sense-making tool, to observe, track, annotate and visualize discussion group activities in online courses. We define a specific methodology for data observation and structuring, and present results of the analysis of peer interactions conducted in discussion forum in a real case study of a P2PU course. Finally we discuss how network visualization and analysis can be used to gather a better understanding of the peer-to-peer learning experience. To do so, we provide preliminary insights on the social, dialogical and conceptual connections that have been generated within one online discussion group.
Resumo:
Planners in public and private institutions would like coherent forecasts of the components of age-specic mortality, such as causes of death. This has been di cult toachieve because the relative values of the forecast components often fail to behave ina way that is coherent with historical experience. In addition, when the group forecasts are combined the result is often incompatible with an all-groups forecast. It hasbeen shown that cause-specic mortality forecasts are pessimistic when compared withall-cause forecasts (Wilmoth, 1995). This paper abandons the conventional approachof using log mortality rates and forecasts the density of deaths in the life table. Sincethese values obey a unit sum constraint for both conventional single-decrement life tables (only one absorbing state) and multiple-decrement tables (more than one absorbingstate), they are intrinsically relative rather than absolute values across decrements aswell as ages. Using the methods of Compositional Data Analysis pioneered by Aitchison(1986), death densities are transformed into the real space so that the full range of multivariate statistics can be applied, then back-transformed to positive values so that theunit sum constraint is honoured. The structure of the best-known, single-decrementmortality-rate forecasting model, devised by Lee and Carter (1992), is expressed incompositional form and the results from the two models are compared. The compositional model is extended to a multiple-decrement form and used to forecast mortalityby cause of death for Japan
Resumo:
Theory of compositional data analysis is often focused on the composition only. However in practical applications we often treat a composition together with covariableswith some other scale. This contribution systematically gathers and develop statistical tools for this situation. For instance, for the graphical display of the dependenceof a composition with a categorical variable, a colored set of ternary diagrams mightbe a good idea for a first look at the data, but it will fast hide important aspects ifthe composition has many parts, or it takes extreme values. On the other hand colored scatterplots of ilr components could not be very instructive for the analyst, if theconventional, black-box ilr is used.Thinking on terms of the Euclidean structure of the simplex, we suggest to set upappropriate projections, which on one side show the compositional geometry and on theother side are still comprehensible by a non-expert analyst, readable for all locations andscales of the data. This is e.g. done by defining special balance displays with carefully-selected axes. Following this idea, we need to systematically ask how to display, explore,describe, and test the relation to complementary or explanatory data of categorical, real,ratio or again compositional scales.This contribution shows that it is sufficient to use some basic concepts and very fewadvanced tools from multivariate statistics (principal covariances, multivariate linearmodels, trellis or parallel plots, etc.) to build appropriate procedures for all these combinations of scales. This has some fundamental implications in their software implementation, and how might they be taught to analysts not already experts in multivariateanalysis
Resumo:
Functional Data Analysis (FDA) deals with samples where a whole function is observedfor each individual. A particular case of FDA is when the observed functions are densityfunctions, that are also an example of infinite dimensional compositional data. In thiswork we compare several methods for dimensionality reduction for this particular typeof data: functional principal components analysis (PCA) with or without a previousdata transformation and multidimensional scaling (MDS) for diferent inter-densitiesdistances, one of them taking into account the compositional nature of density functions. The difeerent methods are applied to both artificial and real data (householdsincome distributions)
Resumo:
In this paper we examine the problem of compositional data from a different startingpoint. Chemical compositional data, as used in provenance studies on archaeologicalmaterials, will be approached from the measurement theory. The results will show, in avery intuitive way that chemical data can only be treated by using the approachdeveloped for compositional data. It will be shown that compositional data analysis is aparticular case in projective geometry, when the projective coordinates are in thepositive orthant, and they have the properties of logarithmic interval metrics. Moreover,it will be shown that this approach can be extended to a very large number ofapplications, including shape analysis. This will be exemplified with a case study inarchitecture of Early Christian churches dated back to the 5th-7th centuries AD
Resumo:
This analysis was stimulated by the real data analysis problem of householdexpenditure data. The full dataset contains expenditure data for a sample of 1224 households. The expenditure is broken down at 2 hierarchical levels: 9 major levels (e.g. housing, food, utilities etc.) and 92 minor levels. There are also 5 factors and 5 covariates at the household level. Not surprisingly, there are a small number of zeros at the major level, but many zeros at the minor level. The question is how best to model the zeros. Clearly, models that tryto add a small amount to the zero terms are not appropriate in general as at least some of the zeros are clearly structural, e.g. alcohol/tobacco for households that are teetotal. The key question then is how to build suitable conditional models. For example, is the sub-composition of spendingexcluding alcohol/tobacco similar for teetotal and non-teetotal households?In other words, we are looking for sub-compositional independence. Also, what determines whether a household is teetotal? Can we assume that it is independent of the composition? In general, whether teetotal will clearly depend on the household level variables, so we need to be able to model this dependence. The other tricky question is that with zeros on more than onecomponent, we need to be able to model dependence and independence of zeros on the different components. Lastly, while some zeros are structural, others may not be, for example, for expenditure on durables, it may be chance as to whether a particular household spends money on durableswithin the sample period. This would clearly be distinguishable if we had longitudinal data, but may still be distinguishable by looking at the distribution, on the assumption that random zeros will usually be for situations where any non-zero expenditure is not small.While this analysis is based on around economic data, the ideas carry over tomany other situations, including geological data, where minerals may be missing for structural reasons (similar to alcohol), or missing because they occur only in random regions which may be missed in a sample (similar to the durables)
Resumo:
In standard multivariate statistical analysis common hypotheses of interest concern changes in mean vectors and subvectors. In compositional data analysis it is now well established that compositional change is most readily described in terms of the simplicial operation of perturbation and that subcompositions replace the marginal concept of subvectors. To motivate the statistical developments of this paper we present two challenging compositional problems from food production processes.Against this background the relevance of perturbations and subcompositions can beclearly seen. Moreover we can identify a number of hypotheses of interest involvingthe specification of particular perturbations or differences between perturbations and also hypotheses of subcompositional stability. We identify the two problems as being the counterpart of the analysis of paired comparison or split plot experiments and of separate sample comparative experiments in the jargon of standard multivariate analysis. We then develop appropriate estimation and testing procedures for a complete lattice of relevant compositional hypotheses
Resumo:
INTRODUCTION According to several series, hospital hyponutrition involves 30-50% of hospitalized patients. The high prevalence justifies the need for early detection from admission. There several classical screening tools that show important limitations in their systematic application in daily clinical practice. OBJECTIVES To analyze the relationship between hyponutrition, detected by our screening method, and mortality, hospital stay, or re-admissions. To analyze, as well, the relationship between hyponutrition and prescription of nutritional support. To compare different nutritional screening methods at admission on a random sample of hospitalized patients. Validation of the INFORNUT method for nutritional screening. MATERIAL AND METHODS In a previous phase from the study design, a retrospective analysis with data from the year 2003 was carried out in order to know the situation of hyponutrition in Virgen de la Victoria Hospital, at Malaga, gathering data from the MBDS (Minimal Basic Data Set), laboratory analysis of nutritional risk (FILNUT filter), and prescription of nutritional support. In the experimental phase, a cross-sectional cohort study was done with a random sample of 255 patients, on May of 2004. Anthropometrical study, Subjective Global Assessment (SGA), Mini-Nutritional Assessment (MNA), Nutritional Risk Screening (NRS), Gassull's method, CONUT and INFORNUT were done. The settings of the INFORNUT filter were: albumin < 3.5 g/dL, and/or total proteins <5 g/dL, and/or prealbumin <18 mg/dL, with or without total lymphocyte count < 1.600 cells/mm3 and/or total cholesterol <180 mg/dL. In order to compare the different methods, a gold standard is created based on the recommendations of the SENPE on anthropometrical and laboratory data. The statistical association analysis was done by the chi-squared test (a: 0.05) and agreement by the k index. RESULTS In the study performed in the previous phase, it is observed that the prevalence of hospital hyponutrition is 53.9%. One thousand six hundred and forty four patients received nutritional support, of which 66.9% suffered from hyponutrition. We also observed that hyponutrition is one of the factors favoring the increase in mortality (hyponourished patients 15.19% vs. non-hyponourished 2.58%), hospital stay (hyponourished patients 20.95 days vs. non-hyponourished 8.75 days), and re-admissions (hyponourished patients 14.30% vs. non-hyponourished 6%). The results from the experimental study are as follows: the prevalence of hyponutrition obtained by the gold standard was 61%, INFORNUT 60%. Agreement levels between INFORNUT, CONUT, and GASSULL are good or very good between them (k: 0.67 INFORNUT with CONUT, and k: 0.94 INFORNUT and GASSULL) and wit the gold standard (k: 0.83; k: 0.64 CONUT; k: 0.89 GASSULL). However, structured tests (SGA, MNA, NRS) show low agreement indexes with the gold standard and laboratory or mixed tests (Gassull), although they show a low to intermediate level of agreement when compared one to each other (k: 0.489 NRS with SGA). INFORNUT shows sensitivity of 92.3%, a positive predictive value of 94.1%, and specificity of 91.2%. After the filer phase, a preliminary report is sent, on which anthropometrical and intake data are added and a Nutritional Risk Report is done. CONCLUSIONS Hyponutrition prevalence in our study (60%) is similar to that found by other authors. Hyponutrition is associated to increased mortality, hospital stay, and re-admission rate. There are no tools that have proven to be effective to show early hyponutrition at the hospital setting without important applicability limitations. FILNUT, as the first phase of the filter process of INFORNUT represents a valid tool: it has sensitivity and specificity for nutritional screening at admission. The main advantages of the process would be early detection of patients with risk for hyponutrition, having a teaching and sensitization function to health care staff implicating them in nutritional assessment of their patients, and doing a hyponutrition diagnosis and nutritional support need in the discharge report that would be registered by the Clinical Documentation Department. Therefore, INFORNUT would be a universal screening method with a good cost-effectiveness ratio.
Resumo:
Analyzing functional data often leads to finding common factors, for which functional principal component analysis proves to be a useful tool to summarize and characterize the random variation in a function space. The representation in terms of eigenfunctions is optimal in the sense of L-2 approximation. However, the eigenfunctions are not always directed towards an interesting and interpretable direction in the context of functional data and thus could obscure the underlying structure. To overcome such difficulty, an alternative to functional principal component analysis is proposed that produces directed components which may be more informative and easier to interpret. These structural components are similar to principal components, but are adapted to situations in which the domain of the function may be decomposed into disjoint intervals such that there is effectively independence between intervals and positive correlation within intervals. The approach is demonstrated with synthetic examples as well as real data. Properties for special cases are also studied.
Resumo:
As stated in Aitchison (1986), a proper study of relative variation in a compositional data set should be based on logratios, and dealing with logratios excludes dealing with zeros. Nevertheless, it is clear that zero observations might be present in real data sets, either because the corresponding part is completelyabsent –essential zeros– or because it is below detection limit –rounded zeros. Because the second kind of zeros is usually understood as “a trace too small to measure”, it seems reasonable to replace them by a suitable small value, and this has been the traditional approach. As stated, e.g. by Tauber (1999) and byMartín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000), the principal problem in compositional data analysis is related to rounded zeros. One should be careful to use a replacement strategy that does not seriously distort the general structure of the data. In particular, the covariance structure of the involvedparts –and thus the metric properties– should be preserved, as otherwise further analysis on subpopulations could be misleading. Following this point of view, a non-parametric imputation method isintroduced in Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000). This method is analyzed in depth by Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2003) where it is shown that thetheoretical drawbacks of the additive zero replacement method proposed in Aitchison (1986) can be overcome using a new multiplicative approach on the non-zero parts of a composition. The new approachhas reasonable properties from a compositional point of view. In particular, it is “natural” in the sense thatit recovers the “true” composition if replacement values are identical to the missing values, and it is coherent with the basic operations on the simplex. This coherence implies that the covariance structure of subcompositions with no zeros is preserved. As a generalization of the multiplicative replacement, in thesame paper a substitution method for missing values on compositional data sets is introduced