905 resultados para longitudinal data-analysis


Relevância:

90.00% 90.00%

Publicador:

Resumo:

Binning and truncation of data are common in data analysis and machine learning. This paper addresses the problem of fitting mixture densities to multivariate binned and truncated data. The EM approach proposed by McLachlan and Jones (Biometrics, 44: 2, 571-578, 1988) for the univariate case is generalized to multivariate measurements. The multivariate solution requires the evaluation of multidimensional integrals over each bin at each iteration of the EM procedure. Naive implementation of the procedure can lead to computationally inefficient results. To reduce the computational cost a number of straightforward numerical techniques are proposed. Results on simulated data indicate that the proposed methods can achieve significant computational gains with no loss in the accuracy of the final parameter estimates. Furthermore, experimental results suggest that with a sufficient number of bins and data points it is possible to estimate the true underlying density almost as well as if the data were not binned. The paper concludes with a brief description of an application of this approach to diagnosis of iron deficiency anemia, in the context of binned and truncated bivariate measurements of volume and hemoglobin concentration from an individual's red blood cells.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Objective: To explore relationships between physical activity and mental health cross-sectionally and longitudinally in a large cohort of older Australian women. Method: Women in their 70s participating in the Australian Longitudinal Study on Women's Health responded in 1996 (aged 70-75) and in 1999 (aged 73-78). Cross-sectional data were analyzed for 10,063 women and longitudinal data for 6472. Self-reports were used to categorize women into four categories of physical activity at each time point as well as to define four physical activity transition categories across the 3-year period. Outcome variables for the cross-sectional analyses were the mental health component score (MCS) and mental health subscales of the Medical Outcomes Study Short Form (SF-36). The longitudinal analyses focused on changes in these variables. Confounders included the physical health component scale (PCS) of the SF-36, marital status, body mass index (BMI) and life events. Adjustment for baseline scores was included for the longitudinal analyses. Results: Cross-sectionally, higher levels of physical activity were associated with higher scores on all dependent variables, both with and without adjustment for confounders. Longitudinally, the effects were weaker, but women who had made a transition from some physical activity to none generally showed more negative changes in emotional well-being than those who had always been sedentary, while those who maintained or adopted physical activity had better outcomes. Conclusion: Physical activity is associated with emotional well-being among a population cohort of older women both cross-sectionally and longitudinally, supporting the need for the promotion of appropriate physical activity in this age group. (C) 2003 Elsevier Science Inc. All rights reserved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

An increasing number of studies shows that the glycogen-accumulating organisms (GAOs) can survive and may indeed proliferate under the alternating anaerobic/aerobic conditions found in EBPR systems, thus forming a strong competitor of the polyphosphate-accumulating organisms (PAOs). Understanding their behaviors in a mixed PAO and GAO culture under various operational conditions is essential for developing operating strategies that disadvantage the growth of this group of unwanted organisms. A model-based data analysis method is developed in this paper for the study of the anaerobic PAO and GAO activities in a mixed PAO and GAO culture. The method primarily makes use of the hydrogen ion production rate and the carbon dioxide transfer rate resulting from the acetate uptake processes by PAOs and GAOs, measured with a recently developed titration and off-gas analysis (TOGA) sensor. The method is demonstrated using the data from a laboratory-scale sequencing batch reactor (SBR) operated under alternating anaerobic and aerobic conditions. The data analysis using the proposed method strongly indicates a coexistence of PAOs and GAOs in the system, which was independently confirmed by fluorescent in situ hybridization (FISH) measurement. The model-based analysis also allowed the identification of the respective acetate uptake rates by PAOs and GAOs, along with a number of kinetic and stoichiometric parameters involved in the PAO and GAO models. The excellent fit between the model predictions and the experimental data not involved in parameter identification shows that the parameter values found are reliable and accurate. It also demonstrates that the current anaerobic PAO and GAO models are able to accurately characterize the PAO/GAO mixed culture obtained in this study. This is of major importance as no pure culture of either PAOs or GAOs has been reported to date, and hence the current PAO and GAO models were developed for the interpretation of experimental results of mixed cultures. The proposed method is readily applicable for detailed investigations of the competition between PAOs and GAOs in enriched cultures. However, the fermentation of organic substrates carried out by ordinary heterotrophs needs to be accounted for when the method is applied to the study of PAO and GAO competition in full-scale sludges. (C) 2003 Wiley Periodicals, Inc.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Objectives : The purpose of this article is to find out differences between surveys using paper and online questionnaires. The author has deep knowledge in the case of questions concerning opinions in the development of survey based research, e.g. the limits of postal and online questionnaires. Methods : In the physician studies carried out in 1995 (doctors graduated in 1982-1991), 2000 (doctors graduated in 1982-1996), 2005 (doctors graduated in 1982-2001), 2011 (doctors graduated in 1977-2006) and 457 family doctors in 2000, were used paper and online questionnaires. The response rates were 64%, 68%, 64%, 49% and 73%, respectively. Results : The results of the physician studies showed that there were differences between methods. These differences were connected with using paper-based questionnaire and online questionnaire and response rate. The online-based survey gave a lower response rate than the postal survey. The major advantages of online survey were short response time; very low financial resource needs and data were directly loaded in the data analysis software, thus saved time and resources associated with the data entry process. Conclusions : The current article helps researchers with planning the study design and choosing of the right data collection method.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This article is is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Attribution-NonCommercial (CC BY-NC) license lets others remix, tweak, and build upon work non-commercially, and although the new works must also acknowledge & be non-commercial.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Research on cluster analysis for categorical data continues to develop, new clustering algorithms being proposed. However, in this context, the determination of the number of clusters is rarely addressed. We propose a new approach in which clustering and the estimation of the number of clusters is done simultaneously for categorical data. We assume that the data originate from a finite mixture of multinomial distributions and use a minimum message length criterion (MML) to select the number of clusters (Wallace and Bolton, 1986). For this purpose, we implement an EM-type algorithm (Silvestre et al., 2008) based on the (Figueiredo and Jain, 2002) approach. The novelty of the approach rests on the integration of the model estimation and selection of the number of clusters in a single algorithm, rather than selecting this number based on a set of pre-estimated candidate models. The performance of our approach is compared with the use of Bayesian Information Criterion (BIC) (Schwarz, 1978) and Integrated Completed Likelihood (ICL) (Biernacki et al., 2000) using synthetic data. The obtained results illustrate the capacity of the proposed algorithm to attain the true number of cluster while outperforming BIC and ICL since it is faster, which is especially relevant when dealing with large data sets.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Catastrophic events, such as wars and terrorist attacks, tornadoes and hurricanes, earthquakes, tsunamis, floods and landslides, are always accompanied by a large number of casualties. The size distribution of these casualties has separately been shown to follow approximate power law (PL) distributions. In this paper, we analyze the statistical distributions of the number of victims of catastrophic phenomena, in particular, terrorism, and find double PL behavior. This means that the data sets are better approximated by two PLs instead of a single one. We plot the PL parameters, corresponding to several events, and observe an interesting pattern in the charts, where the lines that connect each pair of points defining the double PLs are almost parallel to each other. A complementary data analysis is performed by means of the computation of the entropy. The results reveal relationships hidden in the data that may trigger a future comprehensive explanation of this type of phenomena.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Estuaries are perhaps the most threatened environments in the coastal fringe; the coincidence of high natural value and attractiveness for human use has led to conflicts between conservation and development. These conflicts occur in the Sado Estuary since its location is near the industrialised zone of Peninsula of Setúbal and at the same time, a great part of the Estuary is classified as a Natural Reserve due to its high biodiversity. These facts led us to the need of implementing a model of environmental management and quality assessment, based on methodologies that enable the assessment of the Sado Estuary quality and evaluation of the human pressures in the estuary. These methodologies are based on indicators that can better depict the state of the environment and not necessarily all that could be measured or analysed. Sediments have always been considered as an important temporary source of some compounds or a sink for other type of materials or an interface where a great diversity of biogeochemical transformations occur. For all this they are of great importance in the formulation of coastal management system. Many authors have been using sediments to monitor aquatic contamination, showing great advantages when compared to the sampling of the traditional water column. The main objective of this thesis was to develop an estuary environmental management framework applied to Sado Estuary using the DPSIR Model (EMMSado), including data collection, data processing and data analysis. The support infrastructure of EMMSado were a set of spatially contiguous and homogeneous regions of sediment structure (management units). The environmental quality of the estuary was assessed through the sediment quality assessment and integrated in a preliminary stage with the human pressure for development. Besides the earlier explained advantages, studying the quality of the estuary mainly based on the indicators and indexes of the sediment compartment also turns this methodology easier, faster and human and financial resource saving. These are essential factors to an efficient environmental management of coastal areas. Data management, visualization, processing and analysis was obtained through the combined use of indicators and indices, sampling optimization techniques, Geographical Information Systems, remote sensing, statistics for spatial data, Global Positioning Systems and best expert judgments. As a global conclusion, from the nineteen management units delineated and analyzed three showed no ecological risk (18.5 % of the study area). The areas of more concern (5.6 % of the study area) are located in the North Channel and are under strong human pressure mainly due to industrial activities. These areas have also low hydrodynamics and are, thus associated with high levels of deposition. In particular the areas near Lisnave and Eurominas industries can also accumulate the contamination coming from Águas de Moura Channel, since particles coming from that channel can settle down in that area due to residual flow. In these areas the contaminants of concern, from those analyzed, are the heavy metals and metalloids (Cd, Cu, Zn and As exceeded the PEL guidelines) and the pesticides BHC isomers, heptachlor, isodrin, DDT and metabolits, endosulfan and endrin. In the remain management units (76 % of the study area) there is a moderate impact potential of occurrence of adverse ecological effects and in some of these areas no stress agents could be identified. This emphasizes the need for further research, since unmeasured chemicals may be causing or contributing to these adverse effects. Special attention must be taken to the units with moderate impact potential of occurrence of adverse ecological effects, located inside the natural reserve. Non-point source pollution coming from agriculture and aquaculture activities also seem to contribute with important pollution load into the estuary entering from Águas de Moura Channel. This pressure is expressed in a moderate impact potential for ecological risk existent in the areas near the entrance of this Channel. Pressures may also came from Alcácer Channel although they were not quantified in this study. The management framework presented here, including all the methodological tools may be applied and tested in other estuarine ecosystems, which will also allow a comparison between estuarine ecosystems in other parts of the globe.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

OBJECTIVE To analyze conditional and unconditional healthy life expectancy among older Brazilian women.METHODS This cross-sectional study used the intercensal technique to estimate, in the absence of longitudinal data, healthy life expectancy that is conditional and unconditional on the individual’s current health status. The data used were obtained from the Pesquisa Nacional por Amostra de Domicílios (National Household Sample Survey) of 1998, 2003, and 2008. This sample comprised 11,171; 13,694; and 16,259 women aged 65 years or more, respectively. Complete mortality tables from the Brazilian Institute of Geography and Statistics for the years 2001 and 2006 were also used. The definition of health status was based on the difficulty in performing activities of daily living.RESULTS The remaining lifetime was strongly dependent on the current health status of the older women. Between 1998 and 2003, the amount of time lived with disability for healthy women at age 65 was 9.8%. This percentage increased to 66.2% when the women already presented some disability at age 65. Temporal analysis showed that the active life expectancy of the women at age 65 increased between 1998-2003 (19.3 years) and 2003-2008 (19.4 years). However, life years gained have been mainly focused on the unhealthy state.CONCLUSIONS Analysis of conditional and unconditional life expectancy indicated that live years gained are a result of the decline of mortality in unhealthy states. This pattern suggests that there has been no reduction in morbidity among older women in Brazil between 1998 and 2008.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Eight depositional sequences (DS) delimited by regional disconformities had been recognized in the Miocene of Lisbon and Setúbal Peninsula areas. In the case of the western coast of the Setúbal Peninsula, outcrops consisting of Lower Burdigalian to Lower Tortonian sediments were studied. The stratigraphic zonography and the environmental considerations are mainly supported on data concerning to foraminifera, ostracoda, vertebrates and palynomorphs. The first mineralogical and geochemical data determined for Foz da Fonte, Penedo Sul and Penedo Norte sedimentary sequences are presented. These analytical data mainly correspond to the sediments' fine fractions. Mineralogical data are based on X-ray diffraction (XRD), carried out on both the less than 38 nm and 2 nm fractions. Qualitative and semi-quantitative determinations of clay and non-clay minerals were obtained for both fractions. The clay minerals assemblages complete the lithostratigraphic and paleoenvironmental data obtained by stratigraphic and palaeontological studies. Some palaeomagnetic and isotopic data are discussed and correlated with the mineralogical data. Multivariate data analysis (Principal Components Analysis) of the mineralogical data was carried out using both R-mode and Q-mode factor analysis.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Mestrado em Engenharia Informática - Área de Especialização em Tecnologias do Conhecimento e Decisão

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper presents the Realistic Scenarios Generator (RealScen), a tool that processes data from real electricity markets to generate realistic scenarios that enable the modeling of electricity market players’ characteristics and strategic behavior. The proposed tool provides significant advantages to the decision making process in an electricity market environment, especially when coupled with a multi-agent electricity markets simulator. The generation of realistic scenarios is performed using mechanisms for intelligent data analysis, which are based on artificial intelligence and data mining algorithms. These techniques allow the study of realistic scenarios, adapted to the existing markets, and improve the representation of market entities as software agents, enabling a detailed modeling of their profiles and strategies. This work contributes significantly to the understanding of the interactions between the entities acting in electricity markets by increasing the capability and realism of market simulations.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A thesis submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Information Systems.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Harnessing idle PCs CPU cycles, storage space and other resources of networked computers to collaborative are mainly fixated on for all major grid computing research projects. Most of the university computers labs are occupied with the high puissant desktop PC nowadays. It is plausible to notice that most of the time machines are lying idle or wasting their computing power without utilizing in felicitous ways. However, for intricate quandaries and for analyzing astronomically immense amounts of data, sizably voluminous computational resources are required. For such quandaries, one may run the analysis algorithms in very puissant and expensive computers, which reduces the number of users that can afford such data analysis tasks. Instead of utilizing single expensive machines, distributed computing systems, offers the possibility of utilizing a set of much less expensive machines to do the same task. BOINC and Condor projects have been prosperously utilized for solving authentic scientific research works around the world at a low cost. In this work the main goal is to explore both distributed computing to implement, Condor and BOINC, and utilize their potency to harness the ideal PCs resources for the academic researchers to utilize in their research work. In this thesis, Data mining tasks have been performed in implementation of several machine learning algorithms on the distributed computing environment.