988 resultados para aggregated data
Resumo:
Este proyecto propone extender y generalizar los procesos de estimación e inferencia de modelos aditivos generalizados multivariados para variables aleatorias no gaussianas, que describen comportamientos de fenómenos biológicos y sociales y cuyas representaciones originan series longitudinales y datos agregados (clusters). Se genera teniendo como objeto para las aplicaciones inmediatas, el desarrollo de metodología de modelación para la comprensión de procesos biológicos, ambientales y sociales de las áreas de Salud y las Ciencias Sociales, la condicionan la presencia de fenómenos específicos, como el de las enfermedades.Es así que el plan que se propone intenta estrechar la relación entre la Matemática Aplicada, desde un enfoque bajo incertidumbre y las Ciencias Biológicas y Sociales, en general, generando nuevas herramientas para poder analizar y explicar muchos problemas sobre los cuales tienen cada vez mas información experimental y/o observacional.Se propone, en forma secuencial, comenzando por variables aleatorias discretas (Yi, con función de varianza menor que una potencia par del valor esperado E(Y)) generar una clase unificada de modelos aditivos (paramétricos y no paramétricos) generalizados, la cual contenga como casos particulares a los modelos lineales generalizados, no lineales generalizados, los aditivos generalizados, los de media marginales generalizados (enfoques GEE1 -Liang y Zeger, 1986- y GEE2 -Zhao y Prentice, 1990; Zeger y Qaqish, 1992; Yan y Fine, 2004), iniciando una conexión con los modelos lineales mixtos generalizados para variables latentes (GLLAMM, Skrondal y Rabe-Hesketh, 2004), partiendo de estructuras de datos correlacionados. Esto permitirá definir distribuciones condicionales de las respuestas, dadas las covariables y las variables latentes y estimar ecuaciones estructurales para las VL, incluyendo regresiones de VL sobre las covariables y regresiones de VL sobre otras VL y modelos específicos para considerar jerarquías de variación ya reconocidas. Cómo definir modelos que consideren estructuras espaciales o temporales, de manera tal que permitan la presencia de factores jerárquicos, fijos o aleatorios, medidos con error como es el caso de las situaciones que se presentan en las Ciencias Sociales y en Epidemiología, es un desafío a nivel estadístico. Se proyecta esa forma secuencial para la construcción de metodología tanto de estimación como de inferencia, comenzando con variables aleatorias Poisson y Bernoulli, incluyendo los existentes MLG, hasta los actuales modelos generalizados jerárquicos, conextando con los GLLAMM, partiendo de estructuras de datos correlacionados. Esta familia de modelos se generará para estructuras de variables/vectores, covariables y componentes aleatorios jerárquicos que describan fenómenos de las Ciencias Sociales y la Epidemiología.
Resumo:
This work aims to compare the forecast efficiency of different types of methodologies applied to Brazilian Consumer inflation (IPCA). We will compare forecasting models using disaggregated and aggregated data over twelve months ahead. The disaggregated models were estimated by SARIMA and will have different levels of disaggregation. Aggregated models will be estimated by time series techniques such as SARIMA, state-space structural models and Markov-switching. The forecasting accuracy comparison will be made by the selection model procedure known as Model Confidence Set and by Diebold-Mariano procedure. We were able to find evidence of forecast accuracy gains in models using more disaggregated data
Resumo:
The current state of health and biomedicine includes an enormity of heterogeneous data ‘silos’, collected for different purposes and represented differently, that are presently impossible to share or analyze in toto. The greatest challenge for large-scale and meaningful analyses of health-related data is to achieve a uniform data representation for data extracted from heterogeneous source representations. Based upon an analysis and categorization of heterogeneities, a process for achieving comparable data content by using a uniform terminological representation is developed. This process addresses the types of representational heterogeneities that commonly arise in healthcare data integration problems. Specifically, this process uses a reference terminology, and associated "maps" to transform heterogeneous data to a standard representation for comparability and secondary use. The capture of quality and precision of the “maps” between local terms and reference terminology concepts enhances the meaning of the aggregated data, empowering end users with better-informed queries for subsequent analyses. A data integration case study in the domain of pediatric asthma illustrates the development and use of a reference terminology for creating comparable data from heterogeneous source representations. The contribution of this research is a generalized process for the integration of data from heterogeneous source representations, and this process can be applied and extended to other problems where heterogeneous data needs to be merged.
Resumo:
Wireless Sensor Networks (WSN) are being used for a number of applications involving infrastructure monitoring, building energy monitoring and industrial sensing. The difficulty of programming individual sensor nodes and the associated overhead have encouraged researchers to design macro-programming systems which can help program the network as a whole or as a combination of subnets. Most of the current macro-programming schemes do not support multiple users seamlessly deploying diverse applications on the same shared sensor network. As WSNs are becoming more common, it is important to provide such support, since it enables higher-level optimizations such as code reuse, energy savings, and traffic reduction. In this paper, we propose a macro-programming framework called Nano-CF, which, in addition to supporting in-network programming, allows multiple applications written by different programmers to be executed simultaneously on a sensor networking infrastructure. This framework enables the use of a common sensing infrastructure for a number of applications without the users having to worrying about the applications already deployed on the network. The framework also supports timing constraints and resource reservations using the Nano-RK operating system. Nano- CF is efficient at improving WSN performance by (a) combining multiple user programs, (b) aggregating packets for data delivery, and (c) satisfying timing and energy specifications using Rate- Harmonized Scheduling. Using representative applications, we demonstrate that Nano-CF achieves 90% reduction in Source Lines-of-Code (SLoC) and 50% energy savings from aggregated data delivery.
Resumo:
A new data set of daily gridded observations of precipitation, computed from over 400 stations in Portugal, is used to assess the performance of 12 regional climate models at 25 km resolution, from the ENSEMBLES set, all forced by ERA-40 boundary conditions, for the 1961-2000 period. Standard point error statistics, calculated from grid point and basin aggregated data, and precipitation related climate indices are used to analyze the performance of the different models in representing the main spatial and temporal features of the regional climate, and its extreme events. As a whole, the ENSEMBLES models are found to achieve a good representation of those features, with good spatial correlations with observations. There is a small but relevant negative bias in precipitation, especially in the driest months, leading to systematic errors in related climate indices. The underprediction of precipitation occurs in most percentiles, although this deficiency is partially corrected at the basin level. Interestingly, some of the conclusions concerning the performance of the models are different of what has been found for the contiguous territory of Spain; in particular, ENSEMBLES models appear too dry over Portugal and too wet over Spain. Finally, models behave quite differently in the simulation of some important aspects of local climate, from the mean climatology to high precipitation regimes in localized mountain ranges and in the subsequent drier regions.
Resumo:
This study firstly describes the epidemiology of malaria in Roraima, Amazon Basin in Brazil, in the years from 1991 to 1993: the predominance of plasmodium species, distribution of the blood slides examined, the malaria risk and seasonality; and secondly investigates whether population growth from 1962 to 1993 was associated with increasing risk of malaria. Frequency of malaria varied significantly by municipality. Marginally more malaria cases were reported during the dry season (from October to April), even after controlling for by year and municipality. Vivax was the predominant type in all municipalities but the ratio of plasmodium types varied between municipalities. No direct association between population growth and increasing risk of malaria from 1962 to 1993 was detected. Malaria in Roraima is of the "frontier" epidemiological type with high epidemic potential.
Resumo:
RESUMO - Introdução: A despesa em saúde aumentou consideravelmente nas últimas décadas na maioria dos países industrializados. Por outro lado, os indicadores de saúde melhoraram. A evidência empírica sobre a relação entre as despesas em saúde e a saúde das populações tem sido inconclusiva. Este estudo aborda a relação entre as despesas em saúde e a saúde das populações através de dados agregados para 34 países para o período 1980-2010. Metodologia: Utilizou-se o coeficiente de correlação de Pearson para avaliar a correlação entre as variáveis explicativas e os indicadores de saúde. Procedeuse ainda à realização de uma regressão multivariada com dados em painel para cada indicador de saúde utilizado como variável dependente: esperança de vida à nascença e aos 65 anos para mulheres e homens, anos de vida potencialmente perdidos para mulheres e homens e mortalidade infantil. A principal variável explicativa utilizada foi a despesa em saúde, mas consideraram-se também vários fatores de confundimento, nomeadamente a riqueza, fatores estilo de vida, e oferta de cuidados. Resultados: A despesa per capita tem impacto nos indicadores de saúde mas ao adicionarmos a variável PIB per capita deixa de ser estatisticamente significativa. Outros fatores têm um impacto significativo para quase todos os indicadores de saúde utilizados: consumo de álcool e tabaco, gordura, o número de médicos e a imunização, confirmando vários resultados da literatura. Conclusão: Os resultados vão ao encontro de alguns estudos que afirmam o impacto marginal das despesas em saúde e do progresso da medicina nos resultados em saúde desde os anos 80 nos países industrializados.
Resumo:
Background: As part of the second generation surveillance system for HIV/Aids in Switzerland, repeated cross-sectional surveys were conducted in 1993, 1994, 1996, 2000, 2006 and 2011 among attenders of all low threshold facilities (LTFs) with needle exchange programmes and/or supervised drug consumption rooms for injection or inhalation. The number of syringes distributed to the injectors has also been measured annually since 2000. Distribution in other settings, such as pharmacies, is also monitored nationally. Methods: Periodic surveys of LTFs have been conducted using an interviewer/self-administered questionnaire structured along four themes: socio-demographic characteristics, drug consumption, risk/preventive behaviour and health. Analysis is restricted to attenders who had injected drugs during their lifetime (IDU´s). Pearson's chi-square test and trend analysis were conducted on annual aggregated data. Trend significance was assessed using Stata's non parametric test nptrend. Results: Median age of IDU´s increased from 26 years in 1993 to 40 in 2011; most are men (78%). Total yearly number of syringes distributed by LTFs has decreased by 44% in 10 years. Use of cocaine has increased (Table 1). Injection, regular use of heroin and borrowing of syringes/needles have decreased, while sharing of other material remains stable. There are fewer new injectors; more IDU´s report substitution treatment. Most attenders had ever been tested for HIV (90% in 1993, 94% in 2011). Reported prevalence of HIV remained stable around 10%; that of HCV decreased from 62% in 2000 to 42% in 2011. Conclusions: Overall, findings indicate a decrease in injection as a means of drug consumption in that population. This interpretation is supported by data from other sources, such as a national decrease in distribution from other delivery points. Switzerland's behavioural surveillance system is sustainable and allows the HIV epidemic to be monitored among this hard-to-reach population, providing information for planning and evaluation.
Resumo:
BACKGROUND: Measuring syringe availability and coverage is essential in the assessment of HIV/AIDS risk reduction policies. Estimates of syringe availability and coverage were produced for the years 1996 and 2006, based on all relevant available national-level aggregated data from published sources. METHODS: We defined availability as the total monthly number of syringes provided by harm reduction system divided by the estimated number of injecting drug users (IDU), and defined coverage as the proportion of injections performed with a new syringe, at national level (total supply over total demand). Estimates of supply of syringes were derived from the national monitoring system, including needle and syringe programmes (NSP), pharmacies, and medically prescribed heroin programmes. Estimates of syringe demand were based on the number of injections performed by IDU derived from surveys of low threshold facilities for drug users (LTF) with NSP combined with the number of IDU. This number was estimated by two methods combining estimates of heroin users (multiple estimation method) and (a) the number of IDU in methadone treatment (MT) (non-injectors) or (b) the proportion of injectors amongst LTF attendees. Central estimates and ranges were obtained for availability and coverage. RESULTS: The estimated number of IDU decreased markedly according to both methods. The MT-based method (from 14,818 to 4809) showed a much greater decrease and smaller size of the IDU population compared to the LTF-based method (from 24,510 to 12,320). Availability and coverage estimates are higher with the MT-based method. For 1996, central estimates of syringe availability were 30.5 and 18.4 per IDU per month; for 2006, they were 76.5 and 29.9. There were 4 central estimates of coverage. For 1996 they ranged from 24.3% to 43.3%, and for 2006, from 50.5% to 134.3%. CONCLUSION: Although 2006 estimates overlap 1996 estimates, the results suggest a shift to improved syringe availability and coverage over time.
Resumo:
Perceptual maps have been used for decades by market researchers to illuminatethem about the similarity between brands in terms of a set of attributes, to position consumersrelative to brands in terms of their preferences, or to study how demographic and psychometricvariables relate to consumer choice. Invariably these maps are two-dimensional and static. Aswe enter the era of electronic publishing, the possibilities for dynamic graphics are opening up.We demonstrate the usefulness of introducing motion into perceptual maps through fourexamples. The first example shows how a perceptual map can be viewed in three dimensions,and the second one moves between two analyses of the data that were collected according todifferent protocols. In a third example we move from the best view of the data at the individuallevel to one which focuses on between-group differences in aggregated data. A final exampleconsiders the case when several demographic variables or market segments are available foreach respondent, showing an animation with increasingly detailed demographic comparisons.These examples of dynamic maps use several data sets from marketing and social scienceresearch.
Resumo:
The consumption of antibiotics in the inpatient setting of Switzerland was assessed to determine possible differences between linguistic regions, and to compare these results with European results. Data on antibiotic consumption were obtained from a sentinel network representing 54% of the national acute care hospitals, and from a private drug market monitoring company. Aggregated data were converted into defined daily doses (DDD). The total consumption density in Switzerland was close to the median consumption reported in European surveys. Between 2004 and 2008, the total consumption of systemic antibiotics rose from 46.1 to 54.0 DDD per 100 occupied bed-days in the entire hospitals, and from 101.6 to 114.3 DDD per 100 occupied bed-days in the intensive care units. Regional differences were observed for total consumption and among antibiotic classes. Hospitals in the Italian-speaking region showed a significantly higher consumption density, followed by the French- and German-speaking regions. Hospitals in the Italian-speaking region also had a higher consumption of fluoroquinolones, in line with the reported differences between Italy, Germany and France. Antibiotic consumption in acute care hospitals in Switzerland is close to the European median with a relatively low consumption in intensive care units. Some of the patterns of variation in consumption levels noticed among European countries are also observed among the cultural regions of Switzerland.
Resumo:
The ecological fallacy (EF) is a common problem regional scientists have to deal with when using aggregated data in their analyses. Although there is a wide number of studies considering different aspects of this problem, little attention has been paid to the potential negative effects of the EF in a time series context. Using Spanish regional unemployment data, this paper shows that EF effects are not only observed at the cross-section level, but also in a time series framework. The empirical evidence obtained shows that analytical regional configurations are the least susceptible to time effects relative to both normative and random regional configurations, while normative configurations are an improvement over random ones.
Resumo:
L’explosion récente du nombre de centenaires dans les pays à faible mortalité n’est pas étrangère à la multiplication des études portant sur la longévité, et plus spécifiquement sur ses déterminants et ses répercussions. Alors que certains tentent de découvrir les gènes pouvant être responsables de la longévité extrême, d’autres s’interrogent sur l’impact social, économique et politique du vieillissement de la population et de l’augmentation de l’espérance de vie ou encore, sur l’existence d’une limite biologique à la vie humaine. Dans le cadre de cette thèse, nous analysons la situation démographique des centenaires québécois depuis le début du 20e siècle à partir de données agrégées (données de recensement, statistiques de l’état civil, estimations de population). Dans un deuxième temps, nous évaluons la qualité des données québécoises aux grands âges à partir d’une liste nominative des décès de centenaires des générations 1870-1894. Nous nous intéressons entre autres aux trajectoires de mortalité au-delà de cent ans. Finalement, nous analysons la survie des frères, sœurs et parents d’un échantillon de semi-supercentenaires (105 ans et plus) nés entre 1890 et 1900 afin de se prononcer sur la composante familiale de la longévité. Cette thèse se compose de trois articles. Dans le cadre du premier, nous traitons de l’évolution du nombre de centenaires au Québec depuis les années 1920. Sur la base d’indicateurs démographiques tels le ratio de centenaires, les probabilités de survie et l’âge maximal moyen au décès, nous mettons en lumière les progrès remarquables qui ont été réalisés en matière de survie aux grands âges. Nous procédons également à la décomposition des facteurs responsables de l’augmentation du nombre de centenaires au Québec. Ainsi, au sein des facteurs identifiés, l’augmentation de la probabilité de survie de 80 à 100 ans s’inscrit comme principal déterminant de l’accroissement du nombre de centenaires québécois. Le deuxième article traite de la validation des âges au décès des centenaires des générations 1870-1894 d’origine canadienne-française et de confession catholique nés et décédés au Québec. Au terme de ce processus de validation, nous pouvons affirmer que les données québécoises aux grands âges sont d’excellente qualité. Les trajectoires de mortalité des centenaires basées sur les données brutes s’avèrent donc représentatives de la réalité. L’évolution des quotients de mortalité à partir de 100 ans témoigne de la décélération de la mortalité. Autant chez les hommes que chez les femmes, les quotients de mortalité plafonnent aux alentours de 45%. Finalement, dans le cadre du troisième article, nous nous intéressons à la composante familiale de la longévité. Nous comparons la survie des frères, sœurs et parents des semi-supercentenaires décédés entre 1995 et 2004 à celle de leurs cohortes de naissance respectives. Les différences de survie entre les frères, sœurs et parents des semi-supercentenaires sous observation et leur génération « contrôle » s’avèrent statistiquement significatives à un seuil de 0,01%. De plus, les frères, sœurs, pères et mères des semi-supercentenaires ont entre 1,7 (sœurs) et 3 fois (mères) plus de chance d’atteindre 90 ans que les membres de leur cohorte de naissance correspondante. Ainsi, au terme de ces analyses, il ne fait nul doute que la longévité se concentre au sein de certaines familles.
Resumo:
The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. This work proposes a fully decentralised algorithm (Epidemic K-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art distributed K-Means algorithms based on sampling methods. The experimental analysis confirms that the proposed algorithm is a practical and accurate distributed K-Means implementation for networked systems of very large and extreme scale.
Resumo:
The issue of diversification in direct real estate investment portfolios has been widely studied in academic and practitioner literature. Most work, however, has been done using either partially aggregated data or data for small samples of individual properties. This paper reports results from tests of both risk reduction and diversification that use the records of 10,000+ UK properties tracked by Investment Property Databank. It provides, for the first time, robust estimates of the diversification gains attainable given the returns, risks and cross‐correlations across the individual properties available to fund managers. The results quantify the number of assets and amount of money needed to construct both ‘balanced’ and ‘specialist’ property portfolios by direct investment. Target numbers will vary according to the objectives of investors and the degree to which tracking error is tolerated. The top‐level results are consistent with previous work, showing that a large measure of risk reduction can be achieved with portfolios of 30–50 properties, but full diversification of specific risk can only be achieved in very large portfolios. However, the paper extends previous work by demonstrating on a single, large dataset the implications of different methods of calculating risk reduction, and also by showing more disaggregated results relevant to the construction of specialist, sector‐focussed funds.