864 resultados para Data analysis
Resumo:
Next-generation sequencing (NGS) technology has become a prominent tool in biological and biomedical research. However, NGS data analysis, such as de novo assembly, mapping and variants detection is far from maturity, and the high sequencing error-rate is one of the major problems. . To minimize the impact of sequencing errors, we developed a highly robust and efficient method, MTM, to correct the errors in NGS reads. We demonstrated the effectiveness of MTM on both single-cell data with highly non-uniform coverage and normal data with uniformly high coverage, reflecting that MTM’s performance does not rely on the coverage of the sequencing reads. MTM was also compared with Hammer and Quake, the best methods for correcting non-uniform and uniform data respectively. For non-uniform data, MTM outperformed both Hammer and Quake. For uniform data, MTM showed better performance than Quake and comparable results to Hammer. By making better error correction with MTM, the quality of downstream analysis, such as mapping and SNP detection, was improved. SNP calling is a major application of NGS technologies. However, the existence of sequencing errors complicates this process, especially for the low coverage (
Resumo:
In this dissertation, we propose a continuous-time Markov chain model to examine the longitudinal data that have three categories in the outcome variable. The advantage of this model is that it permits a different number of measurements for each subject and the duration between two consecutive time points of measurements can be irregular. Using the maximum likelihood principle, we can estimate the transition probability between two time points. By using the information provided by the independent variables, this model can also estimate the transition probability for each subject. The Monte Carlo simulation method will be used to investigate the goodness of model fitting compared with that obtained from other models. A public health example will be used to demonstrate the application of this method. ^
Resumo:
In September 1999, the International Monetary Fund (IMF) established the Poverty Reduction and Growth Facility (PRGF) to make the reduction of poverty and the enhancement of economic growth the fundamental objectives of lending operations in its poorest member countries. This paper studies the spending and absorption of aid in PRGF-supported programs, verifies whether the use of aid is programmed to be smoothed over time, and analyzes how considerations about macroeconomic stability influence the programmed use of aid. The paper shows that PRGF-supported programs permit countries to utilize all increases in aid within a few years, showing smoothed use of aid inflows over time. Our results reveal that spending is higher than absorption in both the long-run and short-run use of aid, which is a robust finding of the study. Furthermore, the paper demonstrates that the long-run spending exceeds the injected increase of aid inflows in the economy. In addition, the paper finds that the presence of a PRGF-supported program does not influence the actual absorption or spending of aid.
Resumo:
The paper focuses on the recent pattern of government consumption expenditure in developing countries and estimates the determinants which have influenced government expenditure. Using a panel data set for 111 developing countries from 1984 to 2004, this study finds evidence that political and institutional variables as well as governance variables significantly influence government expenditure. Among other results, the paper finds new evidence of Wagner's law which states that peoples' demand for service and willingness to pay is income-elastic hence the expansion of public economy is influenced by the greater economic affluence of a nation (Cameron1978). Corruption is found to be influential in explaining the public expenditure of developing countries. On the contrary, size of the economy and fractionalization are found to have significant negative association with government expenditure. In addition, the study finds evidence that public expenditure significantly shrinks under military dictatorship compared with other form of governance.
Resumo:
In the last years significant efforts have been devoted to the development of advanced data analysis tools to both predict the occurrence of disruptions and to investigate the operational spaces of devices, with the long term goal of advancing the understanding of the physics of these events and to prepare for ITER. On JET the latest generation of the disruption predictor called APODIS has been deployed in the real time network during the last campaigns with the new metallic wall. Even if it was trained only with discharges with the carbon wall, it has reached very good performance, with both missed alarms and false alarms in the order of a few percent (and strategies to improve the performance have already been identified). Since for the optimisation of the mitigation measures, predicting also the type of disruption is considered to be also very important, a new clustering method, based on the geodesic distance on a probabilistic manifold, has been developed. This technique allows automatic classification of an incoming disruption with a success rate of better than 85%. Various other manifold learning tools, particularly Principal Component Analysis and Self Organised Maps, are also producing very interesting results in the comparative analysis of JET and ASDEX Upgrade (AUG) operational spaces, on the route to developing predictors capable of extrapolating from one device to another.
Resumo:
Tolls have increasingly become a common mechanism to fund road projects in recent decades. Therefore, improving knowledge of demand behavior constitutes a key aspect for stakeholders dealing with the management of toll roads. However, the literature concerning demand elasticity estimates for interurban toll roads is still limited due to their relatively scarce number in the international context. Furthermore, existing research has left some aspects to be investigated, among others, the choice of GDP as the most common socioeconomic variable to explain traffic growth over time. This paper intends to determine the variables that better explain the evolution of light vehicle demand in toll roads throughout the years. To that end, we establish a dynamic panel data methodology aimed at identifying the key socioeconomic variables explaining changes in light vehicle demand over time. The results show that, despite some usefulness, GDP does not constitute the most appropriate explanatory variable, while other parameters such as employment or GDP per capita lead to more stable and consistent results. The methodology is applied to Spanish toll roads for the 1990?2011 period, which constitutes a very interesting case on variations in toll road use, as road demand has experienced a significant decrease since the beginning of the economic crisis in 2008.
Resumo:
Contents: - Center for Open Middleware - POSDATA project - User modeling - Some early results - @posdata service
Resumo:
En los últimos años la sociedad está experimentando una serie de cambios. Uno de estos cambios es la datificación (“datafication” en inglés). Este término puede ser definido como la transformación sistemática de aspectos de la vida cotidiana de las personas en datos procesados por ordenadores. Cada día, a cada minuto y a cada segundo, cada vez que alguien emplea un dispositivo digital,hay datos siendo guardados en algún lugar. Se puede tratar del contenido de un correo electrónico pero también puede ser el número de pasos que esa persona ha caminado o su historial médico. El simple almacenamiento de datos no proporciona un valor añadido por si solo. Para extraer conocimiento de los datos, y por tanto darles un valor, se requiere del análisis de datos. La ciencia de los datos junto con el análisis de datos se está volviendo cada vez más popular. Hoy en día, se pueden encontrar millones de web APIs estadísticas; estas APIs ofrecen la posibilidad de analizar tendencias o sentimientos presentes en las redes sociales o en internet en general. Una de las redes sociales más populares, Twitter, es pública. Cada mensaje, o tweet, publicado puede ser visto por cualquier persona en el mundo, siempre y cuando posea una conexión a internet. Esto hace de Twitter un medio interesante a la hora de analizar hábitos sociales o perfiles de consumo. Es en este contexto en que se engloba este proyecto. Este trabajo, combinando el análisis estadístico de datos y el análisis de contenido, trata de extraer conocimiento de tweets públicos de Twitter. En particular tratará de establecer si el género es un factor influyente en las relaciones entre usuarios de Twitter. Para ello, se analizará una base de datos que contiene casi 2.000 tweets. En primer lugar se determinará el género de los usuarios mediante web APIs. En segundo lugar se empleará el contraste de hipótesis para saber si el género influye en los usuarios a la hora de relacionarse con otros usuarios. Finalmente se construirá un modelo estadístico para predecir el comportamiento de los usuarios de Twitter en relación a su género.
Resumo:
Acknowledgements We thank Andrew Spink (Noldus Information Technology) and the Blogging Birds team members Peter Kindness and Abdul Adeniyi for their valuable contributions to this paper. John Fryxell, Chris Thaxter and Arjun Amar provided valuable comments on an earlier version. The study was part of the Digital Conservation project of dot.rural, the University of Aberdeen’s Digital Economy Research Hub, funded by RCUK (grant reference EP/G066051/1).
Resumo:
Understanding spatial distributions and how environmental conditions influence catch-per-unit-effort (CPUE) is important for increased fishing efficiency and sustainable fisheries management. This study investigated the relationship between CPUE, spatial factors, temperature, and depth using generalized additive models. Combinations of factors, and not one single factor, were frequently included in the best model. Parameters which best described CPUE varied by geographic region. The amount of variance, or deviance, explained by the best models ranged from a low of 29% (halibut, Charlotte region) to a high of 94% (sablefish, Charlotte region). Depth, latitude, and longitude influenced most species in several regions. On the broad geographic scale, depth was associated with CPUE for every species, except dogfish. Latitude and longitude influenced most species, except halibut (Areas 4 A/D), sablefish, and cod. Temperature was important for describing distributions of halibut in Alaska, arrowtooth flounder in British Columbia, dogfish, Alaska skate, and Aleutian skate. The species-habitat relationships revealed in this study can be used to create improved fishing and management strategies.
Resumo:
Participation trends in 6-hour ultra-marathons held word-wide were investigated to gain basic demographic data on 6-hour ultra-marathoners and where these races took place. Participation trends and the association between nationality and race performance were investigated in all 6-hour races held worldwide between 1991 and 2010. Participation increased linearly in both women and men across years. The annual number of finishes was significantly higher in men than in women (P=0.013). The male-to-female ratio remained stable at ~4 since 1991. Runners in age group 45-49 years showed the largest increase in participation for both men (800 participants in 18 years) and women (208 participants in 16 years). Europe attracted most of the runners from other continents (166 runners), more than all other continents combined (55 runners). European runners also showed the best top ten performances (73±3 km for women and 77±11 km for men), while African (with 65±9 km for men) and South American (54±4 km for women and 65±2 km for men) runners showed the weakest. To summarize, participation in 6-hour ultra-marathons increased across years. Most of the development took place in Europe and in athletes in the age group 45-49 years. Europe also attracted the most diverse field of athletes with runners from all other continents. European runners accounted for the most runners and achieved the best top ten performances.
Resumo:
Complex systems in causal relationships are known to be circular rather than linear; this means that a particular result is not produced by a single cause, but rather that both positive and negative feedback processes are involved. However, although interpreting systemic interrelationships requires a language formed by circles, this has only been developed at the diagram level, and not from an axiomatic point of view. The first difficulty encountered when analysing any complex system is that usually the only data available relate to the various variables, so the first objective was to transform these data into cause-and-effect relationships. Once this initial step was taken, our discrete chaos theory could be applied by finding the causal circles that will form part of the system attractor and allow their behavior to be interpreted. As an application of the technique presented, we analyzed the system associated with the transcription factors of inflammatory diseases.
Resumo:
The aim of this paper is to propose a mathematical model to determine invariant sets, set covering, orbits and, in particular, attractors in the set of tourism variables. Analysis was carried out based on a pre-designed algorithm and applying our interpretation of chaos theory developed in the context of General Systems Theory. This article sets out the causal relationships associated with tourist flows in order to enable the formulation of appropriate strategies. Our results can be applied to numerous cases. For example, in the analysis of tourist flows, these findings can be used to determine whether the behaviour of certain groups affects that of other groups and to analyse tourist behaviour in terms of the most relevant variables. Unlike statistical analyses that merely provide information on current data, our method uses orbit analysis to forecast, if attractors are found, the behaviour of tourist variables in the immediate future.
Resumo:
This paper deals with the determinants of labour out-migration from agriculture across 149 EU regions over the 1990–2008 period. The central aim is to shed light on the role played by payments from the common agricultural policy (CAP) on this important adjustment process. Using static and dynamic panel data estimators, we show that standard neoclassical drivers, like relative income and the relative labour share, represent significant determinants of the intersectoral migration of agricultural labour. Overall, CAP payments contributed significantly to job creation in agriculture, although the magnitude of the economic effect was rather moderate. We also find that pillar I subsidies exerted an effect approximately two times greater than that of pillar II payments.
Resumo:
Mode of access: Internet.