942 resultados para Blog datasets
Resumo:
Critical reflection is central to improving practices in early years services. It is also a learned skill.
Resumo:
Ankylosing spondylitis is a common form of inflammatory arthritis predominantly affecting the spine and pelvis that occurs in approximately 5 out of 1,000 adults of European descent. Here we report the identification of three variants in the RUNX3, LTBR-TNFRSF1A and IL12B regions convincingly associated with ankylosing spondylitis (P < 5 × 10-8 in the combined discovery and replication datasets) and a further four loci at PTGER4, TBKBP1, ANTXR2 and CARD9 that show strong association across all our datasets (P < 5 × 10-6 overall, with support in each of the three datasets studied). We also show that polymorphisms of ERAP1, which encodes an endoplasmic reticulum aminopeptidase involved in peptide trimming before HLA class I presentation, only affect ankylosing spondylitis risk in HLA-B27-positive individuals. These findings provide strong evidence that HLA-B27 operates in ankylosing spondylitis through a mechanism involving aberrant processing of antigenic peptides.
Resumo:
Gene expression is arguably the most important indicator of biological function. Thus identifying differentially expressed genes is one of the main aims of high throughout studies that use microarray and RNAseq platforms to study deregulated cellular pathways. There are many tools for analysing differentia gene expression from transciptomic datasets. The major challenge of this topic is to estimate gene expression variance due to the high amount of ‘background noise’ that is generated from biological equipment and the lack of biological replicates. Bayesian inference has been widely used in the bioinformatics field. In this work, we reveal that the prior knowledge employed in the Bayesian framework also helps to improve the accuracy of differential gene expression analysis when using a small number of replicates. We have developed a differential analysis tool that uses Bayesian estimation of the variance of gene expression for use with small numbers of biological replicates. Our method is more consistent when compared to the widely used cyber-t tool that successfully introduced the Bayesian framework to differential analysis. We also provide a user-friendly web based Graphic User Interface for biologists to use with microarray and RNAseq data. Bayesian inference can compensate for the instability of variance caused when using a small number of biological replicates by using pseudo replicates as prior knowledge. We also show that our new strategy to select pseudo replicates will improve the performance of the analysis. - See more at: http://www.eurekaselect.com/node/138761/article#sthash.VeK9xl5k.dpuf
Resumo:
Document clustering is one of the prominent methods for mining important information from the vast amount of data available on the web. However, document clustering generally suffers from the curse of dimensionality. Providentially in high dimensional space, data points tend to be more concentrated in some areas of clusters. We take advantage of this phenomenon by introducing a novel concept of dynamic cluster representation named as loci. Clusters’ loci are efficiently calculated using documents’ ranking scores generated from a search engine. We propose a fast loci-based semi-supervised document clustering algorithm that uses clusters’ loci instead of conventional centroids for assigning documents to clusters. Empirical analysis on real-world datasets shows that the proposed method produces cluster solutions with promising quality and is substantially faster than several benchmarked centroid-based semi-supervised document clustering methods.
Resumo:
Neural data are inevitably contaminated by noise. When such noisy data are subjected to statistical analysis, misleading conclusions can be reached. Here we attempt to address this problem by applying a state-space smoothing method, based on the combined use of the Kalman filter theory and the Expectation–Maximization algorithm, to denoise two datasets of local field potentials recorded from monkeys performing a visuomotor task. For the first dataset, it was found that the analysis of the high gamma band (60–90 Hz) neural activity in the prefrontal cortex is highly susceptible to the effect of noise, and denoising leads to markedly improved results that were physiologically interpretable. For the second dataset, Granger causality between primary motor and primary somatosensory cortices was not consistent across two monkeys and the effect of noise was suspected. After denoising, the discrepancy between the two subjects was significantly reduced.
Resumo:
The development of techniques for scaling up classifiers so that they can be applied to problems with large datasets of training examples is one of the objectives of data mining. Recently, AdaBoost has become popular among machine learning community thanks to its promising results across a variety of applications. However, training AdaBoost on large datasets is a major problem, especially when the dimensionality of the data is very high. This paper discusses the effect of high dimensionality on the training process of AdaBoost. Two preprocessing options to reduce dimensionality, namely the principal component analysis and random projection are briefly examined. Random projection subject to a probabilistic length preserving transformation is explored further as a computationally light preprocessing step. The experimental results obtained demonstrate the effectiveness of the proposed training process for handling high dimensional large datasets.
Resumo:
In this digital age, as social media is emerging as a central site where information is shared and interpreted, it is essential to study information construction issues on social media sites in order to understand how social reality is constructed. While there is a number of studies taking an information-as-objective point of view, this proposed study emphasizes the constructed and interpretive nature of information and explores the processes through which information surrounding acute events comes into being on micro-blogs. In order to conduct this analysis systematically and theoretically, the concept of interpretive communities will be deployed. This research investigates if or not micro-blog based social groups can serve as interpretive communities, and, if so, what role might they play in the construction of information, and the social impacts that may arise. To understand how this process is entangled with the surrounding social, political, technical contexts, cases from both China (focusing on Sina Weibo) and Australia (focusing on Twitter) will be analysed.
Resumo:
- Objective To explore the potential for using a basic text search of routine emergency department data to identify product-related injury in infants and to compare the patterns from routine ED data and specialised injury surveillance data. - Methods Data was sourced from the Emergency Department Information System (EDIS) and the Queensland Injury Surveillance Unit (QISU) for all injured infants between 2009 and 2011. A basic text search was developed to identify the top five infant products in QISU. Sensitivity, specificity, and positive predictive value were calculated and a refined search was used with EDIS. Results were manually reviewed to assess validity. Descriptive analysis was conducted to examine patterns between datasets. - Results The basic text search for all products showed high sensitivity and specificity, and most searches showed high positive predictive value. EDIS patterns were similar to QISU patterns with strikingly similar month-of-age injury peaks, admission proportions and types of injuries. - Conclusions This study demonstrated a capacity to identify a sample of valid cases of product-related injuries for specified products using simple text searching of routine ED data. - Implications As the capacity for large datasets grows and the capability to reliably mine text improves, opportunities for expanded sources of injury surveillance data increase. This will ultimately assist stakeholders such as consumer product safety regulators and child safety advocates to appropriately target prevention initiatives.
Resumo:
Selection criteria and misspecification tests for the intra-cluster correlation structure (ICS) in longitudinal data analysis are considered. In particular, the asymptotical distribution of the correlation information criterion (CIC) is derived and a new method for selecting a working ICS is proposed by standardizing the selection criterion as the p-value. The CIC test is found to be powerful in detecting misspecification of the working ICS structures, while with respect to the working ICS selection, the standardized CIC test is also shown to have satisfactory performance. Some simulation studies and applications to two real longitudinal datasets are made to illustrate how these criteria and tests might be useful.
Resumo:
We consider the development of statistical models for prediction of constituent concentration of riverine pollutants, which is a key step in load estimation from frequent flow rate data and less frequently collected concentration data. We consider how to capture the impacts of past flow patterns via the average discounted flow (ADF) which discounts the past flux based on the time lapsed - more recent fluxes are given more weight. However, the effectiveness of ADF depends critically on the choice of the discount factor which reflects the unknown environmental cumulating process of the concentration compounds. We propose to choose the discount factor by maximizing the adjusted R-2 values or the Nash-Sutcliffe model efficiency coefficient. The R2 values are also adjusted to take account of the number of parameters in the model fit. The resulting optimal discount factor can be interpreted as a measure of constituent exhaustion rate during flood events. To evaluate the performance of the proposed regression estimators, we examine two different sampling scenarios by resampling fortnightly and opportunistically from two real daily datasets, which come from two United States Geological Survey (USGS) gaging stations located in Des Plaines River and Illinois River basin. The generalized rating-curve approach produces biased estimates of the total sediment loads by -30% to 83%, whereas the new approaches produce relatively much lower biases, ranging from -24% to 35%. This substantial improvement in the estimates of the total load is due to the fact that predictability of concentration is greatly improved by the additional predictors.
Resumo:
The 2008 US election has been heralded as the first presidential election of the social media era, but took place at a time when social media were still in a state of comparative infancy; so much so that the most important platform was not Facebook or Twitter, but the purpose-built campaign site my.barackobama.com, which became the central vehicle for the most successful electoral fundraising campaign in American history. By 2012, the social media landscape had changed: Facebook and, to a somewhat lesser extent, Twitter are now well-established as the leading social media platforms in the United States, and were used extensively by the campaign organisations of both candidates. As third-party spaces controlled by independent commercial entities, however, their use necessarily differs from that of home-grown, party-controlled sites: from the point of view of the platform itself, a @BarackObama or @MittRomney is technically no different from any other account, except for the very high follower count and an exceptional volume of @mentions. In spite of the significant social media experience which Democrat and Republican campaign strategists had already accumulated during the 2008 campaign, therefore, the translation of such experience to the use of Facebook and Twitter in their 2012 incarnations still required a substantial amount of new work, experimentation, and evaluation. This chapter examines the Twitter strategies of the leading accounts operated by both campaign headquarters: the ‘personal’ candidate accounts @BarackObama and @MittRomney as well as @JoeBiden and @PaulRyanVP, and the campaign accounts @Obama2012 and @TeamRomney. Drawing on datasets which capture all tweets from and at these accounts during the final months of the campaign (from early September 2012 to the immediate aftermath of the election night), we reconstruct the campaigns’ approaches to using Twitter for electioneering from the quantitative and qualitative patterns of their activities, and explore the resonance which these accounts have found with the wider Twitter userbase. A particular focus of our investigation in this context will be on the tweeting styles of these accounts: the mixture of original messages, @replies, and retweets, and the level and nature of engagement with everyday Twitter followers. We will examine whether the accounts chose to respond (by @replying) to the messages of support or criticism which were directed at them, whether they retweeted any such messages (and whether there was any preferential retweeting of influential or – alternatively – demonstratively ordinary users), and/or whether they were used mainly to broadcast and disseminate prepared campaign messages. Our analysis will highlight any significant differences between the accounts we examine, trace changes in style over the course of the final campaign months, and correlate such stylistic differences with the respective electoral positioning of the candidates. Further, we examine the use of these accounts during moments of heightened attention (such as the presidential and vice-presidential debates, or in the context of controversies such as that caused by the publication of the Romney “47%” video; additional case studies may emerge over the remainder of the campaign) to explore how they were used to present or defend key talking points, and exploit or avert damage from campaign gaffes. A complementary analysis of the messages directed at the campaign accounts (in the form of @replies or retweets) will also provide further evidence for the extent to which these talking points were picked up and disseminated by the wider Twitter population. Finally, we also explore the use of external materials (links to articles, images, videos, and other content on the campaign sites themselves, in the mainstream media, or on other platforms) by the campaign accounts, and the resonance which these materials had with the wider follower base of these accounts. This provides an indication of the integration of Twitter into the overall campaigning process, by highlighting how the platform was used as a means of encouraging the viral spread of campaign propaganda (such as advertising materials) or of directing user attention towards favourable media coverage. By building on comprehensive, large datasets of Twitter activity (as of early October, our combined datasets comprise some 3.8 million tweets) which we process and analyse using custom-designed social media analytics tools, and by using our initial quantitative analysis to guide further qualitative evaluation of Twitter activity around these campaign accounts, we are able to provide an in-depth picture of the use of Twitter in political campaigning during the 2012 US election which will provide detailed new insights social media use in contemporary elections. This analysis will then also be able to serve as a touchstone for the analysis of social media use in subsequent elections, in the USA as well as in other developed nations where Twitter and other social media platforms are utilised in electioneering.
Resumo:
Robust estimation often relies on a dispersion function that is more slowly varying at large values than the square function. However, the choice of tuning constant in dispersion functions may impact the estimation efficiency to a great extent. For a given family of dispersion functions such as the Huber family, we suggest obtaining the "best" tuning constant from the data so that the asymptotic efficiency is maximized. This data-driven approach can automatically adjust the value of the tuning constant to provide the necessary resistance against outliers. Simulation studies show that substantial efficiency can be gained by this data-dependent approach compared with the traditional approach in which the tuning constant is fixed. We briefly illustrate the proposed method using two datasets.
Resumo:
The Macroscopic Fundamental Diagram (MFD) relates space-mean density and flow. Since the MFD represents the area-wide network traffic performance, studies on perimeter control strategies and network-wide traffic state estimation utilising the MFD concept have been reported. Most previous works have utilised data from fixed sensors, such as inductive loops, to estimate the MFD, which can cause biased estimation in urban networks due to queue spillovers at intersections. To overcome the limitation, recent literature reports the use of trajectory data obtained from probe vehicles. However, these studies have been conducted using simulated datasets; limited works have discussed the limitations of real datasets and their impact on the variable estimation. This study compares two methods for estimating traffic state variables of signalised arterial sections: a method based on cumulative vehicle counts (CUPRITE), and one based on vehicles’ trajectory from taxi Global Positioning System (GPS) log. The comparisons reveal some characteristics of taxi trajectory data available in Brisbane, Australia. The current trajectory data have limitations in quantity (i.e., the penetration rate), due to which the traffic state variables tend to be underestimated. Nevertheless, the trajectory-based method successfully captures the features of traffic states, which suggests that the trajectories from taxis can be a good estimator for the network-wide traffic states.
Resumo:
The oncogene MDM4, also known as MDMX or HDMX, contributes to cancer susceptibility and progression through its capacity to negatively regulate a range of genes with tumour-suppressive functions. As part of a recent genome-wide association study it was determined that the A-allele of the rs4245739 SNP (A>C), located in the 3'-UTR of MDM4, is associated with an increased risk of prostate cancer. Computational predictions revealed that the rs4245739 SNP is located within a predicted binding site for three microRNAs (miRNAs): miR-191-5p, miR-887 and miR-3669. Herein, we show using reporter gene assays and endogenous MDM4 expression analyses that miR-191-5p and miR-887 have a specific affinity for the rs4245739 SNP C-allele in prostate cancer. These miRNAs do not affect MDM4 mRNA levels, rather they inhibit its translation in C-allele-containing PC3 cells but not in LNCaP cells homozygous for the A-allele. By analysing gene expression datasets from patient cohorts, we found that MDM4 is associated with metastasis and prostate cancer progression and that targeting this gene with miR-191-5p or miR-887 decreases in PC3 cell viability. This study is the first, to our knowledge, to demonstrate regulation of the MDM4 rs4245739 SNP C-allele by two miRNAs in prostate cancer, and thereby to identify a mechanism by which the MDM4 rs4245739 SNP A-allele may be associated with an increased risk for prostate cancer.
Resumo:
This Master's thesis examines two opposite nationalistic discourses on the revolution of Zanzibar. Chama cha Mapinduzi (CCM), the party in power since the 1964 revolution defends its revolutionary and "African" heritage in the current multi-party system. New nationalists, including among others the main opposition party Civic United Front (CUF), question both the 1964 revolution and the post-revolution period and blame CCM for empty promises, corruption and ethnic discrimination. This study analyzes the role of a significant historical event in the creation of nationalistic ideology and national identity. The 1964 revolution forms the nucleus of various debates related to the history of Zanzibar: slavery, colonialism, racial discrimination and political violence. Representations of these Social constructivist principles form the basis of this study, and central concepts in the theoretical framework are nationalism, national identity, ethnicity and race. I use critical discourse analysis as my research method, lean on the work by Teun A. van Dijk and Norman Fairclough as the most significant researchers in this field. I examine particularly the ways in which linguistic methods, such as stereotypes and metaphors are used to form in- and out-groups ("us" vs. "others"). My material, both in Swahili and English, was collected mainly in Tanzania in the fall of 2007 and from online sources in the spring of 2009. It includes publications by the Zanzibari government between the years of 1964-2000 (12), official speeches for the Revolution Day or the Union Day (12), articles from Tanzanian newspapers from the 1990s until the year of 2009 (15), memoirs and political pamphlets (10), blog posts and opinion pieces from four different websites (8), and interviews or personal communication in Zanzibar, Dar es Salaam and Uppsala (8). Nationalistic rhetoric often creates enemy images by using binary good-bad oppositions. Both discourses in this study build identities on the basis of "otherness" and exclusion, with the intent of emphasizing the particularity of the own group and excluding "evilness" outside the own reference group. These opposite views on the 1964 revolution as the main axis of the history of Zanzibar build different portraits of the nation and Zanzibari-ness (Uzanzibari). CCM still relies on the pre-revolutionary enemy images of Arabs as selfish rulers and cruel slave traders. For CCM, Zanzibar is primarily an "African" nation and a part of Tanzania which is threatened by "Arabs", the outsiders. In contrast, the new nationalists stress the long history of Zanzibar as multi-racial, cosmopolitan and formerly independent country which has its own, separate culture and identity from mainland Tanzanians. Heshima, honour/respect, one of the basic values of Swahili culture, occupies a central role in both discourses: the main party emphasizes that the revolution returned "heshima" to the Zanzibari Africans after centuries of humiliation, whereas the new nationalists claim that ever since the revolution all "non-Africans" have been humiliated and lost their "heshima". According to the new nationalists, true Zanzibari values which include tolerance and harmony between different "races" were lost when the "foreign" revolutionaries arrived from the mainland. Consequently, they see the 1964 revolution as Tanganyikan colonialism which began with the help of Western countries, and maintain that this "colonialism" still continues in the violent multi-party elections.