942 resultados para Blog datasets
Resumo:
The research objectives of this thesis were to contribute to Bayesian statistical methodology by contributing to risk assessment statistical methodology, and to spatial and spatio-temporal methodology, by modelling error structures using complex hierarchical models. Specifically, I hoped to consider two applied areas, and use these applications as a springboard for developing new statistical methods as well as undertaking analyses which might give answers to particular applied questions. Thus, this thesis considers a series of models, firstly in the context of risk assessments for recycled water, and secondly in the context of water usage by crops. The research objective was to model error structures using hierarchical models in two problems, namely risk assessment analyses for wastewater, and secondly, in a four dimensional dataset, assessing differences between cropping systems over time and over three spatial dimensions. The aim was to use the simplicity and insight afforded by Bayesian networks to develop appropriate models for risk scenarios, and again to use Bayesian hierarchical models to explore the necessarily complex modelling of four dimensional agricultural data. The specific objectives of the research were to develop a method for the calculation of credible intervals for the point estimates of Bayesian networks; to develop a model structure to incorporate all the experimental uncertainty associated with various constants thereby allowing the calculation of more credible credible intervals for a risk assessment; to model a single day’s data from the agricultural dataset which satisfactorily captured the complexities of the data; to build a model for several days’ data, in order to consider how the full data might be modelled; and finally to build a model for the full four dimensional dataset and to consider the timevarying nature of the contrast of interest, having satisfactorily accounted for possible spatial and temporal autocorrelations. This work forms five papers, two of which have been published, with two submitted, and the final paper still in draft. The first two objectives were met by recasting the risk assessments as directed, acyclic graphs (DAGs). In the first case, we elicited uncertainty for the conditional probabilities needed by the Bayesian net, incorporated these into a corresponding DAG, and used Markov chain Monte Carlo (MCMC) to find credible intervals, for all the scenarios and outcomes of interest. In the second case, we incorporated the experimental data underlying the risk assessment constants into the DAG, and also treated some of that data as needing to be modelled as an ‘errors-invariables’ problem [Fuller, 1987]. This illustrated a simple method for the incorporation of experimental error into risk assessments. In considering one day of the three-dimensional agricultural data, it became clear that geostatistical models or conditional autoregressive (CAR) models over the three dimensions were not the best way to approach the data. Instead CAR models are used with neighbours only in the same depth layer. This gave flexibility to the model, allowing both the spatially structured and non-structured variances to differ at all depths. We call this model the CAR layered model. Given the experimental design, the fixed part of the model could have been modelled as a set of means by treatment and by depth, but doing so allows little insight into how the treatment effects vary with depth. Hence, a number of essentially non-parametric approaches were taken to see the effects of depth on treatment, with the model of choice incorporating an errors-in-variables approach for depth in addition to a non-parametric smooth. The statistical contribution here was the introduction of the CAR layered model, the applied contribution the analysis of moisture over depth and estimation of the contrast of interest together with its credible intervals. These models were fitted using WinBUGS [Lunn et al., 2000]. The work in the fifth paper deals with the fact that with large datasets, the use of WinBUGS becomes more problematic because of its highly correlated term by term updating. In this work, we introduce a Gibbs sampler with block updating for the CAR layered model. The Gibbs sampler was implemented by Chris Strickland using pyMCMC [Strickland, 2010]. This framework is then used to consider five days data, and we show that moisture in the soil for all the various treatments reaches levels particular to each treatment at a depth of 200 cm and thereafter stays constant, albeit with increasing variances with depth. In an analysis across three spatial dimensions and across time, there are many interactions of time and the spatial dimensions to be considered. Hence, we chose to use a daily model and to repeat the analysis at all time points, effectively creating an interaction model of time by the daily model. Such an approach allows great flexibility. However, this approach does not allow insight into the way in which the parameter of interest varies over time. Hence, a two-stage approach was also used, with estimates from the first-stage being analysed as a set of time series. We see this spatio-temporal interaction model as being a useful approach to data measured across three spatial dimensions and time, since it does not assume additivity of the random spatial or temporal effects.
Resumo:
This paper proposes an innovative instance similarity based evaluation metric that reduces the search map for clustering to be performed. An aggregate global score is calculated for each instance using the novel idea of Fibonacci series. The use of Fibonacci numbers is able to separate the instances effectively and, in hence, the intra-cluster similarity is increased and the inter-cluster similarity is decreased during clustering. The proposed FIBCLUS algorithm is able to handle datasets with numerical, categorical and a mix of both types of attributes. Results obtained with FIBCLUS are compared with the results of existing algorithms such as k-means, x-means expected maximization and hierarchical algorithms that are widely used to cluster numeric, categorical and mix data types. Empirical analysis shows that FIBCLUS is able to produce better clustering solutions in terms of entropy, purity and F-score in comparison to the above described existing algorithms.
Resumo:
With the growth of the Web, E-commerce activities are also becoming popular. Product recommendation is an effective way of marketing a product to potential customers. Based on a user’s previous searches, most recommendation methods employ two dimensional models to find relevant items. Such items are then recommended to a user. Further too many irrelevant recommendations worsen the information overload problem for a user. This happens because such models based on vectors and matrices are unable to find the latent relationships that exist between users and searches. Identifying user behaviour is a complex process, and usually involves comparing searches made by him. In most of the cases traditional vector and matrix based methods are used to find prominent features as searched by a user. In this research we employ tensors to find relevant features as searched by users. Such relevant features are then used for making recommendations. Evaluation on real datasets show the effectiveness of such recommendations over vector and matrix based methods.
Resumo:
Search log data is multi dimensional data consisting of number of searches of multiple users with many searched parameters. This data can be used to identify a user’s interest in an item or object being searched. Identifying highest interests of a Web user from his search log data is a complex process. Based on a user’s previous searches, most recommendation methods employ two-dimensional models to find relevant items. Such items are then recommended to a user. Two-dimensional data models, when used to mine knowledge from such multi dimensional data may not be able to give good mappings of user and his searches. The major problem with such models is that they are unable to find the latent relationships that exist between different searched dimensions. In this research work, we utilize tensors to model the various searches made by a user. Such high dimensional data model is then used to extract the relationship between various dimensions, and find the prominent searched components. To achieve this, we have used popular tensor decomposition methods like PARAFAC, Tucker and HOSVD. All experiments and evaluation is done on real datasets, which clearly show the effectiveness of tensor models in finding prominent searched components in comparison to other widely used two-dimensional data models. Such top rated searched components are then given as recommendation to users.
Resumo:
Light plays a unique role for plants as it is both a source of energy for growth and a signal for development. Light captured by the pigments in the light harvesting complexes is used to drive the synthesis of the chemical energy required for carbon assimilation. The light perceived by photoreceptors activates effectors, such as transcription factors (TFs), which modulate the expression of light-responsive genes. Recently, it has been speculated that increasing the photosynthetic rate could further improve the yield potential of three carbon (C3) crops such as wheat. However, little is currently known about the transcriptional regulation of photosynthesis genes, particularly in crop species. Nuclear factor Y (NF-Y) TF is a functionally diverse regulator of growth and development in the model plant species, with demonstrated roles in embryo development, stress response, flowering time and chloroplast biogenesis. Furthermore, a light-responsive NF-Y binding site (CCAAT-box) is present in the promoter of a spinach photosynthesis gene. As photosynthesis genes are co-regulated by light and co-regulated genes typically have similar regulatory elements in their promoters, it seems likely that other photosynthesis genes would also have light-responsive CCAAT-boxes. This provided the impetus to investigate the NF-Y TF in bread wheat. This thesis is focussed on wheat NF-Y members that have roles in light-mediated gene regulation with an emphasis on their involvement in the regulation of photosynthesis genes. NF-Y is a heterotrimeric complex, comprised of the three subunits NF-YA, NF-YB and NF-YC. Unlike the mammalian and yeast counterparts, each of the three subunits is encoded by multiple genes in Arabidopsis. The initial step taken in this study was the identification of the wheat NF-Y family (Chapter 3). A search of the current wheat nucleotide sequence databases identified 37 NF-Y genes (10 NF-YA, 11 NF-YB, 14 NF-YC & 2 Dr1). Phylogenetic analysis revealed that each of the three wheat NF-Y (TaNF-Y) subunit families could be divided into 4-5 clades based on their conserved core regions. Outside of the core regions, eleven motifs were identified to be conserved between Arabidopsis, rice and wheat NF-Y subunit members. The expression profiles of TaNF-Y genes were constructed using quantitative real-time polymerase chain reaction (RT-PCR). Some TaNF-Y subunit members had little variation in their transcript levels among the organs, while others displayed organ-predominant expression profiles, including those expressed mainly in the photosynthetic organs. To investigate their potential role in light-mediated gene regulation, the light responsiveness of the TaNF-Y genes were examined (Chapters 4 and 5). Two TaNF-YB and five TaNF-YC members were markedly upregulated by light in both the wheat leaves and seedling shoots. To identify the potential target genes of the light-upregulated NF-Y subunit members, a gene expression correlation analysis was conducted using publically available Affymetrix Wheat Genome Array datasets. This analysis revealed that the transcript expression levels of TaNF-YB3 and TaNF-YC11 were significantly correlated with those of photosynthesis genes. These correlated express profiles were also observed in the quantitative RT-PCR dataset from wheat plants grown under light and dark conditions. Sequence analysis of the promoters of these wheat photosynthesis genes revealed that they were enriched with potential NF-Y binding sites (CCAAT-box). The potential role of TaNF-YB3 in the regulation of photosynthetic genes was further investigated using a transgenic approach (Chapter 5). Transgenic wheat lines constitutively expressing TaNF-YB3 were found to have significantly increased expression levels of photosynthesis genes, including those encoding light harvesting chlorophyll a/b-binding proteins, photosystem I reaction centre subunits, a chloroplast ATP synthase subunit and glutamyl-tRNA reductase (GluTR). GluTR is a rate-limiting enzyme in the chlorophyll biosynthesis pathway. In association with the increased expression of the photosynthesis genes, the transgenic lines had a higher leaf chlorophyll content, increased photosynthetic rate and had a more rapid early growth rate compared to the wild-type wheat. In addition to its role in the regulation of photosynthesis genes, TaNF-YB3 overexpression lines flower on average 2-days earlier than the wild-type (Chapter 6). Quantitative RT-PCR analysis showed that there was a 13-fold increase in the expression level of the floral integrator, TaFT. The transcript levels of other downstream genes (TaFT2 and TaVRN1) were also increased in the transgenic lines. Furthermore, the transcript levels of TaNF-YB3 were significantly correlated with those of constans (CO), constans-like (COL) and timing of chlorophyll a/b-binding (CAB) expression 1 [TOC1; (CCT)] domain-containing proteins known to be involved in the regulation of flowering time. To summarise the key findings of this study, 37 NF-Y genes were identified in the crop species wheat. An in depth analysis of TaNF-Y gene expression profiles revealed that the potential role of some light-upregulated members was in the regulation of photosynthetic genes. The involvement of TaNF-YB3 in the regulation of photosynthesis genes was supported by data obtained from transgenic wheat lines with increased constitutive expression of TaNF-YB3. The overexpression of TaNF-YB3 in the transgenic lines revealed this NF-YB member is also involved in the fine-tuning of flowering time. These data suggest that the NF-Y TF plays an important role in light-mediated gene regulation in wheat.
Resumo:
In late 2007, Gold Coast City Council libraries embarked on an online library project, designed to ramp up libraries’ online services to customers. As part of this project, the Young People’s team identified a need to connect with youth aged 12 to 16 in the online environment, in order to create a direct channel of communication with this market segment and encourage them to engage with the library. Blogging was identified as an appropriate means of communicating with both current and potential library customers from this age group. The Young People’s team consequently prepared a concept plan for a youth blog for launch in Children’s Book Week 2008 and are working towards development of management and administrative models and documentation and implementation of the blog itself. While many libraries have been quick to take up Web 2.0-style services, there has been little formal publication about the successes (or failures) of this type of project. Likewise, few libraries have published about the planning, management, and administration of such services. The youth blog currently in development at Gold Coast City Council libraries will be supported by a robust planning phase and will be rigorously evaluated as part of the project. This paper will report on the project (its aims, objectives and outputs), the planning process, and the evaluation activities and outcomes.
Resumo:
In 2009, Australia celebrated the introduction of a national Early Years Learning Framework. This is a critical component in a series of educational reforms designed to support quality pedagogy and practice in early childhood education and care (ECEC) and successful transition to school. As with any policy change, success in real terms relies upon building shared understanding and the capacity of educators to apply new knowledge and support change and improved practice within their service. With these outcomes in mind, a collaborative research project is investigating the efficacy of a new approach to professional learning in ECEC: The professional conversation. This paper provides an overview of the professional conversation approach, including underpinning principles and the design and use of reflective questions to support meaningful conversation and learning.
Resumo:
Flow regime transition criteria are of practical importance for two-phase flow analyses at reduced gravity conditions. Here, flow regime transition criteria which take the friction pressure loss effect into account were studied in detail. Criteria at reduced gravity conditions were developed by extending an existing model with various experimental datasets taken at microgravity conditions showed satisfactory agreement. Sample computations of the model were performed at various gravity conditions, such as 0.196, 1.62, 3.71, and 9.81 m/s2 corresponding to micro-gravity and lunar, Martian and Earth surface gravity, respectively. It was found that the effect of gravity on bubbly-slug and slug-annular (churn) transitions in a two-phase flow system was more pronounced at low liquid flow conditions, whereas the gravity effect could be ignored at high mixture volumetric flux conditions. While for the annular flow transitions due to flow reversal and onset of dropset entrainment, higher superficial gas velocity was obtained at higher gravity level.
Resumo:
In this paper, we present a new algorithm for boosting visual template recall performance through a process of visual expectation. Visual expectation dynamically modifies the recognition thresholds of learnt visual templates based on recently matched templates, improving the recall of sequences of familiar places while keeping precision high, without any feedback from a mapping backend. We demonstrate the performance benefits of visual expectation using two 17 kilometer datasets gathered in an outdoor environment at two times separated by three weeks. The visual expectation algorithm provides up to a 100% improvement in recall. We also combine the visual expectation algorithm with the RatSLAM SLAM system and show how the algorithm enables successful mapping
Resumo:
The growth of technologies and tools branded as =new media‘ or =Web 2.0‘ has sparked much discussion about the internet and its place in all facets of social life. Such debate includes the potential for blogs and citizen journalism projects to replace or alter journalism and mainstream media practices. However, while the journalism-blog dynamic has attracted the most attention, the actual work of political bloggers, the roles they play in the mediasphere and the resources they use, has been comparatively ignored. This project will look at political blogging in Australia and France - sites commenting on or promoting political events and ideas, and run by citizens, politicians, and journalists alike. In doing so, the structure of networks formed by bloggers and the nature of communication within political blogospheres will be examined. Previous studies of political blogging around the world have focussed on individual nations, finding that in some cases the networks are divided between different political ideologies. By comparing two countries with different political representation (two-party dominated system vs. a wider political spectrum), this study will determine the structure of these political blogospheres, and correlate these structures with the political environment in which they are situated. The thesis adapts concepts from communication and media theories, including framing, agenda setting, and opinion leaders, to examine the work of political bloggers and their place within the mediasphere. As well as developing a hybrid theoretical base for research into blogs and other online communication, the project outlines new methodologies for carrying out studies of online activity through the analysis of several topical networks within the wider activity collected for this project. The project draws on hyperlink and textual data collected from a sample of Australian and French blogs between January and August 2009. From this data, the thesis provides an overview of =everyday‘ political blogging, showing posting patterns over several months of activity, away from national elections and their associated campaigns. However, while other work in this field has looked solely at cumulative networks, treating collected data as a static network, this project will also look at specific cases to see how the blogospheres change with time and topics of discussion. Three case studies are used within the thesis to examine how blogs cover politics, featuring an international political event (the Obama inauguration), and local political topics (the opposition to the =Création et Internet‘, or HADOPI, law in France, the =Utegate‘ scandal in Australia). By using a mixture of qualitative and quantitative methods, the study analyses data collected from a population of sites from both countries, looking at their linking patterns, relationship with mainstream media, and topics of interest. This project will subsequently help to further develop methodologies in this field and provide new and detailed information on both online networks and internet-based political communication in Australia and France.
Resumo:
Handling information overload online, from the user's point of view is a big challenge, especially when the number of websites is growing rapidly due to growth in e-commerce and other related activities. Personalization based on user needs is the key to solving the problem of information overload. Personalization methods help in identifying relevant information, which may be liked by a user. User profile and object profile are the important elements of a personalization system. When creating user and object profiles, most of the existing methods adopt two-dimensional similarity methods based on vector or matrix models in order to find inter-user and inter-object similarity. Moreover, for recommending similar objects to users, personalization systems use the users-users, items-items and users-items similarity measures. In most cases similarity measures such as Euclidian, Manhattan, cosine and many others based on vector or matrix methods are used to find the similarities. Web logs are high-dimensional datasets, consisting of multiple users, multiple searches with many attributes to each. Two-dimensional data analysis methods may often overlook latent relationships that may exist between users and items. In contrast to other studies, this thesis utilises tensors, the high-dimensional data models, to build user and object profiles and to find the inter-relationships between users-users and users-items. To create an improved personalized Web system, this thesis proposes to build three types of profiles: individual user, group users and object profiles utilising decomposition factors of tensor data models. A hybrid recommendation approach utilising group profiles (forming the basis of a collaborative filtering method) and object profiles (forming the basis of a content-based method) in conjunction with individual user profiles (forming the basis of a model based approach) is proposed for making effective recommendations. A tensor-based clustering method is proposed that utilises the outcomes of popular tensor decomposition techniques such as PARAFAC, Tucker and HOSVD to group similar instances. An individual user profile, showing the user's highest interest, is represented by the top dimension values, extracted from the component matrix obtained after tensor decomposition. A group profile, showing similar users and their highest interest, is built by clustering similar users based on tensor decomposed values. A group profile is represented by the top association rules (containing various unique object combinations) that are derived from the searches made by the users of the cluster. An object profile is created to represent similar objects clustered on the basis of their similarity of features. Depending on the category of a user (known, anonymous or frequent visitor to the website), any of the profiles or their combinations is used for making personalized recommendations. A ranking algorithm is also proposed that utilizes the personalized information to order and rank the recommendations. The proposed methodology is evaluated on data collected from a real life car website. Empirical analysis confirms the effectiveness of recommendations made by the proposed approach over other collaborative filtering and content-based recommendation approaches based on two-dimensional data analysis methods.
Resumo:
Mixture models are a flexible tool for unsupervised clustering that have found popularity in a vast array of research areas. In studies of medicine, the use of mixtures holds the potential to greatly enhance our understanding of patient responses through the identification of clinically meaningful clusters that, given the complexity of many data sources, may otherwise by intangible. Furthermore, when developed in the Bayesian framework, mixture models provide a natural means for capturing and propagating uncertainty in different aspects of a clustering solution, arguably resulting in richer analyses of the population under study. This thesis aims to investigate the use of Bayesian mixture models in analysing varied and detailed sources of patient information collected in the study of complex disease. The first aim of this thesis is to showcase the flexibility of mixture models in modelling markedly different types of data. In particular, we examine three common variants on the mixture model, namely, finite mixtures, Dirichlet Process mixtures and hidden Markov models. Beyond the development and application of these models to different sources of data, this thesis also focuses on modelling different aspects relating to uncertainty in clustering. Examples of clustering uncertainty considered are uncertainty in a patient’s true cluster membership and accounting for uncertainty in the true number of clusters present. Finally, this thesis aims to address and propose solutions to the task of comparing clustering solutions, whether this be comparing patients or observations assigned to different subgroups or comparing clustering solutions over multiple datasets. To address these aims, we consider a case study in Parkinson’s disease (PD), a complex and commonly diagnosed neurodegenerative disorder. In particular, two commonly collected sources of patient information are considered. The first source of data are on symptoms associated with PD, recorded using the Unified Parkinson’s Disease Rating Scale (UPDRS) and constitutes the first half of this thesis. The second half of this thesis is dedicated to the analysis of microelectrode recordings collected during Deep Brain Stimulation (DBS), a popular palliative treatment for advanced PD. Analysis of this second source of data centers on the problems of unsupervised detection and sorting of action potentials or "spikes" in recordings of multiple cell activity, providing valuable information on real time neural activity in the brain.
Resumo:
“Turtle Twilight” is a two-screen video installation. Paragraphs of text adapted from a travel blog type across the left-hand screen. A computer-generated image of a tropical sunset is slowly animated on the right-hand screen. The two screens are accompanied by an atmospheric stock music track. This work examines how we construct, represent and deploy ‘nature’ in our contemporary lives. It mixes cinematic codes with image, text and sound gleaned from online sources. By extending on Nicolas Bourriad’s understanding of ‘postproduction’ and the creative and critical strategies of ‘editing’, it questions the relationship between contemporary screen culture, nature, desire and contemplation.
Resumo:
With the growing number of XML documents on theWeb it becomes essential to effectively organise these XML documents in order to retrieve useful information from them. A possible solution is to apply clustering on the XML documents to discover knowledge that promotes effective data management, information retrieval and query processing. However, many issues arise in discovering knowledge from these types of semi-structured documents due to their heterogeneity and structural irregularity. Most of the existing research on clustering techniques focuses only on one feature of the XML documents, this being either their structure or their content due to scalability and complexity problems. The knowledge gained in the form of clusters based on the structure or the content is not suitable for reallife datasets. It therefore becomes essential to include both the structure and content of XML documents in order to improve the accuracy and meaning of the clustering solution. However, the inclusion of both these kinds of information in the clustering process results in a huge overhead for the underlying clustering algorithm because of the high dimensionality of the data. The overall objective of this thesis is to address these issues by: (1) proposing methods to utilise frequent pattern mining techniques to reduce the dimension; (2) developing models to effectively combine the structure and content of XML documents; and (3) utilising the proposed models in clustering. This research first determines the structural similarity in the form of frequent subtrees and then uses these frequent subtrees to represent the constrained content of the XML documents in order to determine the content similarity. A clustering framework with two types of models, implicit and explicit, is developed. The implicit model uses a Vector Space Model (VSM) to combine the structure and the content information. The explicit model uses a higher order model, namely a 3- order Tensor Space Model (TSM), to explicitly combine the structure and the content information. This thesis also proposes a novel incremental technique to decompose largesized tensor models to utilise the decomposed solution for clustering the XML documents. The proposed framework and its components were extensively evaluated on several real-life datasets exhibiting extreme characteristics to understand the usefulness of the proposed framework in real-life situations. Additionally, this research evaluates the outcome of the clustering process on the collection selection problem in the information retrieval on the Wikipedia dataset. The experimental results demonstrate that the proposed frequent pattern mining and clustering methods outperform the related state-of-the-art approaches. In particular, the proposed framework of utilising frequent structures for constraining the content shows an improvement in accuracy over content-only and structure-only clustering results. The scalability evaluation experiments conducted on large scaled datasets clearly show the strengths of the proposed methods over state-of-the-art methods. In particular, this thesis work contributes to effectively combining the structure and the content of XML documents for clustering, in order to improve the accuracy of the clustering solution. In addition, it also contributes by addressing the research gaps in frequent pattern mining to generate efficient and concise frequent subtrees with various node relationships that could be used in clustering.
Resumo:
Background: Known risk factors for secondary lymphedema only partially explain who develops lymphedema following cancer, suggesting that inherited genetic susceptibility may influence risk. Moreover, identification of molecular signatures could facilitate lymphedema risk prediction prior to surgery or lead to effective drug therapies for prevention or treatment. Recent advances in the molecular biology underlying development of the lymphatic system and related congenital disorders implicate a number of potential candidate genes to explore in relation to secondary lymphedema. Methods and Results: We undertook a nested case-control study, with participants who had developed lymphedema after surgical intervention within the first 18 months of their breast cancer diagnosis serving as cases (n=22) and those without lymphedema serving as controls (n=98), identified from a prospective, population-based, cohort study in Queensland, Australia. TagSNPs that covered all known genetic variation in the genes SOX18, VEGFC, VEGFD, VEGFR2, VEGFR3, RORC, FOXC2, LYVE1, ADM and PROX1 were selected for genotyping. Multiple SNPs within three receptor genes, VEGFR2, VEGFR3 and RORC, were associated with lymphedema defined by statistical significance (p<0.05) or extreme risk estimates (OR<0.5 or >2.0). Conclusions: These provocative, albeit preliminary, findings regarding possible genetic predisposition to secondary lymphedema following breast cancer treatment warrant further attention for potential replication using larger datasets.