340 resultados para datasets


Relevância:

10.00% 10.00%

Publicador:

Resumo:

The traditional Vector Space Model (VSM) is not able to represent both the structure and the content of XML documents. This paper introduces a novel method of representing XML documents in a Tensor Space Model (TSM) and then utilizing it for clustering. Empirical analysis shows that the proposed method is scalable for large-sized datasets; as well, the factorized matrices produced from the proposed method help to improve the quality of clusters through the enriched document representation of both structure and content information.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Information overload has become a serious issue for web users. Personalisation can provide effective solutions to overcome this problem. Recommender systems are one popular personalisation tool to help users deal with this issue. As the base of personalisation, the accuracy and efficiency of web user profiling affects the performances of recommender systems and other personalisation systems greatly. In Web 2.0, the emerging user information provides new possible solutions to profile users. Folksonomy or tag information is a kind of typical Web 2.0 information. Folksonomy implies the users‘ topic interests and opinion information. It becomes another source of important user information to profile users and to make recommendations. However, since tags are arbitrary words given by users, folksonomy contains a lot of noise such as tag synonyms, semantic ambiguities and personal tags. Such noise makes it difficult to profile users accurately or to make quality recommendations. This thesis investigates the distinctive features and multiple relationships of folksonomy and explores novel approaches to solve the tag quality problem and profile users accurately. Harvesting the wisdom of crowds and experts, three new user profiling approaches are proposed: folksonomy based user profiling approach, taxonomy based user profiling approach, hybrid user profiling approach based on folksonomy and taxonomy. The proposed user profiling approaches are applied to recommender systems to improve their performances. Based on the generated user profiles, the user and item based collaborative filtering approaches, combined with the content filtering methods, are proposed to make recommendations. The proposed new user profiling and recommendation approaches have been evaluated through extensive experiments. The effectiveness evaluation experiments were conducted on two real world datasets collected from Amazon.com and CiteULike websites. The experimental results demonstrate that the proposed user profiling and recommendation approaches outperform those related state-of-the-art approaches. In addition, this thesis proposes a parallel, scalable user profiling implementation approach based on advanced cloud computing techniques such as Hadoop, MapReduce and Cascading. The scalability evaluation experiments were conducted on a large scaled dataset collected from Del.icio.us website. This thesis contributes to effectively use the wisdom of crowds and expert to help users solve information overload issues through providing more accurate, effective and efficient user profiling and recommendation approaches. It also contributes to better usages of taxonomy information given by experts and folksonomy information contributed by users in Web 2.0.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Social tags are an important information source in Web 2.0. They can be used to describe users’ topic preferences as well as the content of items to make personalized recommendations. However, since tags are arbitrary words given by users, they contain a lot of noise such as tag synonyms, semantic ambiguities and personal tags. Such noise brings difficulties to improve the accuracy of item recommendations. To eliminate the noise of tags, in this paper we propose to use the multiple relationships among users, items and tags to find the semantic meaning of each tag for each user individually. With the proposed approach, the relevant tags of each item and the tag preferences of each user are determined. In addition, the user and item-based collaborative filtering combined with the content filtering approach are explored. The effectiveness of the proposed approaches is demonstrated in the experiments conducted on real world datasets collected from Amazon.com and citeULike website.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Item folksonomy or tag information is a kind of typical and prevalent web 2.0 information. Item folksonmy contains rich opinion information of users on item classifications and descriptions. It can be used as another important information source to conduct opinion mining. On the other hand, each item is associated with taxonomy information that reflects the viewpoints of experts. In this paper, we propose to mine for users’ opinions on items based on item taxonomy developed by experts and folksonomy contributed by users. In addition, we explore how to make personalized item recommendations based on users’ opinions. The experiments conducted on real word datasets collected from Amazon.com and CiteULike demonstrated the effectiveness of the proposed approaches.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Distributed Denial-of-Service (DDoS) attacks continue to be one of the most pernicious threats to the delivery of services over the Internet. Not only are DDoS attacks present in many guises, they are also continuously evolving as new vulnerabilities are exploited. Hence accurate detection of these attacks still remains a challenging problem and a necessity for ensuring high-end network security. An intrinsic challenge in addressing this problem is to effectively distinguish these Denial-of-Service attacks from similar looking Flash Events (FEs) created by legitimate clients. A considerable overlap between the general characteristics of FEs and DDoS attacks makes it difficult to precisely separate these two classes of Internet activity. In this paper we propose parameters which can be used to explicitly distinguish FEs from DDoS attacks and analyse two real-world publicly available datasets to validate our proposal. Our analysis shows that even though FEs appear very similar to DDoS attacks, there are several subtle dissimilarities which can be exploited to separate these two classes of events.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Unusual event detection in crowded scenes remains challenging because of the diversity of events and noise. In this paper, we present a novel approach for unusual event detection via sparse reconstruction of dynamic textures over an overcomplete basis set, with the dynamic texture described by local binary patterns from three orthogonal planes (LBPTOP). The overcomplete basis set is learnt from the training data where only the normal items observed. In the detection process, given a new observation, we compute the sparse coefficients using the Dantzig Selector algorithm which was proposed in the literature of compressed sensing. Then the reconstruction errors are computed, based on which we detect the abnormal items. Our application can be used to detect both local and global abnormal events. We evaluate our algorithm on UCSD Abnormality Datasets for local anomaly detection, which is shown to outperform current state-of-the-art approaches, and we also get promising results for rapid escape detection using the PETS2009 dataset.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper describes a scene invariant crowd counting algorithm that uses local features to monitor crowd size. Unlike previous algorithms that require each camera to be trained separately, the proposed method uses camera calibration to scale between viewpoints, allowing a system to be trained and tested on different scenes. A pre-trained system could therefore be used as a turn-key solution for crowd counting across a wide range of environments. The use of local features allows the proposed algorithm to calculate local occupancy statistics, and Gaussian process regression is used to scale to conditions which are unseen in the training data, also providing confidence intervals for the crowd size estimate. A new crowd counting database is introduced to the computer vision community to enable a wider evaluation over multiple scenes, and the proposed algorithm is tested on seven datasets to demonstrate scene invariance and high accuracy. To the authors' knowledge this is the first system of its kind due to its ability to scale between different scenes and viewpoints.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Micro aerial vehicles (MAVs) are a rapidly growing area of research and development in robotics. For autonomous robot operations, localization has typically been calculated using GPS, external camera arrays, or onboard range or vision sensing. In cluttered indoor or outdoor environments, onboard sensing is the only viable option. In this paper we present an appearance-based approach to visual SLAM on a flying MAV using only low quality vision. Our approach consists of a visual place recognition algorithm that operates on 1000 pixel images, a lightweight visual odometry algorithm, and a visual expectation algorithm that improves the recall of place sequences and the precision with which they are recalled as the robot flies along a similar path. Using data gathered from outdoor datasets, we show that the system is able to perform visual recognition with low quality, intermittent visual sensory data. By combining the visual algorithms with the RatSLAM system, we also demonstrate how the algorithms enable successful SLAM.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Virtual methods to assess the fitting of a fracture fixation plate were proposed recently, however with limitations such as simplified fit criteria or manual data processing. This study aims to automate a fit analysis procedure using clinical-based criteria, and then to analyse the results further for borderline fit cases. Three dimensional (3D) models of 45 bones and of a precontoured distal tibial plate were utilized to assess the fitting of the plate automatically. A Matlab program was developed to automatically measure the shortest distance between the bone and the plate at three regions of interest and a plate-bone angle. The measured values including the fit assessment results were recorded in a spreadsheet as part of the batch-process routine. An automated fit analysis procedure will enable the processing of larger bone datasets in a significantly shorter time, which will provide more representative data of the target population for plate shape design and validation. As a result, better fitting plates can be manufactured and made available to surgeons, thereby reducing the risk and cost associated with complications or corrective procedures. This in turn, is expected to translate into improving patients' quality of life.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Soluble organic matter derived from exotic Pinus vegetation forms stronger complexes with iron (Fe) than the soluble organic matter derived from most native Australian species. This has lead to concern about the environmental impacts related to the establishment of extensive exotic Pinus plantations in coastal southeast Queensland, Australia. It has been suggested that the Pinus plantations may enhance the solubility of Fe in soils by increasing the amount of organically complexed Fe. While this remains inconclusive, the environmental impacts of an increased flux of dissolved, organically complexed Fe from soils to the fluvial system and then to sensitive coastal ecosystems are potentially damaging. Previous work investigated a small number of samples, was largely laboratory based and had limited application to field conditions. These assessments lacked field-based studies, including the comparison of the soil water chemistry of sites associated with Pinus vegetation and undisturbed native vegetation. In addition, the main controls on the distribution and mobilisation of Fe in soils of this subtropical coastal region have not been determined. This information is required in order to better understand the relative significance of any Pinus enhanced solubility of Fe. The main aim of this thesis is to determine the controls on Fe distribution and mobilisation in soils and soil waters of a representative coastal catchment in southeast Queensland (Poona Creek catchment, Fraser Coast) and to test the effect of Pinus vegetation on the solubility and speciation of Fe. The thesis is structured around three individual papers. The first paper identifies the main processes responsible for the distribution and mobilisation of labile Fe in the study area and takes a catchment scale approach. Physicochemical attributes of 120 soil samples distributed throughout the catchment are analysed, and a new multivariate data analysis approach (Kohonen’s self organising maps) is used to identify the conditions associated with high labile Fe. The second paper establishes whether Fe nodules play a major role as an iron source in the catchment, by determining the genetic mechanism responsible for their formation. The nodules are a major pool of Fe in much of the region and previous studies have implied that they may be involved in redox-controlled mobilisation and redistribution of Fe. This is achieved by combining a detailed study of a ferric soil profile (morphology, mineralogy and micromorphology) with the distribution of Fe nodules on a catchment scale. The third component of the thesis tests whether the concentration and speciation of Fe in soil solutions from Pinus plantations differs significantly from native vegetation soil solutions. Microlysimeters are employed to collect unaltered, in situ soil water samples. The redox speciation of Fe is determined spectrophotometrically and the interaction between Fe and dissolved organic matter (DOM) is modelled with the Stockholm Humic Model. The thesis provides a better understanding of the controls on the distribution, concentration and speciation of Fe in the soils and soil waters of southeast Queensland. Reductive dissolution is the main mechanism by which mobilisation of Fe occurs in the study area. Labile Fe concentrations are low overall, particularly in the sandy soils of the coastal plain. However, high labile Fe is common in seasonally waterlogged and clay-rich soils which are exposed to fluctuating redox conditions and in organic-rich soils adjacent to streams. Clay-rich soils are most common in the upper parts of the catchment. Fe nodules were shown to have a negligible role in the redistribution of dissolved iron in the catchment. They are formed by the erosion, colluvial transport and chemical weathering of iron-rich sandstones. The ferric horizons, in which nodules are commonly concentrated, subsequently form through differential biological mixing of the soil. Whereas dissolution/ reprecipitation of the Fe cements is an important component of nodule formation, mobilised Fe reprecipitates locally. Dissolved Fe in the soil waters is almost entirely in the ferrous form. Vegetation type does not affect the concentration and speciation of Fe in soil waters, although Pinus DOM has greater acidic functional group site densities than DOM from native vegetation. Iron concentrations are highest in the high DOM soil waters collected from sandy podosols, where they are controlled by redox potential. Iron concentrations are low in soil solutions from clay and iron oxide rich soils, in spite of similar redox potentials. This is related to stronger sorption to the reactive clay and iron oxide mineral surfaces in these soils, which reduces the amount of DOM available for microbial metabolisation and reductive dissolution of Fe. Modelling suggests that Pinus DOM can significantly increase the amount of truly dissolved ferric iron remaining in solution in oxidising conditions. Thus, inputs of ferrous iron together with Pinus DOM to surface waters may reduce precipitation of hydrous ferric oxides and increase the flux of dissolved iron out of the catchment. Such inputs are most likely from the lower catchment, where podosols planted with Pinus are most widely distributed. Significant outcomes other than the main aims were also achieved. It is shown that mobilisation of Fe in podosols can occur as dissolved Fe(II) rather than as Fe(III)-organic complexes. This has implications for the large body of work which assumes that Fe(II) plays a minor role. Also, the first paper demonstrates that a data analysis approach based on Kohonen’s self organising maps can facilitate the interpretation of complex datasets and can help identify geochemical processes operating on a catchment scale.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The research objectives of this thesis were to contribute to Bayesian statistical methodology by contributing to risk assessment statistical methodology, and to spatial and spatio-temporal methodology, by modelling error structures using complex hierarchical models. Specifically, I hoped to consider two applied areas, and use these applications as a springboard for developing new statistical methods as well as undertaking analyses which might give answers to particular applied questions. Thus, this thesis considers a series of models, firstly in the context of risk assessments for recycled water, and secondly in the context of water usage by crops. The research objective was to model error structures using hierarchical models in two problems, namely risk assessment analyses for wastewater, and secondly, in a four dimensional dataset, assessing differences between cropping systems over time and over three spatial dimensions. The aim was to use the simplicity and insight afforded by Bayesian networks to develop appropriate models for risk scenarios, and again to use Bayesian hierarchical models to explore the necessarily complex modelling of four dimensional agricultural data. The specific objectives of the research were to develop a method for the calculation of credible intervals for the point estimates of Bayesian networks; to develop a model structure to incorporate all the experimental uncertainty associated with various constants thereby allowing the calculation of more credible credible intervals for a risk assessment; to model a single day’s data from the agricultural dataset which satisfactorily captured the complexities of the data; to build a model for several days’ data, in order to consider how the full data might be modelled; and finally to build a model for the full four dimensional dataset and to consider the timevarying nature of the contrast of interest, having satisfactorily accounted for possible spatial and temporal autocorrelations. This work forms five papers, two of which have been published, with two submitted, and the final paper still in draft. The first two objectives were met by recasting the risk assessments as directed, acyclic graphs (DAGs). In the first case, we elicited uncertainty for the conditional probabilities needed by the Bayesian net, incorporated these into a corresponding DAG, and used Markov chain Monte Carlo (MCMC) to find credible intervals, for all the scenarios and outcomes of interest. In the second case, we incorporated the experimental data underlying the risk assessment constants into the DAG, and also treated some of that data as needing to be modelled as an ‘errors-invariables’ problem [Fuller, 1987]. This illustrated a simple method for the incorporation of experimental error into risk assessments. In considering one day of the three-dimensional agricultural data, it became clear that geostatistical models or conditional autoregressive (CAR) models over the three dimensions were not the best way to approach the data. Instead CAR models are used with neighbours only in the same depth layer. This gave flexibility to the model, allowing both the spatially structured and non-structured variances to differ at all depths. We call this model the CAR layered model. Given the experimental design, the fixed part of the model could have been modelled as a set of means by treatment and by depth, but doing so allows little insight into how the treatment effects vary with depth. Hence, a number of essentially non-parametric approaches were taken to see the effects of depth on treatment, with the model of choice incorporating an errors-in-variables approach for depth in addition to a non-parametric smooth. The statistical contribution here was the introduction of the CAR layered model, the applied contribution the analysis of moisture over depth and estimation of the contrast of interest together with its credible intervals. These models were fitted using WinBUGS [Lunn et al., 2000]. The work in the fifth paper deals with the fact that with large datasets, the use of WinBUGS becomes more problematic because of its highly correlated term by term updating. In this work, we introduce a Gibbs sampler with block updating for the CAR layered model. The Gibbs sampler was implemented by Chris Strickland using pyMCMC [Strickland, 2010]. This framework is then used to consider five days data, and we show that moisture in the soil for all the various treatments reaches levels particular to each treatment at a depth of 200 cm and thereafter stays constant, albeit with increasing variances with depth. In an analysis across three spatial dimensions and across time, there are many interactions of time and the spatial dimensions to be considered. Hence, we chose to use a daily model and to repeat the analysis at all time points, effectively creating an interaction model of time by the daily model. Such an approach allows great flexibility. However, this approach does not allow insight into the way in which the parameter of interest varies over time. Hence, a two-stage approach was also used, with estimates from the first-stage being analysed as a set of time series. We see this spatio-temporal interaction model as being a useful approach to data measured across three spatial dimensions and time, since it does not assume additivity of the random spatial or temporal effects.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper proposes an innovative instance similarity based evaluation metric that reduces the search map for clustering to be performed. An aggregate global score is calculated for each instance using the novel idea of Fibonacci series. The use of Fibonacci numbers is able to separate the instances effectively and, in hence, the intra-cluster similarity is increased and the inter-cluster similarity is decreased during clustering. The proposed FIBCLUS algorithm is able to handle datasets with numerical, categorical and a mix of both types of attributes. Results obtained with FIBCLUS are compared with the results of existing algorithms such as k-means, x-means expected maximization and hierarchical algorithms that are widely used to cluster numeric, categorical and mix data types. Empirical analysis shows that FIBCLUS is able to produce better clustering solutions in terms of entropy, purity and F-score in comparison to the above described existing algorithms.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

With the growth of the Web, E-commerce activities are also becoming popular. Product recommendation is an effective way of marketing a product to potential customers. Based on a user’s previous searches, most recommendation methods employ two dimensional models to find relevant items. Such items are then recommended to a user. Further too many irrelevant recommendations worsen the information overload problem for a user. This happens because such models based on vectors and matrices are unable to find the latent relationships that exist between users and searches. Identifying user behaviour is a complex process, and usually involves comparing searches made by him. In most of the cases traditional vector and matrix based methods are used to find prominent features as searched by a user. In this research we employ tensors to find relevant features as searched by users. Such relevant features are then used for making recommendations. Evaluation on real datasets show the effectiveness of such recommendations over vector and matrix based methods.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Search log data is multi dimensional data consisting of number of searches of multiple users with many searched parameters. This data can be used to identify a user’s interest in an item or object being searched. Identifying highest interests of a Web user from his search log data is a complex process. Based on a user’s previous searches, most recommendation methods employ two-dimensional models to find relevant items. Such items are then recommended to a user. Two-dimensional data models, when used to mine knowledge from such multi dimensional data may not be able to give good mappings of user and his searches. The major problem with such models is that they are unable to find the latent relationships that exist between different searched dimensions. In this research work, we utilize tensors to model the various searches made by a user. Such high dimensional data model is then used to extract the relationship between various dimensions, and find the prominent searched components. To achieve this, we have used popular tensor decomposition methods like PARAFAC, Tucker and HOSVD. All experiments and evaluation is done on real datasets, which clearly show the effectiveness of tensor models in finding prominent searched components in comparison to other widely used two-dimensional data models. Such top rated searched components are then given as recommendation to users.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Light plays a unique role for plants as it is both a source of energy for growth and a signal for development. Light captured by the pigments in the light harvesting complexes is used to drive the synthesis of the chemical energy required for carbon assimilation. The light perceived by photoreceptors activates effectors, such as transcription factors (TFs), which modulate the expression of light-responsive genes. Recently, it has been speculated that increasing the photosynthetic rate could further improve the yield potential of three carbon (C3) crops such as wheat. However, little is currently known about the transcriptional regulation of photosynthesis genes, particularly in crop species. Nuclear factor Y (NF-Y) TF is a functionally diverse regulator of growth and development in the model plant species, with demonstrated roles in embryo development, stress response, flowering time and chloroplast biogenesis. Furthermore, a light-responsive NF-Y binding site (CCAAT-box) is present in the promoter of a spinach photosynthesis gene. As photosynthesis genes are co-regulated by light and co-regulated genes typically have similar regulatory elements in their promoters, it seems likely that other photosynthesis genes would also have light-responsive CCAAT-boxes. This provided the impetus to investigate the NF-Y TF in bread wheat. This thesis is focussed on wheat NF-Y members that have roles in light-mediated gene regulation with an emphasis on their involvement in the regulation of photosynthesis genes. NF-Y is a heterotrimeric complex, comprised of the three subunits NF-YA, NF-YB and NF-YC. Unlike the mammalian and yeast counterparts, each of the three subunits is encoded by multiple genes in Arabidopsis. The initial step taken in this study was the identification of the wheat NF-Y family (Chapter 3). A search of the current wheat nucleotide sequence databases identified 37 NF-Y genes (10 NF-YA, 11 NF-YB, 14 NF-YC & 2 Dr1). Phylogenetic analysis revealed that each of the three wheat NF-Y (TaNF-Y) subunit families could be divided into 4-5 clades based on their conserved core regions. Outside of the core regions, eleven motifs were identified to be conserved between Arabidopsis, rice and wheat NF-Y subunit members. The expression profiles of TaNF-Y genes were constructed using quantitative real-time polymerase chain reaction (RT-PCR). Some TaNF-Y subunit members had little variation in their transcript levels among the organs, while others displayed organ-predominant expression profiles, including those expressed mainly in the photosynthetic organs. To investigate their potential role in light-mediated gene regulation, the light responsiveness of the TaNF-Y genes were examined (Chapters 4 and 5). Two TaNF-YB and five TaNF-YC members were markedly upregulated by light in both the wheat leaves and seedling shoots. To identify the potential target genes of the light-upregulated NF-Y subunit members, a gene expression correlation analysis was conducted using publically available Affymetrix Wheat Genome Array datasets. This analysis revealed that the transcript expression levels of TaNF-YB3 and TaNF-YC11 were significantly correlated with those of photosynthesis genes. These correlated express profiles were also observed in the quantitative RT-PCR dataset from wheat plants grown under light and dark conditions. Sequence analysis of the promoters of these wheat photosynthesis genes revealed that they were enriched with potential NF-Y binding sites (CCAAT-box). The potential role of TaNF-YB3 in the regulation of photosynthetic genes was further investigated using a transgenic approach (Chapter 5). Transgenic wheat lines constitutively expressing TaNF-YB3 were found to have significantly increased expression levels of photosynthesis genes, including those encoding light harvesting chlorophyll a/b-binding proteins, photosystem I reaction centre subunits, a chloroplast ATP synthase subunit and glutamyl-tRNA reductase (GluTR). GluTR is a rate-limiting enzyme in the chlorophyll biosynthesis pathway. In association with the increased expression of the photosynthesis genes, the transgenic lines had a higher leaf chlorophyll content, increased photosynthetic rate and had a more rapid early growth rate compared to the wild-type wheat. In addition to its role in the regulation of photosynthesis genes, TaNF-YB3 overexpression lines flower on average 2-days earlier than the wild-type (Chapter 6). Quantitative RT-PCR analysis showed that there was a 13-fold increase in the expression level of the floral integrator, TaFT. The transcript levels of other downstream genes (TaFT2 and TaVRN1) were also increased in the transgenic lines. Furthermore, the transcript levels of TaNF-YB3 were significantly correlated with those of constans (CO), constans-like (COL) and timing of chlorophyll a/b-binding (CAB) expression 1 [TOC1; (CCT)] domain-containing proteins known to be involved in the regulation of flowering time. To summarise the key findings of this study, 37 NF-Y genes were identified in the crop species wheat. An in depth analysis of TaNF-Y gene expression profiles revealed that the potential role of some light-upregulated members was in the regulation of photosynthetic genes. The involvement of TaNF-YB3 in the regulation of photosynthesis genes was supported by data obtained from transgenic wheat lines with increased constitutive expression of TaNF-YB3. The overexpression of TaNF-YB3 in the transgenic lines revealed this NF-YB member is also involved in the fine-tuning of flowering time. These data suggest that the NF-Y TF plays an important role in light-mediated gene regulation in wheat.