Biblioteca Digital

970 resultados para Datasets

Parametric differences between a real-world distributed denial-of-service attack and a flash event

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Distributed Denial-of-Service (DDoS) attacks continue to be one of the most pernicious threats to the delivery of services over the Internet. Not only are DDoS attacks present in many guises, they are also continuously evolving as new vulnerabilities are exploited. Hence accurate detection of these attacks still remains a challenging problem and a necessity for ensuring high-end network security. An intrinsic challenge in addressing this problem is to effectively distinguish these Denial-of-Service attacks from similar looking Flash Events (FEs) created by legitimate clients. A considerable overlap between the general characteristics of FEs and DDoS attacks makes it difficult to precisely separate these two classes of Internet activity. In this paper we propose parameters which can be used to explicitly distinguish FEs from DDoS attacks and analyse two real-world publicly available datasets to validate our proposal. Our analysis shows that even though FEs appear very similar to DDoS attacks, there are several subtle dissimilarities which can be exploited to separate these two classes of events.

Dynamic texture reconstruction from sparse codes for unusual event detection in crowded scenes

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Unusual event detection in crowded scenes remains challenging because of the diversity of events and noise. In this paper, we present a novel approach for unusual event detection via sparse reconstruction of dynamic textures over an overcomplete basis set, with the dynamic texture described by local binary patterns from three orthogonal planes (LBPTOP). The overcomplete basis set is learnt from the training data where only the normal items observed. In the detection process, given a new observation, we compute the sparse coefficients using the Dantzig Selector algorithm which was proposed in the literature of compressed sensing. Then the reconstruction errors are computed, based on which we detect the abnormal items. Our application can be used to detect both local and global abnormal events. We evaluate our algorithm on UCSD Abnormality Datasets for local anomaly detection, which is shown to outperform current state-of-the-art approaches, and we also get promising results for rapid escape detection using the PETS2009 dataset.

Scene invariant crowd counting

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper describes a scene invariant crowd counting algorithm that uses local features to monitor crowd size. Unlike previous algorithms that require each camera to be trained separately, the proposed method uses camera calibration to scale between viewpoints, allowing a system to be trained and tested on different scenes. A pre-trained system could therefore be used as a turn-key solution for crowd counting across a wide range of environments. The use of local features allows the proposed algorithm to calculate local occupancy statistics, and Gaussian process regression is used to scale to conditions which are unseen in the training data, also providing confidence intervals for the crowd size estimate. A new crowd counting database is introduced to the computer vision community to enable a wider evaluation over multiple scenes, and the proposed algorithm is tested on seven datasets to demonstrate scene invariance and high accuracy. To the authors' knowledge this is the first system of its kind due to its ability to scale between different scenes and viewpoints.

Aerial SLAM with a single camera using visual expectation

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Micro aerial vehicles (MAVs) are a rapidly growing area of research and development in robotics. For autonomous robot operations, localization has typically been calculated using GPS, external camera arrays, or onboard range or vision sensing. In cluttered indoor or outdoor environments, onboard sensing is the only viable option. In this paper we present an appearance-based approach to visual SLAM on a flying MAV using only low quality vision. Our approach consists of a visual place recognition algorithm that operates on 1000 pixel images, a lightweight visual odometry algorithm, and a visual expectation algorithm that improves the recall of place sequences and the precision with which they are recalled as the robot flies along a similar path. Using data gathered from outdoor datasets, we show that the system is able to perform visual recognition with low quality, intermittent visual sensory data. By combining the visual algorithms with the RatSLAM system, we also demonstrate how the algorithms enable successful SLAM.

Quantitative fit assessment of a precontoured fracture fixation plate : its automation and an investigation on the borderline cases

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Virtual methods to assess the fitting of a fracture fixation plate were proposed recently, however with limitations such as simplified fit criteria or manual data processing. This study aims to automate a fit analysis procedure using clinical-based criteria, and then to analyse the results further for borderline fit cases. Three dimensional (3D) models of 45 bones and of a precontoured distal tibial plate were utilized to assess the fitting of the plate automatically. A Matlab program was developed to automatically measure the shortest distance between the bone and the plate at three regions of interest and a plate-bone angle. The measured values including the fit assessment results were recorded in a spreadsheet as part of the batch-process routine. An automated fit analysis procedure will enable the processing of larger bone datasets in a significantly shorter time, which will provide more representative data of the target population for plate shape design and validation. As a result, better fitting plates can be manufactured and made available to surgeons, thereby reducing the risk and cost associated with complications or corrective procedures. This in turn, is expected to translate into improving patients' quality of life.

Controls on iron in soils and soil waters of a forested, coastal catchment in subtropical Australia

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Soluble organic matter derived from exotic Pinus vegetation forms stronger complexes with iron (Fe) than the soluble organic matter derived from most native Australian species. This has lead to concern about the environmental impacts related to the establishment of extensive exotic Pinus plantations in coastal southeast Queensland, Australia. It has been suggested that the Pinus plantations may enhance the solubility of Fe in soils by increasing the amount of organically complexed Fe. While this remains inconclusive, the environmental impacts of an increased flux of dissolved, organically complexed Fe from soils to the fluvial system and then to sensitive coastal ecosystems are potentially damaging. Previous work investigated a small number of samples, was largely laboratory based and had limited application to field conditions. These assessments lacked field-based studies, including the comparison of the soil water chemistry of sites associated with Pinus vegetation and undisturbed native vegetation. In addition, the main controls on the distribution and mobilisation of Fe in soils of this subtropical coastal region have not been determined. This information is required in order to better understand the relative significance of any Pinus enhanced solubility of Fe. The main aim of this thesis is to determine the controls on Fe distribution and mobilisation in soils and soil waters of a representative coastal catchment in southeast Queensland (Poona Creek catchment, Fraser Coast) and to test the effect of Pinus vegetation on the solubility and speciation of Fe. The thesis is structured around three individual papers. The first paper identifies the main processes responsible for the distribution and mobilisation of labile Fe in the study area and takes a catchment scale approach. Physicochemical attributes of 120 soil samples distributed throughout the catchment are analysed, and a new multivariate data analysis approach (Kohonen’s self organising maps) is used to identify the conditions associated with high labile Fe. The second paper establishes whether Fe nodules play a major role as an iron source in the catchment, by determining the genetic mechanism responsible for their formation. The nodules are a major pool of Fe in much of the region and previous studies have implied that they may be involved in redox-controlled mobilisation and redistribution of Fe. This is achieved by combining a detailed study of a ferric soil profile (morphology, mineralogy and micromorphology) with the distribution of Fe nodules on a catchment scale. The third component of the thesis tests whether the concentration and speciation of Fe in soil solutions from Pinus plantations differs significantly from native vegetation soil solutions. Microlysimeters are employed to collect unaltered, in situ soil water samples. The redox speciation of Fe is determined spectrophotometrically and the interaction between Fe and dissolved organic matter (DOM) is modelled with the Stockholm Humic Model. The thesis provides a better understanding of the controls on the distribution, concentration and speciation of Fe in the soils and soil waters of southeast Queensland. Reductive dissolution is the main mechanism by which mobilisation of Fe occurs in the study area. Labile Fe concentrations are low overall, particularly in the sandy soils of the coastal plain. However, high labile Fe is common in seasonally waterlogged and clay-rich soils which are exposed to fluctuating redox conditions and in organic-rich soils adjacent to streams. Clay-rich soils are most common in the upper parts of the catchment. Fe nodules were shown to have a negligible role in the redistribution of dissolved iron in the catchment. They are formed by the erosion, colluvial transport and chemical weathering of iron-rich sandstones. The ferric horizons, in which nodules are commonly concentrated, subsequently form through differential biological mixing of the soil. Whereas dissolution/ reprecipitation of the Fe cements is an important component of nodule formation, mobilised Fe reprecipitates locally. Dissolved Fe in the soil waters is almost entirely in the ferrous form. Vegetation type does not affect the concentration and speciation of Fe in soil waters, although Pinus DOM has greater acidic functional group site densities than DOM from native vegetation. Iron concentrations are highest in the high DOM soil waters collected from sandy podosols, where they are controlled by redox potential. Iron concentrations are low in soil solutions from clay and iron oxide rich soils, in spite of similar redox potentials. This is related to stronger sorption to the reactive clay and iron oxide mineral surfaces in these soils, which reduces the amount of DOM available for microbial metabolisation and reductive dissolution of Fe. Modelling suggests that Pinus DOM can significantly increase the amount of truly dissolved ferric iron remaining in solution in oxidising conditions. Thus, inputs of ferrous iron together with Pinus DOM to surface waters may reduce precipitation of hydrous ferric oxides and increase the flux of dissolved iron out of the catchment. Such inputs are most likely from the lower catchment, where podosols planted with Pinus are most widely distributed. Significant outcomes other than the main aims were also achieved. It is shown that mobilisation of Fe in podosols can occur as dissolved Fe(II) rather than as Fe(III)-organic complexes. This has implications for the large body of work which assumes that Fe(II) plays a minor role. Also, the first paper demonstrates that a data analysis approach based on Kohonen’s self organising maps can facilitate the interpretation of complex datasets and can help identify geochemical processes operating on a catchment scale.

Using Bayesian methods for the estimation of uncertainty in complex statistical models

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The research objectives of this thesis were to contribute to Bayesian statistical methodology by contributing to risk assessment statistical methodology, and to spatial and spatio-temporal methodology, by modelling error structures using complex hierarchical models. Specifically, I hoped to consider two applied areas, and use these applications as a springboard for developing new statistical methods as well as undertaking analyses which might give answers to particular applied questions. Thus, this thesis considers a series of models, firstly in the context of risk assessments for recycled water, and secondly in the context of water usage by crops. The research objective was to model error structures using hierarchical models in two problems, namely risk assessment analyses for wastewater, and secondly, in a four dimensional dataset, assessing differences between cropping systems over time and over three spatial dimensions. The aim was to use the simplicity and insight afforded by Bayesian networks to develop appropriate models for risk scenarios, and again to use Bayesian hierarchical models to explore the necessarily complex modelling of four dimensional agricultural data. The specific objectives of the research were to develop a method for the calculation of credible intervals for the point estimates of Bayesian networks; to develop a model structure to incorporate all the experimental uncertainty associated with various constants thereby allowing the calculation of more credible credible intervals for a risk assessment; to model a single day’s data from the agricultural dataset which satisfactorily captured the complexities of the data; to build a model for several days’ data, in order to consider how the full data might be modelled; and finally to build a model for the full four dimensional dataset and to consider the timevarying nature of the contrast of interest, having satisfactorily accounted for possible spatial and temporal autocorrelations. This work forms five papers, two of which have been published, with two submitted, and the final paper still in draft. The first two objectives were met by recasting the risk assessments as directed, acyclic graphs (DAGs). In the first case, we elicited uncertainty for the conditional probabilities needed by the Bayesian net, incorporated these into a corresponding DAG, and used Markov chain Monte Carlo (MCMC) to find credible intervals, for all the scenarios and outcomes of interest. In the second case, we incorporated the experimental data underlying the risk assessment constants into the DAG, and also treated some of that data as needing to be modelled as an ‘errors-invariables’ problem [Fuller, 1987]. This illustrated a simple method for the incorporation of experimental error into risk assessments. In considering one day of the three-dimensional agricultural data, it became clear that geostatistical models or conditional autoregressive (CAR) models over the three dimensions were not the best way to approach the data. Instead CAR models are used with neighbours only in the same depth layer. This gave flexibility to the model, allowing both the spatially structured and non-structured variances to differ at all depths. We call this model the CAR layered model. Given the experimental design, the fixed part of the model could have been modelled as a set of means by treatment and by depth, but doing so allows little insight into how the treatment effects vary with depth. Hence, a number of essentially non-parametric approaches were taken to see the effects of depth on treatment, with the model of choice incorporating an errors-in-variables approach for depth in addition to a non-parametric smooth. The statistical contribution here was the introduction of the CAR layered model, the applied contribution the analysis of moisture over depth and estimation of the contrast of interest together with its credible intervals. These models were fitted using WinBUGS [Lunn et al., 2000]. The work in the fifth paper deals with the fact that with large datasets, the use of WinBUGS becomes more problematic because of its highly correlated term by term updating. In this work, we introduce a Gibbs sampler with block updating for the CAR layered model. The Gibbs sampler was implemented by Chris Strickland using pyMCMC [Strickland, 2010]. This framework is then used to consider five days data, and we show that moisture in the soil for all the various treatments reaches levels particular to each treatment at a depth of 200 cm and thereafter stays constant, albeit with increasing variances with depth. In an analysis across three spatial dimensions and across time, there are many interactions of time and the spatial dimensions to be considered. Hence, we chose to use a daily model and to repeat the analysis at all time points, effectively creating an interaction model of time by the daily model. Such an approach allows great flexibility. However, this approach does not allow insight into the way in which the parameter of interest varies over time. Hence, a two-stage approach was also used, with estimates from the first-stage being analysed as a set of time series. We see this spatio-temporal interaction model as being a useful approach to data measured across three spatial dimensions and time, since it does not assume additivity of the random spatial or temporal effects.

Aggregate distance based clustering using Fibonacci Series -FIBCLUS

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper proposes an innovative instance similarity based evaluation metric that reduces the search map for clustering to be performed. An aggregate global score is calculated for each instance using the novel idea of Fibonacci series. The use of Fibonacci numbers is able to separate the instances effectively and, in hence, the intra-cluster similarity is increased and the inter-cluster similarity is decreased during clustering. The proposed FIBCLUS algorithm is able to handle datasets with numerical, categorical and a mix of both types of attributes. Results obtained with FIBCLUS are compared with the results of existing algorithms such as k-means, x-means expected maximization and hierarchical algorithms that are widely used to cluster numeric, categorical and mix data types. Empirical analysis shows that FIBCLUS is able to produce better clustering solutions in terms of entropy, purity and F-score in comparison to the above described existing algorithms.

Individual user behaviour modelling for effective web recommendation

Relevância:

10.00% 10.00%

Publicador:

Resumo:

With the growth of the Web, E-commerce activities are also becoming popular. Product recommendation is an effective way of marketing a product to potential customers. Based on a user’s previous searches, most recommendation methods employ two dimensional models to find relevant items. Such items are then recommended to a user. Further too many irrelevant recommendations worsen the information overload problem for a user. This happens because such models based on vectors and matrices are unable to find the latent relationships that exist between users and searches. Identifying user behaviour is a complex process, and usually involves comparing searches made by him. In most of the cases traditional vector and matrix based methods are used to find prominent features as searched by a user. In this research we employ tensors to find relevant features as searched by users. Such relevant features are then used for making recommendations. Evaluation on real datasets show the effectiveness of such recommendations over vector and matrix based methods.

Identifying interests of web users for effective recommendations

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Search log data is multi dimensional data consisting of number of searches of multiple users with many searched parameters. This data can be used to identify a user’s interest in an item or object being searched. Identifying highest interests of a Web user from his search log data is a complex process. Based on a user’s previous searches, most recommendation methods employ two-dimensional models to find relevant items. Such items are then recommended to a user. Two-dimensional data models, when used to mine knowledge from such multi dimensional data may not be able to give good mappings of user and his searches. The major problem with such models is that they are unable to find the latent relationships that exist between different searched dimensions. In this research work, we utilize tensors to model the various searches made by a user. Such high dimensional data model is then used to extract the relationship between various dimensions, and find the prominent searched components. To achieve this, we have used popular tensor decomposition methods like PARAFAC, Tucker and HOSVD. All experiments and evaluation is done on real datasets, which clearly show the effectiveness of tensor models in finding prominent searched components in comparison to other widely used two-dimensional data models. Such top rated searched components are then given as recommendation to users.

Characterisation of the TaNF-Y family of transcription factors in wheat

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Light plays a unique role for plants as it is both a source of energy for growth and a signal for development. Light captured by the pigments in the light harvesting complexes is used to drive the synthesis of the chemical energy required for carbon assimilation. The light perceived by photoreceptors activates effectors, such as transcription factors (TFs), which modulate the expression of light-responsive genes. Recently, it has been speculated that increasing the photosynthetic rate could further improve the yield potential of three carbon (C3) crops such as wheat. However, little is currently known about the transcriptional regulation of photosynthesis genes, particularly in crop species. Nuclear factor Y (NF-Y) TF is a functionally diverse regulator of growth and development in the model plant species, with demonstrated roles in embryo development, stress response, flowering time and chloroplast biogenesis. Furthermore, a light-responsive NF-Y binding site (CCAAT-box) is present in the promoter of a spinach photosynthesis gene. As photosynthesis genes are co-regulated by light and co-regulated genes typically have similar regulatory elements in their promoters, it seems likely that other photosynthesis genes would also have light-responsive CCAAT-boxes. This provided the impetus to investigate the NF-Y TF in bread wheat. This thesis is focussed on wheat NF-Y members that have roles in light-mediated gene regulation with an emphasis on their involvement in the regulation of photosynthesis genes. NF-Y is a heterotrimeric complex, comprised of the three subunits NF-YA, NF-YB and NF-YC. Unlike the mammalian and yeast counterparts, each of the three subunits is encoded by multiple genes in Arabidopsis. The initial step taken in this study was the identification of the wheat NF-Y family (Chapter 3). A search of the current wheat nucleotide sequence databases identified 37 NF-Y genes (10 NF-YA, 11 NF-YB, 14 NF-YC & 2 Dr1). Phylogenetic analysis revealed that each of the three wheat NF-Y (TaNF-Y) subunit families could be divided into 4-5 clades based on their conserved core regions. Outside of the core regions, eleven motifs were identified to be conserved between Arabidopsis, rice and wheat NF-Y subunit members. The expression profiles of TaNF-Y genes were constructed using quantitative real-time polymerase chain reaction (RT-PCR). Some TaNF-Y subunit members had little variation in their transcript levels among the organs, while others displayed organ-predominant expression profiles, including those expressed mainly in the photosynthetic organs. To investigate their potential role in light-mediated gene regulation, the light responsiveness of the TaNF-Y genes were examined (Chapters 4 and 5). Two TaNF-YB and five TaNF-YC members were markedly upregulated by light in both the wheat leaves and seedling shoots. To identify the potential target genes of the light-upregulated NF-Y subunit members, a gene expression correlation analysis was conducted using publically available Affymetrix Wheat Genome Array datasets. This analysis revealed that the transcript expression levels of TaNF-YB3 and TaNF-YC11 were significantly correlated with those of photosynthesis genes. These correlated express profiles were also observed in the quantitative RT-PCR dataset from wheat plants grown under light and dark conditions. Sequence analysis of the promoters of these wheat photosynthesis genes revealed that they were enriched with potential NF-Y binding sites (CCAAT-box). The potential role of TaNF-YB3 in the regulation of photosynthetic genes was further investigated using a transgenic approach (Chapter 5). Transgenic wheat lines constitutively expressing TaNF-YB3 were found to have significantly increased expression levels of photosynthesis genes, including those encoding light harvesting chlorophyll a/b-binding proteins, photosystem I reaction centre subunits, a chloroplast ATP synthase subunit and glutamyl-tRNA reductase (GluTR). GluTR is a rate-limiting enzyme in the chlorophyll biosynthesis pathway. In association with the increased expression of the photosynthesis genes, the transgenic lines had a higher leaf chlorophyll content, increased photosynthetic rate and had a more rapid early growth rate compared to the wild-type wheat. In addition to its role in the regulation of photosynthesis genes, TaNF-YB3 overexpression lines flower on average 2-days earlier than the wild-type (Chapter 6). Quantitative RT-PCR analysis showed that there was a 13-fold increase in the expression level of the floral integrator, TaFT. The transcript levels of other downstream genes (TaFT2 and TaVRN1) were also increased in the transgenic lines. Furthermore, the transcript levels of TaNF-YB3 were significantly correlated with those of constans (CO), constans-like (COL) and timing of chlorophyll a/b-binding (CAB) expression 1 [TOC1; (CCT)] domain-containing proteins known to be involved in the regulation of flowering time. To summarise the key findings of this study, 37 NF-Y genes were identified in the crop species wheat. An in depth analysis of TaNF-Y gene expression profiles revealed that the potential role of some light-upregulated members was in the regulation of photosynthetic genes. The involvement of TaNF-YB3 in the regulation of photosynthesis genes was supported by data obtained from transgenic wheat lines with increased constitutive expression of TaNF-YB3. The overexpression of TaNF-YB3 in the transgenic lines revealed this NF-YB member is also involved in the fine-tuning of flowering time. These data suggest that the NF-Y TF plays an important role in light-mediated gene regulation in wheat.

Flow regime transition criteria for two-phase flow at reduced gravity conditions

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Flow regime transition criteria are of practical importance for two-phase flow analyses at reduced gravity conditions. Here, flow regime transition criteria which take the friction pressure loss effect into account were studied in detail. Criteria at reduced gravity conditions were developed by extending an existing model with various experimental datasets taken at microgravity conditions showed satisfactory agreement. Sample computations of the model were performed at various gravity conditions, such as 0.196, 1.62, 3.71, and 9.81 m/s2 corresponding to micro-gravity and lunar, Martian and Earth surface gravity, respectively. It was found that the effect of gravity on bubbly-slug and slug-annular (churn) transitions in a two-phase flow system was more pronounced at low liquid flow conditions, whereas the gravity effect could be ignored at high mixture volumetric flux conditions. While for the annular flow transitions due to flow reversal and onset of dropset entrainment, higher superficial gas velocity was obtained at higher gravity level.

Improving recall in appearance-based visual SLAM using visual expectation

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, we present a new algorithm for boosting visual template recall performance through a process of visual expectation. Visual expectation dynamically modifies the recognition thresholds of learnt visual templates based on recently matched templates, improving the recall of sequences of familiar places while keeping precision high, without any feedback from a mapping backend. We demonstrate the performance benefits of visual expectation using two 17 kilometer datasets gathered in an outdoor environment at two times separated by three weeks. The visual expectation algorithm provides up to a 100% improvement in recall. We also combine the visual expectation algorithm with the RatSLAM SLAM system and show how the algorithm enables successful mapping

User behaviour modelling in a multi-dimensional environment for personalization and recommendation

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Handling information overload online, from the user's point of view is a big challenge, especially when the number of websites is growing rapidly due to growth in e-commerce and other related activities. Personalization based on user needs is the key to solving the problem of information overload. Personalization methods help in identifying relevant information, which may be liked by a user. User profile and object profile are the important elements of a personalization system. When creating user and object profiles, most of the existing methods adopt two-dimensional similarity methods based on vector or matrix models in order to find inter-user and inter-object similarity. Moreover, for recommending similar objects to users, personalization systems use the users-users, items-items and users-items similarity measures. In most cases similarity measures such as Euclidian, Manhattan, cosine and many others based on vector or matrix methods are used to find the similarities. Web logs are high-dimensional datasets, consisting of multiple users, multiple searches with many attributes to each. Two-dimensional data analysis methods may often overlook latent relationships that may exist between users and items. In contrast to other studies, this thesis utilises tensors, the high-dimensional data models, to build user and object profiles and to find the inter-relationships between users-users and users-items. To create an improved personalized Web system, this thesis proposes to build three types of profiles: individual user, group users and object profiles utilising decomposition factors of tensor data models. A hybrid recommendation approach utilising group profiles (forming the basis of a collaborative filtering method) and object profiles (forming the basis of a content-based method) in conjunction with individual user profiles (forming the basis of a model based approach) is proposed for making effective recommendations. A tensor-based clustering method is proposed that utilises the outcomes of popular tensor decomposition techniques such as PARAFAC, Tucker and HOSVD to group similar instances. An individual user profile, showing the user's highest interest, is represented by the top dimension values, extracted from the component matrix obtained after tensor decomposition. A group profile, showing similar users and their highest interest, is built by clustering similar users based on tensor decomposed values. A group profile is represented by the top association rules (containing various unique object combinations) that are derived from the searches made by the users of the cluster. An object profile is created to represent similar objects clustered on the basis of their similarity of features. Depending on the category of a user (known, anonymous or frequent visitor to the website), any of the profiles or their combinations is used for making personalized recommendations. A ranking algorithm is also proposed that utilizes the personalized information to order and rank the recommendations. The proposed methodology is evaluated on data collected from a real life car website. Empirical analysis confirms the effectiveness of recommendations made by the proposed approach over other collaborative filtering and content-based recommendation approaches based on two-dimensional data analysis methods.

Bayesian mixtures for modelling complex medical data : a case study in Parkinson’s disease

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Mixture models are a flexible tool for unsupervised clustering that have found popularity in a vast array of research areas. In studies of medicine, the use of mixtures holds the potential to greatly enhance our understanding of patient responses through the identification of clinically meaningful clusters that, given the complexity of many data sources, may otherwise by intangible. Furthermore, when developed in the Bayesian framework, mixture models provide a natural means for capturing and propagating uncertainty in different aspects of a clustering solution, arguably resulting in richer analyses of the population under study. This thesis aims to investigate the use of Bayesian mixture models in analysing varied and detailed sources of patient information collected in the study of complex disease. The first aim of this thesis is to showcase the flexibility of mixture models in modelling markedly different types of data. In particular, we examine three common variants on the mixture model, namely, finite mixtures, Dirichlet Process mixtures and hidden Markov models. Beyond the development and application of these models to different sources of data, this thesis also focuses on modelling different aspects relating to uncertainty in clustering. Examples of clustering uncertainty considered are uncertainty in a patient’s true cluster membership and accounting for uncertainty in the true number of clusters present. Finally, this thesis aims to address and propose solutions to the task of comparing clustering solutions, whether this be comparing patients or observations assigned to different subgroups or comparing clustering solutions over multiple datasets. To address these aims, we consider a case study in Parkinson’s disease (PD), a complex and commonly diagnosed neurodegenerative disorder. In particular, two commonly collected sources of patient information are considered. The first source of data are on symptoms associated with PD, recorded using the Unified Parkinson’s Disease Rating Scale (UPDRS) and constitutes the first half of this thesis. The second half of this thesis is dedicated to the analysis of microelectrode recordings collected during Deep Brain Stimulation (DBS), a popular palliative treatment for advanced PD. Analysis of this second source of data centers on the problems of unsupervised detection and sorting of action potentials or "spikes" in recordings of multiple cell activity, providing valuable information on real time neural activity in the brain.

«
1
2
...
14
15
16
17
18
19
20
...
64
65
»