55 resultados para Web Mining, Data Mining, User Topic Model, Web User Profiles


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Overlaying maps using a desktop GIS is often the first step of a multivariate spatial analysis. The potential of this operation has increased considerably as data sources an dWeb services to manipulate them are becoming widely available via the Internet. Standards from the OGC enable such geospatial ‘mashups’ to be seamless and user driven, involving discovery of thematic data. The user is naturally inclined to look for spatial clusters and ‘correlation’ of outcomes. Using classical cluster detection scan methods to identify multivariate associations can be problematic in this context, because of a lack of control on or knowledge about background populations. For public health and epidemiological mapping, this limiting factor can be critical but often the focus is on spatial identification of risk factors associated with health or clinical status. In this article we point out that this association itself can ensure some control on underlying populations, and develop an exploratory scan statistic framework for multivariate associations. Inference using statistical map methodologies can be used to test the clustered associations. The approach is illustrated with a hypothetical data example and an epidemiological study on community MRSA. Scenarios of potential use for online mashups are introduced but full implementation is left for further research.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

1. Pearson's correlation coefficient only tests whether the data fit a linear model. With large numbers of observations, quite small values of r become significant and the X variable may only account for a minute proportion of the variance in Y. Hence, the value of r squared should always be calculated and included in a discussion of the significance of r. 2. The use of r assumes that a bivariate normal distribution is present and this assumption should be examined prior to the study. If Pearson's r is not appropriate, then a non-parametric correlation coefficient such as Spearman's rs may be used. 3. A significant correlation should not be interpreted as indicating causation especially in observational studies in which there is a high probability that the two variables are correlated because of their mutual correlations with other variables. 4. In studies of measurement error, there are problems in using r as a test of reliability and the ‘intra-class correlation coefficient’ should be used as an alternative. A correlation test provides only limited information as to the relationship between two variables. Fitting a regression line to the data using the method known as ‘least square’ provides much more information and the methods of regression and their application in optometry will be discussed in the next article.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Visualising data for exploratory analysis is a major challenge in many applications. Visualisation allows scientists to gain insight into the structure and distribution of the data, for example finding common patterns and relationships between samples as well as variables. Typically, visualisation methods like principal component analysis and multi-dimensional scaling are employed. These methods are favoured because of their simplicity, but they cannot cope with missing data and it is difficult to incorporate prior knowledge about properties of the variable space into the analysis; this is particularly important in the high-dimensional, sparse datasets typical in geochemistry. In this paper we show how to utilise a block-structured correlation matrix using a modification of a well known non-linear probabilistic visualisation model, the Generative Topographic Mapping (GTM), which can cope with missing data. The block structure supports direct modelling of strongly correlated variables. We show that including prior structural information it is possible to improve both the data visualisation and the model fit. These benefits are demonstrated on artificial data as well as a real geochemical dataset used for oil exploration, where the proposed modifications improved the missing data imputation results by 3 to 13%.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we present a novel method for emulating a stochastic, or random output, computer model and show its application to a complex rabies model. The method is evaluated both in terms of accuracy and computational efficiency on synthetic data and the rabies model. We address the issue of experimental design and provide empirical evidence on the effectiveness of utilizing replicate model evaluations compared to a space-filling design. We employ the Mahalanobis error measure to validate the heteroscedastic Gaussian process based emulator predictions for both the mean and (co)variance. The emulator allows efficient screening to identify important model inputs and better understanding of the complex behaviour of the rabies model.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper examines the problems in the definition of the General Non-Parametric Corporate Performance (GNCP) and introduces a multiplicative linear programming as an alternative model for corporate performance. We verified and tested a statistically significant difference between the two models based on the application of 27 UK industries using six performance ratios. Our new model is found to be a more robust performance model than the previous standard Data Envelopment Analysis (DEA) model.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Social streams have proven to be the mostup-to-date and inclusive information on cur-rent events. In this paper we propose a novelprobabilistic modelling framework, called violence detection model (VDM), which enables the identification of text containing violent content and extraction of violence-related topics over social media data. The proposed VDM model does not require any labeled corpora for training, instead, it only needs the in-corporation of word prior knowledge which captures whether a word indicates violence or not. We propose a novel approach of deriving word prior knowledge using the relative entropy measurement of words based on the in-tuition that low entropy words are indicative of semantically coherent topics and therefore more informative, while high entropy words indicates words whose usage is more topical diverse and therefore less informative. Our proposed VDM model has been evaluated on the TREC Microblog 2011 dataset to identify topics related to violence. Experimental results show that deriving word priors using our proposed relative entropy method is more effective than the widely-used information gain method. Moreover, VDM gives higher violence classification results and produces more coherent violence-related topics compared toa few competitive baselines.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Resource Space Model is a kind of data model which can effectively and flexibly manage the digital resources in cyber-physical system from multidimensional and hierarchical perspectives. This paper focuses on constructing resource space automatically. We propose a framework that organizes a set of digital resources according to different semantic dimensions combining human background knowledge in WordNet and Wikipedia. The construction process includes four steps: extracting candidate keywords, building semantic graphs, detecting semantic communities and generating resource space. An unsupervised statistical language topic model (i.e., Latent Dirichlet Allocation) is applied to extract candidate keywords of the facets. To better interpret meanings of the facets found by LDA, we map the keywords to Wikipedia concepts, calculate word relatedness using WordNet's noun synsets and construct corresponding semantic graphs. Moreover, semantic communities are identified by GN algorithm. After extracting candidate axes based on Wikipedia concept hierarchy, the final axes of resource space are sorted and picked out through three different ranking strategies. The experimental results demonstrate that the proposed framework can organize resources automatically and effectively.©2013 Published by Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

To compare the accuracy of different forecasting approaches an error measure is required. Many error measures have been proposed in the literature, however in practice there are some situations where different measures yield different decisions on forecasting approach selection and there is no agreement on which approach should be used. Generally forecasting measures represent ratios or percentages providing an overall image of how well fitted the forecasting technique is to the observations. This paper proposes a multiplicative Data Envelopment Analysis (DEA) model in order to rank several forecasting techniques. We demonstrate the proposed model by applying it to the set of yearly time series of the M3 competition. The usefulness of the proposed approach has been tested using the M3-competition where five error measures have been applied in and aggregated to a single DEA score.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A Bayesian procedure for the retrieval of wind vectors over the ocean using satellite borne scatterometers requires realistic prior near-surface wind field models over the oceans. We have implemented carefully chosen vector Gaussian Process models; however in some cases these models are too smooth to reproduce real atmospheric features, such as fronts. At the scale of the scatterometer observations, fronts appear as discontinuities in wind direction. Due to the nature of the retrieval problem a simple discontinuity model is not feasible, and hence we have developed a constrained discontinuity vector Gaussian Process model which ensures realistic fronts. We describe the generative model and show how to compute the data likelihood given the model. We show the results of inference using the model with Markov Chain Monte Carlo methods on both synthetic and real data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

thesis is developed from a real life application of performance evaluation of small and medium-sized enterprises (SMEs) in Vietnam. The thesis presents two main methodological developments on evaluation of dichotomous environment variable impacts on technical efficiency. Taking into account the selection bias the thesis proposes a revised frontier separation approach for the seminal Data Envelopment Analysis (DEA) model which was developed by Charnes, Cooper, and Rhodes (1981). The revised frontier separation approach is based on a nearest neighbour propensity score matching pairing treated SMEs with their counterfactuals on the propensity score. The thesis develops order-m frontier conditioning on propensity score from the conditional order-m approach proposed by Cazals, Florens, and Simar (2002), advocated by Daraio and Simar (2005). By this development, the thesis allows the application of the conditional order-m approach with a dichotomous environment variable taking into account the existence of the self-selection problem of impact evaluation. Monte Carlo style simulations have been built to examine the effectiveness of the aforementioned developments. Methodological developments of the thesis are applied in empirical studies to evaluate the impact of training programmes on the performance of food processing SMEs and the impact of exporting on technical efficiency of textile and garment SMEs of Vietnam. The analysis shows that training programmes have no significant impact on the technical efficiency of food processing SMEs. Moreover, the analysis confirms the conclusion of the export literature that exporters are self selected into the sector. The thesis finds no significant impact from exporting activities on technical efficiency of textile and garment SMEs. However, large bias has been eliminated by the proposed approach. Results of empirical studies contribute to the understanding of the impact of different environmental variables on the performance of SMEs. It helps policy makers to design proper policy supporting the development of Vietnamese SMEs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Edge blur is an important perceptual cue, but how does the visual system encode the degree of blur at edges? Blur could be measured by the width of the luminance gradient profile, peak ^ trough separation in the 2nd derivative profile, or the ratio of 1st-to-3rd derivative magnitudes. In template models, the system would store a set of templates of different sizes and find which one best fits the `signature' of the edge. The signature could be the luminance profile itself, or one of its spatial derivatives. I tested these possibilities in blur-matching experiments. In a 2AFC staircase procedure, observers adjusted the blur of Gaussian edges (30% contrast) to match the perceived blur of various non-Gaussian test edges. In experiment 1, test stimuli were mixtures of 2 Gaussian edges (eg 10 and 30 min of arc blur) at the same location, while in experiment 2, test stimuli were formed from a blurred edge sharpened to different extents by a compressive transformation. Predictions of the various models were tested against the blur-matching data, but only one model was strongly supported. This was the template model, in which the input signature is the 2nd derivative of the luminance profile, and the templates are applied to this signature at the zero-crossings. The templates are Gaussian derivative receptive fields that covary in width and length to form a self-similar set (ie same shape, different sizes). This naturally predicts that shorter edges should look sharper. As edge length gets shorter, responses of longer templates drop more than shorter ones, and so the response distribution shifts towards shorter (smaller) templates, signalling a sharper edge. The data confirmed this, including the scale-invariance implied by self-similarity, and a good fit was obtained from templates with a length-to-width ratio of about 1. The simultaneous analysis of edge blur and edge location may offer a new solution to the multiscale problem in edge detection.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Liquid-liquid extraction has long been known as a unit operation that plays an important role in industry. This process is well known for its complexity and sensitivity to operation conditions. This thesis presents an attempt to explore the dynamics and control of this process using a systematic approach and state of the art control system design techniques. The process was studied first experimentally under carefully selected. operation conditions, which resembles the ranges employed practically under stable and efficient conditions. Data were collected at steady state conditions using adequate sampling techniques for the dispersed and continuous phases as well as during the transients of the column with the aid of a computer-based online data logging system and online concentration analysis. A stagewise single stage backflow model was improved to mimic the dynamic operation of the column. The developed model accounts for the variation in hydrodynamics, mass transfer, and physical properties throughout the length of the column. End effects were treated by addition of stages at the column entrances. Two parameters were incorporated in the model namely; mass transfer weight factor to correct for the assumption of no mass transfer in the. settling zones at each stage and the backmixing coefficients to handle the axial dispersion phenomena encountered in the course of column operation. The parameters were estimated by minimizing the differences between the experimental and the model predicted concentration profiles at steady state conditions using non-linear optimisation technique. The estimated values were then correlated as functions of operating parameters and were incorporated in·the model equations. The model equations comprise a stiff differential~algebraic system. This system was solved using the GEAR ODE solver. The calculated concentration profiles were compared to those experimentally measured. A very good agreement of the two profiles was achieved within a percent relative error of ±2.S%. The developed rigorous dynamic model of the extraction column was used to derive linear time-invariant reduced-order models that relate the input variables (agitator speed, solvent feed flowrate and concentration, feed concentration and flowrate) to the output variables (raffinate concentration and extract concentration) using the asymptotic method of system identification. The reduced-order models were shown to be accurate in capturing the dynamic behaviour of the process with a maximum modelling prediction error of I %. The simplicity and accuracy of the derived reduced-order models allow for control system design and analysis of such complicated processes. The extraction column is a typical multivariable process with agitator speed and solvent feed flowrate considered as manipulative variables; raffinate concentration and extract concentration as controlled variables and the feeds concentration and feed flowrate as disturbance variables. The control system design of the extraction process was tackled as multi-loop decentralised SISO (Single Input Single Output) as well as centralised MIMO (Multi-Input Multi-Output) system using both conventional and model-based control techniques such as IMC (Internal Model Control) and MPC (Model Predictive Control). Control performance of each control scheme was. studied in terms of stability, speed of response, sensitivity to modelling errors (robustness), setpoint tracking capabilities and load rejection. For decentralised control, multiple loops were assigned to pair.each manipulated variable with each controlled variable according to the interaction analysis and other pairing criteria such as relative gain array (RGA), singular value analysis (SVD). Loops namely Rotor speed-Raffinate concentration and Solvent flowrate Extract concentration showed weak interaction. Multivariable MPC has shown more effective performance compared to other conventional techniques since it accounts for loops interaction, time delays, and input-output variables constraints.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The thesis investigates the properties of two trends or time series which formed a:part of the Co-Citation bibliometric model "X~Ray Crystallography and Protein Determination in 1978, 1980 and 1982". This model was one of several created for the 1983 ABRC Science Policy Study which aimed to test the utility of bibliometric models in a national science policy context. The outcome of the validation part of that study proved to be especially favourable concerning the utility of trend data, which purport to model the development of speciality areas in science over time. This assessment could have important implications for the use of such data in policy formulation. However one possible problem with the Science Policy Study's conclusions was that insufficient time was available in the study for an in-depth analysis of the data. The thesis aims to continue the validation begun in the ABRC study by providing a detailed.examination of the characteristics of the data contained in the Trends numbered 11 and 44 in the model. A novel methodology for the analysis of the properties of the trends with respect to their literature content is presented. This is followed by an assessment based on questionnaire and interview data, of the ability of Trend 44 to realistically model the historical development of the field of mobile genetic elements research over time, with respect to its scientific content and the activities of its community of researchers. The results of these various analyses are then used to evaluate the strenghts and weaknesses of a trend or time series approach to the modelling of the activities of scientifiic fields. A critical evaluation of the origins of the discovered strengths and weaknesses.in the assumptions underlying the techniques used to generate trends from co-citation data is provided. Possible improvements. to the modelling techniques are discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Reversed-pahse high-performance liquid chromatographic (HPLC) methods were developed for the assay of indomethacin, its decomposition products, ibuprofen and its (tetrahydro-2-furanyl)methyl-, (tetrahydro-2-(2H)pyranyl)methyl- and cyclohexylmethyl esters. The development and application of these HPLC systems were studied. A number of physico-chemical parameters that affect percutaneous absorption were investigated. The pKa values of indomethacin and ibuprofen were determined using the solubility method. Potentiometric titration and the Taft equation were also used for ibuprofen. The incorporation of ethanol or propylene glycol in the solvent resulted in an improvement in the aqueous solubility of these compounds. The partition coefficients were evaluated in order to establish the affinity of these drugs towards the stratum corneum. The stability of indomethacin and of ibuprofen esters were investigated and the effect of temperature and pH on the decomposition rates were studied. The effect of cetyltrimethylammonium bromide on the alkaline degradation of indomethacin was also followed. In the presence of alcohol, indomethacin alcoholysis was observed and the kinetics of decomposition were subjected to non-linear regression analysis and the rate constants for the various pathways were quantified. The non-isothermal, sufactant non-isoconcentration and non-isopH degradation of indomethacin were investigated. The analysis of the data was undertaken using NONISO, a BASIC computer program. The degradation profiles obtained from both non-iso and iso-kinetic studies show that there is close concordance in the results. The metabolic biotransformation of ibuprofen esters was followed using esterases from hog liver and rat skin homogenates. The results showed that the esters were very labile under these conditions. The presence of propylene glycol affected the rates of enzymic hydrolysis of the ester. The hydrolysis is modelled using an equation involving the dielectric constant of the medium. The percutaneous absorption of indomethacin and of ibuprofen and its esters was followed from solutions using an in vitro excised human skin model. The absorption profiles followed first order kinetics. The diffusion process was related to their solubility and to the human skin/solvent partition coefficient. The percutaneous absorption of two ibuprofen esters from suspensions in 20% propylene glycol-water were also followed through rat skin with only ibuprofen being detected in the receiver phase. The sensitivity of ibuprofen esters to enzymic hydrolysis compared to the chemical hydrolysis may prove valuable in the formulation of topical delivery systems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Strontium has been substituted for calcium in the glass series (SiO2)49.46(Na2O)26.38(P2O5)1.07(CaO)23.08x(SrO)x (where x = 0, 11.54, 23.08) to elucidate their underlying atomic-scale structural characteristics as a basis for understanding features related to the bioactivity. These bioactive glasses have been investigated using isomorphic neutron and X-ray diffraction, Sr K-edge EXAFS and solid state 17O, 23Na, 29Si, 31P and 43Ca magic-angle-spinning (MAS) NMR. An effective isomorphic substitution first-order difference function has been applied to the neutron diffraction data, confirming that Ca and Sr behave in a similar manner within the glass network, with residual differences attributed to solely the variation in ionic radius between the two species. The diffraction data provides the first direct experimental evidence of split Ca–O nearest-neighbour correlations in these melt quench bioactive glasses, together with an analogous splitting of the Sr–O correlations; the correlations are attributed to the metal ions correlated either to bridging or to non-bridging oxygen atoms. Triple quantum (3Q) 43Ca MAS NMR corroborates the split Ca–O correlations. Successful simplification of the 2 < r (A) < 3 region via the difference method has also revealed two distinct Na environments. These environments are attributed to sodium correlated either to bridging or to nonbridging oxygen atoms. Complementary multinuclear MAS NMR, Sr K-edge EXAFS and X-ray diffraction data supports the structural model presented. The structural sites present will be intimately related to their release properties in physiological fluids such as plasma and saliva, and hence the bioactivity of the material. Detailed structural knowledge is therefore a prerequisite for optimising material design.