29 resultados para databases and data mining
em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain
Resumo:
In this project a research both in finding predictors via clustering techniques and in reviewing the Data Mining free software is achieved. The research is based in a case of study, from where additionally to the KDD free software used by the scientific community; a new free tool for pre-processing the data is presented. The predictors are intended for the e-learning domain as the data from where these predictors have to be inferred are student qualifications from different e-learning environments. Through our case of study not only clustering algorithms are tested but also additional goals are proposed.
Resumo:
Trabajo de investigación que realiza un estudio clasificatorio de las asignaturas matriculadas en la carrera de Administración y Dirección de Empresas de la UOC en relación a su resultado. Se proponen diferentes métodos y modelos de comprensión del entorno en el que se realiza el estudio.
Resumo:
Marketing scholars have suggested a need for more empirical research on consumer response to malls, in order to have a better understanding of the variables that explain the behavior of the consumers. The segmentation methodology CHAID (Chi-square automatic interaction detection) was used in order to identify the profiles of consumers with regard to their activities at malls, on the basis of socio-demographic variables and behavioral variables (how and with whom they go to the malls). A sample of 790 subjects answered an online questionnaire. The CHAID analysis of the results was used to identify the profiles of consumers with regard to their activities at malls. In the set of variables analyzed the transport used in order to go shopping and the frequency of visits to centers are the main predictors of behavior in malls. The results provide guidelines for the development of effective strategies to attract consumers to malls and retain them there.
Resumo:
Consider a model with parameter phi, and an auxiliary model with parameter theta. Let phi be a randomly sampled from a given density over the known parameter space. Monte Carlo methods can be used to draw simulated data and compute the corresponding estimate of theta, say theta_tilde. A large set of tuples (phi, theta_tilde) can be generated in this manner. Nonparametric methods may be use to fit the function E(phi|theta_tilde=a), using these tuples. It is proposed to estimate phi using the fitted E(phi|theta_tilde=theta_hat), where theta_hat is the auxiliary estimate, using the real sample data. This is a consistent and asymptotically normally distributed estimator, under certain assumptions. Monte Carlo results for dynamic panel data and vector autoregressions show that this estimator can have very attractive small sample properties. Confidence intervals can be constructed using the quantiles of the phi for which theta_tilde is close to theta_hat. Such confidence intervals are found to have very accurate coverage.
Resumo:
In past years, comprehensive representations of cell signalling pathways have been developed by manual curation from literature, which requires huge effort and would benefit from information stored in databases and from automatic retrieval and integration methods. Once a reconstruction of the network of interactions is achieved, analysis of its structural features and its dynamic behaviour can take place. Mathematical modelling techniques are used to simulate the complex behaviour of cell signalling networks, which ultimately sheds light on the mechanisms leading to complex diseases or helps in the identification of drug targets. A variety of databases containing information on cell signalling pathways have been developed in conjunction with methodologies to access and analyse the data. In principle, the scenario is prepared to make the most of this information for the analysis of the dynamics of signalling pathways. However, are the knowledge repositories of signalling pathways ready to realize the systems biology promise? In this article we aim to initiate this discussion and to provide some insights on this issue.
Resumo:
The spectral efficiency achievable with joint processing of pilot and data symbol observations is compared with that achievable through the conventional (separate) approach of first estimating the channel on the basis of the pilot symbols alone, and subsequently detecting the datasymbols. Studied on the basis of a mutual information lower bound, joint processing is found to provide a non-negligible advantage relative to separate processing, particularly for fast fading. It is shown that, regardless of the fading rate, only a very small number of pilot symbols (at most one per transmit antenna and per channel coherence interval) shouldbe transmitted if joint processing is allowed.
Resumo:
This article reviews the methodology of the studies on drug utilization with particular emphasis on primary care. Population based studies of drug inappropriateness can be done with microdata from Health Electronic Records and e-prescriptions. Multilevel models estimate the influence of factors affecting the appropriateness of drug prescription at different hierarchical levels: patient, doctor, health care organization and regulatory environment. Work by the GIUMAP suggest that patient characteristics are the most important factor in the appropriateness of prescriptions with significant effects at the general practicioner level.
Resumo:
This article reviews the methodology of the studies on drug utilization with particular emphasis on primary care. Population based studies of drug inappropriateness can be done with microdata from Health Electronic Records and e-prescriptions. Multilevel models estimate the influence of factors affecting the appropriateness of drug prescription at different hierarchical levels: patient, doctor, health care organization and regulatory environment.Work by the GIUMAP suggest that patient characteristics are the most important factor in the appropriateness of prescriptions with significant effects at the general practicioner level.
Resumo:
We study theoretical and empirical aspects of the mean exit time (MET) of financial time series. The theoretical modeling is done within the framework of continuous time random walk. We empirically verify that the mean exit time follows a quadratic scaling law and it has associated a prefactor which is specific to the analyzed stock. We perform a series of statistical tests to determine which kind of correlation are responsible for this specificity. The main contribution is associated with the autocorrelation property of stock returns. We introduce and solve analytically both two-state and three-state Markov chain models. The analytical results obtained with the two-state Markov chain model allows us to obtain a data collapse of the 20 measured MET profiles in a single master curve.
Resumo:
In this correspondence, we propose applying the hiddenMarkov models (HMM) theory to the problem of blind channel estimationand data detection. The Baum–Welch (BW) algorithm, which is able toestimate all the parameters of the model, is enriched by introducingsome linear constraints emerging from a linear FIR hypothesis on thechannel. Additionally, a version of the algorithm that is suitable for timevaryingchannels is also presented. Performance is analyzed in a GSMenvironment using standard test channels and is found to be close to thatobtained with a nonblind receiver.
Resumo:
Development of methods to explore data from educational settings, to understand better the learning process.
Resumo:
Bases de dades i magatzems de dades: disseny i implementació d'una base de dades relacional per al manteniment d'aparells d'una empresa.
Resumo:
Construcción y explotación de un almacén de datos de planificación hidrológica para la Confederación Hidrográfica del Norte y Este.
Resumo:
Construcción y explotación de un almacén de datos de planificación hidrológica.
Resumo:
Background: To enhance our understanding of complex biological systems like diseases we need to put all of the available data into context and use this to detect relations, pattern and rules which allow predictive hypotheses to be defined. Life science has become a data rich science with information about the behaviour of millions of entities like genes, chemical compounds, diseases, cell types and organs, which are organised in many different databases and/or spread throughout the literature. Existing knowledge such as genotype - phenotype relations or signal transduction pathways must be semantically integrated and dynamically organised into structured networks that are connected with clinical and experimental data. Different approaches to this challenge exist but so far none has proven entirely satisfactory. Results: To address this challenge we previously developed a generic knowledge management framework, BioXM™, which allows the dynamic, graphic generation of domain specific knowledge representation models based on specific objects and their relations supporting annotations and ontologies. Here we demonstrate the utility of BioXM for knowledge management in systems biology as part of the EU FP6 BioBridge project on translational approaches to chronic diseases. From clinical and experimental data, text-mining results and public databases we generate a chronic obstructive pulmonary disease (COPD) knowledge base and demonstrate its use by mining specific molecular networks together with integrated clinical and experimental data. Conclusions: We generate the first semantically integrated COPD specific public knowledge base and find that for the integration of clinical and experimental data with pre-existing knowledge the configuration based set-up enabled by BioXM reduced implementation time and effort for the knowledge base compared to similar systems implemented as classical software development projects. The knowledgebase enables the retrieval of sub-networks including protein-protein interaction, pathway, gene - disease and gene - compound data which are used for subsequent data analysis, modelling and simulation. Pre-structured queries and reports enhance usability; establishing their use in everyday clinical settings requires further simplification with a browser based interface which is currently under development.