48 resultados para Data Mining, Yield Improvement, Self Organising Map, Clustering Quality


100.00% 100.00%



Aplicación en entorno web para facilitar la planificación, control y gestión de operaciones logísticas de descarga en un almacén regulador que gestione diferentes clientes con varios orígenes y destinos. Permite el acceso concurrente de varios usuarios previamente autorizados con diferentes niveles de acceso a datos y actualizaciones. Su utilización y actualización de los datos permite mantener un control on-line de la actividad y dar visibilidad a los clientes. Los datos almacenados se convierten en una fuente única, compartida y exportable para posteriores usos en análisis de datos y procesos de mejora. Incorpora informes de actividad y de indicadores de calidad.


100.00% 100.00%



Our procedure to detect moving groups in the solar neighbourhood (Chen et al., 1997) in the four-dimensional space of the stellar velocity components and age has been improved. The method, which takes advantadge of non-parametric estimators of density distribution to avoid any a priori knowledge of the kinematic properties of these stellar groups, now includes the effect of observational errors on the process to select moving group stars, uses a better estimation of the density distribution of the total sample and field stars, and classifies moving group stars using all the available information. It is applied here to an accurately selected sample of early-type stars with known radial velocities and Strömgren photometry. Astrometric data are taken from the HIPPARCOS catalogue (ESA, 1997), which results in an important decrease in the observational errors with respect to ground-based data, and ensures the uniformity of the observed data. Both the improvement of our method and the use of precise astrometric data have allowed us not only to confirm the existence of classical moving groups, but also to detect finer structures that in several cases can be related to kinematic properties of nearby open clusters or associations.


100.00% 100.00%



En este artículo se propone el análisis de las interacciones entre usuarios de Twitter, tanto lo que se genera alrededor de un usuario concreto como el análisis de un hashtag dado durante un periodo de tiempo establecido.


100.00% 100.00%



PURPOSE: Pharmacovigilance methods have advanced greatly during the last decades, making post-market drug assessment an essential drug evaluation component. These methods mainly rely on the use of spontaneous reporting systems and health information databases to collect expertise from huge amounts of real-world reports. The EU-ADR Web Platform was built to further facilitate accessing, monitoring and exploring these data, enabling an in-depth analysis of adverse drug reactions risks.METHODS: The EU-ADR Web Platform exploits the wealth of data collected within a large-scale European initiative, the EU-ADR project. Millions of electronic health records, provided by national health agencies, are mined for specific drug events, which are correlated with literature, protein and pathway data, resulting in a rich drug-event dataset. Next, advanced distributed computing methods are tailored to coordinate the execution of data-mining and statistical analysis tasks. This permits obtaining a ranked drug-event list, removing spurious entries and highlighting relationships with high risk potential.RESULTS: The EU-ADR Web Platform is an open workspace for the integrated analysis of pharmacovigilance datasets. Using this software, researchers can access a variety of tools provided by distinct partners in a single centralized environment. Besides performing standalone drug-event assessments, they can also control the pipeline for an improved batch analysis of custom datasets. Drug-event pairs can be substantiated and statistically analysed within the platform's innovative working environment.CONCLUSIONS: A pioneering workspace that helps in explaining the biological path of adverse drug reactions was developed within the EU-ADR project consortium. This tool, targeted at the pharmacovigilance community, is available online at https://bioinformatics.ua.pt/euadr/. Copyright © 2012 John Wiley & Sons, Ltd.


100.00% 100.00%



El objetivo de este artículo es introducir al lector español en algunos debates recientes de la comunidad de humanistas digitales de habla inglesa. En lugar de intentar definir la disciplina en términos absolutos, se ha optado por una aproximación diacrónica aunque se ha puesto el acento en algunos principios como la interdisciplinariedad y la construcción de modelos, valores como el acceso y el código abierto, y prácticas como la minería de datos y la colaboración.


100.00% 100.00%



Aquesta exposició vol presentar breument el ventall d'eines disponibles, la terminologia utilitzada i, en general, el marc metodològic de l'estadística exploratoria i de l'analisi de dades, el paradigma de la disciplina.En el decurs dels darrers anys, la disciplina no ha estat pas capgirada, però de tota manera sí que cal una actualització permanent.S'han forjat i provat algunes eines gairebé només esbossades, han aparegut nous dominis d'aplicació. Cal precisar la relació amb els competidors i dinamics veïns (intel·ligencia artificial, xarxes neurals, Data Mining). La perspectiva que presento dels mètodes d'anàlisi de dades emana evidentment d'un punt de vista particular; altres punts de vista poden ser igualment vàlids


100.00% 100.00%



The European Space Agency's Gaia mission will create the largest and most precise three dimensional chart of our galaxy (the Milky Way), by providing unprecedented position, parallax, proper motion, and radial velocity measurements for about one billion stars. The resulting catalogue will be made available to the scientific community and will be analyzed in many different ways, including the production of a variety of statistics. The latter will often entail the generation of multidimensional histograms and hypercubes as part of the precomputed statistics for each data release, or for scientific analysis involving either the final data products or the raw data coming from the satellite instruments. In this paper we present and analyze a generic framework that allows the hypercube generation to be easily done within a MapReduce infrastructure, providing all the advantages of the new Big Data analysis paradigmbut without dealing with any specific interface to the lower level distributed system implementation (Hadoop). Furthermore, we show how executing the framework for different data storage model configurations (i.e. row or column oriented) and compression techniques can considerably improve the response time of this type of workload for the currently available simulated data of the mission. In addition, we put forward the advantages and shortcomings of the deployment of the framework on a public cloud provider, benchmark against other popular solutions available (that are not always the best for such ad-hoc applications), and describe some user experiences with the framework, which was employed for a number of dedicated astronomical data analysis techniques workshops.


100.00% 100.00%



The European Space Agency's Gaia mission will create the largest and most precise three dimensional chart of our galaxy (the Milky Way), by providing unprecedented position, parallax, proper motion, and radial velocity measurements for about one billion stars. The resulting catalogue will be made available to the scientific community and will be analyzed in many different ways, including the production of a variety of statistics. The latter will often entail the generation of multidimensional histograms and hypercubes as part of the precomputed statistics for each data release, or for scientific analysis involving either the final data products or the raw data coming from the satellite instruments. In this paper we present and analyze a generic framework that allows the hypercube generation to be easily done within a MapReduce infrastructure, providing all the advantages of the new Big Data analysis paradigmbut without dealing with any specific interface to the lower level distributed system implementation (Hadoop). Furthermore, we show how executing the framework for different data storage model configurations (i.e. row or column oriented) and compression techniques can considerably improve the response time of this type of workload for the currently available simulated data of the mission. In addition, we put forward the advantages and shortcomings of the deployment of the framework on a public cloud provider, benchmark against other popular solutions available (that are not always the best for such ad-hoc applications), and describe some user experiences with the framework, which was employed for a number of dedicated astronomical data analysis techniques workshops.


100.00% 100.00%



In recent years, studies into the reasons for dropping out of higher education (including online education) have been undertaken with greater regularity, parallel to the rise in the relative weight of this type of education, compared with brick-and-mortar education. However, the work invested in characterising the students who drop out of education, compared with those who do not, appears not to have had the same relevance as that invested in the analysis of the causes. The definition of dropping out is very sensitive to the context. In this article, we reach a purely empirical definition of student dropping out, based on the probability of not continuing a specific academic programme following several consecutive semesters of "theoretical break". Dropping out should be properly defined before analysing its causes, as well as comparing the drop-out rates between the different online programmes, or between online and on-campus ones. Our results show that there are significant differences among programmes, depending on their theoretical extension, but not their domain of knowledge.


100.00% 100.00%



DDM is a framework that combines intelligent agents and artificial intelligence traditional algorithms such as classifiers. The central idea of this project is to create a multi-agent system that allows to compare different views into a single one.


100.00% 100.00%



Background: Current advances in genomics, proteomics and other areas of molecular biology make the identification and reconstruction of novel pathways an emerging area of great interest. One such class of pathways is involved in the biogenesis of Iron-Sulfur Clusters (ISC). Results: Our goal is the development of a new approach based on the use and combination of mathematical, theoretical and computational methods to identify the topology of a target network. In this approach, mathematical models play a central role for the evaluation of the alternative network structures that arise from literature data-mining, phylogenetic profiling, structural methods, and human curation. As a test case, we reconstruct the topology of the reaction and regulatory network for the mitochondrial ISC biogenesis pathway in S. cerevisiae. Predictions regarding how proteins act in ISC biogenesis are validated by comparison with published experimental results. For example, the predicted role of Arh1 and Yah1 and some of the interactions we predict for Grx5 both matches experimental evidence. A putative role for frataxin in directly regulating mitochondrial iron import is discarded from our analysis, which agrees with also published experimental results. Additionally, we propose a number of experiments for testing other predictions and further improve the identification of the network structure. Conclusion: We propose and apply an iterative in silico procedure for predictive reconstruction of the network topology of metabolic pathways. The procedure combines structural bioinformatics tools and mathematical modeling techniques that allow the reconstruction of biochemical networks. Using the Iron Sulfur cluster biogenesis in S. cerevisiae as a test case we indicate how this procedure can be used to analyze and validate the network model against experimental results. Critical evaluation of the obtained results through this procedure allows devising new wet lab experiments to confirm its predictions or provide alternative explanations for further improving the models.


100.00% 100.00%



Background: Information about the composition of regulatory regions is of great value for designing experiments to functionally characterize gene expression. The multiplicity of available applications to predict transcription factor binding sites in a particular locus contrasts with the substantial computational expertise that is demanded to manipulate them, which may constitute a potential barrier for the experimental community. Results: CBS (Conserved regulatory Binding Sites, http://compfly.bio.ub.es/CBS) is a public platform of evolutionarily conserved binding sites and enhancers predicted in multiple Drosophila genomes that is furnished with published chromatin signatures associated to transcriptionally active regions and other experimental sources of information. The rapid access to this novel body of knowledge through a user-friendly web interface enables non-expert users to identify the binding sequences available for any particular gene, transcription factor, or genome region. Conclusions: The CBS platform is a powerful resource that provides tools for data mining individual sequences and groups of co-expressed genes with epigenomics information to conduct regulatory screenings in Drosophila.


100.00% 100.00%



This study compares the impact of quality management tools on the performance of organisations utilising the ISO 9001:2000 standard as a basis for a quality-management system band those utilising the EFQM model for this purpose. A survey is conducted among 107 experienced and independent quality-management assessors. The study finds that organisations with qualitymanagement systems based on the ISO 9001:2000 standard tend to use general-purpose qualitative tools, and that these do have a relatively positive impact on their general performance. In contrast, organisations adopting the EFQM model tend to use more specialised quantitative tools, which produce significant improvements in specific aspects of their performance. The findings of the study will enable organisations to choose the most effective quality-improvement tools for their particular quality strategy


100.00% 100.00%



Open educational resources (OER) promise increased access, participation, quality, and relevance, in addition to cost reduction. These seemingly fantastic promises are based on the supposition that educators and learners will discover existing resources, improve them, and share the results, resulting in a virtuous cycle of improvement and re-use. By anecdotal metrics, existing web scale search is not working for OER. This situation impairs the cycle underlying the promise of OER, endangering long term growth and sustainability. While the scope of the problem is vast, targeted improvements in areas of curation, indexing, and data exchange can improve the situation, and create opportunities for further scale. I explore the way the system is currently inadequate, discuss areas for targeted improvement, and describe a prototype system built to test these ideas. I conclude with suggestions for further exploration and development.


100.00% 100.00%



Un árbol de decisión es una forma gráfica y analítica de representar todos los eventos (sucesos) que pueden surgir a partir de una decisión asumida en cierto momento. Nos ayudan a tomar la decisión más"acertada", desde un punto de vista probabilístico, ante un abanico de posibles decisiones. Estos árboles permiten examinar los resultados y determinar visualmente cómo fluye el modelo. Los resultados visuales ayudan a buscar subgrupos específicos y relaciones que tal vez no encontraríamos con estadísticos más tradicionales. Los árboles de decisión son una técnica estadística para la segmentación, la estratificación, la predicción, la reducción de datos y el filtrado de variables, la identificación de interacciones, la fusión de categorías y la discretización de variables continuas. La función árboles de decisión (Tree) en SPSS crea árboles de clasificación y de decisión para identificar grupos, descubrir las relaciones entre grupos y predecir eventos futuros. Existen diferentes tipos de árbol: CHAID, CHAID exhaustivo, CRT y QUEST, según el que mejor se ajuste a nuestros datos.