864 resultados para pacs: data handling techniques
Resumo:
The research project is an extension of a series of administrative science and health care research projects evaluating the influence of external context, organizational strategy, and organizational structure upon organizational success or performance. The research will rely on the assumption that there is not one single best approach to the management of organizations (the contingency theory). As organizational effectiveness is dependent on an appropriate mix of factors, organizations may be equally effective based on differing combinations of factors. The external context of the organization is expected to influence internal organizational strategy and structure and in turn the internal measures affect performance (discriminant theory). The research considers the relationship of external context and organization performance.^ The unit of study for the research will be the health maintenance organization (HMO); an organization the accepts in exchange for a fixed, advance capitation payment, contractual responsibility to assure the delivery of a stated range of health sevices to a voluntary enrolled population. With the current Federal resurgence of interest in the Health Maintenance Organization (HMO) as a major component in the health care system, attention must be directed at maximizing development of HMOs from the limited resources available. Increased skills are needed in both Federal and private evaluation of HMO feasibility in order to prevent resource investment and in projects that will fail while concurrently identifying potentially successful projects that will not be considered using current standards.^ The research considers 192 factors measuring contextual milieu (social, educational, economic, legal, demographic, health and technological factors). Through intercorrelation and principle components data reduction techniques this was reduced to 12 variables. Two measures of HMO performance were identified, they are (1) HMO status (operational or defunct), and (2) a principle components factor score considering eight measures of performance. The relationship between HMO context and performance was analysed using correlation and stepwise multiple regression methods. In each case it has been concluded that the external contextual variables are not predictive of success or failure of study Health Maintenance Organizations. This suggests that performance of an HMO may rely on internal organizational factors. These findings have policy implications as contextual measures are used as a major determinant in HMO feasibility analysis, and as a factor in the allocation of limited Federal funds. ^
Resumo:
Accurate quantitative estimation of exposure using retrospective data has been one of the most challenging tasks in the exposure assessment field. To improve these estimates, some models have been developed using published exposure databases with their corresponding exposure determinants. These models are designed to be applied to reported exposure determinants obtained from study subjects or exposure levels assigned by an industrial hygienist, so quantitative exposure estimates can be obtained. ^ In an effort to improve the prediction accuracy and generalizability of these models, and taking into account that the limitations encountered in previous studies might be due to limitations in the applicability of traditional statistical methods and concepts, the use of computer science- derived data analysis methods, predominantly machine learning approaches, were proposed and explored in this study. ^ The goal of this study was to develop a set of models using decision trees/ensemble and neural networks methods to predict occupational outcomes based on literature-derived databases, and compare, using cross-validation and data splitting techniques, the resulting prediction capacity to that of traditional regression models. Two cases were addressed: the categorical case, where the exposure level was measured as an exposure rating following the American Industrial Hygiene Association guidelines and the continuous case, where the result of the exposure is expressed as a concentration value. Previously developed literature-based exposure databases for 1,1,1 trichloroethane, methylene dichloride and, trichloroethylene were used. ^ When compared to regression estimations, results showed better accuracy of decision trees/ensemble techniques for the categorical case while neural networks were better for estimation of continuous exposure values. Overrepresentation of classes and overfitting were the main causes for poor neural network performance and accuracy. Estimations based on literature-based databases using machine learning techniques might provide an advantage when they are applied to other methodologies that combine `expert inputs' with current exposure measurements, like the Bayesian Decision Analysis tool. The use of machine learning techniques to more accurately estimate exposures from literature-based exposure databases might represent the starting point for the independence from the expert judgment.^
Resumo:
The data collection "Deep Drilling of Glaciers: Soviet-Russian projects in Arctic, 1975-1995" was collected by the following basic considerations: - compilation of deep (>100 m) drilling projects on Arctic glaciers, using data of (a) publications; (b) archives of IGRAN; (c) personal communication of project participants; - documentation of parameters, references. Accuracy of data and techniques applied to determine different parameters are not evaluated. The accuracy of some geochemical parameters (up to 1984 and heavy metalls) is uncertain. Most reconstructions of ice core age and of annual layer thickness are discussed; - digitizing of published diagrams (in case, when original numerical data were lost) and subsequent data conversion to equal range series and adjustment to the common units. Therefore, the equal-range series were calculated from original data or converted from digitized chart values as indicated in the metadata. For the methodological purpose, the equal-range series obtained from original and reconstructed data were compared repeatedly; the systematic difference was less then 5-7%. Special attention should be given to the fact, that the data for individual ice core parameters varies, because some parameters were originally measured or registered. Parameters were converted in equal-range series using 2 m steps; - two or more parameter values were determined, then the mean-weighted (i.e. accounting the sample length) value is assigned to the entire interval; - one parameter value was determined, measured or registered independently from the parameter values in depth intervals which over- and underlie it, then the value is assigned to the entire interval; - one parameter value was determined, measured or registered for two adjoining depth intervals, then the specific value is assigned to the depth interval, which represents >75% of sample length ; if each of adjoining depth intervals represents <75% of sample length, then the correspondent parameter value is assigned to both intervals of depth. This collection of ice core data (version 2000) was made available through the EU funded QUEEN project by S.M. Arkhipov, Moscow.
Resumo:
Structural decomposition techniques based on input-output table have become a widely used tool for analyzing long term economic growth. However, due to limitations of data, such techniques have never been applied to China's regional economies. Fortunately, in 2003, China's Interregional Input-Output Table for 1987 and Multi-regional Input-Output Table for 1997 were published, making decomposition analysis of China's regional economies possible. This paper first estimates the interregional input-output table in constant price by using an alternative approach: the Grid-Search method, and then applies the standard input-output decomposition technique to China's regional economies for 1987-97. Based on the decomposition results, the contributions to output growth of different factors are summarized at the regional and industrial level. Furthermore, interdependence between China's regional economies is measured and explained by aggregating the decomposition factors into the intraregional multiplier-related effect, the feedback-related effect, and the spillover-related effect. Finally, the performance of China's industrial and regional development policies implemented in the 1990s is briefly discussed based on the analytical results of the paper.
Resumo:
Much work has been done in the áreas of and-parallelism and data parallelism in Logic Programs. Such work has proceeded to a certain extent in an independent fashion. Both types of parallelism offer advantages and disadvantages. Traditional (and-) parallel models offer generality, being able to exploit parallelism in a large class of programs (including that exploited by data parallelism techniques). Data parallelism techniques on the other hand offer increased performance for a restricted class of programs. The thesis of this paper is that these two forms of parallelism are not fundamentally different and that relating them opens the possibility of obtaining the advantages of both within the same system. Some relevant issues are discussed and solutions proposed. The discussion is illustrated through visualizations of actual parallel executions implementing the ideas proposed.
Resumo:
By combining complex network theory and data mining techniques, we provide objective criteria for optimization of the functional network representation of generic multivariate time series. In particular, we propose a method for the principled selection of the threshold value for functional network reconstruction from raw data, and for proper identification of the network's indicators that unveil the most discriminative information on the system for classification purposes. We illustrate our method by analysing networks of functional brain activity of healthy subjects, and patients suffering from Mild Cognitive Impairment, an intermediate stage between the expected cognitive decline of normal aging and the more pronounced decline of dementia. We discuss extensions of the scope of the proposed methodology to network engineering purposes, and to other data mining tasks.
Resumo:
Sensor networks are increasingly being deployed in the environment for many different purposes. The observations that they produce are made available with heterogeneous schemas, vocabularies and data formats, making it difficult to share and reuse this data, for other purposes than those for which they were originally set up. The authors propose an ontology-based approach for providing data access and query capabilities to streaming data sources, allowing users to express their needs at a conceptual level, independent of implementation and language-specific details. In this article, the authors describe the theoretical foundations and technologies that enable exposing semantically enriched sensor metadata, and querying sensor observations through SPARQL extensions, using query rewriting and data translation techniques according to mapping languages, and managing both pull and push delivery modes.
Resumo:
The origins for this work arise in response to the increasing need for biologists and doctors to obtain tools for visual analysis of data. When dealing with multidimensional data, such as medical data, the traditional data mining techniques can be a tedious and complex task, even to some medical experts. Therefore, it is necessary to develop useful visualization techniques that can complement the expert’s criterion, and at the same time visually stimulate and make easier the process of obtaining knowledge from a dataset. Thus, the process of interpretation and understanding of the data can be greatly enriched. Multidimensionality is inherent to any medical data, requiring a time-consuming effort to get a clinical useful outcome. Unfortunately, both clinicians and biologists are not trained in managing more than four dimensions. Specifically, we were aimed to design a 3D visual interface for gene profile analysis easy in order to be used both by medical and biologist experts. In this way, a new analysis method is proposed: MedVir. This is a simple and intuitive analysis mechanism based on the visualization of any multidimensional medical data in a three dimensional space that allows interaction with experts in order to collaborate and enrich this representation. In other words, MedVir makes a powerful reduction in data dimensionality in order to represent the original information into a three dimensional environment. The experts can interact with the data and draw conclusions in a visual and quickly way.
Resumo:
The use of data mining techniques for the gene profile discovery of diseases, such as cancer, is becoming usual in many researches. These techniques do not usually analyze the relationships between genes in depth, depending on the different variety of manifestations of the disease (related to patients). This kind of analysis takes a considerable amount of time and is not always the focus of the research. However, it is crucial in order to generate personalized treatments to fight the disease. Thus, this research focuses on finding a mechanism for gene profile analysis to be used by the medical and biologist experts. Results: In this research, the MedVir framework is proposed. It is an intuitive mechanism based on the visualization of medical data such as gene profiles, patients, clinical data, etc. MedVir, which is based on an Evolutionary Optimization technique, is a Dimensionality Reduction (DR) approach that presents the data in a three dimensional space. Furthermore, thanks to Virtual Reality technology, MedVir allows the expert to interact with the data in order to tailor it to the experience and knowledge of the expert.
Resumo:
Objective The main purpose of this research is the novel use of artificial metaplasticity on multilayer perceptron (AMMLP) as a data mining tool for prediction the outcome of patients with acquired brain injury (ABI) after cognitive rehabilitation. The final goal aims at increasing knowledge in the field of rehabilitation theory based on cognitive affectation. Methods and materials The data set used in this study contains records belonging to 123 ABI patients with moderate to severe cognitive affectation (according to Glasgow Coma Scale) that underwent rehabilitation at Institut Guttmann Neurorehabilitation Hospital (IG) using the tele-rehabilitation platform PREVIRNEC©. The variables included in the analysis comprise the neuropsychological initial evaluation of the patient (cognitive affectation profile), the results of the rehabilitation tasks performed by the patient in PREVIRNEC© and the outcome of the patient after a 3–5 months treatment. To achieve the treatment outcome prediction, we apply and compare three different data mining techniques: the AMMLP model, a backpropagation neural network (BPNN) and a C4.5 decision tree. Results The prediction performance of the models was measured by ten-fold cross validation and several architectures were tested. The results obtained by the AMMLP model are clearly superior, with an average predictive performance of 91.56%. BPNN and C4.5 models have a prediction average accuracy of 80.18% and 89.91% respectively. The best single AMMLP model provided a specificity of 92.38%, a sensitivity of 91.76% and a prediction accuracy of 92.07%. Conclusions The proposed prediction model presented in this study allows to increase the knowledge about the contributing factors of an ABI patient recovery and to estimate treatment efficacy in individual patients. The ability to predict treatment outcomes may provide new insights toward improving effectiveness and creating personalized therapeutic interventions based on clinical evidence.
Resumo:
Tradicionalmente, el uso de técnicas de análisis de datos ha sido una de las principales vías para el descubrimiento de conocimiento oculto en grandes cantidades de datos, recopilados por expertos en diferentes dominios. Por otra parte, las técnicas de visualización también se han usado para mejorar y facilitar este proceso. Sin embargo, existen limitaciones serias en la obtención de conocimiento, ya que suele ser un proceso lento, tedioso y en muchas ocasiones infructífero, debido a la dificultad de las personas para comprender conjuntos de datos de grandes dimensiones. Otro gran inconveniente, pocas veces tenido en cuenta por los expertos que analizan grandes conjuntos de datos, es la degradación involuntaria a la que someten a los datos durante las tareas de análisis, previas a la obtención final de conclusiones. Por degradación quiere decirse que los datos pueden perder sus propiedades originales, y suele producirse por una reducción inapropiada de los datos, alterando así su naturaleza original y llevando en muchos casos a interpretaciones y conclusiones erróneas que podrían tener serias implicaciones. Además, este hecho adquiere una importancia trascendental cuando los datos pertenecen al dominio médico o biológico, y la vida de diferentes personas depende de esta toma final de decisiones, en algunas ocasiones llevada a cabo de forma inapropiada. Ésta es la motivación de la presente tesis, la cual propone un nuevo framework visual, llamado MedVir, que combina la potencia de técnicas avanzadas de visualización y minería de datos para tratar de dar solución a estos grandes inconvenientes existentes en el proceso de descubrimiento de información válida. El objetivo principal es hacer más fácil, comprensible, intuitivo y rápido el proceso de adquisición de conocimiento al que se enfrentan los expertos cuando trabajan con grandes conjuntos de datos en diferentes dominios. Para ello, en primer lugar, se lleva a cabo una fuerte disminución en el tamaño de los datos con el objetivo de facilitar al experto su manejo, y a la vez preservando intactas, en la medida de lo posible, sus propiedades originales. Después, se hace uso de efectivas técnicas de visualización para representar los datos obtenidos, permitiendo al experto interactuar de forma sencilla e intuitiva con los datos, llevar a cabo diferentes tareas de análisis de datos y así estimular visualmente su capacidad de comprensión. De este modo, el objetivo subyacente se basa en abstraer al experto, en la medida de lo posible, de la complejidad de sus datos originales para presentarle una versión más comprensible, que facilite y acelere la tarea final de descubrimiento de conocimiento. MedVir se ha aplicado satisfactoriamente, entre otros, al campo de la magnetoencefalografía (MEG), que consiste en la predicción en la rehabilitación de lesiones cerebrales traumáticas (Traumatic Brain Injury (TBI) rehabilitation prediction). Los resultados obtenidos demuestran la efectividad del framework a la hora de acelerar y facilitar el proceso de descubrimiento de conocimiento sobre conjuntos de datos reales. ABSTRACT Traditionally, the use of data analysis techniques has been one of the main ways of discovering knowledge hidden in large amounts of data, collected by experts in different domains. Moreover, visualization techniques have also been used to enhance and facilitate this process. However, there are serious limitations in the process of knowledge acquisition, as it is often a slow, tedious and many times fruitless process, due to the difficulty for human beings to understand large datasets. Another major drawback, rarely considered by experts that analyze large datasets, is the involuntary degradation to which they subject the data during analysis tasks, prior to obtaining the final conclusions. Degradation means that data can lose part of their original properties, and it is usually caused by improper data reduction, thereby altering their original nature and often leading to erroneous interpretations and conclusions that could have serious implications. Furthermore, this fact gains a trascendental importance when the data belong to medical or biological domain, and the lives of people depends on the final decision-making, which is sometimes conducted improperly. This is the motivation of this thesis, which proposes a new visual framework, called MedVir, which combines the power of advanced visualization techniques and data mining to try to solve these major problems existing in the process of discovery of valid information. Thus, the main objective is to facilitate and to make more understandable, intuitive and fast the process of knowledge acquisition that experts face when working with large datasets in different domains. To achieve this, first, a strong reduction in the size of the data is carried out in order to make the management of the data easier to the expert, while preserving intact, as far as possible, the original properties of the data. Then, effective visualization techniques are used to represent the obtained data, allowing the expert to interact easily and intuitively with the data, to carry out different data analysis tasks, and so visually stimulating their comprehension capacity. Therefore, the underlying objective is based on abstracting the expert, as far as possible, from the complexity of the original data to present him a more understandable version, thus facilitating and accelerating the task of knowledge discovery. MedVir has been succesfully applied to, among others, the field of magnetoencephalography (MEG), which consists in predicting the rehabilitation of Traumatic Brain Injury (TBI). The results obtained successfully demonstrate the effectiveness of the framework to accelerate and facilitate the process of knowledge discovery on real world datasets.
Resumo:
Simulation of satellite subsystems behaviour is extramely important in the design at early stages. The subsystems are normally simulated in the both ways : isolated and as part of more complex simulation that takes into account imputs from other subsystems (concurrent design). In the present work, a simple concurrent simulation of the power subsystem of a microsatellite, UPMSat-2, is described. The aim of the work is to obtain the performance profile of the system (battery charging level, power consumption by the payloads, power supply from solar panels....). Different situations such as battery critical low or high level, effects of high current charging due to the low temperature of solar panels after eclipse,DoD margins..., were analysed, and different safety strategies studied using the developed tool (simulator) to fulfil the mission requirements. Also, failure cases were analysed in order to study the robustness of the system. The mentioned simulator has been programed taking into account the power consumption performances (average and maximum consumptions per orbit/day) of small part of the subsystem (SELEX GALILEO SPVS modular generators built with Azur Space solar cells, SAFT VES16 6P4S Li-ion battery, SSBV magnetometers, TECNOBIT and DATSI/UPM On Board Data Handling -OBDH-...). The developed tool is then intended to be a modular simulator, with the chance of use any other components implementing some standard data.
Resumo:
El presente trabajo tiene como objetivo diseñar un modelo de gestión de responsabilidad social sustentado en estándares internacionales para las empresas del sector petrolero venezolano. Esta investigación no se suscribe a un modelo epistémico en particular, como forma parcializada de ver la realidad. Por el contrario, se realizó un abordaje holístico de la investigación, entendiendo el evento de estudio, la gestión de la responsabilidad social, como un evento integrado por distintas visiones de la relación empresa – sociedad. La holística se refiere a una tendencia que permite entender la realidad desde el punto de vista de las múltiples interacciones que la caracterizan. Corresponde a una actitud integradora como también a una teoría explicativa que se orienta hacia una comprensión contextual de los procesos, de los protagonistas y de los eventos. Desde la concepción holística se determinó que la investigación es de tipo proyectiva. Este tipo de investigación propone soluciones a una situación determinada a partir de un proceso de indagación. Implica describir, comparar, explicar y proponer alternativas de cambios, lo que da lugar a los estadios de investigación. En cuanto al diseño de la investigación, aplicando el ciclo holístico, se tiene un diseño que es univariable, transeccional contemporáneo y de fuente mixta. Univariable, porque se enfoca en la gestión de responsabilidad social. Transeccional contemporáneo, porque el evento se estudia en la actualidad y se realiza una sola medición de los datos. De fuente mixta, porque en los estadios descriptivo y explicativo se aplica un diseño de campo, al recolectar los datos directamente en las empresas objeto de estudio, mientras que para los estadios analítico y comparativo se aplica un diseño documental. Las técnicas de recolección de la información estuvieron constituidas por fuentes primarias provenientes de la observación directa, la revisión documental y la aplicación de un cuestionario estructurado tipo escala Likert. El análisis de los datos comprendió el análisis estadístico descriptivo, la estimación de la fiabilidad y el análisis de coeficientes de correlación y análisis de ruta, a través del software estadístico SPSS v.19.0 y AMOS v.20. En los estadios descriptivo y explicativo se estudió la gestión de la responsabilidad social en las empresas del sector petrolero. Los resultados indicaron que las empresas del sector petrolero actúan bajo los lineamientos trazados en el Plan de Desarrollo Nacional y de acuerdo con las políticas, directrices, planes y estrategias para el sector de los hidrocarburos, dictadas por el Ministerio de Energía y Petróleo. También incluyen el compromiso social y la política ambiental en su filosofía de gestión. Tienen en su estructura organizacional una gerencia de desarrollo social que gestiona la responsabilidad social. Las actividades de inversión social se presentan poco estructuradas y en ocasiones se improvisan ya que atienden a los lineamientos políticos del Estado y no a una política interna de sostenibilidad del negocio petrolero. En cuanto a la integralidad de la gestión las empresas no consideran la responsabilidad social en todas las áreas, por lo que deben ampliar su concepción de una gestión responsable, redefiniendo estructuras, estrategias y procesos, con una orientación hacia una gestión sustentable. En cuanto a los estadios analítico y comparativo aplicados al estudio de las guías y estándares internacionales de responsabilidad social, se determinó que en términos de la integralidad de la gestión las iniciativas que destacan son: en cuanto a los principios, las directrices para empresas multinacionales según la OCDE y el Libro Verde de la Unión Europea. En relación con las guías de implementación y control, el Global Reporting Initiative y la norma ISO 26000. Y en cuanto a los sistemas de gestión el Sistema de Gestión Ética y Responsable (SGE 21) y el Sistema de Gestión de Responsabilidad Social IQNET SR10. Finalmente se diseñó una estructura para la gestión integral de responsabilidad social basada en los estándares internacionales y en el concepto de desarrollo sostenible. Por tanto abarca el desarrollo social, el equilibrio ecológico y el crecimiento económico, lo que permite un desarrollo sinérgico. La originalidad del enfoque consistió en la comprensión de la investigación desde una concepción holística, que permitió la integración de las teorías que tratan el tema de la responsabilidad social a través de un abordaje estructurado. ABSTRACT The present research aims to design a model of social responsibility management underpinned by international standards for companies in the Venezuelan oil sector. This research is not framed in a particular epistemic model as a biased way of looking at reality. Instead, a holistic approach to the research was conducted, understanding the event under study, the management of social responsibility as an event composed of different views of the relationship between corporation and society. The term holistic refers to a trend in understanding the reality from the point of view of the multiple interactions that characterize it. It corresponds to an integrative as well as an explanatory theory that is oriented towards a contextual understanding of the processes, of the participants and of the events. From the holistic conception it was determined that this research is of a projective type. The research proposes solutions to a given situation from a process of inquiry. It implies describing, comparing, explaining and proposing alternative changes, which results in the different research stages. Regarding the research design, applying the holistic cycle, an univariate, contemporary cross-sectional and mixed source design is obtained. It is univariate, because it focuses on the management of social responsibility. It is contemporary cross-sectional, because the event is studied in the present time and a single measurement of data is performed. It relies on mixed source, because in the descriptive and explanatory stages a field design is applied when collecting data directly from the companies under study, while for the analytical and comparative stages applies a documentary design is applied. The data collection techniques were constituted by primary sources from direct observation, document review and the implementation of a structured Likert scale questionnaire. The data analysis comprised descriptive statistical analysis, reliability estimates and analysis of correlation and the path analysis through the SPSS v.19.0 and AMOS V.20 statistical software. In the descriptive and explanatory stages social responsibility management in the oil sector companies was studied. The results indicated that the oil companies operate under the guidelines outlined in the National Development Plan and in accordance with the policies, guidelines, plans and strategies for the hydrocarbons sector, issued by the Ministry of Energy and Petroleum. They also include the social commitment and the environmental policy in their management philosophy. They have in their organizational structure a social development management which deals with social responsibility. Corporate social investment is presented poorly structured and is sometimes improvised since they follow the policy guidelines of the state and not the internal sustainability policy of the oil business. As for the integrity of management companies they do not consider social responsibility in all areas, so they need to expand their conception of responsible management, redefining structures, strategies and processes, with a focus on sustainable management. As for the analytical and comparative stages applied to the study of international guidelines and standards of social responsibility, it was determined that, in terms of the comprehensiveness of management, the initiatives that stand out are the following: With respect to principles, the guidelines for multinational enterprises as indicated by OECD and the Green Paper of the European Union. Regarding the implementation and control guides, the Global Reporting Initiative and the ISO 26000 standard are relevant. And as for management systems the Ethics and Responsible Management System (SGE 21) and the IQNet SR10 Social responsibility management system have to be considered. Finally a framework for the comprehensive management of social responsibility based on international standards and the concept of sustainable development was designed. Hence, social development, ecological balance and economic growth are included allowing therefore a synergistic development. The originality of this approach is the understanding of research in a holistic way, which allows the integration of theories that address the issue of social responsibility through a structured approximation.
Resumo:
Durante los últimos años, el imparable crecimiento de fuentes de datos biomédicas, propiciado por el desarrollo de técnicas de generación de datos masivos (principalmente en el campo de la genómica) y la expansión de tecnologías para la comunicación y compartición de información ha propiciado que la investigación biomédica haya pasado a basarse de forma casi exclusiva en el análisis distribuido de información y en la búsqueda de relaciones entre diferentes fuentes de datos. Esto resulta una tarea compleja debido a la heterogeneidad entre las fuentes de datos empleadas (ya sea por el uso de diferentes formatos, tecnologías, o modelizaciones de dominios). Existen trabajos que tienen como objetivo la homogeneización de estas con el fin de conseguir que la información se muestre de forma integrada, como si fuera una única base de datos. Sin embargo no existe ningún trabajo que automatice de forma completa este proceso de integración semántica. Existen dos enfoques principales para dar solución al problema de integración de fuentes heterogéneas de datos: Centralizado y Distribuido. Ambos enfoques requieren de una traducción de datos de un modelo a otro. Para realizar esta tarea se emplean formalizaciones de las relaciones semánticas entre los modelos subyacentes y el modelo central. Estas formalizaciones se denominan comúnmente anotaciones. Las anotaciones de bases de datos, en el contexto de la integración semántica de la información, consisten en definir relaciones entre términos de igual significado, para posibilitar la traducción automática de la información. Dependiendo del problema en el que se esté trabajando, estas relaciones serán entre conceptos individuales o entre conjuntos enteros de conceptos (vistas). El trabajo aquí expuesto se centra en estas últimas. El proyecto europeo p-medicine (FP7-ICT-2009-270089) se basa en el enfoque centralizado y hace uso de anotaciones basadas en vistas y cuyas bases de datos están modeladas en RDF. Los datos extraídos de las diferentes fuentes son traducidos e integrados en un Data Warehouse. Dentro de la plataforma de p-medicine, el Grupo de Informática Biomédica (GIB) de la Universidad Politécnica de Madrid, en el cuál realicé mi trabajo, proporciona una herramienta para la generación de las necesarias anotaciones de las bases de datos RDF. Esta herramienta, denominada Ontology Annotator ofrece la posibilidad de generar de manera manual anotaciones basadas en vistas. Sin embargo, aunque esta herramienta muestra las fuentes de datos a anotar de manera gráfica, la gran mayoría de usuarios encuentran difícil el manejo de la herramienta , y pierden demasiado tiempo en el proceso de anotación. Es por ello que surge la necesidad de desarrollar una herramienta más avanzada, que sea capaz de asistir al usuario en el proceso de anotar bases de datos en p-medicine. El objetivo es automatizar los procesos más complejos de la anotación y presentar de forma natural y entendible la información relativa a las anotaciones de bases de datos RDF. Esta herramienta ha sido denominada Ontology Annotator Assistant, y el trabajo aquí expuesto describe el proceso de diseño y desarrollo, así como algunos algoritmos innovadores que han sido creados por el autor del trabajo para su correcto funcionamiento. Esta herramienta ofrece funcionalidades no existentes previamente en ninguna otra herramienta del área de la anotación automática e integración semántica de bases de datos. ---ABSTRACT---Over the last years, the unstoppable growth of biomedical data sources, mainly thanks to the development of massive data generation techniques (specially in the genomics field) and the rise of the communication and information sharing technologies, lead to the fact that biomedical research has come to rely almost exclusively on the analysis of distributed information and in finding relationships between different data sources. This is a complex task due to the heterogeneity of the sources used (either by the use of different formats, technologies or domain modeling). There are some research proyects that aim homogenization of these sources in order to retrieve information in an integrated way, as if it were a single database. However there is still now work to automate completely this process of semantic integration. There are two main approaches with the purpouse of integrating heterogeneous data sources: Centralized and Distributed. Both approches involve making translation from one model to another. To perform this task there is a need of using formalization of the semantic relationships between the underlying models and the main model. These formalizations are also calles annotations. In the context of semantic integration of the information, data base annotations consist on defining relations between concepts or words with the same meaning, so the automatic translation can be performed. Depending on the task, the ralationships can be between individuals or between whole sets of concepts (views). This paper focuses on the latter. The European project p-medicine (FP7-ICT-2009-270089) is based on the centralized approach. It uses view based annotations and RDF modeled databases. The data retireved from different data sources is translated and joined into a Data Warehouse. Within the p-medicine platform, the Biomedical Informatics Group (GIB) of the Polytechnic University of Madrid, in which I worked, provides a software to create annotations for the RDF sources. This tool, called Ontology Annotator, is used to create annotations manually. However, although Ontology Annotator displays the data sources graphically, most of the users find it difficult to use this software, thus they spend too much time to complete the task. For this reason there is a need to develop a more advanced tool, which would be able to help the user in the task of annotating p-medicine databases. The aim is automating the most complex processes of the annotation and display the information clearly and easy understanding. This software is called Ontology Annotater Assistant and this book describes the process of design and development of it. as well as some innovative algorithms that were designed by the author of the work. This tool provides features that no other software in the field of automatic annotation can provide.
Resumo:
El abandono universitario es un problema que siempre ha afectado a las distintas universidades públicas. La Universidad Politécnica de Madrid, en concreto la Escuela Técnica Superior de Ingenieros Informáticos, pretende reducir dicho abandono aplicando técnicas de Data Mining para detectar aquellos alumnos que abandonan antes de que lo hagan. Este proyecto, a través de la metodología CRISP-DM (CRoss Industry Standard Process for Data Mining) y con los datos obtenidos del SIIU (Sistema Integrado de Información Universitaria), tiene como fin identificar a los alumnos que abandonan el primer añoo para poder aplicar distintas políticas en ellos y así evitarlo. ---ABSTRACT---Early university abandonment is a problem which has affected different public universities. The Universidad Polit´ecnica de Madrid (Polytechnic University of Madrid), inparticular the Escuela Técnica Superior de Ingenieros Informáticos (Computer Science Engineering School), wants to reduce this abandonment rate using Data Mining techniques to find students who are likely to leave before they do. The purpose of this project, through CRISP-DM (CRoss Industry Standard Process for Data Mining) methodology and using the data from SIIU (Sistema Integrado de Informaci´on Universitaria), is to identify those students who leave the first year in order to apply different policies and avoid it.