807 resultados para frequency based knowledge discovery
Resumo:
In this position paper, we claim that the need for time consuming data preparation and result interpretation tasks in knowledge discovery, as well as for costly expert consultation and consensus building activities required for ontology building can be reduced through exploiting the interplay of data mining and ontology engineering. The aim is to obtain in a semi-automatic way new knowledge from distributed data sources that can be used for inference and reasoning, as well as to guide the extraction of further knowledge from these data sources. The proposed approach is based on the creation of a novel knowledge discovery method relying on the combination, through an iterative ?feedbackloop?, of (a) data mining techniques to make emerge implicit models from data and (b) pattern-based ontology engineering to capture these models in reusable, conceptual and inferable artefacts.
Resumo:
Nanotechnology represents an area of particular promise and significant opportunity across multiple scientific disciplines. Ongoing nanotechnology research ranges from the characterization of nanoparticles and nanomaterials to the analysis and processing of experimental data seeking correlations between nanoparticles and their functionalities and side effects. Due to their special properties, nanoparticles are suitable for cellular-level diagnostics and therapy, offering numerous applications in medicine, e.g. development of biomedical devices, tissue repair, drug delivery systems and biosensors. In nanomedicine, recent studies are producing large amounts of structural and property data, highlighting the role for computational approaches in information management. While in vitro and in vivo assays are expensive, the cost of computing is falling. Furthermore, improvements in the accuracy of computational methods (e.g. data mining, knowledge discovery, modeling and simulation) have enabled effective tools to automate the extraction, management and storage of these vast data volumes. Since this information is widely distributed, one major issue is how to locate and access data where it resides (which also poses data-sharing limitations). The novel discipline of nanoinformatics addresses the information challenges related to nanotechnology research. In this paper, we summarize the needs and challenges in the field and present an overview of extant initiatives and efforts.
Estudio de patrones de interacción entre los estudiantes y la Plataforma de Tele-Enseñanza en la UPM
Resumo:
Vivimos en una sociedad en la que la información ha adquirido una vital importancia. El uso de Internet y el desarrollo de nuevos sistemas de la información han generado un ferviente interés tanto de empresas como de instituciones en la búsqueda de nuevos patrones que les proporcione la clave del éxito. La Analítica de Negocio reúne un conjunto de herramientas, estrategias y técnicas orientadas a la explotación de la información con el objetivo de crear conocimiento útil dentro de un marco de trabajo y facilitar la optimización de los recursos tanto de empresas como de instituciones. El presente proyecto se enmarca en lo que se conoce como Gestión Educativa. Se aplicará una arquitectura y modelo de trabajo similar a lo que se ha venido haciendo en los últimos años en el entorno empresarial con la Inteligencia de Negocio. Con esta variante, se pretende mejorar la calidad de la enseñanza, agilizar las decisiones dentro de la institución académica, fortalecer las capacidades del cuerpo docente y en definitiva favorecer el aprendizaje del alumnado. Para lograr el objetivo se ha decidido seguir las etapas del Knowledge Discovery in Databases (KDD), una de las metodologías más conocidas dentro de la Inteligencia de Negocio, que describe el procedimiento que va desde la selección de la información y su carga en sistemas de almacenamiento, hasta la aplicación de técnicas de minería de datos para la obtención nuevo conocimiento. Los estudios se realizan a partir de la información de la activad de los usuarios dentro la plataforma de Tele-Enseñanza de la Universidad Politécnica de Madrid (Moodle). Se desarrollan trabajos de extracción y preprocesado de la base de datos en crudo y se aplican técnicas de minería de datos. En la aplicación de técnicas de minería de datos, uno de los factores más importantes a tener en cuenta es el tipo de información que se va a tratar. Por este motivo, se trabaja con la Minería de Datos Educativa, en inglés, Educational Data Mining (EDM) que consiste en la aplicación de técnicas de minería optimizadas para la información que se genera en entornos educativos. Dentro de las posibilidades que ofrece el EDM, se ha decidido centrar los estudios en lo que se conoce como analítica predictiva. El objetivo fundamental es conocer la influencia que tienen las interacciones alumno-plataforma en las calificaciones finales y descubrir nuevas reglas que describan comportamientos que faciliten al profesorado discriminar si un estudiante va a aprobar o suspender la asignatura, de tal forma que se puedan tomar medidas que mejoren su rendimiento. Toda la información tratada en el presente proyecto ha sido previamente anonimizada para evitar cualquier tipo de intromisión que atente contra la privacidad de los elementos participantes en el estudio. ABSTRACT. We live in a society dominated by data. The use of the Internet accompanied by developments in information systems has generated a sustained interest among companies and institutions to discover new patterns to succeed in their business ventures. Business Analytics (BA) combines tools, strategies and techniques focused on exploiting the available information, to optimize resources and create useful insight. The current project is framed under Educational Management. A Business Intelligence (BI) architecture and business models taught up to date will be applied with the aim to accelerate the decision-making in academic institutions, strengthen teacher´s skills and ultimately improve the quality of teaching and learning. The best way to achieve this is to follow the Knowledge Discovery in Databases (KDD), one of the best-known methodologies in B.I. This process describes data preparation, selection, and cleansing through to the application of purely Data Mining Techniques in order to incorporate prior knowledge on data sets and interpret accurate solutions from the observed results. The studies will be performed using the information extracted from the Universidad Politécnica de Madrid Learning Management System (LMS), Moodle. The stored data is based on the user-platform interaction. The raw data will be extracted and pre-processed and afterwards, Data Mining Techniques will be applied. One of the crucial factors in the application of Data Mining Techniques is the kind of information that will be processed. For this reason, a new Data Mining perspective will be taken, called Educational Data Mining (EDM). EDM consists of the application of Data Mining Techniques but optimized for the raw data generated by the educational environment. Within EDM, we have decided to drive our research on what is called Predictive Analysis. The main purpose is to understand the influence of the user-platform interactions in the final grades of students and discover new patterns that explain their behaviours. This could allow teachers to intervene ahead of a student passing or failing, in such a way an action could be taken to improve the student performance. All the information processed has been previously anonymized to avoid the invasion of privacy.
Resumo:
Comunicación presentada en las XVI Jornadas de Ingeniería del Software y Bases de Datos, JISBD 2011, A Coruña, 5-7 septiembre 2011.
Resumo:
As a knowable object, the human body is highly complex. Evidence from several converging lines of research, including psychological studies, neuroimaging and clinical neuropsychology, indicates that human body knowledge is widely distributed in the adult brain, and is instantiated in at least three partially independent levels of representation. Sensori-motor body knowledge is responsible for on-line control and movement of one's own body and may also contribute to the perception of others' moving bodies; visuo-spatial body knowledge specifies detailed structural descriptions of the spatial attributes of the human body; and lexical-semantic body knowledge contains language-based knowledge about the human body. In the first chapter of this Monograph, we outline the evidence for these three hypothesized levels of human body knowledge, then review relevant literature on infants' and young children's human body knowledge in terms of the three-level framework. In Chapters II and III, we report two complimentary series of studies that specifically investigate the emergence of visuospatial body knowledge in infancy. Our technique is to compare infants' responses to typical and scrambled human bodies, in order to evaluate when and how infants acquire knowledge about the canonical spatial layout of the human body. Data from a series of visual habituation studies indicate that infants first discriminate scrambled from typical human body pictures at 15 to 18 months of age. Data from object examination studies similarly indicate that infants are sensitive to violations of three-dimensional human body stimuli starting at 15-18 months of age. The overall pattern of data supports several conclusions about the early development of human body knowledge: (a) detailed visuo-spatial knowledge about the human body is first evident in the second year of life, (b) visuo-spatial knowledge of human faces and human bodies are at least partially independent in infancy and (c) infants' initial visuo-spatial human body representations appear to be highly schematic, becoming more detailed and specific with development. In the final chapter, we explore these conclusions and discuss how levels of body knowledge may interact in early development.
Resumo:
Little is known about the extent of allelic diversity of genes in the complex polyploid, sugarcane. Using sucrose phosphate synthase (SPS) Gene (SPS) Family III as an example, we have amplified and sequenced a 400 nt region from this gene from two sugarcane lines that are parents of a mapping population. Ten single nucleotide polymorphisms (SNPs) were identified within the 400 nt region of which seven were present in both lines. In the elite commercial cultivar Q165(A), 10 sequence haplotypes were identified, with four haplotypes recovered at 9% or greater frequency. Based on SNP presence, two clusters of haplotypes were observed. In IJ76-514, a Saccharum officinarum accession, 8 haplotypes were identified with 4 haplotypes recovered at 13% or greater frequency. Again, two clusters of haplotypes were observed. The results suggest that there may be two SPS Gene Family III genes per genome in sugarcane, each with different numbers of different alleles. This suggestion is supported by sequencing results in an elite parental sorghum line, 403463-2-1, in which 4 haplotypes, corresponding to two broad types, were also identified. Primers were designed to the sugarcane SNPs and screened over bulked DNA from high and low Sucrose-containing progeny from a cross between Q165(A) and IJ76-514. The SNP frequency did not vary in the two bulked DNA samples, suggesting that these SNPs from this SPS gene family are not associated with variation in sucrose content. Using an ecotilling approach, two of the SPS Gene Family III haplotypes were mapped to two different linkage groups in homology group 1 in Q165(A). Both haplotypes mapped near QTLs for increased sucrose content but were not themselves associated with any sugar-related trait.
Resumo:
The concept of ordered weighted averaging (OWA) operator weights arises in uncertain decision making problems, however some weights may have a specific relationship with other. This information about the weights can be obtained from decision makers (DMs). This paper intends to introduce a theory of weight restrictions into the existing OWA operator weight models. Based on the DMs' value judgment the obtained OWA operator weights could be more realistic.
Resumo:
In a certain automobile factory, batch-painting of the body types in colours is controlled by an allocation system. This tries to balance production with orders, whilst making optimally-sized batches of colours. Sequences of cars entering painting cannot be optimised for easy selection of colour and batch size. `Over-production' is not allowed, in order to reduce buffer stocks of unsold vehicles. Paint quality is degraded by random effects. This thesis describes a toolkit which supports IKBS in an object-centred formalism. The intended domain of use for the toolkit is flexible manufacturing. A sizeable application program was developed, using the toolkit, to test the validity of the IKBS approach in solving the real manufacturing problem above, for which an existing conventional program was already being used. A detailed statistical analysis of the operating circumstances of the program was made to evaluate the likely need for the more flexible type of program for which the toolkit was intended. The IKBS program captures the many disparate and conflicting constraints in the scheduling knowledge and emulates the behaviour of the program installed in the factory. In the factory system, many possible, newly-discovered, heuristics would be awkward to represent and it would be impossible to make many new extensions. The representation scheme is capable of admitting changes to the knowledge, relying on the inherent encapsulating properties of object-centres programming to protect and isolate data. The object-centred scheme is supported by an enhancement of the `C' programming language and runs under BSD 4.2 UNIX. The structuring technique, using objects, provides a mechanism for separating control of expression of rule-based knowledge from the knowledge itself and allowing explicit `contexts', within which appropriate expression of knowledge can be done. Facilities are provided for acquisition of knowledge in a consistent manner.
Resumo:
Procedural knowledge is the knowledge required to perform certain tasks. It forms an important part of expertise, and is crucial for learning new tasks. This paper summarises existing work on procedural knowledge acquisition, and identifies two major challenges that remain to be solved in this field; namely, automating the acquisition process to tackle bottleneck in the formalization of procedural knowledge, and enabling machine understanding and manipulation of procedural knowledge. It is believed that recent advances in information extraction techniques can be applied compose a comprehensive solution to address these challenges. We identify specific tasks required to achieve the goal, and present detailed analyses of new research challenges and opportunities. It is expected that these analyses will interest researchers of various knowledge management tasks, particularly knowledge acquisition and capture.
Resumo:
In this paper, we propose a new similarity measure to compute the pair-wise similarity of text-based documents based on patterns of the words in the documents. First we develop a kappa measure for pair-wise comparison of documents then we use ordered weighting averaging operator to define a document similarity measure for a set of documents.
Resumo:
In this demonstration, we will present a semantic environment called the K-Box. The K-Box supports the lightweight integration of knowledge tools, with a focus on semantic tools, but with the flexibility to integrate natural language and conventional tools. We discuss the implementation of the framework, and two existing applications, including details of a new application for developers of semantic workflows. The demonstration will be of interest to developers and researchers of ontology-based knowledge management systems, and semantic desktops, and to analysts working with cross-media information. © 2011 ACM.
Resumo:
Information extraction or knowledge discovery from large data sets should be linked to data aggregation process. Data aggregation process can result in a new data representation with decreased number of objects of a given set. A deterministic approach to separable data aggregation means a lesser number of objects without mixing of objects from different categories. A statistical approach is less restrictive and allows for almost separable data aggregation with a low level of mixing of objects from different categories. Layers of formal neurons can be designed for the purpose of data aggregation both in the case of deterministic and statistical approach. The proposed designing method is based on minimization of the of the convex and piecewise linear (CPL) criterion functions.
Resumo:
The paper introduces a method for dependencies discovery during human-machine interaction. It is based on an analysis of numerical data sets in knowledge-poor environments. The driven procedures are independent and they interact on a competitive principle. The research focuses on seven of them. The application is in Number Theory.
Resumo:
We propose a family of attributed graph kernels based on mutual information measures, i.e., the Jensen-Tsallis (JT) q-differences (for q ∈ [1,2]) between probability distributions over the graphs. To this end, we first assign a probability to each vertex of the graph through a continuous-time quantum walk (CTQW). We then adopt the tree-index approach [1] to strengthen the original vertex labels, and we show how the CTQW can induce a probability distribution over these strengthened labels. We show that our JT kernel (for q = 1) overcomes the shortcoming of discarding non-isomorphic substructures arising in the R-convolution kernels. Moreover, we prove that the proposed JT kernels generalize the Jensen-Shannon graph kernel [2] (for q = 1) and the classical subtree kernel [3] (for q = 2), respectively. Experimental evaluations demonstrate the effectiveness and efficiency of the JT kernels.
Resumo:
School of thought analysis is an important yet not-well-elaborated scientific knowledge discovery task. This paper makes the first attempt at this problem. We focus on one aspect of the problem: do characteristic school-of-thought words exist and whether they are characterizable? To answer these questions, we propose a probabilistic generative School-Of-Thought (SOT) model to simulate the scientific authoring process based on several assumptions. SOT defines a school of thought as a distribution of topics and assumes that authors determine the school of thought for each sentence before choosing words to deliver scientific ideas. SOT distinguishes between two types of school-of-thought words for either the general background of a school of thought or the original ideas each paper contributes to its school of thought. Narrative and quantitative experiments show positive and promising results to the questions raised above © 2013 Association for Computational Linguistics. © 2013 Association for Computational Linguistics.