789 resultados para Multimedia Data Mining
Resumo:
In recent years, learning analytics (LA) has attracted a great deal of attention in technology-enhanced learning (TEL) research as practitioners, institutions, and researchers are increasingly seeing the potential that LA has to shape the future TEL landscape. Generally, LA deals with the development of methods that harness educational data sets to support the learning process. This paper provides a foundation for future research in LA. It provides a systematic overview on this emerging field and its key concepts through a reference model for LA based on four dimensions, namely data, environments, context (what?), stakeholders (who?), objectives (why?), and methods (how?). It further identifies various challenges and research opportunities in the area of LA in relation to each dimension.
Resumo:
Well-known data mining algorithms rely on inputs in the form of pairwise similarities between objects. For large datasets it is computationally impossible to perform all pairwise comparisons. We therefore propose a novel approach that uses approximate Principal Component Analysis to efficiently identify groups of similar objects. The effectiveness of the approach is demonstrated in the context of binary classification using the supervised normalized cut as a classifier. For large datasets from the UCI repository, the approach significantly improves run times with minimal loss in accuracy.
Resumo:
Biodiversity, a multidimensional property of natural systems, is difficult to quantify partly because of the multitude of indices proposed for this purpose. Indices aim to describe general properties of communities that allow us to compare different regions, taxa, and trophic levels. Therefore, they are of fundamental importance for environmental monitoring and conservation, although there is no consensus about which indices are more appropriate and informative. We tested several common diversity indices in a range of simple to complex statistical analyses in order to determine whether some were better suited for certain analyses than others. We used data collected around the focal plant Plantago lanceolata on 60 temperate grassland plots embedded in an agricultural landscape to explore relationships between the common diversity indices of species richness (S), Shannon's diversity (H'), Simpson's diversity (D-1), Simpson's dominance (D-2), Simpson's evenness (E), and Berger-Parker dominance (BP). We calculated each of these indices for herbaceous plants, arbuscular mycorrhizal fungi, aboveground arthropods, belowground insect larvae, and P.lanceolata molecular and chemical diversity. Including these trait-based measures of diversity allowed us to test whether or not they behaved similarly to the better studied species diversity. We used path analysis to determine whether compound indices detected more relationships between diversities of different organisms and traits than more basic indices. In the path models, more paths were significant when using H', even though all models except that with E were equally reliable. This demonstrates that while common diversity indices may appear interchangeable in simple analyses, when considering complex interactions, the choice of index can profoundly alter the interpretation of results. Data mining in order to identify the index producing the most significant results should be avoided, but simultaneously considering analyses using multiple indices can provide greater insight into the interactions in a system.
Resumo:
NH···π hydrogen bonds occur frequently between the amino acid side groups in proteins and peptides. Data-mining studies of protein crystals find that ~80% of the T-shaped histidine···aromatic contacts are CH···π, and only ~20% are NH···π interactions. We investigated the infrared (IR) and ultraviolet (UV) spectra of the supersonic-jet-cooled imidazole·benzene (Im·Bz) complex as a model for the NH···π interaction between histidine and phenylalanine. Ground- and excited-state dispersion-corrected density functional calculations and correlated methods (SCS-MP2 and SCS-CC2) predict that Im·Bz has a Cs-symmetric T-shaped minimum-energy structure with an NH···π hydrogen bond to the Bz ring; the NH bond is tilted 12° away from the Bz C₆ axis. IR depletion spectra support the T-shaped geometry: The NH stretch vibrational fundamental is red shifted by −73 cm⁻¹ relative to that of bare imidazole at 3518 cm⁻¹, indicating a moderately strong NH···π interaction. While the Sₒ(A1g) → S₁(B₂u) origin of benzene at 38 086 cm⁻¹ is forbidden in the gas phase, Im·Bz exhibits a moderately intense Sₒ → S₁ origin, which appears via the D₆h → Cs symmetry lowering of Bz by its interaction with imidazole. The NH···π ground-state hydrogen bond is strong, De=22.7 kJ/mol (1899 cm⁻¹). The combination of gas-phase UV and IR spectra confirms the theoretical predictions that the optimum Im·Bz geometry is T shaped and NH···π hydrogen bonded. We find no experimental evidence for a CH···π hydrogen-bonded ground-state isomer of Im·Bz. The optimum NH···π geometry of the Im·Bz complex is very different from the majority of the histidine·aromatic contact geometries found in protein database analyses, implying that the CH···π contacts observed in these searches do not arise from favorable binding interactions but merely from protein side-chain folding and crystal-packing constraints. The UV and IR spectra of the imidazole·(benzene)₂ cluster are observed via fragmentation into the Im·Bz+ mass channel. The spectra of Im·Bz and Im·Bz₂ are cleanly separable by IR hole burning. The UV spectrum of Im·Bz₂ exhibits two 000 bands corresponding to the Sₒ → S₁ excitations of the two inequivalent benzenes, which are symmetrically shifted by −86/+88 cm⁻¹ relative to the 000 band of benzene.
Resumo:
Intensive family preservation services (IFPS), designed to stabilize at-risk families and avert out-of-home care, have been the focus of many randomized, experimental studies. Employing a retrospective “clinical data-mining” (CDM) methodology (Epstein, 2001), this study makes use of available information extracted from client records in one IFPS agency over the course of two years. The primary goal of this descriptive and associational study was to gain a clearer understanding of IFPS service delivery and effectiveness. Interventions provided to families are delineated and assessed for their impact on improved family functioning, their impact on the reduction of family violence, as well as placement prevention. Findings confirm the use of a wide range of services consistent with IFPS program theory. Because the study employs a quasi-experimental, retrospective use of available information, clinical outcomes described cannot be causally attributed to interventions employed as with randomized controlled trials. With regard to service outcomes, findings suggest that family education, empowerment services and advocacy are most influential in placement prevention and in ameliorating unmanageable behaviors in children as well as the incidence of family violence.
Resumo:
Intensive family preservation services (IFPS), designed to stabilize at-risk families and avert out-of-home care, have been the focus of many randomized, experimental studies. The emphasis on "gold-standard" evaluation of IFPS has resulted in fewer "black box" studies that describe actual IFPS service patterns and the fidelity with which they adhere to IFPS program theory. Intervention research is important to the advancement of programs designed to protect the safety of children, improve family functioning, as well as prevent out-of-home placement. Employing a retrospective “clinical data-mining” (CDM) methodology, this exploratory study of Families First, an IFPS program, makes use of available information extracted from client records to describe interventions and service patterns provided over a two year period. This study uncovers actual IFPS service patterns, demonstrates IFPS program fidelity, as well as reveals the usefulness of CDM as a social work research methodology. These findings are particularly valuable for program planning and treatment, policy development and evidence-based practice research.
Resumo:
The recent development of in-situ monitoring devices, such as UV-spectrometers, makes the study of short-term stream chemistry variation relevant, especially the study of diurnal cycles, which are not yet fully understood. Our study is based on high-frequency data from an agricultural catchment (Studienlandschaft Schwingbachtal, Germany). We propose a novel approach, i.e. the combination of cluster analysis and Linear Discriminant Analysis, to mine from these data nitrate behavior patterns. As a result, we observe a seasonality of nitrate diurnal cycles, that differs from the most common cycle seasonality described in the literature, i.e. pre-dawn peaks in spring. Our cycles appear in summer and the maximum and minimum shift to a later time in late summer/autumn. This is observed both for water- and energy-limited years, thus potentially stressing the role of evapotranspiration. This concluding hypothesis on the role of evapotranspiration on nitrate stream concentration, which was obtained through data mining, broadens the perspective on the diurnal cycling of stream nitrate concentrations.
Resumo:
This poster raises the issue of a research work oriented to the storage, retrieval, representation and analysis of dynamic GI, taking into account The ultimate objective is the modelling and representation of the dynamic nature of geographic features, establishing mechanisms to store geometries enriched with a temporal structure (regardless of space) and a set of semantic descriptors detailing and clarifying the nature of the represented features and their temporality. the semantic, the temporal and the spatiotemporal components. We intend to define a set of methods, rules and restrictions for the adequate integration of these components into the primary elements of the GI: theme, location, time [1]. We intend to establish and incorporate three new structures (layers) into the core of data storage by using mark-up languages: a semantictemporal structure, a geosemantic structure, and an incremental spatiotemporal structure. Thus, data would be provided with the capability of pinpointing and expressing their own basic and temporal characteristics, enabling them to interact each other according to their context, and their time and meaning relationships that could be eventually established
Resumo:
Multi-user videoconferencing systems offer communication between more than two users, who are able to interact through their webcams, microphones and other components. The use of these systems has been increased recently due to, on the one hand, improvements in Internet access, networks of companies, universities and houses, whose available bandwidth has been increased whilst the delay in sending and receiving packets has decreased. On the other hand, the advent of Rich Internet Applications (RIA) means that a large part of web application logic and control has started to be implemented on the web browsers. This has allowed developers to create web applications with a level of complexity comparable to traditional desktop applications, running on top of the Operating Systems. More recently the use of Cloud Computing systems has improved application scalability and involves a reduction in the price of backend systems. This offers the possibility of implementing web services on the Internet with no need to spend a lot of money when deploying infrastructures and resources, both hardware and software. Nevertheless there are not many initiatives that aim to implement videoconferencing systems taking advantage of Cloud systems. This dissertation proposes a set of techniques, interfaces and algorithms for the implementation of videoconferencing systems in public and private Cloud Computing infrastructures. The mechanisms proposed here are based on the implementation of a basic videoconferencing system that runs on the web browser without any previous installation requirements. To this end, the development of this thesis starts from a RIA application with current technologies that allow users to access their webcams and microphones from the browser, and to send captured data through their Internet connections. Furthermore interfaces have been implemented to allow end users to participate in videoconferencing rooms that are managed in different Cloud provider servers. To do so this dissertation starts from the results obtained from the previous techniques and backend resources were implemented in the Cloud. A traditional videoconferencing service which was implemented in the department was modified to meet typical Cloud Computing infrastructure requirements. This allowed us to validate whether Cloud Computing public infrastructures are suitable for the traffic generated by this kind of system. This analysis focused on the network level and processing capacity and stability of the Cloud Computing systems. In order to improve this validation several other general considerations were taken in order to cover more cases, such as multimedia data processing in the Cloud, as research activity has increased in this area in recent years. The last stage of this dissertation is the design of a new methodology to implement these kinds of applications in hybrid clouds reducing the cost of videoconferencing systems. Finally, this dissertation opens up a discussion about the conclusions obtained throughout this study, resulting in useful information from the different stages of the implementation of videoconferencing systems in Cloud Computing systems. RESUMEN Los sistemas de videoconferencia multiusuario permiten la comunicación entre más de dos usuarios que pueden interactuar a través de cámaras de video, micrófonos y otros elementos. En los últimos años el uso de estos sistemas se ha visto incrementado gracias, por un lado, a la mejora de las redes de acceso en las conexiones a Internet en empresas, universidades y viviendas, que han visto un aumento del ancho de banda disponible en dichas conexiones y una disminución en el retardo experimentado por los datos enviados y recibidos. Por otro lado también ayudó la aparación de las Aplicaciones Ricas de Internet (RIA) con las que gran parte de la lógica y del control de las aplicaciones web comenzó a ejecutarse en los mismos navegadores. Esto permitió a los desarrolladores la creación de aplicaciones web cuya complejidad podía compararse con la de las tradicionales aplicaciones de escritorio, ejecutadas directamente por los sistemas operativos. Más recientemente el uso de sistemas de Cloud Computing ha mejorado la escalabilidad y el abaratamiento de los costes para sistemas de backend, ofreciendo la posibilidad de implementar servicios Web en Internet sin la necesidad de grandes desembolsos iniciales en las áreas de infraestructuras y recursos tanto hardware como software. Sin embargo no existen aún muchas iniciativas con el objetivo de realizar sistemas de videoconferencia que aprovechen las ventajas del Cloud. Esta tesis doctoral propone un conjunto de técnicas, interfaces y algoritmos para la implentación de sistemas de videoconferencia en infraestructuras tanto públicas como privadas de Cloud Computing. Las técnicas propuestas en la tesis se basan en la realización de un servicio básico de videoconferencia que se ejecuta directamente en el navegador sin la necesidad de instalar ningún tipo de aplicación de escritorio. Para ello el desarrollo de esta tesis parte de una aplicación RIA con tecnologías que hoy en día permiten acceder a la cámara y al micrófono directamente desde el navegador, y enviar los datos que capturan a través de la conexión de Internet. Además se han implementado interfaces que permiten a usuarios finales la participación en salas de videoconferencia que se ejecutan en servidores de proveedores de Cloud. Para ello se partió de los resultados obtenidos en las técnicas anteriores de ejecución de aplicaciones en el navegador y se implementaron los recursos de backend en la nube. Además se modificó un servicio ya existente implementado en el departamento para adaptarlo a los requisitos típicos de las infraestructuras de Cloud Computing. Alcanzado este punto se procedió a analizar si las infraestructuras propias de los proveedores públicos de Cloud Computing podrían soportar el tráfico generado por los sistemas que se habían adaptado. Este análisis se centró tanto a nivel de red como a nivel de capacidad de procesamiento y estabilidad de los sistemas. Para los pasos de análisis y validación de los sistemas Cloud se tomaron consideraciones más generales para abarcar casos como el procesamiento de datos multimedia en la nube, campo en el que comienza a haber bastante investigación en los últimos años. Como último paso se ideó una metodología de implementación de este tipo de aplicaciones para que fuera posible abaratar los costes de los sistemas de videoconferencia haciendo uso de clouds híbridos. Finalmente en la tesis se abre una discusión sobre las conclusiones obtenidas a lo largo de este amplio estudio, obteniendo resultados útiles en las distintas etapas de implementación de los sistemas de videoconferencia en la nube.
Resumo:
By combining complex network theory and data mining techniques, we provide objective criteria for optimization of the functional network representation of generic multivariate time series. In particular, we propose a method for the principled selection of the threshold value for functional network reconstruction from raw data, and for proper identification of the network's indicators that unveil the most discriminative information on the system for classification purposes. We illustrate our method by analysing networks of functional brain activity of healthy subjects, and patients suffering from Mild Cognitive Impairment, an intermediate stage between the expected cognitive decline of normal aging and the more pronounced decline of dementia. We discuss extensions of the scope of the proposed methodology to network engineering purposes, and to other data mining tasks.
Resumo:
The goal of the W3C's Media Annotation Working Group (MAWG) is to promote interoperability between multimedia metadata formats on the Web. As experienced by everybody, audiovisual data is omnipresent on today's Web. However, different interaction interfaces and especially diverse metadata formats prevent unified search, access, and navigation. MAWG has addressed this issue by developing an interlingua ontology and an associated API. This article discusses the rationale and core concepts of the ontology and API for media resources. The specifications developed by MAWG enable interoperable contextualized and semantic annotation and search, independent of the source metadata format, and connecting multimedia data to the Linked Data cloud. Some demonstrators of such applications are also presented in this article.
Resumo:
Data mining, and in particular decision trees have been used in different fields: engineering, medicine, banking and finance, etc., to analyze a target variable through decision variables. The following article examines the use of the decision trees algorithm as a tool in territorial logistic planning. The decision tree built has estimated population density indexes for territorial units with similar logistics characteristics in a concise and practical way.
Resumo:
The origins for this work arise in response to the increasing need for biologists and doctors to obtain tools for visual analysis of data. When dealing with multidimensional data, such as medical data, the traditional data mining techniques can be a tedious and complex task, even to some medical experts. Therefore, it is necessary to develop useful visualization techniques that can complement the expert’s criterion, and at the same time visually stimulate and make easier the process of obtaining knowledge from a dataset. Thus, the process of interpretation and understanding of the data can be greatly enriched. Multidimensionality is inherent to any medical data, requiring a time-consuming effort to get a clinical useful outcome. Unfortunately, both clinicians and biologists are not trained in managing more than four dimensions. Specifically, we were aimed to design a 3D visual interface for gene profile analysis easy in order to be used both by medical and biologist experts. In this way, a new analysis method is proposed: MedVir. This is a simple and intuitive analysis mechanism based on the visualization of any multidimensional medical data in a three dimensional space that allows interaction with experts in order to collaborate and enrich this representation. In other words, MedVir makes a powerful reduction in data dimensionality in order to represent the original information into a three dimensional environment. The experts can interact with the data and draw conclusions in a visual and quickly way.
Resumo:
Nanotechnology represents an area of particular promise and significant opportunity across multiple scientific disciplines. Ongoing nanotechnology research ranges from the characterization of nanoparticles and nanomaterials to the analysis and processing of experimental data seeking correlations between nanoparticles and their functionalities and side effects. Due to their special properties, nanoparticles are suitable for cellular-level diagnostics and therapy, offering numerous applications in medicine, e.g. development of biomedical devices, tissue repair, drug delivery systems and biosensors. In nanomedicine, recent studies are producing large amounts of structural and property data, highlighting the role for computational approaches in information management. While in vitro and in vivo assays are expensive, the cost of computing is falling. Furthermore, improvements in the accuracy of computational methods (e.g. data mining, knowledge discovery, modeling and simulation) have enabled effective tools to automate the extraction, management and storage of these vast data volumes. Since this information is widely distributed, one major issue is how to locate and access data where it resides (which also poses data-sharing limitations). The novel discipline of nanoinformatics addresses the information challenges related to nanotechnology research. In this paper, we summarize the needs and challenges in the field and present an overview of extant initiatives and efforts.
Resumo:
Los avances en el hardware permiten disponer de grandes volúmenes de datos, surgiendo aplicaciones que deben suministrar información en tiempo cuasi-real, la monitorización de pacientes, ej., el seguimiento sanitario de las conducciones de agua, etc. Las necesidades de estas aplicaciones hacen emerger el modelo de flujo de datos (data streaming) frente al modelo almacenar-para-despuésprocesar (store-then-process). Mientras que en el modelo store-then-process, los datos son almacenados para ser posteriormente consultados; en los sistemas de streaming, los datos son procesados a su llegada al sistema, produciendo respuestas continuas sin llegar a almacenarse. Esta nueva visión impone desafíos para el procesamiento de datos al vuelo: 1) las respuestas deben producirse de manera continua cada vez que nuevos datos llegan al sistema; 2) los datos son accedidos solo una vez y, generalmente, no son almacenados en su totalidad; y 3) el tiempo de procesamiento por dato para producir una respuesta debe ser bajo. Aunque existen dos modelos para el cómputo de respuestas continuas, el modelo evolutivo y el de ventana deslizante; éste segundo se ajusta mejor en ciertas aplicaciones al considerar únicamente los datos recibidos más recientemente, en lugar de todo el histórico de datos. En los últimos años, la minería de datos en streaming se ha centrado en el modelo evolutivo. Mientras que, en el modelo de ventana deslizante, el trabajo presentado es más reducido ya que estos algoritmos no sólo deben de ser incrementales si no que deben borrar la información que caduca por el deslizamiento de la ventana manteniendo los anteriores tres desafíos. Una de las tareas fundamentales en minería de datos es la búsqueda de agrupaciones donde, dado un conjunto de datos, el objetivo es encontrar grupos representativos, de manera que se tenga una descripción sintética del conjunto. Estas agrupaciones son fundamentales en aplicaciones como la detección de intrusos en la red o la segmentación de clientes en el marketing y la publicidad. Debido a las cantidades masivas de datos que deben procesarse en este tipo de aplicaciones (millones de eventos por segundo), las soluciones centralizadas puede ser incapaz de hacer frente a las restricciones de tiempo de procesamiento, por lo que deben recurrir a descartar datos durante los picos de carga. Para evitar esta perdida de datos, se impone el procesamiento distribuido de streams, en concreto, los algoritmos de agrupamiento deben ser adaptados para este tipo de entornos, en los que los datos están distribuidos. En streaming, la investigación no solo se centra en el diseño para tareas generales, como la agrupación, sino también en la búsqueda de nuevos enfoques que se adapten mejor a escenarios particulares. Como ejemplo, un mecanismo de agrupación ad-hoc resulta ser más adecuado para la defensa contra la denegación de servicio distribuida (Distributed Denial of Services, DDoS) que el problema tradicional de k-medias. En esta tesis se pretende contribuir en el problema agrupamiento en streaming tanto en entornos centralizados y distribuidos. Hemos diseñado un algoritmo centralizado de clustering mostrando las capacidades para descubrir agrupaciones de alta calidad en bajo tiempo frente a otras soluciones del estado del arte, en una amplia evaluación. Además, se ha trabajado sobre una estructura que reduce notablemente el espacio de memoria necesario, controlando, en todo momento, el error de los cómputos. Nuestro trabajo también proporciona dos protocolos de distribución del cómputo de agrupaciones. Se han analizado dos características fundamentales: el impacto sobre la calidad del clustering al realizar el cómputo distribuido y las condiciones necesarias para la reducción del tiempo de procesamiento frente a la solución centralizada. Finalmente, hemos desarrollado un entorno para la detección de ataques DDoS basado en agrupaciones. En este último caso, se ha caracterizado el tipo de ataques detectados y se ha desarrollado una evaluación sobre la eficiencia y eficacia de la mitigación del impacto del ataque. ABSTRACT Advances in hardware allow to collect huge volumes of data emerging applications that must provide information in near-real time, e.g., patient monitoring, health monitoring of water pipes, etc. The data streaming model emerges to comply with these applications overcoming the traditional store-then-process model. With the store-then-process model, data is stored before being consulted; while, in streaming, data are processed on the fly producing continuous responses. The challenges of streaming for processing data on the fly are the following: 1) responses must be produced continuously whenever new data arrives in the system; 2) data is accessed only once and is generally not maintained in its entirety, and 3) data processing time to produce a response should be low. Two models exist to compute continuous responses: the evolving model and the sliding window model; the latter fits best with applications must be computed over the most recently data rather than all the previous data. In recent years, research in the context of data stream mining has focused mainly on the evolving model. In the sliding window model, the work presented is smaller since these algorithms must be incremental and they must delete the information which expires when the window slides. Clustering is one of the fundamental techniques of data mining and is used to analyze data sets in order to find representative groups that provide a concise description of the data being processed. Clustering is critical in applications such as network intrusion detection or customer segmentation in marketing and advertising. Due to the huge amount of data that must be processed by such applications (up to millions of events per second), centralized solutions are usually unable to cope with timing restrictions and recur to shedding techniques where data is discarded during load peaks. To avoid discarding of data, processing of streams (such as clustering) must be distributed and adapted to environments where information is distributed. In streaming, research does not only focus on designing for general tasks, such as clustering, but also in finding new approaches that fit bests with particular scenarios. As an example, an ad-hoc grouping mechanism turns out to be more adequate than k-means for defense against Distributed Denial of Service (DDoS). This thesis contributes to the data stream mining clustering technique both for centralized and distributed environments. We present a centralized clustering algorithm showing capabilities to discover clusters of high quality in low time and we provide a comparison with existing state of the art solutions. We have worked on a data structure that significantly reduces memory requirements while controlling the error of the clusters statistics. We also provide two distributed clustering protocols. We focus on the analysis of two key features: the impact on the clustering quality when computation is distributed and the requirements for reducing the processing time compared to the centralized solution. Finally, with respect to ad-hoc grouping techniques, we have developed a DDoS detection framework based on clustering.We have characterized the attacks detected and we have evaluated the efficiency and effectiveness of mitigating the attack impact.