840 resultados para Data Mining, Clustering, PSA, Pavement Deflection


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The increase in new electronic devices had generated a considerable increase in obtaining spatial data information; hence these data are becoming more and more widely used. As well as for conventional data, spatial data need to be analyzed so interesting information can be retrieved from them. Therefore, data clustering techniques can be used to extract clusters of a set of spatial data. However, current approaches do not consider the implicit semantics that exist between a region and an object’s attributes. This paper presents an approach that enhances spatial data mining process, so they can use the semantic that exists within a region. A framework was developed, OntoSDM, which enables spatial data mining algorithms to communicate with ontologies in order to enhance the algorithm’s result. The experiments demonstrated a semantically improved result, generating more interesting clusters, therefore reducing manual analysis work of an expert.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In [1], the authors proposed a framework for automated clustering and visualization of biological data sets named AUTO-HDS. This letter is intended to complement that framework by showing that it is possible to get rid of a user-defined parameter in a way that the clustering stage can be implemented more accurately while having reduced computational complexity

Relevância:

100.00% 100.00%

Publicador:

Resumo:

[ES]En este artículo se describe la experiencia de la aplicación de técnicas de EDM (clustering) a un curso disponible en la plataforma Ude@ de la Universidad de Antioquia. El objetivo es clasificar los patrones de interacción de los estudiantes a partir de la información almacenada en la base de datos de la plataforma Moodle. Para ello, se generan informes sobre el uso de los recursos y la autoevaluación que permiten analizar el comportamiento y los patrones de navegación de los estudiantes durante el uso del LMS (Learning Management System).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Advances in biomedical signal acquisition systems for motion analysis have led to lowcost and ubiquitous wearable sensors which can be used to record movement data in different settings. This implies the potential availability of large amounts of quantitative data. It is then crucial to identify and to extract the information of clinical relevance from the large amount of available data. This quantitative and objective information can be an important aid for clinical decision making. Data mining is the process of discovering such information in databases through data processing, selection of informative data, and identification of relevant patterns. The databases considered in this thesis store motion data from wearable sensors (specifically accelerometers) and clinical information (clinical data, scores, tests). The main goal of this thesis is to develop data mining tools which can provide quantitative information to the clinician in the field of movement disorders. This thesis will focus on motor impairment in Parkinson's disease (PD). Different databases related to Parkinson subjects in different stages of the disease were considered for this thesis. Each database is characterized by the data recorded during a specific motor task performed by different groups of subjects. The data mining techniques that were used in this thesis are feature selection (a technique which was used to find relevant information and to discard useless or redundant data), classification, clustering, and regression. The aims were to identify high risk subjects for PD, characterize the differences between early PD subjects and healthy ones, characterize PD subtypes and automatically assess the severity of symptoms in the home setting.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the last few years there has been a heightened interest in data treatment and analysis with the aim of discovering hidden knowledge and eliciting relationships and patterns within this data. Data mining techniques (also known as Knowledge Discovery in Databases) have been applied over a wide range of fields such as marketing, investment, fraud detection, manufacturing, telecommunications and health. In this study, well-known data mining techniques such as artificial neural networks (ANN), genetic programming (GP), forward selection linear regression (LR) and k-means clustering techniques, are proposed to the health and sports community in order to aid with resistance training prescription. Appropriate resistance training prescription is effective for developing fitness, health and for enhancing general quality of life. Resistance exercise intensity is commonly prescribed as a percent of the one repetition maximum. 1RM, dynamic muscular strength, one repetition maximum or one execution maximum, is operationally defined as the heaviest load that can be moved over a specific range of motion, one time and with correct performance. The safety of the 1RM assessment has been questioned as such an enormous effort may lead to muscular injury. Prediction equations could help to tackle the problem of predicting the 1RM from submaximal loads, in order to avoid or at least, reduce the associated risks. We built different models from data on 30 men who performed up to 5 sets to exhaustion at different percentages of the 1RM in the bench press action, until reaching their actual 1RM. Also, a comparison of different existing prediction equations is carried out. The LR model seems to outperform the ANN and GP models for the 1RM prediction in the range between 1 and 10 repetitions. At 75% of the 1RM some subjects (n = 5) could perform 13 repetitions with proper technique in the bench press action, whilst other subjects (n = 20) performed statistically significant (p < 0:05) more repetitions at 70% than at 75% of their actual 1RM in the bench press action. Rate of perceived exertion (RPE) seems not to be a good predictor for 1RM when all the sets are performed until exhaustion, as no significant differences (p < 0:05) were found in the RPE at 75%, 80% and 90% of the 1RM. Also, years of experience and weekly hours of strength training are better correlated to 1RM (p < 0:05) than body weight. O'Connor et al. 1RM prediction equation seems to arise from the data gathered and seems to be the most accurate 1RM prediction equation from those proposed in literature and used in this study. Epley's 1RM prediction equation is reproduced by means of data simulation from 1RM literature equations. Finally, future lines of research are proposed related to the problem of the 1RM prediction by means of genetic algorithms, neural networks and clustering techniques. RESUMEN En los últimos años ha habido un creciente interés en el tratamiento y análisis de datos con el propósito de descubrir relaciones, patrones y conocimiento oculto en los mismos. Las técnicas de data mining (también llamadas de \Descubrimiento de conocimiento en bases de datos\) se han aplicado consistentemente a lo gran de un gran espectro de áreas como el marketing, inversiones, detección de fraude, producción industrial, telecomunicaciones y salud. En este estudio, técnicas bien conocidas de data mining como las redes neuronales artificiales (ANN), programación genética (GP), regresión lineal con selección hacia adelante (LR) y la técnica de clustering k-means, se proponen a la comunidad del deporte y la salud con el objetivo de ayudar con la prescripción del entrenamiento de fuerza. Una apropiada prescripción de entrenamiento de fuerza es efectiva no solo para mejorar el estado de forma general, sino para mejorar la salud e incrementar la calidad de vida. La intensidad en un ejercicio de fuerza se prescribe generalmente como un porcentaje de la repetición máxima. 1RM, fuerza muscular dinámica, una repetición máxima o una ejecución máxima, se define operacionalmente como la carga máxima que puede ser movida en un rango de movimiento específico, una vez y con una técnica correcta. La seguridad de las pruebas de 1RM ha sido cuestionada debido a que el gran esfuerzo requerido para llevarlas a cabo puede derivar en serias lesiones musculares. Las ecuaciones predictivas pueden ayudar a atajar el problema de la predicción de la 1RM con cargas sub-máximas y son empleadas con el propósito de eliminar o al menos, reducir los riesgos asociados. En este estudio, se construyeron distintos modelos a partir de los datos recogidos de 30 hombres que realizaron hasta 5 series al fallo en el ejercicio press de banca a distintos porcentajes de la 1RM, hasta llegar a su 1RM real. También se muestra una comparación de algunas de las distintas ecuaciones de predicción propuestas con anterioridad. El modelo LR parece superar a los modelos ANN y GP para la predicción de la 1RM entre 1 y 10 repeticiones. Al 75% de la 1RM algunos sujetos (n = 5) pudieron realizar 13 repeticiones con una técnica apropiada en el ejercicio press de banca, mientras que otros (n = 20) realizaron significativamente (p < 0:05) más repeticiones al 70% que al 75% de su 1RM en el press de banca. El ínndice de esfuerzo percibido (RPE) parece no ser un buen predictor del 1RM cuando todas las series se realizan al fallo, puesto que no existen diferencias signifiativas (p < 0:05) en el RPE al 75%, 80% y el 90% de la 1RM. Además, los años de experiencia y las horas semanales dedicadas al entrenamiento de fuerza están más correlacionadas con la 1RM (p < 0:05) que el peso corporal. La ecuación de O'Connor et al. parece surgir de los datos recogidos y parece ser la ecuación de predicción de 1RM más precisa de aquellas propuestas en la literatura y empleadas en este estudio. La ecuación de predicción de la 1RM de Epley es reproducida mediante simulación de datos a partir de algunas ecuaciones de predicción de la 1RM propuestas con anterioridad. Finalmente, se proponen futuras líneas de investigación relacionadas con el problema de la predicción de la 1RM mediante algoritmos genéticos, redes neuronales y técnicas de clustering.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

En esta memoria se presenta el diseño y desarrollo de una aplicación en la nube destinada a la compartición de objetos y servicios. El desarrollo de esta aplicación surge dentro del proyecto de I+D+i, SITAC: Social Internet of Things – Apps by and for the Crowd ITEA 2 11020, que trata de crear una arquitectura integradora y un “ecosistema” que incluya plataformas, herramientas y metodologías para facilitar la conexión y cooperación de entidades de distinto tipo conectadas a la red bien sean sistemas, máquinas, dispositivos o personas con dispositivos móviles personales como tabletas o teléfonos móviles. El proyecto innovará mediante la utilización de un modelo inspirado en las redes sociales para facilitar y unificar las interacciones tanto entre personas como entre personas y dispositivos. En este contexto surge la necesidad de desarrollar una aplicación destinada a la compartición de recursos en la nube que pueden ser tanto lógicos como físicos, y que esté orientada al big data. Ésta será la aplicación presentada en este trabajo, el “Resource Sharing Center”, que ofrece un servicio web para el intercambio y compartición de contenido, y un motor de recomendaciones basado en las preferencias de los usuarios. Con este objetivo, se han usado tecnologías de despliegue en la nube, como Elastic Beanstalk (el PaaS de Amazon Web Services), S3 (el sistema de almacenamiento de Amazon Web Services), SimpleDB (base de datos NoSQL) y HTML5 con JavaScript y Twitter Bootstrap para el desarrollo del front-end, siendo Python y Node.js las tecnologías usadas en el back end, y habiendo contribuido a la mejora de herramientas de clustering sobre big data. Por último, y de cara a realizar el estudio sobre las pruebas de carga de la aplicación se ha usado la herramienta ApacheJMeter.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Categorising visitors based on their interaction with a website is a key problem in Web content usage. The clickstreams generated by various users often follow distinct patterns, the knowledge of which may help in providing customised content. This paper proposes an approach to clustering weblog data, based on ART2 neural networks. Due to the characteristics of the ART2 neural network model, the proposed approach can be used for unsupervised and self-learning data mining, which makes it adaptable to dynamically changing websites.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the explosive growth of the volume and complexity of document data (e.g., news, blogs, web pages), it has become a necessity to semantically understand documents and deliver meaningful information to users. Areas dealing with these problems are crossing data mining, information retrieval, and machine learning. For example, document clustering and summarization are two fundamental techniques for understanding document data and have attracted much attention in recent years. Given a collection of documents, document clustering aims to partition them into different groups to provide efficient document browsing and navigation mechanisms. One unrevealed area in document clustering is that how to generate meaningful interpretation for the each document cluster resulted from the clustering process. Document summarization is another effective technique for document understanding, which generates a summary by selecting sentences that deliver the major or topic-relevant information in the original documents. How to improve the automatic summarization performance and apply it to newly emerging problems are two valuable research directions. To assist people to capture the semantics of documents effectively and efficiently, the dissertation focuses on developing effective data mining and machine learning algorithms and systems for (1) integrating document clustering and summarization to obtain meaningful document clusters with summarized interpretation, (2) improving document summarization performance and building document understanding systems to solve real-world applications, and (3) summarizing the differences and evolution of multiple document sources.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Due to the rapid advances in computing and sensing technologies, enormous amounts of data are being generated everyday in various applications. The integration of data mining and data visualization has been widely used to analyze these massive and complex data sets to discover hidden patterns. For both data mining and visualization to be effective, it is important to include the visualization techniques in the mining process and to generate the discovered patterns for a more comprehensive visual view. In this dissertation, four related problems: dimensionality reduction for visualizing high dimensional datasets, visualization-based clustering evaluation, interactive document mining, and multiple clusterings exploration are studied to explore the integration of data mining and data visualization. In particular, we 1) propose an efficient feature selection method (reliefF + mRMR) for preprocessing high dimensional datasets; 2) present DClusterE to integrate cluster validation with user interaction and provide rich visualization tools for users to examine document clustering results from multiple perspectives; 3) design two interactive document summarization systems to involve users efforts and generate customized summaries from 2D sentence layouts; and 4) propose a new framework which organizes the different input clusterings into a hierarchical tree structure and allows for interactive exploration of multiple clustering solutions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data mining, as a heatedly discussed term, has been studied in various fields. Its possibilities in refining the decision-making process, realizing potential patterns and creating valuable knowledge have won attention of scholars and practitioners. However, there are less studies intending to combine data mining and libraries where data generation occurs all the time. Therefore, this thesis plans to fill such a gap. Meanwhile, potential opportunities created by data mining are explored to enhance one of the most important elements of libraries: reference service. In order to thoroughly demonstrate the feasibility and applicability of data mining, literature is reviewed to establish a critical understanding of data mining in libraries and attain the current status of library reference service. The result of the literature review indicates that free online data resources other than data generated on social media are rarely considered to be applied in current library data mining mandates. Therefore, the result of the literature review motivates the presented study to utilize online free resources. Furthermore, the natural match between data mining and libraries is established. The natural match is explained by emphasizing the data richness reality and considering data mining as one kind of knowledge, an easy choice for libraries, and a wise method to overcome reference service challenges. The natural match, especially the aspect that data mining could be helpful for library reference service, lays the main theoretical foundation for the empirical work in this study. Turku Main Library was selected as the case to answer the research question: whether data mining is feasible and applicable for reference service improvement. In this case, the daily visit from 2009 to 2015 in Turku Main Library is considered as the resource for data mining. In addition, corresponding weather conditions are collected from Weather Underground, which is totally free online. Before officially being analyzed, the collected dataset is cleansed and preprocessed in order to ensure the quality of data mining. Multiple regression analysis is employed to mine the final dataset. Hourly visits are the independent variable and weather conditions, Discomfort Index and seven days in a week are dependent variables. In the end, four models in different seasons are established to predict visiting situations in each season. Patterns are realized in different seasons and implications are created based on the discovered patterns. In addition, library-climate points are generated by a clustering method, which simplifies the process for librarians using weather data to forecast library visiting situation. Then the data mining result is interpreted from the perspective of improving reference service. After this data mining work, the result of the case study is presented to librarians so as to collect professional opinions regarding the possibility of employing data mining to improve reference services. In the end, positive opinions are collected, which implies that it is feasible to utilizing data mining as a tool to enhance library reference service.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Clustering data streams is an important task in data mining research. Recently, some algorithms have been proposed to cluster data streams as a whole, but just few of them deal with multivariate data streams. Even so, these algorithms merely aggregate the attributes without touching upon the correlation among them. In order to overcome this issue, we propose a new framework to cluster multivariate data streams based on their evolving behavior over time, exploring the correlations among their attributes by computing the fractal dimension. Experimental results with climate data streams show that the clusters' quality and compactness can be improved compared to the competing method, leading to the thoughtfulness that attributes correlations cannot be put aside. In fact, the clusters' compactness are 7 to 25 times better using our method. Our framework also proves to be an useful tool to assist meteorologists in understanding the climate behavior along a period of time.

Relevância:

100.00% 100.00%

Publicador: