979 resultados para clustering techniques


Relevância:

60.00% 60.00%

Publicador:

Resumo:

In the last few years there has been a heightened interest in data treatment and analysis with the aim of discovering hidden knowledge and eliciting relationships and patterns within this data. Data mining techniques (also known as Knowledge Discovery in Databases) have been applied over a wide range of fields such as marketing, investment, fraud detection, manufacturing, telecommunications and health. In this study, well-known data mining techniques such as artificial neural networks (ANN), genetic programming (GP), forward selection linear regression (LR) and k-means clustering techniques, are proposed to the health and sports community in order to aid with resistance training prescription. Appropriate resistance training prescription is effective for developing fitness, health and for enhancing general quality of life. Resistance exercise intensity is commonly prescribed as a percent of the one repetition maximum. 1RM, dynamic muscular strength, one repetition maximum or one execution maximum, is operationally defined as the heaviest load that can be moved over a specific range of motion, one time and with correct performance. The safety of the 1RM assessment has been questioned as such an enormous effort may lead to muscular injury. Prediction equations could help to tackle the problem of predicting the 1RM from submaximal loads, in order to avoid or at least, reduce the associated risks. We built different models from data on 30 men who performed up to 5 sets to exhaustion at different percentages of the 1RM in the bench press action, until reaching their actual 1RM. Also, a comparison of different existing prediction equations is carried out. The LR model seems to outperform the ANN and GP models for the 1RM prediction in the range between 1 and 10 repetitions. At 75% of the 1RM some subjects (n = 5) could perform 13 repetitions with proper technique in the bench press action, whilst other subjects (n = 20) performed statistically significant (p < 0:05) more repetitions at 70% than at 75% of their actual 1RM in the bench press action. Rate of perceived exertion (RPE) seems not to be a good predictor for 1RM when all the sets are performed until exhaustion, as no significant differences (p < 0:05) were found in the RPE at 75%, 80% and 90% of the 1RM. Also, years of experience and weekly hours of strength training are better correlated to 1RM (p < 0:05) than body weight. O'Connor et al. 1RM prediction equation seems to arise from the data gathered and seems to be the most accurate 1RM prediction equation from those proposed in literature and used in this study. Epley's 1RM prediction equation is reproduced by means of data simulation from 1RM literature equations. Finally, future lines of research are proposed related to the problem of the 1RM prediction by means of genetic algorithms, neural networks and clustering techniques. RESUMEN En los últimos años ha habido un creciente interés en el tratamiento y análisis de datos con el propósito de descubrir relaciones, patrones y conocimiento oculto en los mismos. Las técnicas de data mining (también llamadas de \Descubrimiento de conocimiento en bases de datos\) se han aplicado consistentemente a lo gran de un gran espectro de áreas como el marketing, inversiones, detección de fraude, producción industrial, telecomunicaciones y salud. En este estudio, técnicas bien conocidas de data mining como las redes neuronales artificiales (ANN), programación genética (GP), regresión lineal con selección hacia adelante (LR) y la técnica de clustering k-means, se proponen a la comunidad del deporte y la salud con el objetivo de ayudar con la prescripción del entrenamiento de fuerza. Una apropiada prescripción de entrenamiento de fuerza es efectiva no solo para mejorar el estado de forma general, sino para mejorar la salud e incrementar la calidad de vida. La intensidad en un ejercicio de fuerza se prescribe generalmente como un porcentaje de la repetición máxima. 1RM, fuerza muscular dinámica, una repetición máxima o una ejecución máxima, se define operacionalmente como la carga máxima que puede ser movida en un rango de movimiento específico, una vez y con una técnica correcta. La seguridad de las pruebas de 1RM ha sido cuestionada debido a que el gran esfuerzo requerido para llevarlas a cabo puede derivar en serias lesiones musculares. Las ecuaciones predictivas pueden ayudar a atajar el problema de la predicción de la 1RM con cargas sub-máximas y son empleadas con el propósito de eliminar o al menos, reducir los riesgos asociados. En este estudio, se construyeron distintos modelos a partir de los datos recogidos de 30 hombres que realizaron hasta 5 series al fallo en el ejercicio press de banca a distintos porcentajes de la 1RM, hasta llegar a su 1RM real. También se muestra una comparación de algunas de las distintas ecuaciones de predicción propuestas con anterioridad. El modelo LR parece superar a los modelos ANN y GP para la predicción de la 1RM entre 1 y 10 repeticiones. Al 75% de la 1RM algunos sujetos (n = 5) pudieron realizar 13 repeticiones con una técnica apropiada en el ejercicio press de banca, mientras que otros (n = 20) realizaron significativamente (p < 0:05) más repeticiones al 70% que al 75% de su 1RM en el press de banca. El ínndice de esfuerzo percibido (RPE) parece no ser un buen predictor del 1RM cuando todas las series se realizan al fallo, puesto que no existen diferencias signifiativas (p < 0:05) en el RPE al 75%, 80% y el 90% de la 1RM. Además, los años de experiencia y las horas semanales dedicadas al entrenamiento de fuerza están más correlacionadas con la 1RM (p < 0:05) que el peso corporal. La ecuación de O'Connor et al. parece surgir de los datos recogidos y parece ser la ecuación de predicción de 1RM más precisa de aquellas propuestas en la literatura y empleadas en este estudio. La ecuación de predicción de la 1RM de Epley es reproducida mediante simulación de datos a partir de algunas ecuaciones de predicción de la 1RM propuestas con anterioridad. Finalmente, se proponen futuras líneas de investigación relacionadas con el problema de la predicción de la 1RM mediante algoritmos genéticos, redes neuronales y técnicas de clustering.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The mobile apps market is a tremendous success, with millions of apps downloaded and used every day by users spread all around the world. For apps’ developers, having their apps published on one of the major app stores (e.g. Google Play market) is just the beginning of the apps lifecycle. Indeed, in order to successfully compete with the other apps in the market, an app has to be updated frequently by adding new attractive features and by fixing existing bugs. Clearly, any developer interested in increasing the success of her app should try to implement features desired by the app’s users and to fix bugs affecting the user experience of many of them. A precious source of information to decide how to collect users’ opinions and wishes is represented by the reviews left by users on the store from which they downloaded the app. However, to exploit such information the app’s developer should manually read each user review and verify if it contains useful information (e.g. suggestions for new features). This is something not doable if the app receives hundreds of reviews per day, as happens for the very popular apps on the market. In this work, our aim is to provide support to mobile apps developers by proposing a novel approach exploiting data mining, natural language processing, machine learning, and clustering techniques in order to classify the user reviews on the basis of the information they contain (e.g. useless, suggestion for new features, bugs reporting). Such an approach has been empirically evaluated and made available in a web-­‐based tool publicly available to all apps’ developers. The achieved results showed that the developed tool: (i) is able to correctly categorise user reviews on the basis of their content (e.g. isolating those reporting bugs) with 78% of accuracy, (ii) produces clusters of reviews (e.g. groups together reviews indicating exactly the same bug to be fixed) that are meaningful from a developer’s point-­‐of-­‐view, and (iii) is considered useful by a software company working in the mobile apps’ development market.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Optimal currency area theory suggests that business cycle comovement is a sufficient condition for monetary union, particularly if there are low levels of labour mobility between potential members of the monetary union. Previous studies of co-movement of business cycle variables (mainly authored by Artis and Zhang in the late 1990s) found that there was a core of member states in the EU that could be grouped together as having similar business cycle comovements, but these studies always used Germany as the country against which to compare. In this study, the analysis of Artis and Zhang is extended and updated but correlating against both German and euro area macroeconomic aggregates and using more recent techniques in cluster analysis, namely model-based clustering techniques.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper describes the application of a new technique, rough clustering, to the problem of market segmentation. Rough clustering produces different solutions to k-means analysis because of the possibility of multiple cluster membership of objects. Traditional clustering methods generate extensional descriptions of groups, that show which objects are members of each cluster. Clustering techniques based on rough sets theory generate intensional descriptions, which outline the main characteristics of each cluster. In this study, a rough cluster analysis was conducted on a sample of 437 responses from a larger study of the relationship between shopping orientation (the general predisposition of consumers toward the act of shopping) and intention to purchase products via the Internet. The cluster analysis was based on five measures of shopping orientation: enjoyment, personalization, convenience, loyalty, and price. The rough clusters obtained provide interpretations of different shopping orientations present in the data without the restriction of attempting to fit each object into only one segment. Such descriptions can be an aid to marketers attempting to identify potential segments of consumers.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Clustering techniques such as k-means and hierarchical clustering are commonly used to analyze DNA microarray derived gene expression data. However, the interactions between processes underlying the cell activity suggest that the complexity of the microarray data structure may not be fully represented with discrete clustering methods.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The aims of the project were twofold: 1) To investigate classification procedures for remotely sensed digital data, in order to develop modifications to existing algorithms and propose novel classification procedures; and 2) To investigate and develop algorithms for contextual enhancement of classified imagery in order to increase classification accuracy. The following classifiers were examined: box, decision tree, minimum distance, maximum likelihood. In addition to these the following algorithms were developed during the course of the research: deviant distance, look up table and an automated decision tree classifier using expert systems technology. Clustering techniques for unsupervised classification were also investigated. Contextual enhancements investigated were: mode filters, small area replacement and Wharton's CONAN algorithm. Additionally methods for noise and edge based declassification and contextual reclassification, non-probabilitic relaxation and relaxation based on Markov chain theory were developed. The advantages of per-field classifiers and Geographical Information Systems were investigated. The conclusions presented suggest suitable combinations of classifier and contextual enhancement, given user accuracy requirements and time constraints. These were then tested for validity using a different data set. A brief examination of the utility of the recommended contextual algorithms for reducing the effects of data noise was also carried out.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents an effective decision making system for leak detection based on multiple generalized linear models and clustering techniques. The training data for the proposed decision system is obtained by setting up an experimental pipeline fully operational distribution system. The system is also equipped with data logging for three variables; namely, inlet pressure, outlet pressure, and outlet flow. The experimental setup is designed such that multi-operational conditions of the distribution system, including multi pressure and multi flow can be obtained. We then statistically tested and showed that pressure and flow variables can be used as signature of leak under the designed multi-operational conditions. It is then shown that the detection of leakages based on the training and testing of the proposed multi model decision system with pre data clustering, under multi operational conditions produces better recognition rates in comparison to the training based on the single model approach. This decision system is then equipped with the estimation of confidence limits and a method is proposed for using these confidence limits for obtaining more robust leakage recognition results.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

With the recent explosion in the complexity and amount of digital multimedia data, there has been a huge impact on the operations of various organizations in distinct areas, such as government services, education, medical care, business, entertainment, etc. To satisfy the growing demand of multimedia data management systems, an integrated framework called DIMUSE is proposed and deployed for distributed multimedia applications to offer a full scope of multimedia related tools and provide appealing experiences for the users. This research mainly focuses on video database modeling and retrieval by addressing a set of core challenges. First, a comprehensive multimedia database modeling mechanism called Hierarchical Markov Model Mediator (HMMM) is proposed to model high dimensional media data including video objects, low-level visual/audio features, as well as historical access patterns and frequencies. The associated retrieval and ranking algorithms are designed to support not only the general queries, but also the complicated temporal event pattern queries. Second, system training and learning methodologies are incorporated such that user interests are mined efficiently to improve the retrieval performance. Third, video clustering techniques are proposed to continuously increase the searching speed and accuracy by architecting a more efficient multimedia database structure. A distributed video management and retrieval system is designed and implemented to demonstrate the overall performance. The proposed approach is further customized for a mobile-based video retrieval system to solve the perception subjectivity issue by considering individual user's profile. Moreover, to deal with security and privacy issues and concerns in distributed multimedia applications, DIMUSE also incorporates a practical framework called SMARXO, which supports multilevel multimedia security control. SMARXO efficiently combines role-based access control (RBAC), XML and object-relational database management system (ORDBMS) to achieve the target of proficient security control. A distributed multimedia management system named DMMManager (Distributed MultiMedia Manager) is developed with the proposed framework DEMUR; to support multimedia capturing, analysis, retrieval, authoring and presentation in one single framework.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This dissertation establishes a novel data-driven method to identify language network activation patterns in pediatric epilepsy through the use of the Principal Component Analysis (PCA) on functional magnetic resonance imaging (fMRI). A total of 122 subjects’ data sets from five different hospitals were included in the study through a web-based repository site designed here at FIU. Research was conducted to evaluate different classification and clustering techniques in identifying hidden activation patterns and their associations with meaningful clinical variables. The results were assessed through agreement analysis with the conventional methods of lateralization index (LI) and visual rating. What is unique in this approach is the new mechanism designed for projecting language network patterns in the PCA-based decisional space. Synthetic activation maps were randomly generated from real data sets to uniquely establish nonlinear decision functions (NDF) which are then used to classify any new fMRI activation map into typical or atypical. The best nonlinear classifier was obtained on a 4D space with a complexity (nonlinearity) degree of 7. Based on the significant association of language dominance and intensities with the top eigenvectors of the PCA decisional space, a new algorithm was deployed to delineate primary cluster members without intensity normalization. In this case, three distinct activations patterns (groups) were identified (averaged kappa with rating 0.65, with LI 0.76) and were characterized by the regions of: (1) the left inferior frontal Gyrus (IFG) and left superior temporal gyrus (STG), considered typical for the language task; (2) the IFG, left mesial frontal lobe, right cerebellum regions, representing a variant left dominant pattern by higher activation; and (3) the right homologues of the first pattern in Broca's and Wernicke's language areas. Interestingly, group 2 was found to reflect a different language compensation mechanism than reorganization. Its high intensity activation suggests a possible remote effect on the right hemisphere focus on traditionally left-lateralized functions. In retrospect, this data-driven method provides new insights into mechanisms for brain compensation/reorganization and neural plasticity in pediatric epilepsy.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This dissertation establishes a novel data-driven method to identify language network activation patterns in pediatric epilepsy through the use of the Principal Component Analysis (PCA) on functional magnetic resonance imaging (fMRI). A total of 122 subjects’ data sets from five different hospitals were included in the study through a web-based repository site designed here at FIU. Research was conducted to evaluate different classification and clustering techniques in identifying hidden activation patterns and their associations with meaningful clinical variables. The results were assessed through agreement analysis with the conventional methods of lateralization index (LI) and visual rating. What is unique in this approach is the new mechanism designed for projecting language network patterns in the PCA-based decisional space. Synthetic activation maps were randomly generated from real data sets to uniquely establish nonlinear decision functions (NDF) which are then used to classify any new fMRI activation map into typical or atypical. The best nonlinear classifier was obtained on a 4D space with a complexity (nonlinearity) degree of 7. Based on the significant association of language dominance and intensities with the top eigenvectors of the PCA decisional space, a new algorithm was deployed to delineate primary cluster members without intensity normalization. In this case, three distinct activations patterns (groups) were identified (averaged kappa with rating 0.65, with LI 0.76) and were characterized by the regions of: 1) the left inferior frontal Gyrus (IFG) and left superior temporal gyrus (STG), considered typical for the language task; 2) the IFG, left mesial frontal lobe, right cerebellum regions, representing a variant left dominant pattern by higher activation; and 3) the right homologues of the first pattern in Broca's and Wernicke's language areas. Interestingly, group 2 was found to reflect a different language compensation mechanism than reorganization. Its high intensity activation suggests a possible remote effect on the right hemisphere focus on traditionally left-lateralized functions. In retrospect, this data-driven method provides new insights into mechanisms for brain compensation/reorganization and neural plasticity in pediatric epilepsy.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Parking is often underpriced and expanding its capacity is expensive; universities need a better way of reducing congestion outside of building costly parking garages. Demand based pricing mechanisms, such as auctions, offer a possible solution to the problem by promising to reduce parking at peak times. However, faculty, students, and staff at universities have systematically different parking needs, leading to different parking valuations. In this study, I determine the impact university affiliation has on predicting bid values cast in three Dutch Auctions of on-campus parking permits sold at Chapman University in Fall 2010. Using clustering techniques crosschecked with university demographic information to detect affiliation groups, I ran a log-linear regression, finding that university affiliation had a larger effect on bid amount than on lot location and fraction of auction duration. Generally, faculty were predicted to have higher bids whereas students were predicted to have lower bids.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Microsecond long Molecular Dynamics (MD) trajectories of biomolecular processes are now possible due to advances in computer technology. Soon, trajectories long enough to probe dynamics over many milliseconds will become available. Since these timescales match the physiological timescales over which many small proteins fold, all atom MD simulations of protein folding are now becoming popular. To distill features of such large folding trajectories, we must develop methods that can both compress trajectory data to enable visualization, and that can yield themselves to further analysis, such as the finding of collective coordinates and reduction of the dynamics. Conventionally, clustering has been the most popular MD trajectory analysis technique, followed by principal component analysis (PCA). Simple clustering used in MD trajectory analysis suffers from various serious drawbacks, namely, (i) it is not data driven, (ii) it is unstable to noise and change in cutoff parameters, and (iii) since it does not take into account interrelationships amongst data points, the separation of data into clusters can often be artificial. Usually, partitions generated by clustering techniques are validated visually, but such validation is not possible for MD trajectories of protein folding, as the underlying structural transitions are not well understood. Rigorous cluster validation techniques may be adapted, but it is more crucial to reduce the dimensions in which MD trajectories reside, while still preserving their salient features. PCA has often been used for dimension reduction and while it is computationally inexpensive, being a linear method, it does not achieve good data compression. In this thesis, I propose a different method, a nonmetric multidimensional scaling (nMDS) technique, which achieves superior data compression by virtue of being nonlinear, and also provides a clear insight into the structural processes underlying MD trajectories. I illustrate the capabilities of nMDS by analyzing three complete villin headpiece folding and six norleucine mutant (NLE) folding trajectories simulated by Freddolino and Schulten [1]. Using these trajectories, I make comparisons between nMDS, PCA and clustering to demonstrate the superiority of nMDS. The three villin headpiece trajectories showed great structural heterogeneity. Apart from a few trivial features like early formation of secondary structure, no commonalities between trajectories were found. There were no units of residues or atoms found moving in concert across the trajectories. A flipping transition, corresponding to the flipping of helix 1 relative to the plane formed by helices 2 and 3 was observed towards the end of the folding process in all trajectories, when nearly all native contacts had been formed. However, the transition occurred through a different series of steps in all trajectories, indicating that it may not be a common transition in villin folding. The trajectories showed competition between local structure formation/hydrophobic collapse and global structure formation in all trajectories. Our analysis on the NLE trajectories confirms the notion that a tight hydrophobic core inhibits correct 3-D rearrangement. Only one of the six NLE trajectories folded, and it showed no flipping transition. All the other trajectories get trapped in hydrophobically collapsed states. The NLE residues were found to be buried deeply into the core, compared to the corresponding lysines in the villin headpiece, thereby making the core tighter and harder to undo for 3-D rearrangement. Our results suggest that the NLE may not be a fast folder as experiments suggest. The tightness of the hydrophobic core may be a very important factor in the folding of larger proteins. It is likely that chaperones like GroEL act to undo the tight hydrophobic core of proteins, after most secondary structure elements have been formed, so that global rearrangement is easier. I conclude by presenting facts about chaperone-protein complexes and propose further directions for the study of protein folding.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In recent years, Facebook and other social media have become key players in branding activities. However, empirical research on consumer–brand interactions on Facebook is still in its infancy. Therefore, the aim of this research is to provide additional insights to brand managers on how to adapt their approaches to increase consumers’ interactions with brands on Facebook. In this study, we apply the uses and gratification theory proposed by Katz to develop a new typology of consumers based on consumer motivations to interact with brands on Facebook, and explore the type and intensity of these interactions. We identify five main motivations that might influence consumers’ interactions with a brand on Facebook: (i) social influence, (ii) search for information, (iii) entertainment, (iv) trust and (v) reward. Building on these five motivations, a classification using clustering techniques reveals four different groups of consumers: (i) ‘brand detached’, (ii) ‘brand profiteers’, (iii) ‘brand companions’ and (iv) ‘brand reliants’. Our results provide valuable and applicable insights for social media marketing activities, which will assist brand managers to develop strategies for effectively reaching and influencing the most desirable groups of consumers.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Dissertação (mestrado)—Universidade de Brasília, Instituto de Ciências Exatas, Departamento de Ciência da Computação, 2015.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The paper catalogues the procedures and steps involved in agroclimatic classification. These vary from conventional descriptive methods to modern computer-based numerical techniques. There are three mutually independent numerical classification techniques, namely Ordination, Cluster analysis, and Minimum spanning tree; and under each technique there are several forms of grouping techniques existing. The vhoice of numerical classification procedure differs with the type of data set. In the case of numerical continuous data sets with booth positive and negative values, the simple and least controversial procedures are unweighted pair group method (UPGMA) and weighted pair group method (WPGMA) under clustering techniques with similarity measure obtained either from Gower metric or standardized Euclidean metric. Where the number of attributes are large, these could be reduced to fewer new attributes defined by the principal components or coordinates by ordination technique. The first few components or coodinates explain the maximum variance in the data matrix. These revided attributes are less affected by noise in the data set. It is possible to check misclassifications using minimum spanning tree.