840 resultados para Data Mining, Clustering, PSA, Pavement Deflection


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Core Vector Machine(CVM) is suitable for efficient large-scale pattern classification. In this paper, a method for improving the performance of CVM with Gaussian kernel function irrespective of the orderings of patterns belonging to different classes within the data set is proposed. This method employs a selective sampling based training of CVM using a novel kernel based scalable hierarchical clustering algorithm. Empirical studies made on synthetic and real world data sets show that the proposed strategy performs well on large data sets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Over the past decade, many powerful data mining techniques have been developed to analyze temporal and sequential data. The time is now fertile for addressing problems of larger scope under the purview of temporal data mining. The fourth SIGKDD workshop on temporal data mining focused on the question: What can we infer about the structure of a complex dynamical system from observed temporal data? The goals of the workshop were to critically evaluate the need in this area by bringing together leading researchers from industry and academia, and to identify promising technologies and methodologies for doing the same. We provide a brief summary of the workshop proceedings and ideas arising out of the discussions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we present a methodology for identifying best features from a large feature space. In high dimensional feature space nearest neighbor search is meaningless. In this feature space we see quality and performance issue with nearest neighbor search. Many data mining algorithms use nearest neighbor search. So instead of doing nearest neighbor search using all the features we need to select relevant features. We propose feature selection using Non-negative Matrix Factorization(NMF) and its application to nearest neighbor search. Recent clustering algorithm based on Locally Consistent Concept Factorization(LCCF) shows better quality of document clustering by using local geometrical and discriminating structure of the data. By using our feature selection method we have shown further improvement of performance in the clustering.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Clustering has been the most popular method for data exploration. Clustering is partitioning the data set into sub-partitions based on some measures say the distance measure, each partition has its own significant information. There are a number of algorithms explored for this purpose, one such algorithm is the Particle Swarm Optimization(PSO) which is a population based heuristic search technique derived from swarm intelligence. In this paper we present an improved version of the Particle Swarm Optimization where, each feature of the data set is given significance accordingly by adding some random weights, which also minimizes the distortions in the dataset if any. The performance of the above proposed algorithm is evaluated using some benchmark datasets from Machine Learning Repository. The experimental results shows that our proposed methodology performs significantly better than the previously performed experiments.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The problem of classification of time series data is an interesting problem in the field of data mining. Even though several algorithms have been proposed for the problem of time series classification we have developed an innovative algorithm which is computationally fast and accurate in several cases when compared with 1NN classifier. In our method we are calculating the fuzzy membership of each test pattern to be classified to each class. We have experimented with 6 benchmark datasets and compared our method with 1NN classifier.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Self-organizing maps (SOM) have been recognized as a powerful tool in data exploratoration, especially for the tasks of clustering on high dimensional data. However, clustering on categorical data is still a challenge for SOM. This paper aims to extend standard SOM to handle feature values of categorical type. A batch SOM algorithm (NCSOM) is presented concerning the dissimilarity measure and update method of map evolution for both numeric and categorical features simultaneously.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Compared with structured data sources that are usually stored and analyzed in spreadsheets, relational databases, and single data tables, unstructured construction data sources such as text documents, site images, web pages, and project schedules have been less intensively studied due to additional challenges in data preparation, representation, and analysis. In this paper, our vision for data management and mining addressing such challenges are presented, together with related research results from previous work, as well as our recent developments of data mining on text-based, web-based, image-based, and network-based construction databases.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

National Key Basic Research and Development Program of China [2006CB701305]; State Key Laboratory of Resource and Environment Information System [088RA400SA]; Chinese Academy of Sciences

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Clare, A., Williams, H. E. and Lester, N. M. (2004) Scalable Multi-Relational Association Mining. In proceedings of the 4th International Conference on Data Mining ICDM '04.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ferr?, S. and King, R. D. (2004) A dichotomic search algorithm for mining and learning in domain-specific logics. Fundamenta Informaticae. IOS Press. To appear

Relevância:

100.00% 100.00%

Publicador:

Resumo:

R. Jensen, Q. Shen, Data Reduction with Rough Sets, In: Encyclopedia of Data Warehousing and Mining - 2nd Edition, Vol. II, 2008.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The aim of this research, which focused on the Irish adult population, was to generate information for policymakers by applying statistical analyses and current technologies to oral health administrative and survey databases. Objectives included identifying socio-demographic influences on oral health and utilisation of dental services, comparing epidemiologically-estimated dental treatment need with treatment provided, and investigating the potential of a dental administrative database to provide information on utilisation of services and the volume and types of treatment provided over time. Information was extracted from the claims databases for the Dental Treatment Benefit Scheme (DTBS) for employed adults and the Dental Treatment Services Scheme (DTSS) for less-well-off adults, the National Surveys of Adult Oral Health, and the 2007 Survey of Lifestyle Attitudes and Nutrition in Ireland. Factors associated with utilisation and retention of natural teeth were analysed using count data models and logistic regression. The chi-square test and the student’s t-test were used to compare epidemiologically-estimated need in a representative sample of adults with treatment provided. Differences were found in dental care utilisation and tooth retention by Socio-Economic Status. An analysis of the five-year utilisation behaviour of a 2003 cohort of DTBS dental attendees revealed that age and being female were positively associated with visiting annually and number of treatments. Number of adults using the DTBS increased, and mean number of treatments per patient decreased, between 1997 and 2008. As a percentage of overall treatments, restorations, dentures, and extractions decreased, while prophylaxis increased. Differences were found between epidemiologically-estimated treatment need and treatment provided for those using the DTBS and DTSS. This research confirms the utility of survey and administrative data to generate knowledge for policymakers. Public administrative databases have not been designed for research purposes, but they have the potential to provide a wealth of knowledge on treatments provided and utilisation patterns.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: The inherent complexity of statistical methods and clinical phenomena compel researchers with diverse domains of expertise to work in interdisciplinary teams, where none of them have a complete knowledge in their counterpart's field. As a result, knowledge exchange may often be characterized by miscommunication leading to misinterpretation, ultimately resulting in errors in research and even clinical practice. Though communication has a central role in interdisciplinary collaboration and since miscommunication can have a negative impact on research processes, to the best of our knowledge, no study has yet explored how data analysis specialists and clinical researchers communicate over time. METHODS/PRINCIPAL FINDINGS: We conducted qualitative analysis of encounters between clinical researchers and data analysis specialists (epidemiologist, clinical epidemiologist, and data mining specialist). These encounters were recorded and systematically analyzed using a grounded theory methodology for extraction of emerging themes, followed by data triangulation and analysis of negative cases for validation. A policy analysis was then performed using a system dynamics methodology looking for potential interventions to improve this process. Four major emerging themes were found. Definitions using lay language were frequently employed as a way to bridge the language gap between the specialties. Thought experiments presented a series of "what if" situations that helped clarify how the method or information from the other field would behave, if exposed to alternative situations, ultimately aiding in explaining their main objective. Metaphors and analogies were used to translate concepts across fields, from the unfamiliar to the familiar. Prolepsis was used to anticipate study outcomes, thus helping specialists understand the current context based on an understanding of their final goal. CONCLUSION/SIGNIFICANCE: The communication between clinical researchers and data analysis specialists presents multiple challenges that can lead to errors.