5 resultados para Clustering a large document collection

em Digital Commons at Florida International University


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Due to the rapid advances in computing and sensing technologies, enormous amounts of data are being generated everyday in various applications. The integration of data mining and data visualization has been widely used to analyze these massive and complex data sets to discover hidden patterns. For both data mining and visualization to be effective, it is important to include the visualization techniques in the mining process and to generate the discovered patterns for a more comprehensive visual view. In this dissertation, four related problems: dimensionality reduction for visualizing high dimensional datasets, visualization-based clustering evaluation, interactive document mining, and multiple clusterings exploration are studied to explore the integration of data mining and data visualization. In particular, we 1) propose an efficient feature selection method (reliefF + mRMR) for preprocessing high dimensional datasets; 2) present DClusterE to integrate cluster validation with user interaction and provide rich visualization tools for users to examine document clustering results from multiple perspectives; 3) design two interactive document summarization systems to involve users efforts and generate customized summaries from 2D sentence layouts; and 4) propose a new framework which organizes the different input clusterings into a hierarchical tree structure and allows for interactive exploration of multiple clustering solutions.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

With the explosive growth of the volume and complexity of document data (e.g., news, blogs, web pages), it has become a necessity to semantically understand documents and deliver meaningful information to users. Areas dealing with these problems are crossing data mining, information retrieval, and machine learning. For example, document clustering and summarization are two fundamental techniques for understanding document data and have attracted much attention in recent years. Given a collection of documents, document clustering aims to partition them into different groups to provide efficient document browsing and navigation mechanisms. One unrevealed area in document clustering is that how to generate meaningful interpretation for the each document cluster resulted from the clustering process. Document summarization is another effective technique for document understanding, which generates a summary by selecting sentences that deliver the major or topic-relevant information in the original documents. How to improve the automatic summarization performance and apply it to newly emerging problems are two valuable research directions. To assist people to capture the semantics of documents effectively and efficiently, the dissertation focuses on developing effective data mining and machine learning algorithms and systems for (1) integrating document clustering and summarization to obtain meaningful document clusters with summarized interpretation, (2) improving document summarization performance and building document understanding systems to solve real-world applications, and (3) summarizing the differences and evolution of multiple document sources.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Charles Edward Perry (Chuck), 1937-1999, was the founding president of Florida International University in Miami, Florida. He grew up in Logan County, West Virginia and received his bachelor's and masters's degrees from Bowling Green State University. He married Betty Laird in 1960. In 1969, at the age of 32, Perry was the youngest president of any university in the nation. The name of the university reflects Perry’s desire for a title that would not limit the scope of the institution and would support his vision of having close ties to Latin America. Perry and a founding corps opened FIU to 5,667 students in 1972 with only one large building housing six different schools. Perry left the office of President of FIU in 1976 when the student body had grown to 10,000 students and the university had six buildings, offered 134 different degrees and was fully accredited. Charles Perry died on August 30, 1999 at his home in Rockwall, Texas. He is buried on the FIU campus in front of the Graham Center entrance.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Charles Edward Perry (Chuck), 1937-1999, was the founding president of Florida International University in Miami, Florida. He grew up in Logan County, West Virginia and graduated from Bowling Green State University. He married Betty Laird in 1960. In 1969, at the age of 32, Perry was the youngest president of any university in the nation. The name of the university reflects Perry’s desire for a title that would not limit the scope of the institution and would support his vision of having close ties to Latin America. Perry and a founding corps opened FIU to 5,667 students in 1972 with only one large building housing six different schools. Perry left the office of President of FIU in 1976 when the student body had grown to 10,000 students and the university had six buildings, offered 134 different degrees and was fully accredited. Charles Perry died on August 30, 1999 at his home in Rockwall, Texas. He is buried on the FIU campus in front of the Graham Center entrance.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Online Social Network (OSN) services provided by Internet companies bring people together to chat, share the information, and enjoy the information. Meanwhile, huge amounts of data are generated by those services (they can be regarded as the social media ) every day, every hour, even every minute, and every second. Currently, researchers are interested in analyzing the OSN data, extracting interesting patterns from it, and applying those patterns to real-world applications. However, due to the large-scale property of the OSN data, it is difficult to effectively analyze it. This dissertation focuses on applying data mining and information retrieval techniques to mine two key components in the social media data — users and user-generated contents. Specifically, it aims at addressing three problems related to the social media users and contents: (1) how does one organize the users and the contents? (2) how does one summarize the textual contents so that users do not have to go over every post to capture the general idea? (3) how does one identify the influential users in the social media to benefit other applications, e.g., Marketing Campaign? The contribution of this dissertation is briefly summarized as follows. (1) It provides a comprehensive and versatile data mining framework to analyze the users and user-generated contents from the social media. (2) It designs a hierarchical co-clustering algorithm to organize the users and contents. (3) It proposes multi-document summarization methods to extract core information from the social network contents. (4) It introduces three important dimensions of social influence, and a dynamic influence model for identifying influential users.