15 resultados para Web Mining, Data Mining, User Topic Model, Web User Profiles

em Digital Commons at Florida International University


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This dissertation introduces the design of a multimodal, adaptive real-time assistive system as an alternate human computer interface that can be used by individuals with severe motor disabilities. The proposed design is based on the integration of a remote eye-gaze tracking system, voice recognition software, and a virtual keyboard. The methodology relies on a user profile that customizes eye gaze tracking using neural networks. The user profiling feature facilitates the notion of universal access to computing resources for a wide range of applications such as web browsing, email, word processing and editing. ^ The study is significant in terms of the integration of key algorithms to yield an adaptable and multimodal interface. The contributions of this dissertation stem from the following accomplishments: (a) establishment of the data transport mechanism between the eye-gaze system and the host computer yielding to a significantly low failure rate of 0.9%; (b) accurate translation of eye data into cursor movement through congregate steps which conclude with calibrated cursor coordinates using an improved conversion function; resulting in an average reduction of 70% of the disparity between the point of gaze and the actual position of the mouse cursor, compared with initial findings; (c) use of both a moving average and a trained neural network in order to minimize the jitter of the mouse cursor, which yield an average jittering reduction of 35%; (d) introduction of a new mathematical methodology to measure the degree of jittering of the mouse trajectory; (e) embedding an onscreen keyboard to facilitate text entry, and a graphical interface that is used to generate user profiles for system adaptability. ^ The adaptability nature of the interface is achieved through the establishment of user profiles, which may contain the jittering and voice characteristics of a particular user as well as a customized list of the most commonly used words ordered according to the user's preferences: in alphabetical or statistical order. This allows the system to successfully provide the capability of interacting with a computer. Every time any of the sub-system is retrained, the accuracy of the interface response improves even more. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Personalized recommender systems aim to assist users in retrieving and accessing interesting items by automatically acquiring user preferences from the historical data and matching items with the preferences. In the last decade, recommendation services have gained great attention due to the problem of information overload. However, despite recent advances of personalization techniques, several critical issues in modern recommender systems have not been well studied. These issues include: (1) understanding the accessing patterns of users (i.e., how to effectively model users' accessing behaviors); (2) understanding the relations between users and other objects (i.e., how to comprehensively assess the complex correlations between users and entities in recommender systems); and (3) understanding the interest change of users (i.e., how to adaptively capture users' preference drift over time). To meet the needs of users in modern recommender systems, it is imperative to provide solutions to address the aforementioned issues and apply the solutions to real-world applications. ^ The major goal of this dissertation is to provide integrated recommendation approaches to tackle the challenges of the current generation of recommender systems. In particular, three user-oriented aspects of recommendation techniques were studied, including understanding accessing patterns, understanding complex relations and understanding temporal dynamics. To this end, we made three research contributions. First, we presented various personalized user profiling algorithms to capture click behaviors of users from both coarse- and fine-grained granularities; second, we proposed graph-based recommendation models to describe the complex correlations in a recommender system; third, we studied temporal recommendation approaches in order to capture the preference changes of users, by considering both long-term and short-term user profiles. In addition, a versatile recommendation framework was proposed, in which the proposed recommendation techniques were seamlessly integrated. Different evaluation criteria were implemented in this framework for evaluating recommendation techniques in real-world recommendation applications. ^ In summary, the frequent changes of user interests and item repository lead to a series of user-centric challenges that are not well addressed in the current generation of recommender systems. My work proposed reasonable solutions to these challenges and provided insights on how to address these challenges using a simple yet effective recommendation framework.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The kaon electroproduction reaction H(e, e ′K+)Λ was studied as a function of the four momentum transfer, Q2, for different values of the virtual photon polarization parameter. Electrons and kaons were detected in coincidence in two High Resolution Spectrometers (HRS) at Jefferson Lab. Data were taken at electron beam energies ranging from 3.4006 to 5.7544 GeV. The kaons were identified using combined time of flight information and two Aerogel Čerenkov detectors used for particle identification. For different values of Q2 ranging from 1.90 to 2.35 GeV/c2 the center of mass cross sections for the Λ hyperon were determined for 20 kinematics and the longitudinal, σ L, and transverse, σT, terms were separated using the Rosenbluth separation technique. ^ Comparisons between available models and data have been studied. The comparison supports the t-channel dominance behavior for kaon electroproduction. All models seem to underpredict the transverse cross section. An estimate of the kaon form factor has been explored by determining the sensitivity of the separated cross sections to variations of the kaon EM form factor. From comparison between models and data we can conclude that interpreting the data using the Regge model is quite sensitive to a particular choice for the EM form factors. The data from the E98-108 experiment extends the range of the available kaon electroproduction cross section data to an unexplored region of Q2 where no separations have ever been performed. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In Enterobacteriaceae, the transcriptional regulator AmpR, a member of the LysR family, regulates the expression of a chromosomal β-lactamase AmpC. The regulatory repertoire of AmpR is broader in Pseudomonas aeruginosa, an opportunistic pathogen responsible for numerous acute and chronic infections including cystic fibrosis. Previous studies showed that in addition to regulating ampC, P. aeruginosa AmpR regulates the sigma factor AlgT/U and production of some quorum sensing (QS)-regulated virulence factors. In order to better understand the ampR regulon, the transcriptional profiles generated using DNA microarrays and RNA-Seq of the prototypic P. aeruginosa PAO1 strain with its isogenic ampR deletion mutant, PAOΔampR were analyzed. Transcriptome analysis demonstrates that the AmpR regulon is much more extensive than previously thought influencing the differential expression of over 500 genes. In addition to regulating resistance to β-lactam antibiotics via AmpC, AmpR also regulates non-β-lactam antibiotic resistance by modulating the MexEF-OprN efflux pump. Virulence mechanisms including biofilm formation, QS-regulated acute virulence, and diverse physiological processes such as oxidative stress response, heat-shock response and iron uptake are AmpR-regulated. Real-time PCR and phenotypic assays confirmed the transcriptome data. Further, Caenorhabditis elegans model demonstrates that a functional AmpR is required for full pathogenicity of P. aeruginosa. AmpR, a member of the core genome, also regulates genes in the regions of genome plasticity that are acquired by horizontal gene transfer. The extensive AmpR regulon included other transcriptional regulators and sigma factors, accounting for the extensive AmpR regulon. Gene expression studies demonstrate AmpR-dependent expression of the QS master regulator LasR that controls expression of many virulence factors. Using a chromosomally tagged AmpR, ChIP-Seq studies show direct AmpR binding to the lasR promoter. The data demonstrates that AmpR functions as a global regulator in P. aeruginosa and is a positive regulator of acute virulence while negatively regulating chronic infection phenotypes. In summary, my dissertation sheds light on the complex regulatory circuit in P. aeruginosa to provide a better understanding of the bacterial response to antibiotics and how the organism coordinately regulates a myriad of virulence factors.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In Enterobacteriaceae, the transcriptional regulator AmpR, a member of the LysR family, regulates the expression of a chromosomal β-lactamase AmpC. The regulatory repertoire of AmpR is broader in Pseudomonas aeruginosa, an opportunistic pathogen responsible for numerous acute and chronic infections including cystic fibrosis. Previous studies showed that in addition to regulating ampC, P. aeruginosa AmpR regulates the sigma factor AlgT/U and production of some quorum sensing (QS)-regulated virulence factors. In order to better understand the ampR regulon, the transcriptional profiles generated using DNA microarrays and RNA-Seq of the prototypic P. aeruginosa PAO1 strain with its isogenic ampR deletion mutant, PAO∆ampR were analyzed. Transcriptome analysis demonstrates that the AmpR regulon is much more extensive than previously thought influencing the differential expression of over 500 genes. In addition to regulating resistance to β-lactam antibiotics via AmpC, AmpR also regulates non-β-lactam antibiotic resistance by modulating the MexEF-OprN efflux pump. Virulence mechanisms including biofilm formation, QS-regulated acute virulence, and diverse physiological processes such as oxidative stress response, heat-shock response and iron uptake are AmpR-regulated. Real-time PCR and phenotypic assays confirmed the transcriptome data. Further, Caenorhabditis elegans model demonstrates that a functional AmpR is required for full pathogenicity of P. aeruginosa. AmpR, a member of the core genome, also regulates genes in the regions of genome plasticity that are acquired by horizontal gene transfer. The extensive AmpR regulon included other transcriptional regulators and sigma factors, accounting for the extensive AmpR regulon. Gene expression studies demonstrate AmpR-dependent expression of the QS master regulator LasR that controls expression of many virulence factors. Using a chromosomally tagged AmpR, ChIP-Seq studies show direct AmpR binding to the lasR promoter. The data demonstrates that AmpR functions as a global regulator in P. aeruginosa and is a positive regulator of acute virulence while negatively regulating chronic infection phenotypes. In summary, my dissertation sheds light on the complex regulatory circuit in P. aeruginosa to provide a better understanding of the bacterial response to antibiotics and how the organism coordinately regulates a myriad of virulence factors.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

With the explosive growth of the volume and complexity of document data (e.g., news, blogs, web pages), it has become a necessity to semantically understand documents and deliver meaningful information to users. Areas dealing with these problems are crossing data mining, information retrieval, and machine learning. For example, document clustering and summarization are two fundamental techniques for understanding document data and have attracted much attention in recent years. Given a collection of documents, document clustering aims to partition them into different groups to provide efficient document browsing and navigation mechanisms. One unrevealed area in document clustering is that how to generate meaningful interpretation for the each document cluster resulted from the clustering process. Document summarization is another effective technique for document understanding, which generates a summary by selecting sentences that deliver the major or topic-relevant information in the original documents. How to improve the automatic summarization performance and apply it to newly emerging problems are two valuable research directions. To assist people to capture the semantics of documents effectively and efficiently, the dissertation focuses on developing effective data mining and machine learning algorithms and systems for (1) integrating document clustering and summarization to obtain meaningful document clusters with summarized interpretation, (2) improving document summarization performance and building document understanding systems to solve real-world applications, and (3) summarizing the differences and evolution of multiple document sources.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Because some Web users will be able to design a template to visualize information from scratch, while other users need to automatically visualize information by changing some parameters, providing different levels of customization of the information is a desirable goal. Our system allows the automatic generation of visualizations given the semantics of the data, and the static or pre-specified visualization by creating an interface language. We address information visualization taking into consideration the Web, where the presentation of the retrieved information is a challenge. ^ We provide a model to narrow the gap between the user's way of expressing queries and database manipulation languages (SQL) without changing the system itself thus improving the query specification process. We develop a Web interface model that is integrated with the HTML language to create a powerful language that facilitates the construction of Web-based database reports. ^ As opposed to other papers, this model offers a new way of exploring databases focusing on providing Web connectivity to databases with minimal or no result buffering, formatting, or extra programming. We describe how to easily connect the database to the Web. In addition, we offer an enhanced way on viewing and exploring the contents of a database, allowing users to customize their views depending on the contents and the structure of the data. Current database front-ends typically attempt to display the database objects in a flat view making it difficult for users to grasp the contents and the structure of their result. Our model narrows the gap between databases and the Web. ^ The overall objective of this research is to construct a model that accesses different databases easily across the net and generates SQL, forms, and reports across all platforms without requiring the developer to code a complex application. This increases the speed of development. In addition, using only the Web browsers, the end-user can retrieve data from databases remotely to make necessary modifications and manipulations of data using the Web formatted forms and reports, independent of the platform, without having to open different applications, or learn to use anything but their Web browser. We introduce a strategic method to generate and construct SQL queries, enabling inexperienced users that are not well exposed to the SQL world to build syntactically and semantically a valid SQL query and to understand the retrieved data. The generated SQL query will be validated against the database schema to ensure harmless and efficient SQL execution. (Abstract shortened by UMI.)^

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The rapid growth of the Internet and the advancements of the Web technologies have made it possible for users to have access to large amounts of on-line music data, including music acoustic signals, lyrics, style/mood labels, and user-assigned tags. The progress has made music listening more fun, but has raised an issue of how to organize this data, and more generally, how computer programs can assist users in their music experience. An important subject in computer-aided music listening is music retrieval, i.e., the issue of efficiently helping users in locating the music they are looking for. Traditionally, songs were organized in a hierarchical structure such as genre->artist->album->track, to facilitate the users’ navigation. However, the intentions of the users are often hard to be captured in such a simply organized structure. The users may want to listen to music of a particular mood, style or topic; and/or any songs similar to some given music samples. This motivated us to work on user-centric music retrieval system to improve users’ satisfaction with the system. The traditional music information retrieval research was mainly concerned with classification, clustering, identification, and similarity search of acoustic data of music by way of feature extraction algorithms and machine learning techniques. More recently the music information retrieval research has focused on utilizing other types of data, such as lyrics, user-access patterns, and user-defined tags, and on targeting non-genre categories for classification, such as mood labels and styles. This dissertation focused on investigating and developing effective data mining techniques for (1) organizing and annotating music data with styles, moods and user-assigned tags; (2) performing effective analysis of music data with features from diverse information sources; and (3) recommending music songs to the users utilizing both content features and user access patterns.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Online Social Network (OSN) services provided by Internet companies bring people together to chat, share the information, and enjoy the information. Meanwhile, huge amounts of data are generated by those services (they can be regarded as the social media ) every day, every hour, even every minute, and every second. Currently, researchers are interested in analyzing the OSN data, extracting interesting patterns from it, and applying those patterns to real-world applications. However, due to the large-scale property of the OSN data, it is difficult to effectively analyze it. This dissertation focuses on applying data mining and information retrieval techniques to mine two key components in the social media datausers and user-generated contents. Specifically, it aims at addressing three problems related to the social media users and contents: (1) how does one organize the users and the contents? (2) how does one summarize the textual contents so that users do not have to go over every post to capture the general idea? (3) how does one identify the influential users in the social media to benefit other applications, e.g., Marketing Campaign? The contribution of this dissertation is briefly summarized as follows. (1) It provides a comprehensive and versatile data mining framework to analyze the users and user-generated contents from the social media. (2) It designs a hierarchical co-clustering algorithm to organize the users and contents. (3) It proposes multi-document summarization methods to extract core information from the social network contents. (4) It introduces three important dimensions of social influence, and a dynamic influence model for identifying influential users.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Online Social Network (OSN) services provided by Internet companies bring people together to chat, share the information, and enjoy the information. Meanwhile, huge amounts of data are generated by those services (they can be regarded as the social media ) every day, every hour, even every minute, and every second. Currently, researchers are interested in analyzing the OSN data, extracting interesting patterns from it, and applying those patterns to real-world applications. However, due to the large-scale property of the OSN data, it is difficult to effectively analyze it. This dissertation focuses on applying data mining and information retrieval techniques to mine two key components in the social media datausers and user-generated contents. Specifically, it aims at addressing three problems related to the social media users and contents: (1) how does one organize the users and the contents? (2) how does one summarize the textual contents so that users do not have to go over every post to capture the general idea? (3) how does one identify the influential users in the social media to benefit other applications, e.g., Marketing Campaign? The contribution of this dissertation is briefly summarized as follows. (1) It provides a comprehensive and versatile data mining framework to analyze the users and user-generated contents from the social media. (2) It designs a hierarchical co-clustering algorithm to organize the users and contents. (3) It proposes multi-document summarization methods to extract core information from the social network contents. (4) It introduces three important dimensions of social influence, and a dynamic influence model for identifying influential users.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Methods for accessing data on the Web have been the focus of active research over the past few years. In this thesis we propose a method for representing Web sites as data sources. We designed a Data Extractor data retrieval solution that allows us to define queries to Web sites and process resulting data sets. Data Extractor is being integrated into the MSemODB heterogeneous database management system. With its help database queries can be distributed over both local and Web data sources within MSemODB framework. ^ Data Extractor treats Web sites as data sources, controlling query execution and data retrieval. It works as an intermediary between the applications and the sites. Data Extractor utilizes a twofold “custom wrapper” approach for information retrieval. Wrappers for the majority of sites are easily built using a powerful and expressive scripting language, while complex cases are processed using Java-based wrappers that utilize specially designed library of data retrieval, parsing and Web access routines. In addition to wrapper development we thoroughly investigate issues associated with Web site selection, analysis and processing. ^ Data Extractor is designed to act as a data retrieval server, as well as an embedded data retrieval solution. We also use it to create mobile agents that are shipped over the Internet to the client's computer to perform data retrieval on behalf of the user. This approach allows Data Extractor to distribute and scale well. ^ This study confirms feasibility of building custom wrappers for Web sites. This approach provides accuracy of data retrieval, and power and flexibility in handling of complex cases. ^

Relevância:

70.00% 70.00%

Publicador:

Resumo:

An Automatic Vehicle Location (AVL) system is a computer-based vehicle tracking system that is capable of determining a vehicle's location in real time. As a major technology of the Advanced Public Transportation System (APTS), AVL systems have been widely deployed by transit agencies for purposes such as real-time operation monitoring, computer-aided dispatching, and arrival time prediction. AVL systems make a large amount of transit performance data available that are valuable for transit performance management and planning purposes. However, the difficulties of extracting useful information from the huge spatial-temporal database have hindered off-line applications of the AVL data. ^ In this study, a data mining process, including data integration, cluster analysis, and multiple regression, is proposed. The AVL-generated data are first integrated into a Geographic Information System (GIS) platform. The model-based cluster method is employed to investigate the spatial and temporal patterns of transit travel speeds, which may be easily translated into travel time. The transit speed variations along the route segments are identified. Transit service periods such as morning peak, mid-day, afternoon peak, and evening periods are determined based on analyses of transit travel speed variations for different times of day. The seasonal patterns of transit performance are investigated by using the analysis of variance (ANOVA). Travel speed models based on the clustered time-of-day intervals are developed using important factors identified as having significant effects on speed for different time-of-day periods. ^ It has been found that transit performance varied from different seasons and different time-of-day periods. The geographic location of a transit route segment also plays a role in the variation of the transit performance. The results of this research indicate that advanced data mining techniques have good potential in providing automated techniques of assisting transit agencies in service planning, scheduling, and operations control. ^

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Due to the rapid advances in computing and sensing technologies, enormous amounts of data are being generated everyday in various applications. The integration of data mining and data visualization has been widely used to analyze these massive and complex data sets to discover hidden patterns. For both data mining and visualization to be effective, it is important to include the visualization techniques in the mining process and to generate the discovered patterns for a more comprehensive visual view. In this dissertation, four related problems: dimensionality reduction for visualizing high dimensional datasets, visualization-based clustering evaluation, interactive document mining, and multiple clusterings exploration are studied to explore the integration of data mining and data visualization. In particular, we 1) propose an efficient feature selection method (reliefF + mRMR) for preprocessing high dimensional datasets; 2) present DClusterE to integrate cluster validation with user interaction and provide rich visualization tools for users to examine document clustering results from multiple perspectives; 3) design two interactive document summarization systems to involve users efforts and generate customized summaries from 2D sentence layouts; and 4) propose a new framework which organizes the different input clusterings into a hierarchical tree structure and allows for interactive exploration of multiple clustering solutions.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Methods for accessing data on the Web have been the focus of active research over the past few years. In this thesis we propose a method for representing Web sites as data sources. We designed a Data Extractor data retrieval solution that allows us to define queries to Web sites and process resulting data sets. Data Extractor is being integrated into the MSemODB heterogeneous database management system. With its help database queries can be distributed over both local and Web data sources within MSemODB framework. Data Extractor treats Web sites as data sources, controlling query execution and data retrieval. It works as an intermediary between the applications and the sites. Data Extractor utilizes a two-fold "custom wrapper" approach for information retrieval. Wrappers for the majority of sites are easily built using a powerful and expressive scripting language, while complex cases are processed using Java-based wrappers that utilize specially designed library of data retrieval, parsing and Web access routines. In addition to wrapper development we thoroughly investigate issues associated with Web site selection, analysis and processing. Data Extractor is designed to act as a data retrieval server, as well as an embedded data retrieval solution. We also use it to create mobile agents that are shipped over the Internet to the client's computer to perform data retrieval on behalf of the user. This approach allows Data Extractor to distribute and scale well. This study confirms feasibility of building custom wrappers for Web sites. This approach provides accuracy of data retrieval, and power and flexibility in handling of complex cases.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

With the exponential growth of the usage of web-based map services, the web GIS application has become more and more popular. Spatial data index, search, analysis, visualization and the resource management of such services are becoming increasingly important to deliver user-desired Quality of Service. First, spatial indexing is typically time-consuming and is not available to end-users. To address this, we introduce TerraFly sksOpen, an open-sourced an Online Indexing and Querying System for Big Geospatial Data. Integrated with the TerraFly Geospatial database [1-9], sksOpen is an efficient indexing and query engine for processing Top-k Spatial Boolean Queries. Further, we provide ergonomic visualization of query results on interactive maps to facilitate the user’s data analysis. Second, due to the highly complex and dynamic nature of GIS systems, it is quite challenging for the end users to quickly understand and analyze the spatial data, and to efficiently share their own data and analysis results with others. Built on the TerraFly Geo spatial database, TerraFly GeoCloud is an extra layer running upon the TerraFly map and can efficiently support many different visualization functions and spatial data analysis models. Furthermore, users can create unique URLs to visualize and share the analysis results. TerraFly GeoCloud also enables the MapQL technology to customize map visualization using SQL-like statements [10]. Third, map systems often serve dynamic web workloads and involve multiple CPU and I/O intensive tiers, which make it challenging to meet the response time targets of map requests while using the resources efficiently. Virtualization facilitates the deployment of web map services and improves their resource utilization through encapsulation and consolidation. Autonomic resource management allows resources to be automatically provisioned to a map service and its internal tiers on demand. v-TerraFly are techniques to predict the demand of map workloads online and optimize resource allocations, considering both response time and data freshness as the QoS target. The proposed v-TerraFly system is prototyped on TerraFly, a production web map service, and evaluated using real TerraFly workloads. The results show that v-TerraFly can accurately predict the workload demands: 18.91% more accurate; and efficiently allocate resources to meet the QoS target: improves the QoS by 26.19% and saves resource usages by 20.83% compared to traditional peak load-based resource allocation.