5 resultados para Efficient dominating set

em Digital Commons at Florida International University


Relevância:

80.00% 80.00%

Publicador:

Resumo:

In the last decade, large numbers of social media services have emerged and been widely used in people's daily life as important information sharing and acquisition tools. With a substantial amount of user-contributed text data on social media, it becomes a necessity to develop methods and tools for text analysis for this emerging data, in order to better utilize it to deliver meaningful information to users. ^ Previous work on text analytics in last several decades is mainly focused on traditional types of text like emails, news and academic literatures, and several critical issues to text data on social media have not been well explored: 1) how to detect sentiment from text on social media; 2) how to make use of social media's real-time nature; 3) how to address information overload for flexible information needs. ^ In this dissertation, we focus on these three problems. First, to detect sentiment of text on social media, we propose a non-negative matrix tri-factorization (tri-NMF) based dual active supervision method to minimize human labeling efforts for the new type of data. Second, to make use of social media's real-time nature, we propose approaches to detect events from text streams on social media. Third, to address information overload for flexible information needs, we propose two summarization framework, dominating set based summarization framework and learning-to-rank based summarization framework. The dominating set based summarization framework can be applied for different types of summarization problems, while the learning-to-rank based summarization framework helps utilize the existing training data to guild the new summarization tasks. In addition, we integrate these techneques in an application study of event summarization for sports games as an example of how to better utilize social media data. ^

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In the last decade, large numbers of social media services have emerged and been widely used in people's daily life as important information sharing and acquisition tools. With a substantial amount of user-contributed text data on social media, it becomes a necessity to develop methods and tools for text analysis for this emerging data, in order to better utilize it to deliver meaningful information to users. Previous work on text analytics in last several decades is mainly focused on traditional types of text like emails, news and academic literatures, and several critical issues to text data on social media have not been well explored: 1) how to detect sentiment from text on social media; 2) how to make use of social media's real-time nature; 3) how to address information overload for flexible information needs. In this dissertation, we focus on these three problems. First, to detect sentiment of text on social media, we propose a non-negative matrix tri-factorization (tri-NMF) based dual active supervision method to minimize human labeling efforts for the new type of data. Second, to make use of social media's real-time nature, we propose approaches to detect events from text streams on social media. Third, to address information overload for flexible information needs, we propose two summarization framework, dominating set based summarization framework and learning-to-rank based summarization framework. The dominating set based summarization framework can be applied for different types of summarization problems, while the learning-to-rank based summarization framework helps utilize the existing training data to guild the new summarization tasks. In addition, we integrate these techneques in an application study of event summarization for sports games as an example of how to better utilize social media data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Large read-only or read-write transactions with a large read set and a small write set constitute an important class of transactions used in such applications as data mining, data warehousing, statistical applications, and report generators. Such transactions are best supported with optimistic concurrency, because locking of large amounts of data for extended periods of time is not an acceptable solution. The abort rate in regular optimistic concurrency algorithms increases exponentially with the size of the transaction. The algorithm proposed in this dissertation solves this problem by using a new transaction scheduling technique that allows a large transaction to commit safely with significantly greater probability that can exceed several orders of magnitude versus regular optimistic concurrency algorithms. A performance simulation study and a formal proof of serializability and external consistency of the proposed algorithm are also presented.^ This dissertation also proposes a new query optimization technique (lazy queries). Lazy Queries is an adaptive query execution scheme which optimizes itself as the query runs. Lazy queries can be used to find an intersection of sub-queries in a very efficient way, which does not require full execution of large sub-queries nor does it require any statistical knowledge about the data.^ An efficient optimistic concurrency control algorithm used in a massively parallel B-tree with variable-length keys is introduced. B-trees with variable-length keys can be effectively used in a variety of database types. In particular, we show how such a B-tree was used in our implementation of a semantic object-oriented DBMS. The concurrency control algorithm uses semantically safe optimistic virtual "locks" that achieve very fine granularity in conflict detection. This algorithm ensures serializability and external consistency by using logical clocks and backward validation of transactional queries. A formal proof of correctness of the proposed algorithm is also presented. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Cytochrome P450 monooxygenases, one of the most important classes of heme-thiolate proteins, have attracted considerable interest in the biochemical community because of its catalytic versatility, substrate diversity and great number in the superfamily. Although P450s are capable of catalyzing numerous difficult oxidation reactions, the relatively low stability, low turnover rates and the need of electron-donating cofactors have limited their practical biotechnological and pharmaceutical applications as isolated enzymes. The goal of this study is to tailor such heme-thiolate proteins into efficient biocatalysts with high specificity and selectivity by protein engineering and to better understand the structure-function relationship in cytochromes P450. In the effort to engineer P450cam, the prototype member of the P450 superfamily, into an efficient peroxygenase that utilizes hydrogen peroxide via the “peroxide-shunt” pathway, site-directed mutagenesis has been used to elucidate the critical roles of hydrophobic residues in the active site. Various biophysical, biochemical and spectroscopic techniques have been utilized to investigate the wild-type and mutant proteins. Three important P450cam variants were obtained showing distinct structural and functional features. In P450camV247H mutant, which exhibited almost identical spectral properties with the wild-type, it is demonstrated that a single amino acid switch turned the monooxygenase into an efficient preoxidase by increasing the peroxidase activity nearly one thousand folds. In order to tune the distal pocket of P450cam with polar residues, Leu 246 was replaced with a basic residue, lysine, resulting in a mutant with spectral features identical to P420, the inactive species of P450. But this inactive-species-like mutant showed catalytic activities without the facilitation of any cofactors. By substituting Gly 248 with a histidine, a novel Cys-Fe-His ligation set was obtained in P450cam which represented the very rare case of His ligation in heme-thiolate proteins. In addition to serving as a convenient model for hemoprotein structural studies, the G248H mutant also provided evidence about the nature of the axial ligand in cytochrome P420 and other engineered hemoproteins with thiolate ligations. Furthermore, attempts have been made to replace the proximal ligand in sperm whale myoglobin to construct a heme-thiolate protein model by mimicking the protein environment of cytochrome P450cam and chloroperoxidase.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Cytochrome P450 monooxygenases, one of the most important classes of heme-thiolate proteins, have attracted considerable interest in the biochemical community because of its catalytic versatility, substrate diversity and great number in the superfamily. Although P450s are capable of catalyzing numerous difficult oxidation reactions, the relatively low stability, low turnover rates and the need of electron-donating cofactors have limited their practical biotechnological and pharmaceutical applications as isolated enzymes. The goal of this study is to tailor such heme-thiolate proteins into efficient biocatalysts with high specificity and selectivity by protein engineering and to better understand the structure-function relationship in cytochromes P450. In the effort to engineer P450cam, the prototype member of the P450 superfamily, into an efficient peroxygenase that utilizes hydrogen peroxide via the “peroxide-shunt” pathway, site-directed mutagenesis has been used to elucidate the critical roles of hydrophobic residues in the active site. Various biophysical, biochemical and spectroscopic techniques have been utilized to investigate the wild-type and mutant proteins. Three important P450cam variants were obtained showing distinct structural and functional features. In P450camV247H mutant, which exhibited almost identical spectral properties with the wild-type, it is demonstrated that a single amino acid switch turned the monooxygenase into an efficient preoxidase by increasing the peroxidase activity nearly one thousand folds. In order to tune the distal pocket of P450cam with polar residues, Leu 246 was replaced with a basic residue, lysine, resulting in a mutant with spectral features identical to P420, the inactive species of P450. But this inactive-species-like mutant showed catalytic activities without the facilitation of any cofactors. By substituting Gly 248 with a histidine, a novel Cys-Fe-His ligation set was obtained in P450cam which represented the very rare case of His ligation in heme-thiolate proteins. In addition to serving as a convenient model for hemoprotein structural studies, the G248H mutant also provided evidence about the nature of the axial ligand in cytochrome P420 and other engineered hemoproteins with thiolate ligations. Furthermore, attempts have been made to replace the proximal ligand in sperm whale myoglobin to construct a heme-thiolate protein model by mimicking the protein environment of cytochrome P450cam and chloroperoxidase.