20 resultados para Pattern Analysis Statistical Modeling and Computational Learning (PASCAL)
Resumo:
In the last decade, large numbers of social media services have emerged and been widely used in people's daily life as important information sharing and acquisition tools. With a substantial amount of user-contributed text data on social media, it becomes a necessity to develop methods and tools for text analysis for this emerging data, in order to better utilize it to deliver meaningful information to users. ^ Previous work on text analytics in last several decades is mainly focused on traditional types of text like emails, news and academic literatures, and several critical issues to text data on social media have not been well explored: 1) how to detect sentiment from text on social media; 2) how to make use of social media's real-time nature; 3) how to address information overload for flexible information needs. ^ In this dissertation, we focus on these three problems. First, to detect sentiment of text on social media, we propose a non-negative matrix tri-factorization (tri-NMF) based dual active supervision method to minimize human labeling efforts for the new type of data. Second, to make use of social media's real-time nature, we propose approaches to detect events from text streams on social media. Third, to address information overload for flexible information needs, we propose two summarization framework, dominating set based summarization framework and learning-to-rank based summarization framework. The dominating set based summarization framework can be applied for different types of summarization problems, while the learning-to-rank based summarization framework helps utilize the existing training data to guild the new summarization tasks. In addition, we integrate these techneques in an application study of event summarization for sports games as an example of how to better utilize social media data. ^
Resumo:
Today, over 15,000 Ion Mobility Spectrometry (IMS) analyzers are employed at worldwide security checkpoints to detect explosives and illicit drugs. Current portal IMS instruments and other electronic nose technologies detect explosives and drugs by analyzing samples containing the headspace air and loose particles residing on a surface. Canines can outperform these systems at sampling and detecting the low vapor pressure explosives and drugs, such as RDX, PETN, cocaine, and MDMA, because these biological detectors target the volatile signature compounds available in the headspace rather than the non-volatile parent compounds of explosives and drugs. In this dissertation research volatile signature compounds available in the headspace over explosive and drug samples were detected using SPME as a headspace sampling tool coupled to an IMS analyzer. A Genetic Algorithm (GA) technique was developed to optimize the operating conditions of a commercial IMS (GE Itemizer 2), leading to the successful detection of plastic explosives (Detasheet, Semtex H, and C-4) and illicit drugs (cocaine, MDMA, and marijuana). Short sampling times (between 10 sec to 5 min) were adequate to extract and preconcentrate sufficient analytes (> 20 ng) representing the volatile signatures in the headspace of a 15 mL glass vial or a quart-sized can containing ≤ 1 g of the bulk explosive or drug. Furthermore, a research grade IMS with flexibility for changing operating conditions and physical configurations was designed and fabricated to accommodate future research into different analytes or physical configurations. The design and construction of the FIU-IMS were facilitated by computer modeling and simulation of ion’s behavior within an IMS. The simulation method developed uses SIMION/SDS and was evaluated with experimental data collected using a commercial IMS (PCP Phemto Chem 110). The FIU-IMS instrument has comparable performance to the GE Itemizer 2 (average resolving power of 14, resolution of 3 between two drugs and two explosives, and LODs range from 0.7 to 9 ng). The results from this dissertation further advance the concept of targeting volatile components to presumptively detect the presence of concealed bulk explosives and drugs by SPME-IMS, and the new FIU-IMS provides a flexible platform for future IMS research projects.
Resumo:
In the last decade, large numbers of social media services have emerged and been widely used in people's daily life as important information sharing and acquisition tools. With a substantial amount of user-contributed text data on social media, it becomes a necessity to develop methods and tools for text analysis for this emerging data, in order to better utilize it to deliver meaningful information to users. Previous work on text analytics in last several decades is mainly focused on traditional types of text like emails, news and academic literatures, and several critical issues to text data on social media have not been well explored: 1) how to detect sentiment from text on social media; 2) how to make use of social media's real-time nature; 3) how to address information overload for flexible information needs. In this dissertation, we focus on these three problems. First, to detect sentiment of text on social media, we propose a non-negative matrix tri-factorization (tri-NMF) based dual active supervision method to minimize human labeling efforts for the new type of data. Second, to make use of social media's real-time nature, we propose approaches to detect events from text streams on social media. Third, to address information overload for flexible information needs, we propose two summarization framework, dominating set based summarization framework and learning-to-rank based summarization framework. The dominating set based summarization framework can be applied for different types of summarization problems, while the learning-to-rank based summarization framework helps utilize the existing training data to guild the new summarization tasks. In addition, we integrate these techneques in an application study of event summarization for sports games as an example of how to better utilize social media data.
Resumo:
Chloroperoxidase (CPO) is the most versatile heme-containing enzyme that catalyzes a broad spectrum of reactions. The remarkable feature of this enzyme is the high regio- and enantio-selectivity exhibited in CPO-catalyzed oxidation reactions. The aim of this dissertation is to elucidate the structural basis for regio- and enantio-selective transformations and investigate the application of CPO in biodegradation of synthetic dyes. To unravel the mechanism of CPO-catalyzed regioselective oxidation of indole, the dissertation explored the structure of CPO-indole complex using paramagnetic relaxation and molecular modeling. The distances between the protons of indole and the heme iron revealed that the pyrrole ring of indole is oriented toward the heme with its 2-H pointing directly at the heme iron. This provides the first experimental and theoretical explanation for the "unexpected" regioselectivity of CPO-catalyzed indole oxidation. Furthermore, the residues including Leu 70, Phe 103, Ile 179, Val 182, Glu 183, and Phe 186 were found essential to the substrate binding to CPO. These results will serve as a lighthouse in guiding the design of CPO mutants with tailor-made activities for biotechnological applications. To understand the origin of the enantioselectivity of CPO-catalyzed oxidation reactions, the interactions of CPO with substrates such as 2-(methylthio)thiophene were investigated by nuclear magnetic resonance spectroscopy (NMR) and computational techniques. In particular, the enantioselectivity is partly explained by the binding orientation of substrates. In third facet of this dissertation, a green and efficient system for degradation of synthetic dyes was developed. Several commercial dyes such as orange G were tested in the CPO-H2O2-Cl- system, where degradation of these dyes was found very efficient. The presence of halide ions and acidic pH were found necessary to the decomposition of dyes. Significantly, the results revealed that this degradation of azo dyes involves a ferric hypochlorite intermediate of CPO (Fe-OCl), compound X.
Resumo:
Concurrent software executes multiple threads or processes to achieve high performance. However, concurrency results in a huge number of different system behaviors that are difficult to test and verify. The aim of this dissertation is to develop new methods and tools for modeling and analyzing concurrent software systems at design and code levels. This dissertation consists of several related results. First, a formal model of Mondex, an electronic purse system, is built using Petri nets from user requirements, which is formally verified using model checking. Second, Petri nets models are automatically mined from the event traces generated from scientific workflows. Third, partial order models are automatically extracted from some instrumented concurrent program execution, and potential atomicity violation bugs are automatically verified based on the partial order models using model checking. Our formal specification and verification of Mondex have contributed to the world wide effort in developing a verified software repository. Our method to mine Petri net models automatically from provenance offers a new approach to build scientific workflows. Our dynamic prediction tool, named McPatom, can predict several known bugs in real world systems including one that evades several other existing tools. McPatom is efficient and scalable as it takes advantage of the nature of atomicity violations and considers only a pair of threads and accesses to a single shared variable at one time. However, predictive tools need to consider the tradeoffs between precision and coverage. Based on McPatom, this dissertation presents two methods for improving the coverage and precision of atomicity violation predictions: 1) a post-prediction analysis method to increase coverage while ensuring precision; 2) a follow-up replaying method to further increase coverage. Both methods are implemented in a completely automatic tool.