3 resultados para zeta regularization

em Illinois Digital Environment for Access to Learning and Scholarship Repository


Relevância:

10.00% 10.00%

Publicador:

Resumo:

With the dramatic growth of text information, there is an increasing need for powerful text mining systems that can automatically discover useful knowledge from text. Text is generally associated with all kinds of contextual information. Those contexts can be explicit, such as the time and the location where a blog article is written, and the author(s) of a biomedical publication, or implicit, such as the positive or negative sentiment that an author had when she wrote a product review; there may also be complex context such as the social network of the authors. Many applications require analysis of topic patterns over different contexts. For instance, analysis of search logs in the context of the user can reveal how we can improve the quality of a search engine by optimizing the search results according to particular users; analysis of customer reviews in the context of positive and negative sentiments can help the user summarize public opinions about a product; analysis of blogs or scientific publications in the context of a social network can facilitate discovery of more meaningful topical communities. Since context information significantly affects the choices of topics and language made by authors, in general, it is very important to incorporate it into analyzing and mining text data. In general, modeling the context in text, discovering contextual patterns of language units and topics from text, a general task which we refer to as Contextual Text Mining, has widespread applications in text mining. In this thesis, we provide a novel and systematic study of contextual text mining, which is a new paradigm of text mining treating context information as the ``first-class citizen.'' We formally define the problem of contextual text mining and its basic tasks, and propose a general framework for contextual text mining based on generative modeling of text. This conceptual framework provides general guidance on text mining problems with context information and can be instantiated into many real tasks, including the general problem of contextual topic analysis. We formally present a functional framework for contextual topic analysis, with a general contextual topic model and its various versions, which can effectively solve the text mining problems in a lot of real world applications. We further introduce general components of contextual topic analysis, by adding priors to contextual topic models to incorporate prior knowledge, regularizing contextual topic models with dependency structure of context, and postprocessing contextual patterns to extract refined patterns. The refinements on the general contextual topic model naturally lead to a variety of probabilistic models which incorporate different types of context and various assumptions and constraints. These special versions of the contextual topic model are proved effective in a variety of real applications involving topics and explicit contexts, implicit contexts, and complex contexts. We then introduce a postprocessing procedure for contextual patterns, by generating meaningful labels for multinomial context models. This method provides a general way to interpret text mining results for real users. By applying contextual text mining in the ``context'' of other text information management tasks, including ad hoc text retrieval and web search, we further prove the effectiveness of contextual text mining techniques in a quantitative way with large scale datasets. The framework of contextual text mining not only unifies many explorations of text analysis with context information, but also opens up many new possibilities for future research directions in text mining.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This thesis is devoted to the development, synthesis, properties, and applications of nano materials for critical technologies, including three areas: (1) Microbial contamination of drinking water is a serious problem of global significance. About 51% of the waterborne disease outbreaks in the United States can be attributed to contaminated ground water. Development of metal oxide nanoparticles, as viricidal materials is of technological and fundamental scientific importance. Nanoparticles with high surface areas and ultra small particle sizes have dramatically enhanced efficiency and capacity of virus inactivation, which cannot be achieved by their bulk counterparts. A series of metal oxide nanoparticles, such as iron oxide nanoparticles, zinc oxide nanoparticles and iron oxide-silver nanoparticles, coated on fiber substrates was developed in this research for evaluation of their viricidal activity. We also carried out XRD, TEM, SEM, XPS, surface area measurements, and zeta potential of these nanoparticles. MS2 virus inactivation experiments showed that these metal oxide nanoparticle coated fibers were extremely powerful viricidal materials. Results from this research suggest that zinc oxide nanoparticles with diameter of 3.5 nm, showing an isoelectric point (IEP) at 9.0, were well dispersed on fiberglass. These fibers offer an increase in capacity by orders of magnitude over all other materials. Compared to iron oxide nanoparticles, zinc oxide nanoparticles didn’t show an improvement in inactivation kinetics but inactivation capacities did increase by two orders of magnitude to 99.99%. Furthermore, zinc oxide nanoparticles have higher affinity to viruses than the iron oxide nanoparticles in presence of competing ions. The advantages of zinc oxide depend on high surface charge density, small nanoparticle sizes and capabilities of generating reactive oxygen species. The research at its present stage of development appears to offer the best avenue to remove viruses from water. Without additional chemicals and energy input, this system can be implemented by both points of use (POU) and large-scale use water treatment technology, which will have a significant impact on the water purification industry. (2) A new family of aliphatic polyester lubricants has been developed for use in micro-electromechanical systems (MEMS), specifically for hard disk drives that operate at high spindle speeds (>15000rpm). Our program was initiated to address current problems with spin-off of the perfluoroether (PFPE) lubricants. The new polyester lubricant appears to alleviate spin-off problems and at the same time improves the chemical and thermal stability. This new system provides a low cost alternative to PFPE along with improved adhesion to the substrates. In addition, it displays a much lower viscosity, which may be of importance to stiction related problems. The synthetic route is readily scalable in case additional interest emerges in other areas including small motors. (3) The demand for increased signal transmission speed and device density for the next generation of multilevel integrated circuits has placed stringent demands on materials performance. Currently, integration of the ultra low-k materials in dual Damascene processing requires chemical mechanical polishing (CMP) to planarize the copper. Unfortunately, none of the commercially proposed dielectric candidates display the desired mechanical and thermal properties for successful CMP. A new polydiacetylene thermosetting polymer (DEB-TEB), which displays a low dielectric constant (low-k) of 2.7, was recently developed. This novel material appears to offer the only avenue for designing an ultra low k dielectric (1.85k), which can still display the desired modulus (7.7Gpa) and hardness (2.0Gpa) sufficient to withstand the process of CMP. We focused on further characterization of the thermal properties of spin-on poly (DEB-TEB) ultra-thin film. These include the coefficient of thermal expansion (CTE), biaxial thermal stress, and thermal conductivity. Thus the CTE is 2.0*10-5K-1 in the perpendicular direction and 8.0*10-6 K-1 in the planar direction. The low CTE provides a better match to the Si substrate which minimizes interfacial stress and greatly enhances the reliability of the microprocessors. Initial experiments with oxygen plasma etching suggest a high probability of success for achieving vertical profiles.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Visual recognition is a fundamental research topic in computer vision. This dissertation explores datasets, features, learning, and models used for visual recognition. In order to train visual models and evaluate different recognition algorithms, this dissertation develops an approach to collect object image datasets on web pages using an analysis of text around the image and of image appearance. This method exploits established online knowledge resources (Wikipedia pages for text; Flickr and Caltech data sets for images). The resources provide rich text and object appearance information. This dissertation describes results on two datasets. The first is Berg’s collection of 10 animal categories; on this dataset, we significantly outperform previous approaches. On an additional set of 5 categories, experimental results show the effectiveness of the method. Images are represented as features for visual recognition. This dissertation introduces a text-based image feature and demonstrates that it consistently improves performance on hard object classification problems. The feature is built using an auxiliary dataset of images annotated with tags, downloaded from the Internet. Image tags are noisy. The method obtains the text features of an unannotated image from the tags of its k-nearest neighbors in this auxiliary collection. A visual classifier presented with an object viewed under novel circumstances (say, a new viewing direction) must rely on its visual examples. This text feature may not change, because the auxiliary dataset likely contains a similar picture. While the tags associated with images are noisy, they are more stable when appearance changes. The performance of this feature is tested using PASCAL VOC 2006 and 2007 datasets. This feature performs well; it consistently improves the performance of visual object classifiers, and is particularly effective when the training dataset is small. With more and more collected training data, computational cost becomes a bottleneck, especially when training sophisticated classifiers such as kernelized SVM. This dissertation proposes a fast training algorithm called Stochastic Intersection Kernel Machine (SIKMA). This proposed training method will be useful for many vision problems, as it can produce a kernel classifier that is more accurate than a linear classifier, and can be trained on tens of thousands of examples in two minutes. It processes training examples one by one in a sequence, so memory cost is no longer the bottleneck to process large scale datasets. This dissertation applies this approach to train classifiers of Flickr groups with many group training examples. The resulting Flickr group prediction scores can be used to measure image similarity between two images. Experimental results on the Corel dataset and a PASCAL VOC dataset show the learned Flickr features perform better on image matching, retrieval, and classification than conventional visual features. Visual models are usually trained to best separate positive and negative training examples. However, when recognizing a large number of object categories, there may not be enough training examples for most objects, due to the intrinsic long-tailed distribution of objects in the real world. This dissertation proposes an approach to use comparative object similarity. The key insight is that, given a set of object categories which are similar and a set of categories which are dissimilar, a good object model should respond more strongly to examples from similar categories than to examples from dissimilar categories. This dissertation develops a regularized kernel machine algorithm to use this category dependent similarity regularization. Experiments on hundreds of categories show that our method can make significant improvement for categories with few or even no positive examples.