2 resultados para Real-world problem
em Illinois Digital Environment for Access to Learning and Scholarship Repository
Resumo:
With the dramatic growth of text information, there is an increasing need for powerful text mining systems that can automatically discover useful knowledge from text. Text is generally associated with all kinds of contextual information. Those contexts can be explicit, such as the time and the location where a blog article is written, and the author(s) of a biomedical publication, or implicit, such as the positive or negative sentiment that an author had when she wrote a product review; there may also be complex context such as the social network of the authors. Many applications require analysis of topic patterns over different contexts. For instance, analysis of search logs in the context of the user can reveal how we can improve the quality of a search engine by optimizing the search results according to particular users; analysis of customer reviews in the context of positive and negative sentiments can help the user summarize public opinions about a product; analysis of blogs or scientific publications in the context of a social network can facilitate discovery of more meaningful topical communities. Since context information significantly affects the choices of topics and language made by authors, in general, it is very important to incorporate it into analyzing and mining text data. In general, modeling the context in text, discovering contextual patterns of language units and topics from text, a general task which we refer to as Contextual Text Mining, has widespread applications in text mining. In this thesis, we provide a novel and systematic study of contextual text mining, which is a new paradigm of text mining treating context information as the ``first-class citizen.'' We formally define the problem of contextual text mining and its basic tasks, and propose a general framework for contextual text mining based on generative modeling of text. This conceptual framework provides general guidance on text mining problems with context information and can be instantiated into many real tasks, including the general problem of contextual topic analysis. We formally present a functional framework for contextual topic analysis, with a general contextual topic model and its various versions, which can effectively solve the text mining problems in a lot of real world applications. We further introduce general components of contextual topic analysis, by adding priors to contextual topic models to incorporate prior knowledge, regularizing contextual topic models with dependency structure of context, and postprocessing contextual patterns to extract refined patterns. The refinements on the general contextual topic model naturally lead to a variety of probabilistic models which incorporate different types of context and various assumptions and constraints. These special versions of the contextual topic model are proved effective in a variety of real applications involving topics and explicit contexts, implicit contexts, and complex contexts. We then introduce a postprocessing procedure for contextual patterns, by generating meaningful labels for multinomial context models. This method provides a general way to interpret text mining results for real users. By applying contextual text mining in the ``context'' of other text information management tasks, including ad hoc text retrieval and web search, we further prove the effectiveness of contextual text mining techniques in a quantitative way with large scale datasets. The framework of contextual text mining not only unifies many explorations of text analysis with context information, but also opens up many new possibilities for future research directions in text mining.
Resumo:
The U.S. railroad companies spend billions of dollars every year on railroad track maintenance in order to ensure safety and operational efficiency of their railroad networks. Besides maintenance costs, other costs such as train accident costs, train and shipment delay costs and rolling stock maintenance costs are also closely related to track maintenance activities. Optimizing the track maintenance process on the extensive railroad networks is a very complex problem with major cost implications. Currently, the decision making process for track maintenance planning is largely manual and primarily relies on the knowledge and judgment of experts. There is considerable potential to improve the process by using operations research techniques to develop solutions to the optimization problems on track maintenance. In this dissertation study, we propose a range of mathematical models and solution algorithms for three network-level scheduling problems on track maintenance: track inspection scheduling problem (TISP), production team scheduling problem (PTSP) and job-to-project clustering problem (JTPCP). TISP involves a set of inspection teams which travel over the railroad network to identify track defects. It is a large-scale routing and scheduling problem where thousands of tasks are to be scheduled subject to many difficult side constraints such as periodicity constraints and discrete working time constraints. A vehicle routing problem formulation was proposed for TISP, and a customized heuristic algorithm was developed to solve the model. The algorithm iteratively applies a constructive heuristic and a local search algorithm in an incremental scheduling horizon framework. The proposed model and algorithm have been adopted by a Class I railroad in its decision making process. Real-world case studies show the proposed approach outperforms the manual approach in short-term scheduling and can be used to conduct long-term what-if analyses to yield managerial insights. PTSP schedules capital track maintenance projects, which are the largest track maintenance activities and account for the majority of railroad capital spending. A time-space network model was proposed to formulate PTSP. More than ten types of side constraints were considered in the model, including very complex constraints such as mutual exclusion constraints and consecution constraints. A multiple neighborhood search algorithm, including a decomposition and restriction search and a block-interchange search, was developed to solve the model. Various performance enhancement techniques, such as data reduction, augmented cost function and subproblem prioritization, were developed to improve the algorithm. The proposed approach has been adopted by a Class I railroad for two years. Our numerical results show the model solutions are able to satisfy all hard constraints and most soft constraints. Compared with the existing manual procedure, the proposed approach is able to bring significant cost savings and operational efficiency improvement. JTPCP is an intermediate problem between TISP and PTSP. It focuses on clustering thousands of capital track maintenance jobs (based on the defects identified in track inspection) into projects so that the projects can be scheduled in PTSP. A vehicle routing problem based model and a multiple-step heuristic algorithm were developed to solve this problem. Various side constraints such as mutual exclusion constraints and rounding constraints were considered. The proposed approach has been applied in practice and has shown good performance in both solution quality and efficiency.