4 resultados para Data Migration Processes Modeling
em Illinois Digital Environment for Access to Learning and Scholarship Repository
Resumo:
With the dramatic growth of text information, there is an increasing need for powerful text mining systems that can automatically discover useful knowledge from text. Text is generally associated with all kinds of contextual information. Those contexts can be explicit, such as the time and the location where a blog article is written, and the author(s) of a biomedical publication, or implicit, such as the positive or negative sentiment that an author had when she wrote a product review; there may also be complex context such as the social network of the authors. Many applications require analysis of topic patterns over different contexts. For instance, analysis of search logs in the context of the user can reveal how we can improve the quality of a search engine by optimizing the search results according to particular users; analysis of customer reviews in the context of positive and negative sentiments can help the user summarize public opinions about a product; analysis of blogs or scientific publications in the context of a social network can facilitate discovery of more meaningful topical communities. Since context information significantly affects the choices of topics and language made by authors, in general, it is very important to incorporate it into analyzing and mining text data. In general, modeling the context in text, discovering contextual patterns of language units and topics from text, a general task which we refer to as Contextual Text Mining, has widespread applications in text mining. In this thesis, we provide a novel and systematic study of contextual text mining, which is a new paradigm of text mining treating context information as the ``first-class citizen.'' We formally define the problem of contextual text mining and its basic tasks, and propose a general framework for contextual text mining based on generative modeling of text. This conceptual framework provides general guidance on text mining problems with context information and can be instantiated into many real tasks, including the general problem of contextual topic analysis. We formally present a functional framework for contextual topic analysis, with a general contextual topic model and its various versions, which can effectively solve the text mining problems in a lot of real world applications. We further introduce general components of contextual topic analysis, by adding priors to contextual topic models to incorporate prior knowledge, regularizing contextual topic models with dependency structure of context, and postprocessing contextual patterns to extract refined patterns. The refinements on the general contextual topic model naturally lead to a variety of probabilistic models which incorporate different types of context and various assumptions and constraints. These special versions of the contextual topic model are proved effective in a variety of real applications involving topics and explicit contexts, implicit contexts, and complex contexts. We then introduce a postprocessing procedure for contextual patterns, by generating meaningful labels for multinomial context models. This method provides a general way to interpret text mining results for real users. By applying contextual text mining in the ``context'' of other text information management tasks, including ad hoc text retrieval and web search, we further prove the effectiveness of contextual text mining techniques in a quantitative way with large scale datasets. The framework of contextual text mining not only unifies many explorations of text analysis with context information, but also opens up many new possibilities for future research directions in text mining.
Resumo:
The analysis of steel and composite frames has traditionally been carried out by idealizing beam-to-column connections as either rigid or pinned. Although some advanced analysis methods have been proposed to account for semi-rigid connections, the performance of these methods strongly depends on the proper modeling of connection behavior. The primary challenge of modeling beam-to-column connections is their inelastic response and continuously varying stiffness, strength, and ductility. In this dissertation, two distinct approaches—mathematical models and informational models—are proposed to account for the complex hysteretic behavior of beam-to-column connections. The performance of the two approaches is examined and is then followed by a discussion of their merits and deficiencies. To capitalize on the merits of both mathematical and informational representations, a new approach, a hybrid modeling framework, is developed and demonstrated through modeling beam-to-column connections. Component-based modeling is a compromise spanning two extremes in the field of mathematical modeling: simplified global models and finite element models. In the component-based modeling of angle connections, the five critical components of excessive deformation are identified. Constitutive relationships of angles, column panel zones, and contact between angles and column flanges, are derived by using only material and geometric properties and theoretical mechanics considerations. Those of slip and bolt hole ovalization are simplified by empirically-suggested mathematical representation and expert opinions. A mathematical model is then assembled as a macro-element by combining rigid bars and springs that represent the constitutive relationship of components. Lastly, the moment-rotation curves of the mathematical models are compared with those of experimental tests. In the case of a top-and-seat angle connection with double web angles, a pinched hysteretic response is predicted quite well by complete mechanical models, which take advantage of only material and geometric properties. On the other hand, to exhibit the highly pinched behavior of a top-and-seat angle connection without web angles, a mathematical model requires components of slip and bolt hole ovalization, which are more amenable to informational modeling. An alternative method is informational modeling, which constitutes a fundamental shift from mathematical equations to data that contain the required information about underlying mechanics. The information is extracted from observed data and stored in neural networks. Two different training data sets, analytically-generated and experimental data, are tested to examine the performance of informational models. Both informational models show acceptable agreement with the moment-rotation curves of the experiments. Adding a degradation parameter improves the informational models when modeling highly pinched hysteretic behavior. However, informational models cannot represent the contribution of individual components and therefore do not provide an insight into the underlying mechanics of components. In this study, a new hybrid modeling framework is proposed. In the hybrid framework, a conventional mathematical model is complemented by the informational methods. The basic premise of the proposed hybrid methodology is that not all features of system response are amenable to mathematical modeling, hence considering informational alternatives. This may be because (i) the underlying theory is not available or not sufficiently developed, or (ii) the existing theory is too complex and therefore not suitable for modeling within building frame analysis. The role of informational methods is to model aspects that the mathematical model leaves out. Autoprogressive algorithm and self-learning simulation extract the missing aspects from a system response. In a hybrid framework, experimental data is an integral part of modeling, rather than being used strictly for validation processes. The potential of the hybrid methodology is illustrated through modeling complex hysteretic behavior of beam-to-column connections. Mechanics-based components of deformation such as angles, flange-plates, and column panel zone, are idealized to a mathematical model by using a complete mechanical approach. Although the mathematical model represents envelope curves in terms of initial stiffness and yielding strength, it is not capable of capturing the pinching effects. Pinching is caused mainly by separation between angles and column flanges as well as slip between angles/flange-plates and beam flanges. These components of deformation are suitable for informational modeling. Finally, the moment-rotation curves of the hybrid models are validated with those of the experimental tests. The comparison shows that the hybrid models are capable of representing the highly pinched hysteretic behavior of beam-to-column connections. In addition, the developed hybrid model is successfully used to predict the behavior of a newly-designed connection.
Resumo:
The protein lysate array is an emerging technology for quantifying the protein concentration ratios in multiple biological samples. It is gaining popularity, and has the potential to answer questions about post-translational modifications and protein pathway relationships. Statistical inference for a parametric quantification procedure has been inadequately addressed in the literature, mainly due to two challenges: the increasing dimension of the parameter space and the need to account for dependence in the data. Each chapter of this thesis addresses one of these issues. In Chapter 1, an introduction to the protein lysate array quantification is presented, followed by the motivations and goals for this thesis work. In Chapter 2, we develop a multi-step procedure for the Sigmoidal models, ensuring consistent estimation of the concentration level with full asymptotic efficiency. The results obtained in this chapter justify inferential procedures based on large-sample approximations. Simulation studies and real data analysis are used to illustrate the performance of the proposed method in finite-samples. The multi-step procedure is simpler in both theory and computation than the single-step least squares method that has been used in current practice. In Chapter 3, we introduce a new model to account for the dependence structure of the errors by a nonlinear mixed effects model. We consider a method to approximate the maximum likelihood estimator of all the parameters. Using the simulation studies on various error structures, we show that for data with non-i.i.d. errors the proposed method leads to more accurate estimates and better confidence intervals than the existing single-step least squares method.
Resumo:
Language provides an interesting lens to look at state-building processes because of its cross-cutting nature. For example, in addition to its symbolic value and appeal, a national language has other roles in the process, including: (a) becoming the primary medium of communication which permits the nation to function efficiently in its political and economic life, (b) promoting social cohesion, allowing the nation to develop a common culture, and (c) forming a primordial basis for self-determination. Moreover, because of its cross-cutting nature, language interventions are rarely isolated activities. Languages are adopted by speakers, taking root in and spreading between communities because they are legitimated by legislation, and then reproduced through institutions like the education and military systems. Pádraig Ó’ Riagáin (1997) makes a case for this observing that “Language policy is formulated, implemented, and accomplishes its results within a complex interrelated set of economic, social, and political processes which include, inter alia, the operation of other non-language state policies” (p. 45). In the Turkish case, its foundational role in the formation of the Turkish nation-state but its linkages to human rights issues raises interesting issues about how socio-cultural practices become reproduced through institutional infrastructure formation. This dissertation is a country-level case study looking at Turkey’s nation-state building process through the lens of its language and education policy development processes with a focus on the early years of the Republic between 1927 and 1970. This project examines how different groups self-identified or were self-identified (as the case may be) in official Turkish statistical publications (e.g., the Turkish annual statistical yearbooks and the population censuses) during that time period when language and ethnicity data was made publicly available. The overarching questions this dissertation explores include: 1.What were the geo-political conditions surrounding the development and influencing the Turkish government’s language and education policies? 2.Are there any observable patterns in the geo-spatial distribution of language, literacy, and education participation rates over time? In what ways, are these traditionally linked variables (language, literacy, education participation) problematic? 3.What do changes in population identifiers, e.g., language and ethnicity, suggest about the government’s approach towards nation-state building through the construction of a civic Turkish identity and institution building? Archival secondary source data was digitized, aggregated by categories relevant to this project at national and provincial levels and over the course of time (primarily between 1927 and 2000). The data was then re-aggregated into values that could be longitudinally compared and then layered on aspatial administrative maps. This dissertation contributes to existing body of social policy literature by taking an interdisciplinary approach in looking at the larger socio-economic contexts in which language and education policies are produced.