982 resultados para Dimension reduction
Resumo:
This preliminary report describes work carried out as part of work package 1.2 of the MUCM research project. The report is split in two parts: the ?rst part (Sections 1 and 2) summarises the state of the art in emulation of computer models, while the second presents some initial work on the emulation of dynamic models. In the ?rst part, we describe the basics of emulation, introduce the notation and put together the key results for the emulation of models with single and multiple outputs, with or without the use of mean function. In the second part, we present preliminary results on the chaotic Lorenz 63 model. We look at emulation of a single time step, and repeated application of the emulator for sequential predic- tion. After some design considerations, the emulator is compared with the exact simulator on a number of runs to assess its performance. Several general issues related to emulating dynamic models are raised and discussed. Current work on the larger Lorenz 96 model (40 variables) is presented in the context of dimension reduction, with results to be provided in a follow-up report. The notation used in this report are summarised in appendix.
Resumo:
Secondary pyrolysis in fluidized bed fast pyrolysis of biomass is the focus of this work. A novel computational fluid dynamics (CFD) model coupled with a comprehensive chemistry scheme (134 species and 4169 reactions, in CHEMKIN format) has been developed to investigate this complex phenomenon. Previous results from a transient three-dimensional model of primary pyrolysis were used for the source terms of primary products in this model. A parametric study of reaction atmospheres (H2O, N2, H2, CO2, CO) has been performed. For the N2 and H2O atmosphere, results of the model compared favorably to experimentally obtained yields after the temperature was adjusted to a value higher than that used in experiments. One notable deviation versus experiments is pyrolytic water yield and yield of higher hydrocarbons. The model suggests a not overly strong impact of the reaction atmosphere. However, both chemical and physical effects were observed. Most notably, effects could be seen on the yield of various compounds, temperature profile throughout the reactor system, residence time, radical concentration, and turbulent intensity. At the investigated temperature (873 K), turbulent intensity appeared to have the strongest influence on liquid yield. With the aid of acceleration techniques, most importantly dimension reduction, chemistry agglomeration, and in-situ tabulation, a converged solution could be obtained within a reasonable time (∼30 h). As such, a new potentially useful method has been suggested for numerical analysis of fast pyrolysis.
Resumo:
Homogenous secondary pyrolysis is category of reactions following the primary pyrolysis and presumed important for fast pyrolysis. For the comprehensive chemistry and fluid dynamics, a probability density functional (PDF) approach is used; with a kinetic scheme comprising 134 species and 4169 reactions being implemented. With aid of acceleration techniques, most importantly Dimension Reduction, Chemistry Agglomeration and In-situ Tabulation (ISAT), a solution within reasonable time was obtained. More work is required; however, a solution for levoglucosan (C6H10O5) being fed through the inlet with fluidizing gas at 500 °C, has been obtained. 88.6% of the levoglucosan remained non-decomposed, and 19 different decomposition product species were found above 0.01% by weight. A homogenous secondary pyrolysis scheme proposed can thus be implemented in a CFD environment and acceleration techniques can speed-up the calculation for application in engineering settings.
Resumo:
Popular dimension reduction and visualisation algorithms rely on the assumption that input dissimilarities are typically Euclidean, for instance Metric Multidimensional Scaling, t-distributed Stochastic Neighbour Embedding and the Gaussian Process Latent Variable Model. It is well known that this assumption does not hold for most datasets and often high-dimensional data sits upon a manifold of unknown global geometry. We present a method for improving the manifold charting process, coupled with Elastic MDS, such that we no longer assume that the manifold is Euclidean, or of any particular structure. We draw on the benefits of different dissimilarity measures allowing for the relative responsibilities, under a linear combination, to drive the visualisation process.
Resumo:
As massive data sets become increasingly available, people are facing the problem of how to effectively process and understand these data. Traditional sequential computing models are giving way to parallel and distributed computing models, such as MapReduce, both due to the large size of the data sets and their high dimensionality. This dissertation, as in the same direction of other researches that are based on MapReduce, tries to develop effective techniques and applications using MapReduce that can help people solve large-scale problems. Three different problems are tackled in the dissertation. The first one deals with processing terabytes of raster data in a spatial data management system. Aerial imagery files are broken into tiles to enable data parallel computation. The second and third problems deal with dimension reduction techniques that can be used to handle data sets of high dimensionality. Three variants of the nonnegative matrix factorization technique are scaled up to factorize matrices of dimensions in the order of millions in MapReduce based on different matrix multiplication implementations. Two algorithms, which compute CANDECOMP/PARAFAC and Tucker tensor decompositions respectively, are parallelized in MapReduce based on carefully partitioning the data and arranging the computation to maximize data locality and parallelism.
Resumo:
Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.
Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.
Resumo:
This thesis introduces two related lines of study on classification of hyperspectral images with nonlinear methods. First, it describes a quantitative and systematic evaluation, by the author, of each major component in a pipeline for classifying hyperspectral images (HSI) developed earlier in a joint collaboration [23]. The pipeline, with novel use of nonlinear classification methods, has reached beyond the state of the art in classification accuracy on commonly used benchmarking HSI data [6], [13]. More importantly, it provides a clutter map, with respect to a predetermined set of classes, toward the real application situations where the image pixels not necessarily fall into a predetermined set of classes to be identified, detected or classified with.
The particular components evaluated are a) band selection with band-wise entropy spread, b) feature transformation with spatial filters and spectral expansion with derivatives c) graph spectral transformation via locally linear embedding for dimension reduction, and d) statistical ensemble for clutter detection. The quantitative evaluation of the pipeline verifies that these components are indispensable to high-accuracy classification.
Secondly, the work extends the HSI classification pipeline with a single HSI data cube to multiple HSI data cubes. Each cube, with feature variation, is to be classified of multiple classes. The main challenge is deriving the cube-wise classification from pixel-wise classification. The thesis presents the initial attempt to circumvent it, and discuss the potential for further improvement.
Resumo:
Microsecond long Molecular Dynamics (MD) trajectories of biomolecular processes are now possible due to advances in computer technology. Soon, trajectories long enough to probe dynamics over many milliseconds will become available. Since these timescales match the physiological timescales over which many small proteins fold, all atom MD simulations of protein folding are now becoming popular. To distill features of such large folding trajectories, we must develop methods that can both compress trajectory data to enable visualization, and that can yield themselves to further analysis, such as the finding of collective coordinates and reduction of the dynamics. Conventionally, clustering has been the most popular MD trajectory analysis technique, followed by principal component analysis (PCA). Simple clustering used in MD trajectory analysis suffers from various serious drawbacks, namely, (i) it is not data driven, (ii) it is unstable to noise and change in cutoff parameters, and (iii) since it does not take into account interrelationships amongst data points, the separation of data into clusters can often be artificial. Usually, partitions generated by clustering techniques are validated visually, but such validation is not possible for MD trajectories of protein folding, as the underlying structural transitions are not well understood. Rigorous cluster validation techniques may be adapted, but it is more crucial to reduce the dimensions in which MD trajectories reside, while still preserving their salient features. PCA has often been used for dimension reduction and while it is computationally inexpensive, being a linear method, it does not achieve good data compression. In this thesis, I propose a different method, a nonmetric multidimensional scaling (nMDS) technique, which achieves superior data compression by virtue of being nonlinear, and also provides a clear insight into the structural processes underlying MD trajectories. I illustrate the capabilities of nMDS by analyzing three complete villin headpiece folding and six norleucine mutant (NLE) folding trajectories simulated by Freddolino and Schulten [1]. Using these trajectories, I make comparisons between nMDS, PCA and clustering to demonstrate the superiority of nMDS. The three villin headpiece trajectories showed great structural heterogeneity. Apart from a few trivial features like early formation of secondary structure, no commonalities between trajectories were found. There were no units of residues or atoms found moving in concert across the trajectories. A flipping transition, corresponding to the flipping of helix 1 relative to the plane formed by helices 2 and 3 was observed towards the end of the folding process in all trajectories, when nearly all native contacts had been formed. However, the transition occurred through a different series of steps in all trajectories, indicating that it may not be a common transition in villin folding. The trajectories showed competition between local structure formation/hydrophobic collapse and global structure formation in all trajectories. Our analysis on the NLE trajectories confirms the notion that a tight hydrophobic core inhibits correct 3-D rearrangement. Only one of the six NLE trajectories folded, and it showed no flipping transition. All the other trajectories get trapped in hydrophobically collapsed states. The NLE residues were found to be buried deeply into the core, compared to the corresponding lysines in the villin headpiece, thereby making the core tighter and harder to undo for 3-D rearrangement. Our results suggest that the NLE may not be a fast folder as experiments suggest. The tightness of the hydrophobic core may be a very important factor in the folding of larger proteins. It is likely that chaperones like GroEL act to undo the tight hydrophobic core of proteins, after most secondary structure elements have been formed, so that global rearrangement is easier. I conclude by presenting facts about chaperone-protein complexes and propose further directions for the study of protein folding.
Resumo:
Undoubtedly, statistics has become one of the most important subjects in the modern world, where its applications are ubiquitous. The importance of statistics is not limited to statisticians, but also impacts upon non-statisticians who have to use statistics within their own disciplines. Several studies have indicated that most of the academic departments around the world have realized the importance of statistics to non-specialist students. Therefore, the number of students enrolled in statistics courses has vastly increased, coming from a variety of disciplines. Consequently, research within the scope of statistics education has been able to develop throughout the last few years. One important issue is how statistics is best taught to, and learned by, non-specialist students. This issue is controlled by several factors that affect the learning and teaching of statistics to non-specialist students, such as the use of technology, the role of the English language (especially for those whose first language is not English), the effectiveness of statistics teachers and their approach towards teaching statistics courses, students’ motivation to learn statistics and the relevance of statistics courses to the main subjects of non-specialist students. Several studies, focused on aspects of learning and teaching statistics, have been conducted in different countries around the world, particularly in Western countries. Conversely, the situation in Arab countries, especially in Saudi Arabia, is different; here, there is very little research in this scope, and what there is does not meet the needs of those countries towards the development of learning and teaching statistics to non-specialist students. This research was instituted in order to develop the field of statistics education. The purpose of this mixed methods study was to generate new insights into this subject by investigating how statistics courses are currently taught to non-specialist students in Saudi universities. Hence, this study will contribute towards filling the knowledge gap that exists in Saudi Arabia. This study used multiple data collection approaches, including questionnaire surveys from 1053 non-specialist students who had completed at least one statistics course in different colleges of the universities in Saudi Arabia. These surveys were followed up with qualitative data collected via semi-structured interviews with 16 teachers of statistics from colleges within all six universities where statistics is taught to non-specialist students in Saudi Arabia’s Eastern Region. The data from questionnaires included several types, so different techniques were used in analysis. Descriptive statistics were used to identify the demographic characteristics of the participants. The chi-square test was used to determine associations between variables. Based on the main issues that are raised from literature review, the questions (items scales) were grouped and five key groups of questions were obtained which are: 1) Effectiveness of Teachers; 2) English Language; 3) Relevance of Course; 4) Student Engagement; 5) Using Technology. Exploratory data analysis was used to explore these issues in more detail. Furthermore, with the existence of clustering in the data (students within departments within colleges, within universities), multilevel generalized linear models for dichotomous analysis have been used to clarify the effects of clustering at those levels. Factor analysis was conducted confirming the dimension reduction of variables (items scales). The data from teachers’ interviews were analysed on an individual basis. The responses were assigned to one of the eight themes that emerged from within the data: 1) the lack of students’ motivation to learn statistics; 2) students' participation; 3) students’ assessment; 4) the effective use of technology; 5) the level of previous mathematical and statistical skills of non-specialist students; 6) the English language ability of non-specialist students; 7) the need for extra time for teaching and learning statistics; and 8) the role of administrators. All the data from students and teachers indicated that the situation of learning and teaching statistics to non-specialist students in Saudi universities needs to be improved in order to meet the needs of those students. The findings of this study suggested a weakness in the use of statistical software applications in these courses. This study showed that there is lack of application of technology such as statistical software programs in these courses, which would allow non-specialist students to consolidate their knowledge. The results also indicated that English language is considered one of the main challenges in learning and teaching statistics, particularly in institutions where English is not used as the main language. Moreover, the weakness of mathematical skills of students is considered another major challenge. Additionally, the results indicated that there was a need to tailor statistics courses to the needs of non-specialist students based on their main subjects. The findings indicate that statistics teachers need to choose appropriate methods when teaching statistics courses.
Resumo:
This thesis describes a discrete component of a larger mixed-method (survey and interview) study that explored the health-promotion and risk-reduction practices of younger premenopausal survivors of ovarian, breast and haematological cancers. This thesis outlines my distinct contribution to the larger study, which was to: (1) Produce a literature review that thoroughly explored all longer-term breast cancer treatment outcomes, and which outlined the health risks to survivors associated with these; (2) Describe and analyse the health-promotion and risk-reduction behaviours of nine younger female survivors of breast cancer as articulated in the qualitative interview dataset; and (3) Test the explanatory power of the Precede-Proceed theoretical framework underpinning the study in relation to the qualitative data from the breast cancer cohort. The thesis reveals that breast cancer survivors experienced many adverse outcomes as a result of treatment. While they generally engaged in healthy lifestyle practices, a lack of knowledge about many recommended health behaviours emerged throughout the interviews. The participants also described significant internal and external pressures to behave in certain ways because of the social norms surrounding the disease. This thesis also reports that the Precede-Proceed model is a generally robust approach to data collection, analysis and interpretation in the context of breast cancer survivorship. It provided plausible explanations for much of the data in this study. However, profound sociological and psychological implications arose during the analysis that were not effectively captured or explained by the theories underpinning the model. A sociological filter—such as Turner’s explanation of the meaning of the body and embodiment in the social sphere (Turner, 2008)—and the psychological concerns teased out in Mishel’s (1990) Uncertainty in Illness Theory, provided a useful dimension to the findings generated through the Precede-Proceed model. The thesis concludes with several recommendations for future research, clinical practice and education in this context.
Resumo:
We computed Higuchi's fractal dimension (FD) of resting, eyes closed EEG recorded from 30 scalp locations in 18 male neuroleptic-naive, recent-onset schizophrenia (NRS) subjects and 15 male healthy control (HC) subjects, who were group-matched for age. Schizophrenia patients showed a diffuse reduction of FD except in the bilateral temporal and occipital regions, with the reduction being most prominent bifrontally. The positive symptom (PS) schizophrenia subjects showed FD values similar to or even higher than HC in the bilateral temporo-occipital regions, along with a co-existent bifrontal FD reduction as noted in the overall sample of NRS. In contrast, this increase in FD values in the bilateral temporo-occipital region was absent in the negative symptom (NS) subgroup. The regional differences in complexity suggested by these findings may reflect the aberrant brain dynamics underlying the pathophysiology of schizophrenia and its symptom dimensions. Higuchi's method of measuring FD directly in the time domain provides an alternative for the more computationally intensive nonlinear methods of estimating EEG complexity.
Resumo:
The spectra of molecules oriented in liquid crystalline media are dominated by partially averaged dipolar couplings. In the 13C–1H HSQC, due to the inefficient hetero-nuclear dipolar decoupling in the indirect dimension, normally carried out by using a π pulse, there is a considerable loss of resolution. Furthermore, in such strongly orienting media the 1H–1H and 13C–1H dipolar couplings leads to fast dephasing of transverse magnetization causing inefficient polarization transfer and hence the loss of sensitivity in the indirect dimension. In this study we have carried out 13C–1H HSQC experiment with efficient polarization transfer from 1H to 13C for molecules aligned in liquid crystalline media. The homonuclear dipolar decoupling using FFLG during the INEPT transfer delays and also during evolution period combined with the π pulse heteronuclear decoupling in the t1 period has been applied. The studies showed a significant reduction in partially averaged dipolar couplings and thereby enhancement in the resolution and sensitivity in the indirect dimension. This has been demonstrated on pyridazine and pyrimidine oriented in the liquid crystal. The two closely resonating carbons in pyrimidine are better resolved in the present study compared to the earlier work [H.S. Vinay Deepak, Anu Joy, N. Suryaprakash, Determination of natural abundance 15N–1H and 13C–1H dipolar couplings of molecules in a strongly orienting media using two-dimensional inverse experiments, Magn. Reson. Chem. 44 (2006) 553–565].
Resumo:
A dynamical instability is observed in experimental studies on micro-channels of rectangular cross-section with smallest dimension 100 and 160 mu m in which one of the walls is made of soft gel. There is a spontaneous transition from an ordered, laminar flow to a chaotic and highly mixed flow state when the Reynolds number increases beyond a critical value. The critical Reynolds number, which decreases as the elasticity modulus of the soft wall is reduced, is as low as 200 for the softest wall used here (in contrast to 1200 for a rigid-walled channel) The instability onset is observed by the breakup of a dye-stream introduced in the centre of the micro-channel, as well as the onset of wall oscillations due to laser scattering from fluorescent beads embedded in the wall of the channel. The mixing time across a channel of width 1.5 mm, measured by dye-stream and outlet conductance experiments, is smaller by a factor of 10(5) than that for a laminar flow. The increased mixing rate comes at very little cost, because the pressure drop (energy requirement to drive the flow) increases continuously and modestly at transition. The deformed shape is reconstructed numerically, and computational fluid dynamics (CFD) simulations are carried out to obtain the pressure gradient and the velocity fields for different flow rates. The pressure difference across the channel predicted by simulations is in agreement with the experiments (within experimental errors) for flow rates where the dye stream is laminar, but the experimental pressure difference is higher than the simulation prediction after dye-stream breakup. A linear stability analysis is carried out using the parallel-flow approximation, in which the wall is modelled as a neo-Hookean elastic solid, and the simulation results for the mean velocity and pressure gradient from the CFD simulations are used as inputs. The stability analysis accurately predicts the Reynolds number (based on flow rate) at which an instability is observed in the dye stream, and it also predicts that the instability first takes place at the downstream converging section of the channel, and not at the upstream diverging section. The stability analysis also indicates that the destabilization is due to the modification of the flow and the local pressure gradient due to the wall deformation; if we assume a parabolic velocity profile with the pressure gradient given by the plane Poiseuille law, the flow is always found to be stable.
Resumo:
The reduction approaches are presented for vibration control of symmetric, cyclic periodic and linking structures. The condensation of generalized coordinates, the locations of sensors and actuators, and the relation between system inputs and control forces are assumed to be set in a symmetric way so that the control system posses the same repetition as the structure considered. By employing proper transformations of condensed generalized coordinates and the system inputs, the vibration control of an entire system can be implemented by carrying out the control of a number of sub-structures, and thus the dimension of the control problem can be significantly reduced.
Resumo:
In this thesis we study Galois representations corresponding to abelian varieties with certain reduction conditions. We show that these conditions force the image of the representations to be "big," so that the Mumford-Tate conjecture (:= MT) holds. We also prove that the set of abelian varieties satisfying these conditions is dense in a corresponding moduli space.
The main results of the thesis are the following two theorems.
Theorem A: Let A be an absolutely simple abelian variety, End° (A) = k : imaginary quadratic field, g = dim(A). Assume either dim(A) ≤ 4, or A has bad reduction at some prime ϕ, with the dimension of the toric part of the reduction equal to 2r, and gcd(r,g) = 1, and (r,g) ≠ (15,56) or (m -1, m(m+1)/2). Then MT holds.
Theorem B: Let M be the moduli space of abelian varieties with fixed polarization, level structure and a k-action. It is defined over a number field F. The subset of M(Q) corresponding to absolutely simple abelian varieties with a prescribed stable reduction at a large enough prime ϕ of F is dense in M(C) in the complex topology. In particular, the set of simple abelian varieties having bad reductions with fixed dimension of the toric parts is dense.
Besides this we also established the following results:
(1) MT holds for some other classes of abelian varieties with similar reduction conditions. For example, if A is an abelian variety with End° (A) = Q and the dimension of the toric part of its reduction is prime to dim( A), then MT holds.
(2) MT holds for Ribet-type abelian varieties.
(3) The Hodge and the Tate conjectures are equivalent for abelian 4-folds.
(4) MT holds for abelian 4-folds of type II, III, IV (Theorem 5.0(2)) and some 4-folds of type I.
(5) For some abelian varieties either MT or the Hodge conjecture holds.