811 resultados para dimension reduction


Relevância:

60.00% 60.00%

Publicador:

Resumo:

This preliminary report describes work carried out as part of work package 1.2 of the MUCM research project. The report is split in two parts: the ?rst part (Sections 1 and 2) summarises the state of the art in emulation of computer models, while the second presents some initial work on the emulation of dynamic models. In the ?rst part, we describe the basics of emulation, introduce the notation and put together the key results for the emulation of models with single and multiple outputs, with or without the use of mean function. In the second part, we present preliminary results on the chaotic Lorenz 63 model. We look at emulation of a single time step, and repeated application of the emulator for sequential predic- tion. After some design considerations, the emulator is compared with the exact simulator on a number of runs to assess its performance. Several general issues related to emulating dynamic models are raised and discussed. Current work on the larger Lorenz 96 model (40 variables) is presented in the context of dimension reduction, with results to be provided in a follow-up report. The notation used in this report are summarised in appendix.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Secondary pyrolysis in fluidized bed fast pyrolysis of biomass is the focus of this work. A novel computational fluid dynamics (CFD) model coupled with a comprehensive chemistry scheme (134 species and 4169 reactions, in CHEMKIN format) has been developed to investigate this complex phenomenon. Previous results from a transient three-dimensional model of primary pyrolysis were used for the source terms of primary products in this model. A parametric study of reaction atmospheres (H2O, N2, H2, CO2, CO) has been performed. For the N2 and H2O atmosphere, results of the model compared favorably to experimentally obtained yields after the temperature was adjusted to a value higher than that used in experiments. One notable deviation versus experiments is pyrolytic water yield and yield of higher hydrocarbons. The model suggests a not overly strong impact of the reaction atmosphere. However, both chemical and physical effects were observed. Most notably, effects could be seen on the yield of various compounds, temperature profile throughout the reactor system, residence time, radical concentration, and turbulent intensity. At the investigated temperature (873 K), turbulent intensity appeared to have the strongest influence on liquid yield. With the aid of acceleration techniques, most importantly dimension reduction, chemistry agglomeration, and in-situ tabulation, a converged solution could be obtained within a reasonable time (∼30 h). As such, a new potentially useful method has been suggested for numerical analysis of fast pyrolysis.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Homogenous secondary pyrolysis is category of reactions following the primary pyrolysis and presumed important for fast pyrolysis. For the comprehensive chemistry and fluid dynamics, a probability density functional (PDF) approach is used; with a kinetic scheme comprising 134 species and 4169 reactions being implemented. With aid of acceleration techniques, most importantly Dimension Reduction, Chemistry Agglomeration and In-situ Tabulation (ISAT), a solution within reasonable time was obtained. More work is required; however, a solution for levoglucosan (C6H10O5) being fed through the inlet with fluidizing gas at 500 °C, has been obtained. 88.6% of the levoglucosan remained non-decomposed, and 19 different decomposition product species were found above 0.01% by weight. A homogenous secondary pyrolysis scheme proposed can thus be implemented in a CFD environment and acceleration techniques can speed-up the calculation for application in engineering settings.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Popular dimension reduction and visualisation algorithms rely on the assumption that input dissimilarities are typically Euclidean, for instance Metric Multidimensional Scaling, t-distributed Stochastic Neighbour Embedding and the Gaussian Process Latent Variable Model. It is well known that this assumption does not hold for most datasets and often high-dimensional data sits upon a manifold of unknown global geometry. We present a method for improving the manifold charting process, coupled with Elastic MDS, such that we no longer assume that the manifold is Euclidean, or of any particular structure. We draw on the benefits of different dissimilarity measures allowing for the relative responsibilities, under a linear combination, to drive the visualisation process.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

As massive data sets become increasingly available, people are facing the problem of how to effectively process and understand these data. Traditional sequential computing models are giving way to parallel and distributed computing models, such as MapReduce, both due to the large size of the data sets and their high dimensionality. This dissertation, as in the same direction of other researches that are based on MapReduce, tries to develop effective techniques and applications using MapReduce that can help people solve large-scale problems. Three different problems are tackled in the dissertation. The first one deals with processing terabytes of raster data in a spatial data management system. Aerial imagery files are broken into tiles to enable data parallel computation. The second and third problems deal with dimension reduction techniques that can be used to handle data sets of high dimensionality. Three variants of the nonnegative matrix factorization technique are scaled up to factorize matrices of dimensions in the order of millions in MapReduce based on different matrix multiplication implementations. Two algorithms, which compute CANDECOMP/PARAFAC and Tucker tensor decompositions respectively, are parallelized in MapReduce based on carefully partitioning the data and arranging the computation to maximize data locality and parallelism.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.

Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This thesis introduces two related lines of study on classification of hyperspectral images with nonlinear methods. First, it describes a quantitative and systematic evaluation, by the author, of each major component in a pipeline for classifying hyperspectral images (HSI) developed earlier in a joint collaboration [23]. The pipeline, with novel use of nonlinear classification methods, has reached beyond the state of the art in classification accuracy on commonly used benchmarking HSI data [6], [13]. More importantly, it provides a clutter map, with respect to a predetermined set of classes, toward the real application situations where the image pixels not necessarily fall into a predetermined set of classes to be identified, detected or classified with.

The particular components evaluated are a) band selection with band-wise entropy spread, b) feature transformation with spatial filters and spectral expansion with derivatives c) graph spectral transformation via locally linear embedding for dimension reduction, and d) statistical ensemble for clutter detection. The quantitative evaluation of the pipeline verifies that these components are indispensable to high-accuracy classification.

Secondly, the work extends the HSI classification pipeline with a single HSI data cube to multiple HSI data cubes. Each cube, with feature variation, is to be classified of multiple classes. The main challenge is deriving the cube-wise classification from pixel-wise classification. The thesis presents the initial attempt to circumvent it, and discuss the potential for further improvement.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Microsecond long Molecular Dynamics (MD) trajectories of biomolecular processes are now possible due to advances in computer technology. Soon, trajectories long enough to probe dynamics over many milliseconds will become available. Since these timescales match the physiological timescales over which many small proteins fold, all atom MD simulations of protein folding are now becoming popular. To distill features of such large folding trajectories, we must develop methods that can both compress trajectory data to enable visualization, and that can yield themselves to further analysis, such as the finding of collective coordinates and reduction of the dynamics. Conventionally, clustering has been the most popular MD trajectory analysis technique, followed by principal component analysis (PCA). Simple clustering used in MD trajectory analysis suffers from various serious drawbacks, namely, (i) it is not data driven, (ii) it is unstable to noise and change in cutoff parameters, and (iii) since it does not take into account interrelationships amongst data points, the separation of data into clusters can often be artificial. Usually, partitions generated by clustering techniques are validated visually, but such validation is not possible for MD trajectories of protein folding, as the underlying structural transitions are not well understood. Rigorous cluster validation techniques may be adapted, but it is more crucial to reduce the dimensions in which MD trajectories reside, while still preserving their salient features. PCA has often been used for dimension reduction and while it is computationally inexpensive, being a linear method, it does not achieve good data compression. In this thesis, I propose a different method, a nonmetric multidimensional scaling (nMDS) technique, which achieves superior data compression by virtue of being nonlinear, and also provides a clear insight into the structural processes underlying MD trajectories. I illustrate the capabilities of nMDS by analyzing three complete villin headpiece folding and six norleucine mutant (NLE) folding trajectories simulated by Freddolino and Schulten [1]. Using these trajectories, I make comparisons between nMDS, PCA and clustering to demonstrate the superiority of nMDS. The three villin headpiece trajectories showed great structural heterogeneity. Apart from a few trivial features like early formation of secondary structure, no commonalities between trajectories were found. There were no units of residues or atoms found moving in concert across the trajectories. A flipping transition, corresponding to the flipping of helix 1 relative to the plane formed by helices 2 and 3 was observed towards the end of the folding process in all trajectories, when nearly all native contacts had been formed. However, the transition occurred through a different series of steps in all trajectories, indicating that it may not be a common transition in villin folding. The trajectories showed competition between local structure formation/hydrophobic collapse and global structure formation in all trajectories. Our analysis on the NLE trajectories confirms the notion that a tight hydrophobic core inhibits correct 3-D rearrangement. Only one of the six NLE trajectories folded, and it showed no flipping transition. All the other trajectories get trapped in hydrophobically collapsed states. The NLE residues were found to be buried deeply into the core, compared to the corresponding lysines in the villin headpiece, thereby making the core tighter and harder to undo for 3-D rearrangement. Our results suggest that the NLE may not be a fast folder as experiments suggest. The tightness of the hydrophobic core may be a very important factor in the folding of larger proteins. It is likely that chaperones like GroEL act to undo the tight hydrophobic core of proteins, after most secondary structure elements have been formed, so that global rearrangement is easier. I conclude by presenting facts about chaperone-protein complexes and propose further directions for the study of protein folding.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Undoubtedly, statistics has become one of the most important subjects in the modern world, where its applications are ubiquitous. The importance of statistics is not limited to statisticians, but also impacts upon non-statisticians who have to use statistics within their own disciplines. Several studies have indicated that most of the academic departments around the world have realized the importance of statistics to non-specialist students. Therefore, the number of students enrolled in statistics courses has vastly increased, coming from a variety of disciplines. Consequently, research within the scope of statistics education has been able to develop throughout the last few years. One important issue is how statistics is best taught to, and learned by, non-specialist students. This issue is controlled by several factors that affect the learning and teaching of statistics to non-specialist students, such as the use of technology, the role of the English language (especially for those whose first language is not English), the effectiveness of statistics teachers and their approach towards teaching statistics courses, students’ motivation to learn statistics and the relevance of statistics courses to the main subjects of non-specialist students. Several studies, focused on aspects of learning and teaching statistics, have been conducted in different countries around the world, particularly in Western countries. Conversely, the situation in Arab countries, especially in Saudi Arabia, is different; here, there is very little research in this scope, and what there is does not meet the needs of those countries towards the development of learning and teaching statistics to non-specialist students. This research was instituted in order to develop the field of statistics education. The purpose of this mixed methods study was to generate new insights into this subject by investigating how statistics courses are currently taught to non-specialist students in Saudi universities. Hence, this study will contribute towards filling the knowledge gap that exists in Saudi Arabia. This study used multiple data collection approaches, including questionnaire surveys from 1053 non-specialist students who had completed at least one statistics course in different colleges of the universities in Saudi Arabia. These surveys were followed up with qualitative data collected via semi-structured interviews with 16 teachers of statistics from colleges within all six universities where statistics is taught to non-specialist students in Saudi Arabia’s Eastern Region. The data from questionnaires included several types, so different techniques were used in analysis. Descriptive statistics were used to identify the demographic characteristics of the participants. The chi-square test was used to determine associations between variables. Based on the main issues that are raised from literature review, the questions (items scales) were grouped and five key groups of questions were obtained which are: 1) Effectiveness of Teachers; 2) English Language; 3) Relevance of Course; 4) Student Engagement; 5) Using Technology. Exploratory data analysis was used to explore these issues in more detail. Furthermore, with the existence of clustering in the data (students within departments within colleges, within universities), multilevel generalized linear models for dichotomous analysis have been used to clarify the effects of clustering at those levels. Factor analysis was conducted confirming the dimension reduction of variables (items scales). The data from teachers’ interviews were analysed on an individual basis. The responses were assigned to one of the eight themes that emerged from within the data: 1) the lack of students’ motivation to learn statistics; 2) students' participation; 3) students’ assessment; 4) the effective use of technology; 5) the level of previous mathematical and statistical skills of non-specialist students; 6) the English language ability of non-specialist students; 7) the need for extra time for teaching and learning statistics; and 8) the role of administrators. All the data from students and teachers indicated that the situation of learning and teaching statistics to non-specialist students in Saudi universities needs to be improved in order to meet the needs of those students. The findings of this study suggested a weakness in the use of statistical software applications in these courses. This study showed that there is lack of application of technology such as statistical software programs in these courses, which would allow non-specialist students to consolidate their knowledge. The results also indicated that English language is considered one of the main challenges in learning and teaching statistics, particularly in institutions where English is not used as the main language. Moreover, the weakness of mathematical skills of students is considered another major challenge. Additionally, the results indicated that there was a need to tailor statistics courses to the needs of non-specialist students based on their main subjects. The findings indicate that statistics teachers need to choose appropriate methods when teaching statistics courses.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Vegetables represent a main source of micro-nutrients which can improve the health status of malnourished poor in the world. Spinach (Spinacia oleracea L.) is a popular leafy vegetable in many countries which is rich with several important micro-nutrients. Thus, consuming Spinach helps to overcome micro-nutrient deficiencies. Pests and pathogens act as major yield constraints in food production. Root-knot nematodes, Meloidogyne species, constitute a large group of highly destructive plant pests. Spinach is found to be highly susceptible for these nematode attacks. Though agricultural production has largely benefited from modern technologies and innovations, some important dimensions which can minimize the yield losses have been neglected by most of the growers. Pre-plant or initial nematode density in soil is a crucial biotic factor which is directly responsible for crop losses. Hence, information on preplant nematode densities and the corresponding damage is of vital importance to develop successful control procedures to enhance crop production. In the present study, effect of seven initial densities of M. incognita, i.e., 156, 312, 625, 1250, 2,500, 5,000 and 10,000 infective juveniles (IJs)/plant (equivalent to 1000cm3 soil) on the growth and root infestation on potted spinach plants was determined in a screen house. In order to ensure a high accuracy, root infestation was ascertained by the number of galls formed, the percentage galled-length of feeder roots and galled-feeder roots, and egg production, per plant. Fifty days post-inoculation, shoot length and weight, and root length were suppressed at the lowest IJs density. However, the pathogenic effect was pronounced at the highest density at which 43%, 46% and 45% reduction in shoot length and weight, and root length, respectively, was recorded. The highest reduction in root weight (26%) was detected at the second highest density. The Number of galls and percentage galled-length of feeder roots/per plant showed significant progressive increase across the increasing IJs density with the highest mean value of 432.3 and 54%, respectively. The two shoot growth parameters and root length showed significant inverse relationship with the increasing gall formation. Moreover, the shoot and root length were shown to be mutually dependent on each other. Suppression of shoot growth of spinach greatly affects the grower’s economy. Hence, control measures are essentially needed to ensure a better production of spinach via reducing the pre-plant density below the level of 0.156 IJs/cm3.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This report explores how recurrent neural networks can be exploited for learning high-dimensional mappings. Since recurrent networks are as powerful as Turing machines, an interesting question is how recurrent networks can be used to simplify the problem of learning from examples. The main problem with learning high-dimensional functions is the curse of dimensionality which roughly states that the number of examples needed to learn a function increases exponentially with input dimension. This thesis proposes a way of avoiding this problem by using a recurrent network to decompose a high-dimensional function into many lower dimensional functions connected in a feedback loop.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Includes bibliography

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objective: Chronic rhinitis and adenoid hypertrophy are the main causes of nasal obstruction in children and proper treatment of these factors seem essential for controlling nasal obstructive symptoms. This study aims to evaluate the effects of topical mometasone treatment on symptoms and size of adenoid tissue in children with complaints of nasal obstruction and to compare this approach to continuous nasal saline douching plus environmental control alone. Methods: Fifty-one children with nasal obstructive complaints were submitted to a semi-structured clinical questionnaire on nasal symptoms, prick test and nasoendoscopy. Nasoendoscopic images were digitalized, and both adenoid and nasopharyngeal areas were measured in pixels. The relation adenoid/nasopharyngeal area was calculated. Patients were subsequently re-evaluated in two different periods: following 40 days of treatment with nasal douching and environmental prophylaxis alone; and after an subsequent 40 day-period, when topical mometasone furoate (total dose: 100 mu g/day) was superposed. Results: Nasal symptoms and snoring significantly improved after nasal douching, and an additional gain was observed when mometasone furoate was included to treatment. Saline douching did not influence the adenoid area, whereas a significant reduction on adenoid tonsil was observed after 40 days of mometasone treatment (P < 0.0001). Conclusion: Nasal saline douching significantly improved nasal symptoms without interfering in adenoid dimension. In contrast, mometasone furoate significantly reduced adenoid tissue, and led to a supplementary improvement of nasal symptoms. (C) 2012 Elsevier Ireland Ltd. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

OBJECTIVE: To assess the effects of rapid maxillary expansion on facial morphology and on nasal cavity dimensions of mouth breathing children by acoustic rhinometry and computed rhinomanometry. METHODS: Cohort; 29 mouth breathing children with posterior crossbite were evaluated. Orthodontic and otorhinolaryngologic documentation were performed at three different times, i.e., before expansion, immediately after and 90 days following expansion. RESULTS: The expansion was accompanied by an increase of the maxillary and nasal bone transversal width. However, there were no significant differences in relation to mucosal area of the nose. Acoustic rhinometry showed no difference in the minimal cross-sectional area at the level of the valve and inferior turbinate between the periods analyzed, although rhinomanometry showed a statistically significant reduction in nasal resistance right after expansion, but were similar to pre-treatment values 90 days after expansion. CONCLUSION: The maxillary expansion increased the maxilla and nasal bony area, but was inefficient to increase the nasal mucosal area, and may lessen the nasal resistance, although there was no difference in nasal geometry. Significance: Nasal bony expansion is followed by a mucosal compensation.