14 resultados para Multiple-scale processing
em Digital Commons at Florida International University
Resumo:
Background: Biologists often need to assess whether unfamiliar datasets warrant the time investment required for more detailed exploration. Basing such assessments on brief descriptions provided by data publishers is unwieldy for large datasets that contain insights dependent on specific scientific questions. Alternatively, using complex software systems for a preliminary analysis may be deemed as too time consuming in itself, especially for unfamiliar data types and formats. This may lead to wasted analysis time and discarding of potentially useful data. Results: We present an exploration of design opportunities that the Google Maps interface offers to biomedical data visualization. In particular, we focus on synergies between visualization techniques and Google Maps that facilitate the development of biological visualizations which have both low-overhead and sufficient expressivity to support the exploration of data at multiple scales. The methods we explore rely on displaying pre-rendered visualizations of biological data in browsers, with sparse yet powerful interactions, by using the Google Maps API. We structure our discussion around five visualizations: a gene co-regulation visualization, a heatmap viewer, a genome browser, a protein interaction network, and a planar visualization of white matter in the brain. Feedback from collaborative work with domain experts suggests that our Google Maps visualizations offer multiple, scale-dependent perspectives and can be particularly helpful for unfamiliar datasets due to their accessibility. We also find that users, particularly those less experienced with computer use, are attracted by the familiarity of the Google Maps API. Our five implementations introduce design elements that can benefit visualization developers. Conclusions: We describe a low-overhead approach that lets biologists access readily analyzed views of unfamiliar scientific datasets. We rely on pre-computed visualizations prepared by data experts, accompanied by sparse and intuitive interactions, and distributed via the familiar Google Maps framework. Our contributions are an evaluation demonstrating the validity and opportunities of this approach, a set of design guidelines benefiting those wanting to create such visualizations, and five concrete example visualizations.
Resumo:
As massive data sets become increasingly available, people are facing the problem of how to effectively process and understand these data. Traditional sequential computing models are giving way to parallel and distributed computing models, such as MapReduce, both due to the large size of the data sets and their high dimensionality. This dissertation, as in the same direction of other researches that are based on MapReduce, tries to develop effective techniques and applications using MapReduce that can help people solve large-scale problems. Three different problems are tackled in the dissertation. The first one deals with processing terabytes of raster data in a spatial data management system. Aerial imagery files are broken into tiles to enable data parallel computation. The second and third problems deal with dimension reduction techniques that can be used to handle data sets of high dimensionality. Three variants of the nonnegative matrix factorization technique are scaled up to factorize matrices of dimensions in the order of millions in MapReduce based on different matrix multiplication implementations. Two algorithms, which compute CANDECOMP/PARAFAC and Tucker tensor decompositions respectively, are parallelized in MapReduce based on carefully partitioning the data and arranging the computation to maximize data locality and parallelism.
Resumo:
The purpose of this study was to investigate the effects of direct instruction in story grammar on the reading and writing achievement of second graders. Three aspects of story grammar (character, setting, and plot) were taught with direct instruction using the concept development technique of deep processing. Deep processing which included (a) visualization (the drawing of pictures), (b) verbalization (the writing of sentences), (c) the attachment of physical sensations, and (d) the attachment of emotions to concepts was used to help students make mental connections necessary for recall and application of character, setting, and plot when constructing meaning in reading and writing.^ Four existing classrooms consisting of seventy-seven second-grade students were randomly assigned to two treatments, experimental and comparison. Both groups were pretested and posttested for reading achievement using the Gates-MacGinitie Reading Tests. Pretest and posttest writing samples were collected and evaluated. Writing achievement was measured using (a) a primary trait scoring scale (an adapted version of the Glazer Narrative Composition Scale) and (b) an holistic scoring scale by R. J. Pritchard. ANCOVAs were performed on the posttests adjusted for the pretests to determine whether or not the methods differed. There was no significant improvement in reading after the eleven-day experimental period for either group; nor did the two groups differ. There was significant improvement in writing for the experimental group over the comparison group. Pretreatment and posttreatment interviews were selectively collected to evaluate qualitatively if the students were able to identify and manipulate elements of story grammar and to determine patterns in metacognitive processing. Interviews provided evidence that most students in the experimental group gained while most students in the comparison group did not gain in their ability to manipulate, with understanding, the concepts of character, setting, and plot. ^
Resumo:
The primary aim of this dissertation is to develop data mining tools for knowledge discovery in biomedical data when multiple (homogeneous or heterogeneous) sources of data are available. The central hypothesis is that, when information from multiple sources of data are used appropriately and effectively, knowledge discovery can be better achieved than what is possible from only a single source. ^ Recent advances in high-throughput technology have enabled biomedical researchers to generate large volumes of diverse types of data on a genome-wide scale. These data include DNA sequences, gene expression measurements, and much more; they provide the motivation for building analysis tools to elucidate the modular organization of the cell. The challenges include efficiently and accurately extracting information from the multiple data sources; representing the information effectively, developing analytical tools, and interpreting the results in the context of the domain. ^ The first part considers the application of feature-level integration to design classifiers that discriminate between soil types. The machine learning tools, SVM and KNN, were used to successfully distinguish between several soil samples. ^ The second part considers clustering using multiple heterogeneous data sources. The resulting Multi-Source Clustering (MSC) algorithm was shown to have a better performance than clustering methods that use only a single data source or a simple feature-level integration of heterogeneous data sources. ^ The third part proposes a new approach to effectively incorporate incomplete data into clustering analysis. Adapted from K-means algorithm, the Generalized Constrained Clustering (GCC) algorithm makes use of incomplete data in the form of constraints to perform exploratory analysis. Novel approaches for extracting constraints were proposed. For sufficiently large constraint sets, the GCC algorithm outperformed the MSC algorithm. ^ The last part considers the problem of providing a theme-specific environment for mining multi-source biomedical data. The database called PlasmoTFBM, focusing on gene regulation of Plasmodium falciparum, contains diverse information and has a simple interface to allow biologists to explore the data. It provided a framework for comparing different analytical tools for predicting regulatory elements and for designing useful data mining tools. ^ The conclusion is that the experiments reported in this dissertation strongly support the central hypothesis.^
Resumo:
The Deccan Trap basalts are the remnants of a massive series of lava flows that erupted at the K/T boundary and covered 1-2 million km2 of west-central India. This eruptive event is of global interest because of its possible link to the major mass extinction event, and there is much debate about the duration of this massive volcanic event. In contrast to isotopic or paleomagnetic dating methods, I explore an alternative approach to determine the lifecycle of the magma chambers that supplied the lavas, and extend the concept to obtain a tighter constraint on Deccan’s duration. My method relies on extracting time information from elemental and isotopic diffusion across zone boundaries in individual crystals. I determined elemental and Sr-isotopic variations across abnormally large (2-5 cm) plagioclase crystals from the Thalghat and Kashele “Giant Plagioclase Basalts” from the lowermost Jawhar and Igatpuri Formations respectively in the thickest Western Ghats section near Mumbai. I also obtained bulk rock major, trace and rare earth element chemistry of each lava flow from the two formations. Thalghat flows contain only 12% zoned crystals, with 87 Sr/86Sr ratios of 0.7096 in the core and 0.7106 in the rim, separated by a sharp boundary. In contrast, all Kashele crystals have a wider range of 87Sr/86Sr values, with multiple zones. Geochemical modeling of the data suggests that the two types of crystals grew in distinct magmatic environments. Modeling intracrystalline diffusive equilibration between the core and rim of Thalghat crystals led me to obtain a crystal growth rate of 2.03x10-10 cm/s and a residence time of 780 years for the crystals in the magma chamber(s). Employing some assumptions based on field and geochronologic evidence, I extrapolated this residence time to the entire Western Ghats and obtained an estimate of 25,000–35,000 years for the duration of Western Ghats volcanism. This gave an eruptive rate of 30–40 km3/yr, which is much higher than any presently erupting volcano. This result will remain speculative until a similarly detailed analytical-modeling study is performed for the rest of the Western Ghats formations.
Resumo:
Carbon nanotubes (CNT) could serve as potential reinforcement for metal matrix composites for improved mechanical properties. However dispersion of carbon nanotubes (CNT) in the matrix has been a longstanding problem, since they tend to form clusters to minimize their surface area. The aim of this study was to use plasma and cold spraying techniques to synthesize CNT reinforced aluminum composite with improved dispersion and to quantify the degree of CNT dispersion as it influences the mechanical properties. Novel method of spray drying was used to disperse CNTs in Al-12 wt.% Si prealloyed powder, which was used as feedstock for plasma and cold spraying. A new method for quantification of CNT distribution was developed. Two parameters for CNT dispersion quantification, namely Dispersion parameter (DP) and Clustering Parameter (CP) have been proposed based on the image analysis and distance between the centers of CNTs. Nanomechanical properties were correlated with the dispersion of CNTs in the microstructure. Coating microstructure evolution has been discussed in terms of splat formation, deformation and damage of CNTs and CNT/matrix interface. Effect of Si and CNT content on the reaction at CNT/matrix interface was thermodynamically and kinetically studied. A pseudo phase diagram was computed which predicts the interfacial carbide for reaction between CNT and Al-Si alloy at processing temperature. Kinetic aspects showed that Al4C3 forms with Al-12 wt.% Si alloy while SiC forms with Al-23wt.% Si alloy. Mechanical properties at nano, micro and macro-scale were evaluated using nanoindentation and nanoscratch, microindentation and bulk tensile testing respectively. Nano and micro-scale mechanical properties (elastic modulus, hardness and yield strength) displayed improvement whereas macro-scale mechanical properties were poor. The inversion of the mechanical properties at different scale length was attributed to the porosity, CNT clustering, CNT-splat adhesion and Al 4C3 formation at the CNT/matrix interface. The Dispersion parameter (DP) was more sensitive than Clustering parameter (CP) in measuring degree of CNT distribution in the matrix.
Resumo:
Parallel processing is prevalent in many manufacturing and service systems. Many manufactured products are built and assembled from several components fabricated in parallel lines. An example of this manufacturing system configuration is observed at a manufacturing facility equipped to assemble and test web servers. Characteristics of a typical web server assembly line are: multiple products, job circulation, and paralleling processing. The primary objective of this research was to develop analytical approximations to predict performance measures of manufacturing systems with job failures and parallel processing. The analytical formulations extend previous queueing models used in assembly manufacturing systems in that they can handle serial and different configurations of paralleling processing with multiple product classes, and job circulation due to random part failures. In addition, appropriate correction terms via regression analysis were added to the approximations in order to minimize the gap in the error between the analytical approximation and the simulation models. Markovian and general type manufacturing systems, with multiple product classes, job circulation due to failures, and fork and join systems to model parallel processing were studied. In the Markovian and general case, the approximations without correction terms performed quite well for one and two product problem instances. However, it was observed that the flow time error increased as the number of products and net traffic intensity increased. Therefore, correction terms for single and fork-join stations were developed via regression analysis to deal with more than two products. The numerical comparisons showed that the approximations perform remarkably well when the corrections factors were used in the approximations. In general, the average flow time error was reduced from 38.19% to 5.59% in the Markovian case, and from 26.39% to 7.23% in the general case. All the equations stated in the analytical formulations were implemented as a set of Matlab scripts. By using this set, operations managers of web server assembly lines, manufacturing or other service systems with similar characteristics can estimate different system performance measures, and make judicious decisions - especially setting delivery due dates, capacity planning, and bottleneck mitigation, among others.
Resumo:
Standard economic theory suggests that capital should flow from rich countries to poor countries. However, capital has predominantly flowed to rich countries. The three essays in this dissertation attempt to explain this phenomenon. The first two essays suggest theoretical explanations for why capital has not flowed to the poor countries. The third essay empirically tests the theoretical explanations.^ The first essay examines the effects of increasing returns to scale on international lending and borrowing with moral hazard. Introducing increasing returns in a two-country general equilibrium model yields possible multiple equilibria and helps explain the possibility of capital flows from a poor to a rich country. I find that a borrowing country may need to borrow sufficient amounts internationally to reach a minimum investment threshold in order to invest domestically.^ The second essay examines how a poor country may invest in sectors with low productivity because of sovereign risk, and how collateral differences across sectors may exacerbate the problem. I model sovereign borrowing with a two-sector economy: one sector with increasing returns to scale (IRS) and one sector with diminishing returns to scale (DRS). Countries with incomes below a threshold will only invest in the DRS sector, and countries with incomes above a threshold will invest mostly in the IRS sector. The results help explain the existence of a bimodal world income distribution.^ The third essay empirically tests the explanations for why capital has not flowed from the rich to the poor countries, with a focus on institutions and initial capital. I find that institutional variables are a very important factor, but in contrast to other studies, I show that institutions do not account for the Lucas Paradox. Evidence of increasing returns still exists, even when controlling for institutions and other variables. In addition, I find that the determinants of capital flows may depend on whether a country is rich or poor.^
Resumo:
The freshwater Everglades is a complex system containing thousands of tree islands embedded within a marsh-grassland matrix. The tree island-marsh mosaic is shaped and maintained by hydrologic, edaphic and biological mechanisms that interact across multiple scales. Preserving tree islands requires a more integrated understanding of how scale-dependent phenomena interact in the larger freshwater system. The hierarchical patch dynamics paradigm provides a conceptual framework for exploring multi-scale interactions within complex systems. We used a three-tiered approach to examine the spatial variability and patterning of nutrients in relation to site parameters within and between two hydrologically defined Everglades landscapes: the freshwater Marl Prairie and the Ridge and Slough. Results were scale-dependent and complexly interrelated. Total carbon and nitrogen patterning were correlated with organic matter accumulation, driven by hydrologic conditions at the system scale. Total and bioavailable phosphorus were most strongly related to woody plant patterning within landscapes, and were found to be 3 to 11 times more concentrated in tree island soils compared to surrounding marshes. Below canopy resource islands in the slough were elongated in a downstream direction, indicating soil resource directional drift. Combined multi-scale results suggest that hydrology plays a significant role in landscape patterning and also the development and maintenance of tree islands. Once developed, tree islands appear to exert influence over the spatial distribution of nutrients, which can reciprocally affect other ecological processes.
Resumo:
Understanding habitat selection and movement remains a key question in behavioral ecology. Yet, obtaining a sufficiently high spatiotemporal resolution of the movement paths of organisms remains a major challenge, despite recent technological advances. Observing fine-scale movement and habitat choice decisions in the field can prove to be difficult and expensive, particularly in expansive habitats such as wetlands. We describe the application of passive integrated transponder (PIT) systems to field enclosures for tracking detailed fish behaviors in an experimental setting. PIT systems have been applied to habitats with clear passageways, at fixed locations or in controlled laboratory and mesocosm settings, but their use in unconfined habitats and field-based experimental setups remains limited. In an Everglades enclosure, we continuously tracked the movement and habitat use of PIT-tagged centrarchids across three habitats of varying depth and complexity using multiple flatbed antennas for 14 days. Fish used all three habitats, with marked species-specific diel movement patterns across habitats, and short-lived movements that would be likely missed by other tracking techniques. Findings suggest that the application of PIT systems to field enclosures can be an insightful approach for gaining continuous, undisturbed and detailed movement data in unconfined habitats, and for experimentally manipulating both internal and external drivers of these behaviors.
Resumo:
Carbon nanotubes (CNT) could serve as potential reinforcement for metal matrix composites for improved mechanical properties. However dispersion of carbon nanotubes (CNT) in the matrix has been a longstanding problem, since they tend to form clusters to minimize their surface area. The aim of this study was to use plasma and cold spraying techniques to synthesize CNT reinforced aluminum composite with improved dispersion and to quantify the degree of CNT dispersion as it influences the mechanical properties. Novel method of spray drying was used to disperse CNTs in Al-12 wt.% Si pre-alloyed powder, which was used as feedstock for plasma and cold spraying. A new method for quantification of CNT distribution was developed. Two parameters for CNT dispersion quantification, namely Dispersion parameter (DP) and Clustering Parameter (CP) have been proposed based on the image analysis and distance between the centers of CNTs. Nanomechanical properties were correlated with the dispersion of CNTs in the microstructure. Coating microstructure evolution has been discussed in terms of splat formation, deformation and damage of CNTs and CNT/matrix interface. Effect of Si and CNT content on the reaction at CNT/matrix interface was thermodynamically and kinetically studied. A pseudo phase diagram was computed which predicts the interfacial carbide for reaction between CNT and Al-Si alloy at processing temperature. Kinetic aspects showed that Al4C3 forms with Al-12 wt.% Si alloy while SiC forms with Al-23wt.% Si alloy. Mechanical properties at nano, micro and macro-scale were evaluated using nanoindentation and nanoscratch, microindentation and bulk tensile testing respectively. Nano and micro-scale mechanical properties (elastic modulus, hardness and yield strength) displayed improvement whereas macro-scale mechanical properties were poor. The inversion of the mechanical properties at different scale length was attributed to the porosity, CNT clustering, CNT-splat adhesion and Al4C3 formation at the CNT/matrix interface. The Dispersion parameter (DP) was more sensitive than Clustering parameter (CP) in measuring degree of CNT distribution in the matrix.
Resumo:
Parallel processing is prevalent in many manufacturing and service systems. Many manufactured products are built and assembled from several components fabricated in parallel lines. An example of this manufacturing system configuration is observed at a manufacturing facility equipped to assemble and test web servers. Characteristics of a typical web server assembly line are: multiple products, job circulation, and paralleling processing. The primary objective of this research was to develop analytical approximations to predict performance measures of manufacturing systems with job failures and parallel processing. The analytical formulations extend previous queueing models used in assembly manufacturing systems in that they can handle serial and different configurations of paralleling processing with multiple product classes, and job circulation due to random part failures. In addition, appropriate correction terms via regression analysis were added to the approximations in order to minimize the gap in the error between the analytical approximation and the simulation models. Markovian and general type manufacturing systems, with multiple product classes, job circulation due to failures, and fork and join systems to model parallel processing were studied. In the Markovian and general case, the approximations without correction terms performed quite well for one and two product problem instances. However, it was observed that the flow time error increased as the number of products and net traffic intensity increased. Therefore, correction terms for single and fork-join stations were developed via regression analysis to deal with more than two products. The numerical comparisons showed that the approximations perform remarkably well when the corrections factors were used in the approximations. In general, the average flow time error was reduced from 38.19% to 5.59% in the Markovian case, and from 26.39% to 7.23% in the general case. All the equations stated in the analytical formulations were implemented as a set of Matlab scripts. By using this set, operations managers of web server assembly lines, manufacturing or other service systems with similar characteristics can estimate different system performance measures, and make judicious decisions - especially setting delivery due dates, capacity planning, and bottleneck mitigation, among others.
Resumo:
Simarouba glauca, a non-edible oilseed crop native to South Florida, is gaining popularity as a feedstock for the production of biodiesel. The University of Agriculture Sciences in Bangalore, India has developed a biodiesel production model based on the principles of decentralization, small scales, and multiple fuel sources. Success of such a program depends on conversion efficiencies at multiple stages. The conversion efficiency of the field-level, decentralized production model was compared with the in-laboratory conversion efficiency benchmark. The study indicated that the field-level model conversion efficiency was less than that of the lab-scale set up. The fuel qualities and characteristics of the Simarouba glauca biodiesel were tested and found to be the standards required for fuel designation. However, this research suggests that for Simarouba glauca to be widely accepted as a biodiesel feedstock further investigation is still required.
Resumo:
The Deccan Trap basalts are the remnants of a massive series of lava flows that erupted at the K/T boundary and covered 1-2 million km2 of west-central India. This eruptive event is of global interest because of its possible link to the major mass extinction event, and there is much debate about the duration of this massive volcanic event. In contrast to isotopic or paleomagnetic dating methods, I explore an alternative approach to determine the lifecycle of the magma chambers that supplied the lavas, and extend the concept to obtain a tighter constraint on Deccan’s duration. My method relies on extracting time information from elemental and isotopic diffusion across zone boundary in an individual crystal. I determined elemental and Sr-isotopic variations across abnormally large (2-5 cm) plagioclase crystals from the Thalghat and Kashele “Giant Plagioclase Basalts” from the lowermost Jawhar and Igatpuri Formations respectively in the thickest Western Ghats section near Mumbai. I also obtained bulk rock major, trace and rare earth element chemistry of each lava flow from the two formations. Thalghat flows contain only 12% zoned crystals, with 87Sr/86Sr ratios of 0.7096 in the core and 0.7106 in the rim, separated by a sharp boundary. In contrast, all Kashele crystals have a wider range of 87Sr/86Sr values, with multiple zones. Geochemical modeling of the data suggests that the two types of crystals grew in distinct magmatic environments. Modeling intracrystalline diffusive equilibration between the core and rim of Thalghat crystals led me to obtain a crystal growth rate of 2.03x10-10 cm/s and a residence time of 780 years for the crystals in the magma chamber(s). Employing some assumptions based on field and geochronologic evidence, I extrapolated this residence time to the entire Western Ghats and obtained an estimate of 25,000 – 35,000 years for the duration of Western Ghats volcanism. This gave an eruptive rate of 30 – 40 km3/yr, which is much higher than any presently erupting volcano. This result will remain speculative until a similarly detailed analytical-modeling study is performed for the rest of the Western Ghats formations.