959 resultados para Scale-up
Resumo:
A new generation of surface plasmonic optical fibre sensors is fabricated using multiple coatings deposited on a lapped section of a single mode fibre. Post-deposition UV laser irradiation using a phase mask produces a nano-scaled surface relief grating structure, resembling nano-wires. The overall length of the individual corrugations is approximately 14 μm with an average full width half maximum of 100 nm. Evidence is presented to show that these surface structures result from material compaction created by the silicon dioxide and germanium layers in the multi-layered coating and the surface topology is capable of supporting localised surface plasmons. The coating compaction induces a strain gradient into the D-shaped optical fibre that generates an asymmetric periodic refractive index profile which enhances the coupling of the light from the core of the fibre to plasmons on the surface of the coating. Experimental data are presented that show changes in spectral characteristics after UV processing and that the performance of the sensors increases from that of their pre-UV irradiation state. The enhanced performance is illustrated with regards to change in external refractive index and demonstrates high spectral sensitivities in gaseous and aqueous index regimes ranging up to 4000 nm/RIU for wavelength and 800 dB/RIU for intensity. The devices generate surface plasmons over a very large wavelength range, (visible to 2 μm) depending on the polarization state of the illuminating light. © 2013 SPIE.
Resumo:
Novel surface plasmonic optical fiber sensors have been fabricated using multiple coatings deposited on a lapped section of a single mode fiber. UV laser irradiation processing with a phase mask produces a nano-scaled surface relief grating structure resembling nano-wires. The resulting individual corrugations produced by material compaction are approximately 20 μm long with an average width at half maximum of 100 nm and generate localized surface plasmons. Experimental data are presented that show changes in the spectral characteristics after UV processing, coupled with an overall increase in the sensitivity of the devices to surrounding refractive index. Evidence is presented that there is an optimum UV dosage (48 joules) over which no significant additional optical change is observed. The devices are characterized with regards to change in refractive index, where significantly high spectral sensitivities in the aqueous index regime are found, ranging up to 4000 nm/RIU for wavelength and 800 dB/RIU for intensity. © 2013 Optical Society of America.
Resumo:
GraphChi is the first reported disk-based graph engine that can handle billion-scale graphs on a single PC efficiently. GraphChi is able to execute several advanced data mining, graph mining and machine learning algorithms on very large graphs. With the novel technique of parallel sliding windows (PSW) to load subgraph from disk to memory for vertices and edges updating, it can achieve data processing performance close to and even better than those of mainstream distributed graph engines. GraphChi mentioned that its memory is not effectively utilized with large dataset, which leads to suboptimal computation performances. In this paper we are motivated by the concepts of 'pin ' from TurboGraph and 'ghost' from GraphLab to propose a new memory utilization mode for GraphChi, which is called Part-in-memory mode, to improve the GraphChi algorithm performance. The main idea is to pin a fixed part of data inside the memory during the whole computing process. Part-in-memory mode is successfully implemented with only about 40 additional lines of code to the original GraphChi engine. Extensive experiments are performed with large real datasets (including Twitter graph with 1.4 billion edges). The preliminary results show that Part-in-memory mode memory management approach effectively reduces the GraphChi running time by up to 60% in PageRank algorithm. Interestingly it is found that a larger portion of data pinned in memory does not always lead to better performance in the case that the whole dataset cannot be fitted in memory. There exists an optimal portion of data which should be kept in the memory to achieve the best computational performance.
Resumo:
In this paper, we investigate the design of few-mode fibers (FMFs) guiding 4 to 12 non-degenerate linearly polarized (LP) modes with low differential mode delay (DMD) over the C-band, suitable for long-haul transmission. The refractive index profile considered is composed by a graded-core with a cladding trench (GCCT). The optimization of the profile parameters aims the lowest possible DMD and macro-bend losses (MBL) lower than the ITU-T standard recommendation. The optimization results show that the optimum DMD and the MBL scale with the number of modes. Additionally, it is shown that the refractive-index relative difference at the core center is one of the most preponderant parameters, allowing to reduce the DMD at the expense of increasing MBL. Finally, the optimum DMD obtained for 12 LP modes is lower than 3 ps/km. © 2014 IEEE.
Resumo:
When machining a large-scale aerospace part, the part is normally located and clamped firmly until a set of features are machined. When the part is released, its size and shape may deform beyond the tolerance limits due to stress release. This paper presents the design of a new fixing method and flexible fixtures that would automatically respond to workpiece deformation during machining. Deformation is inspected and monitored on-line, and part location and orientation can be adjusted timely to ensure follow-up operations are carried out under low stress and with respect to the related datum defined in the design models.
Resumo:
Large-scale mechanical products, such as aircraft and rockets, consist of large numbers of small components, which introduce additional difficulty for assembly accuracy and error estimation. Planar surfaces as key product characteristics are usually utilised for positioning small components in the assembly process. This paper focuses on assembly accuracy analysis of small components with planar surfaces in large-scale volume products. To evaluate the accuracy of the assembly system, an error propagation model for measurement error and fixture error is proposed, based on the assumption that all errors are normally distributed. In this model, the general coordinate vector is adopted to represent the position of the components. The error transmission functions are simplified into a linear model, and the coordinates of the reference points are composed by theoretical value and random error. The installation of a Head-Up Display is taken as an example to analyse the assembly error of small components based on the propagation model. The result shows that the final coordination accuracy is mainly determined by measurement error of the planar surface in small components. To reduce the uncertainty of the plane measurement, an evaluation index of measurement strategy is presented. This index reflects the distribution of the sampling point set and can be calculated by an inertia moment matrix. Finally, a practical application is introduced for validating the evaluation index.
Resumo:
Climate warming is predicted to cause an increase in the growing season by as much as 30% for regions of the arctic tundra. This will have a significant effect on the physiological activity of the vascular plant species and the ecosystem as a whole. The need to understand the possible physiological change within this ecosystem is confounded by the fact that research in this extreme environment has been limited to periods when conditions are most favorable, mid June–mid August. This study attempted to develop the most comprehensive understanding to date of the physiological activity of seven tundra plant species in the Alaskan Arctic under natural and lengthened growing season conditions. Four interrelated lines of research, scaling from cellular signals to ecosystem processes, set the foundation for this study. ^ I established an experiment looking at the physiological response of arctic sedges to soil temperature stress with emphasis on the role of the hormone abscisic acid (ABA). A manipulation was also developed where the growing season was lengthened and soils were warmed in an attempt to determine the maximum physiological capacity of these seven vascular species. Additionally, the physiological capacities of four evergreens were tested in the subnivean environment along with the potential role anthocyanins play in their activity. The measurements were scaled up to determine the physiological role of these evergreens in maintaining ecosystem carbon fluxes. ^ These studies determined that soil temperature differentials significantly affect vascular plant physiology. ABA appears to be a physiological modifier that limits stomatal processes when root temperatures are low. Photosynthetic capacity was limited by internal plant physiological mechanisms in the face of a lengthened growing season. Therefore shifts in ecosystem carbon dynamics are driven by changes in species composition and biomass production on a per/unit area basis. These studies also found that changes in soil temperatures will have a greater effect of physiological processes than would the same magnitude of change in air temperature. The subnivean environment exhibits conditions that are favorable for photosynthetic activity in evergreen species. These measurements when scaled to the ecosystem have a significant role in limiting the system's carbon source capacity. ^
Resumo:
As massive data sets become increasingly available, people are facing the problem of how to effectively process and understand these data. Traditional sequential computing models are giving way to parallel and distributed computing models, such as MapReduce, both due to the large size of the data sets and their high dimensionality. This dissertation, as in the same direction of other researches that are based on MapReduce, tries to develop effective techniques and applications using MapReduce that can help people solve large-scale problems. Three different problems are tackled in the dissertation. The first one deals with processing terabytes of raster data in a spatial data management system. Aerial imagery files are broken into tiles to enable data parallel computation. The second and third problems deal with dimension reduction techniques that can be used to handle data sets of high dimensionality. Three variants of the nonnegative matrix factorization technique are scaled up to factorize matrices of dimensions in the order of millions in MapReduce based on different matrix multiplication implementations. Two algorithms, which compute CANDECOMP/PARAFAC and Tucker tensor decompositions respectively, are parallelized in MapReduce based on carefully partitioning the data and arranging the computation to maximize data locality and parallelism.
Resumo:
It may soon be the norm for many airline passengers arriving at the check-in desk of any international airline with both stow- away and carry-on luggage to be asked to step onto the weighing scale as the airlines attempt to compete and remain operationally viable in what has become for most a cut-throat and highly litigious operating environment. The author's commentary seeks to highlight a number of the issues surrounding the current impasse. It is also intended to catalyze a more healthy and informed debate aimed at finding an acceptable resolution to this crisis prior to one being imposed which fails to satisfy the needs of either camp.
Resumo:
Many modern applications fall into the category of "large-scale" statistical problems, in which both the number of observations n and the number of features or parameters p may be large. Many existing methods focus on point estimation, despite the continued relevance of uncertainty quantification in the sciences, where the number of parameters to estimate often exceeds the sample size, despite huge increases in the value of n typically seen in many fields. Thus, the tendency in some areas of industry to dispense with traditional statistical analysis on the basis that "n=all" is of little relevance outside of certain narrow applications. The main result of the Big Data revolution in most fields has instead been to make computation much harder without reducing the importance of uncertainty quantification. Bayesian methods excel at uncertainty quantification, but often scale poorly relative to alternatives. This conflict between the statistical advantages of Bayesian procedures and their substantial computational disadvantages is perhaps the greatest challenge facing modern Bayesian statistics, and is the primary motivation for the work presented here.
Two general strategies for scaling Bayesian inference are considered. The first is the development of methods that lend themselves to faster computation, and the second is design and characterization of computational algorithms that scale better in n or p. In the first instance, the focus is on joint inference outside of the standard problem of multivariate continuous data that has been a major focus of previous theoretical work in this area. In the second area, we pursue strategies for improving the speed of Markov chain Monte Carlo algorithms, and characterizing their performance in large-scale settings. Throughout, the focus is on rigorous theoretical evaluation combined with empirical demonstrations of performance and concordance with the theory.
One topic we consider is modeling the joint distribution of multivariate categorical data, often summarized in a contingency table. Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. In Chapter 2, we derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.
Latent class models for the joint distribution of multivariate categorical, such as the PARAFAC decomposition, data play an important role in the analysis of population structure. In this context, the number of latent classes is interpreted as the number of genetically distinct subpopulations of an organism, an important factor in the analysis of evolutionary processes and conservation status. Existing methods focus on point estimates of the number of subpopulations, and lack robust uncertainty quantification. Moreover, whether the number of latent classes in these models is even an identified parameter is an open question. In Chapter 3, we show that when the model is properly specified, the correct number of subpopulations can be recovered almost surely. We then propose an alternative method for estimating the number of latent subpopulations that provides good quantification of uncertainty, and provide a simple procedure for verifying that the proposed method is consistent for the number of subpopulations. The performance of the model in estimating the number of subpopulations and other common population structure inference problems is assessed in simulations and a real data application.
In contingency table analysis, sparse data is frequently encountered for even modest numbers of variables, resulting in non-existence of maximum likelihood estimates. A common solution is to obtain regularized estimates of the parameters of a log-linear model. Bayesian methods provide a coherent approach to regularization, but are often computationally intensive. Conjugate priors ease computational demands, but the conjugate Diaconis--Ylvisaker priors for the parameters of log-linear models do not give rise to closed form credible regions, complicating posterior inference. In Chapter 4 we derive the optimal Gaussian approximation to the posterior for log-linear models with Diaconis--Ylvisaker priors, and provide convergence rate and finite-sample bounds for the Kullback-Leibler divergence between the exact posterior and the optimal Gaussian approximation. We demonstrate empirically in simulations and a real data application that the approximation is highly accurate, even in relatively small samples. The proposed approximation provides a computationally scalable and principled approach to regularized estimation and approximate Bayesian inference for log-linear models.
Another challenging and somewhat non-standard joint modeling problem is inference on tail dependence in stochastic processes. In applications where extreme dependence is of interest, data are almost always time-indexed. Existing methods for inference and modeling in this setting often cluster extreme events or choose window sizes with the goal of preserving temporal information. In Chapter 5, we propose an alternative paradigm for inference on tail dependence in stochastic processes with arbitrary temporal dependence structure in the extremes, based on the idea that the information on strength of tail dependence and the temporal structure in this dependence are both encoded in waiting times between exceedances of high thresholds. We construct a class of time-indexed stochastic processes with tail dependence obtained by endowing the support points in de Haan's spectral representation of max-stable processes with velocities and lifetimes. We extend Smith's model to these max-stable velocity processes and obtain the distribution of waiting times between extreme events at multiple locations. Motivated by this result, a new definition of tail dependence is proposed that is a function of the distribution of waiting times between threshold exceedances, and an inferential framework is constructed for estimating the strength of extremal dependence and quantifying uncertainty in this paradigm. The method is applied to climatological, financial, and electrophysiology data.
The remainder of this thesis focuses on posterior computation by Markov chain Monte Carlo. The Markov Chain Monte Carlo method is the dominant paradigm for posterior computation in Bayesian analysis. It has long been common to control computation time by making approximations to the Markov transition kernel. Comparatively little attention has been paid to convergence and estimation error in these approximating Markov Chains. In Chapter 6, we propose a framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified loss function and computational budget. The results require only ergodicity of the exact kernel and control of the kernel approximation accuracy. The theoretical framework is applied to approximations based on random subsets of data, low-rank approximations of Gaussian processes, and a novel approximating Markov chain for discrete mixture models.
Data augmentation Gibbs samplers are arguably the most popular class of algorithm for approximately sampling from the posterior distribution for the parameters of generalized linear models. The truncated Normal and Polya-Gamma data augmentation samplers are standard examples for probit and logit links, respectively. Motivated by an important problem in quantitative advertising, in Chapter 7 we consider the application of these algorithms to modeling rare events. We show that when the sample size is large but the observed number of successes is small, these data augmentation samplers mix very slowly, with a spectral gap that converges to zero at a rate at least proportional to the reciprocal of the square root of the sample size up to a log factor. In simulation studies, moderate sample sizes result in high autocorrelations and small effective sample sizes. Similar empirical results are observed for related data augmentation samplers for multinomial logit and probit models. When applied to a real quantitative advertising dataset, the data augmentation samplers mix very poorly. Conversely, Hamiltonian Monte Carlo and a type of independence chain Metropolis algorithm show good mixing on the same dataset.
Resumo:
Terrestrial ecosystems, occupying more than 25% of the Earth's surface, can serve as
`biological valves' in regulating the anthropogenic emissions of atmospheric aerosol
particles and greenhouse gases (GHGs) as responses to their surrounding environments.
While the signicance of quantifying the exchange rates of GHGs and atmospheric
aerosol particles between the terrestrial biosphere and the atmosphere is
hardly questioned in many scientic elds, the progress in improving model predictability,
data interpretation or the combination of the two remains impeded by
the lack of precise framework elucidating their dynamic transport processes over a
wide range of spatiotemporal scales. The diculty in developing prognostic modeling
tools to quantify the source or sink strength of these atmospheric substances
can be further magnied by the fact that the climate system is also sensitive to the
feedback from terrestrial ecosystems forming the so-called `feedback cycle'. Hence,
the emergent need is to reduce uncertainties when assessing this complex and dynamic
feedback cycle that is necessary to support the decisions of mitigation and
adaptation policies associated with human activities (e.g., anthropogenic emission
controls and land use managements) under current and future climate regimes.
With the goal to improve the predictions for the biosphere-atmosphere exchange
of biologically active gases and atmospheric aerosol particles, the main focus of this
dissertation is on revising and up-scaling the biotic and abiotic transport processes
from leaf to canopy scales. The validity of previous modeling studies in determining
iv
the exchange rate of gases and particles is evaluated with detailed descriptions of their
limitations. Mechanistic-based modeling approaches along with empirical studies
across dierent scales are employed to rene the mathematical descriptions of surface
conductance responsible for gas and particle exchanges as commonly adopted by all
operational models. Specically, how variation in horizontal leaf area density within
the vegetated medium, leaf size and leaf microroughness impact the aerodynamic attributes
and thereby the ultrane particle collection eciency at the leaf/branch scale
is explored using wind tunnel experiments with interpretations by a porous media
model and a scaling analysis. A multi-layered and size-resolved second-order closure
model combined with particle
uxes and concentration measurements within and
above a forest is used to explore the particle transport processes within the canopy
sub-layer and the partitioning of particle deposition onto canopy medium and forest
oor. For gases, a modeling framework accounting for the leaf-level boundary layer
eects on the stomatal pathway for gas exchange is proposed and combined with sap
ux measurements in a wind tunnel to assess how leaf-level transpiration varies with
increasing wind speed. How exogenous environmental conditions and endogenous
soil-root-stem-leaf hydraulic and eco-physiological properties impact the above- and
below-ground water dynamics in the soil-plant system and shape plant responses
to droughts is assessed by a porous media model that accommodates the transient
water
ow within the plant vascular system and is coupled with the aforementioned
leaf-level gas exchange model and soil-root interaction model. It should be noted
that tackling all aspects of potential issues causing uncertainties in forecasting the
feedback cycle between terrestrial ecosystem and the climate is unrealistic in a single
dissertation but further research questions and opportunities based on the foundation
derived from this dissertation are also brie
y discussed.
Resumo:
People go through their life making all kinds of decisions, and some of these decisions affect their demand for transportation, for example, their choices of where to live and where to work, how and when to travel and which route to take. Transport related choices are typically time dependent and characterized by large number of alternatives that can be spatially correlated. This thesis deals with models that can be used to analyze and predict discrete choices in large-scale networks. The proposed models and methods are highly relevant for, but not limited to, transport applications. We model decisions as sequences of choices within the dynamic discrete choice framework, also known as parametric Markov decision processes. Such models are known to be difficult to estimate and to apply to make predictions because dynamic programming problems need to be solved in order to compute choice probabilities. In this thesis we show that it is possible to explore the network structure and the flexibility of dynamic programming so that the dynamic discrete choice modeling approach is not only useful to model time dependent choices, but also makes it easier to model large-scale static choices. The thesis consists of seven articles containing a number of models and methods for estimating, applying and testing large-scale discrete choice models. In the following we group the contributions under three themes: route choice modeling, large-scale multivariate extreme value (MEV) model estimation and nonlinear optimization algorithms. Five articles are related to route choice modeling. We propose different dynamic discrete choice models that allow paths to be correlated based on the MEV and mixed logit models. The resulting route choice models become expensive to estimate and we deal with this challenge by proposing innovative methods that allow to reduce the estimation cost. For example, we propose a decomposition method that not only opens up for possibility of mixing, but also speeds up the estimation for simple logit models, which has implications also for traffic simulation. Moreover, we compare the utility maximization and regret minimization decision rules, and we propose a misspecification test for logit-based route choice models. The second theme is related to the estimation of static discrete choice models with large choice sets. We establish that a class of MEV models can be reformulated as dynamic discrete choice models on the networks of correlation structures. These dynamic models can then be estimated quickly using dynamic programming techniques and an efficient nonlinear optimization algorithm. Finally, the third theme focuses on structured quasi-Newton techniques for estimating discrete choice models by maximum likelihood. We examine and adapt switching methods that can be easily integrated into usual optimization algorithms (line search and trust region) to accelerate the estimation process. The proposed dynamic discrete choice models and estimation methods can be used in various discrete choice applications. In the area of big data analytics, models that can deal with large choice sets and sequential choices are important. Our research can therefore be of interest in various demand analysis applications (predictive analytics) or can be integrated with optimization models (prescriptive analytics). Furthermore, our studies indicate the potential of dynamic programming techniques in this context, even for static models, which opens up a variety of future research directions.
Resumo:
A compositional multivariate approach is used to analyse regional scale soil geochemical data obtained as part of the Tellus Project generated by the Geological Survey Northern Ireland (GSNI). The multi-element total concentration data presented comprise XRF analyses of 6862 rural soil samples collected at 20cm depths on a non-aligned grid at one site per 2 km2. Censored data were imputed using published detection limits. Using these imputed values for 46 elements (including LOI), each soil sample site was assigned to the regional geology map provided by GSNI initially using the dominant lithology for the map polygon. Northern Ireland includes a diversity of geology representing a stratigraphic record from the Mesoproterozoic, up to and including the Palaeogene. However, the advance of ice sheets and their meltwaters over the last 100,000 years has left at least 80% of the bedrock covered by superficial deposits, including glacial till and post-glacial alluvium and peat. The question is to what extent the soil geochemistry reflects the underlying geology or superficial deposits. To address this, the geochemical data were transformed using centered log ratios (clr) to observe the requirements of compositional data analysis and avoid closure issues. Following this, compositional multivariate techniques including compositional Principal Component Analysis (PCA) and minimum/maximum autocorrelation factor (MAF) analysis method were used to determine the influence of underlying geology on the soil geochemistry signature. PCA showed that 72% of the variation was determined by the first four principal components (PC’s) implying “significant” structure in the data. Analysis of variance showed that only 10 PC’s were necessary to classify the soil geochemical data. To consider an improvement over PCA that uses the spatial relationships of the data, a classification based on MAF analysis was undertaken using the first 6 dominant factors. Understanding the relationship between soil geochemistry and superficial deposits is important for environmental monitoring of fragile ecosystems such as peat. To explore whether peat cover could be predicted from the classification, the lithology designation was adapted to include the presence of peat, based on GSNI superficial deposit polygons and linear discriminant analysis (LDA) undertaken. Prediction accuracy for LDA classification improved from 60.98% based on PCA using 10 principal components to 64.73% using MAF based on the 6 most dominant factors. The misclassification of peat may reflect degradation of peat covered areas since the creation of superficial deposit classification. Further work will examine the influence of underlying lithologies on elemental concentrations in peat composition and the effect of this in classification analysis.
Resumo:
OBJECTIVES: To compare the ability of ophthalmologists versus optometrists to correctly classify retinal lesions due to neovascular age-related macular degeneration (nAMD).
DESIGN: Randomised balanced incomplete block trial. Optometrists in the community and ophthalmologists in the Hospital Eye Service classified lesions from vignettes comprising clinical information, colour fundus photographs and optical coherence tomographic images. Participants' classifications were validated against experts' classifications (reference standard).
SETTING: Internet-based application.
PARTICIPANTS: Ophthalmologists with experience in the age-related macular degeneration service; fully qualified optometrists not participating in nAMD shared care.
INTERVENTIONS: The trial emulated a conventional trial comparing optometrists' and ophthalmologists' decision-making, but vignettes, not patients, were assessed. Therefore, there were no interventions and the trial was virtual. Participants received training before assessing vignettes.
MAIN OUTCOME MEASURES: Primary outcome-correct classification of the activity status of a lesion based on a vignette, compared with a reference standard. Secondary outcomes-potentially sight-threatening errors, judgements about specific lesion components and participants' confidence in their decisions.
RESULTS: In total, 155 participants registered for the trial; 96 (48 in each group) completed all assessments and formed the analysis population. Optometrists and ophthalmologists achieved 1702/2016 (84.4%) and 1722/2016 (85.4%) correct classifications, respectively (OR 0.91, 95% CI 0.66 to 1.25; p=0.543). Optometrists' decision-making was non-inferior to ophthalmologists' with respect to the prespecified limit of 10% absolute difference (0.298 on the odds scale). Optometrists and ophthalmologists made similar numbers of sight-threatening errors (57/994 (5.7%) vs 62/994 (6.2%), OR 0.93, 95% CI 0.55 to 1.57; p=0.789). Ophthalmologists assessed lesion components as present less often than optometrists and were more confident about their classifications than optometrists.
CONCLUSIONS: Optometrists' ability to make nAMD retreatment decisions from vignettes is not inferior to ophthalmologists' ability. Shared care with optometrists monitoring quiescent nAMD lesions has the potential to reduce workload in hospitals.
TRIAL REGISTRATION NUMBER: ISRCTN07479761; pre-results registration.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-08