914 resultados para High-dimensional index structure
Resumo:
Most machine-learning algorithms are designed for datasets with features of a single type whereas very little attention has been given to datasets with mixed-type features. We recently proposed a model to handle mixed types with a probabilistic latent variable formalism. This proposed model describes the data by type-specific distributions that are conditionally independent given the latent space and is called generalised generative topographic mapping (GGTM). It has often been observed that visualisations of high-dimensional datasets can be poor in the presence of noisy features. In this paper we therefore propose to extend the GGTM to estimate feature saliency values (GGTMFS) as an integrated part of the parameter learning process with an expectation-maximisation (EM) algorithm. The efficacy of the proposed GGTMFS model is demonstrated both for synthetic and real datasets.
Resumo:
2010 Mathematics Subject Classification: 62J99.
Resumo:
Heterogeneous datasets arise naturally in most applications due to the use of a variety of sensors and measuring platforms. Such datasets can be heterogeneous in terms of the error characteristics and sensor models. Treating such data is most naturally accomplished using a Bayesian or model-based geostatistical approach; however, such methods generally scale rather badly with the size of dataset, and require computationally expensive Monte Carlo based inference. Recently within the machine learning and spatial statistics communities many papers have explored the potential of reduced rank representations of the covariance matrix, often referred to as projected or fixed rank approaches. In such methods the covariance function of the posterior process is represented by a reduced rank approximation which is chosen such that there is minimal information loss. In this paper a sequential Bayesian framework for inference in such projected processes is presented. The observations are considered one at a time which avoids the need for high dimensional integrals typically required in a Bayesian approach. A C++ library, gptk, which is part of the INTAMAP web service, is introduced which implements projected, sequential estimation and adds several novel features. In particular the library includes the ability to use a generic observation operator, or sensor model, to permit data fusion. It is also possible to cope with a range of observation error characteristics, including non-Gaussian observation errors. Inference for the covariance parameters is explored, including the impact of the projected process approximation on likelihood profiles. We illustrate the projected sequential method in application to synthetic and real datasets. Limitations and extensions are discussed. © 2010 Elsevier Ltd.
Resumo:
In this chapter we provide a comprehensive overview of the emerging field of visualising and browsing image databases. We start with a brief introduction to content-based image retrieval and the traditional query-by-example search paradigm that many retrieval systems employ. We specify the problems associated with this type of interface, such as users not being able to formulate a query due to not having a target image or concept in mind. The idea of browsing systems is then introduced as a means to combat these issues, harnessing the cognitive power of the human mind in order to speed up image retrieval.We detail common methods in which the often high-dimensional feature data extracted from images can be used to visualise image databases in an intuitive way. Systems using dimensionality reduction techniques, such as multi-dimensional scaling, are reviewed along with those that cluster images using either divisive or agglomerative techniques as well as graph-based visualisations. While visualisation of an image collection is useful for providing an overview of the contained images, it forms only part of an image database navigation system. We therefore also present various methods provided by these systems to allow for interactive browsing of these datasets. A further area we explore are user studies of systems and visualisations where we look at the different evaluations undertaken in order to test usability and compare systems, and highlight the key findings from these studies. We conclude the chapter with several recommendations for future work in this area. © 2011 Springer-Verlag Berlin Heidelberg.
Resumo:
For the treatment and monitoring of Parkinson's disease (PD) to be scientific, a key requirement is that measurement of disease stages and severity is quantitative, reliable, and repeatable. The last 50 years in PD research have been dominated by qualitative, subjective ratings obtained by human interpretation of the presentation of disease signs and symptoms at clinical visits. More recently, “wearable,” sensor-based, quantitative, objective, and easy-to-use systems for quantifying PD signs for large numbers of participants over extended durations have been developed. This technology has the potential to significantly improve both clinical diagnosis and management in PD and the conduct of clinical studies. However, the large-scale, high-dimensional character of the data captured by these wearable sensors requires sophisticated signal processing and machine-learning algorithms to transform it into scientifically and clinically meaningful information. Such algorithms that “learn” from data have shown remarkable success in making accurate predictions for complex problems in which human skill has been required to date, but they are challenging to evaluate and apply without a basic understanding of the underlying logic on which they are based. This article contains a nontechnical tutorial review of relevant machine-learning algorithms, also describing their limitations and how these can be overcome. It discusses implications of this technology and a practical road map for realizing the full potential of this technology in PD research and practice. © 2016 International Parkinson and Movement Disorder Society.
Resumo:
As part of a multi-university research program funded by NSF, a comprehensive experimental and analytical study of seismic behavior of hybrid fiber reinforced polymer (FRP)-concrete column is presented in this dissertation. Experimental investigation includes cyclic tests of six large-scale concrete-filled FRP tube (CFFT) and RC columns followed by monotonic flexural tests, a nondestructive evaluation of damage using ultrasonic pulse velocity in between the two test sets and tension tests of sixty-five FRP coupons. Two analytical models using ANSYS and OpenSees were developed and favorably verified against both cyclic and monotonic flexural tests. The results of the two methods were compared. A parametric study was also carried out to investigate the effect of three main parameters on primary seismic response measures. The responses of typical CFFT columns to three representative earthquake records were also investigated. The study shows that only specimens with carbon FRP cracked, whereas specimens with glass or hybrid FRP did not show any visible cracks throughout cyclic tests. Further monotonic flexural tests showed that carbon specimens both experienced flexural cracks in tension and crumpling in compression. Glass or hybrid specimens, on the other hand, all showed local buckling of FRP tubes. Compared with conventional RC columns, CFFT column possesses higher flexural strength and energy dissipation with an extended plastic hinge region. Among all CFFT columns, the hybrid lay-up demonstrated the highest flexural strength and initial stiffness, mainly because of its high reinforcement index and FRP/concrete stiffness ratio, respectively. Moreover, at the same drift ratio, the hybrid lay-up was also considered as the best in term of energy dissipation. Specimens with glassfiber tubes, on the other hand, exhibited the highest ductility due to better flexibility of glass FRP composites. Furthermore, ductility of CFFTs showed a strong correlation with the rupture strain of FRP. Parametric study further showed that different FRP architecture and rebar types may lead to different failure modes for CFFT columns. Transient analysis of strong ground motions showed that the column with off-axis nonlinear filament-wound glass FRP tube exhibited a superior seismic performance to all other CFFTs. Moreover, higher FRP reinforcement ratios may lead to a brittle system failure, while a well-engineered FRP reinforcement configuration may significantly enhance the seismic performance of CFFT columns.
Resumo:
As part of a multi-university research program funded by NSF, a comprehensive experimental and analytical study of seismic behavior of hybrid fiber reinforced polymer (FRP)-concrete column is presented in this dissertation. Experimental investigation includes cyclic tests of six large-scale concrete-filled FRP tube (CFFT) and RC columns followed by monotonic flexural tests, a nondestructive evaluation of damage using ultrasonic pulse velocity in between the two test sets and tension tests of sixty-five FRP coupons. Two analytical models using ANSYS and OpenSees were developed and favorably verified against both cyclic and monotonic flexural tests. The results of the two methods were compared. A parametric study was also carried out to investigate the effect of three main parameters on primary seismic response measures. The responses of typical CFFT columns to three representative earthquake records were also investigated. The study shows that only specimens with carbon FRP cracked, whereas specimens with glass or hybrid FRP did not show any visible cracks throughout cyclic tests. Further monotonic flexural tests showed that carbon specimens both experienced flexural cracks in tension and crumpling in compression. Glass or hybrid specimens, on the other hand, all showed local buckling of FRP tubes. Compared with conventional RC columns, CFFT column possesses higher flexural strength and energy dissipation with an extended plastic hinge region. Among all CFFT columns, the hybrid lay-up demonstrated the highest flexural strength and initial stiffness, mainly because of its high reinforcement index and FRP/concrete stiffness ratio, respectively. Moreover, at the same drift ratio, the hybrid lay-up was also considered as the best in term of energy dissipation. Specimens with glassfiber tubes, on the other hand, exhibited the highest ductility due to better flexibility of glass FRP composites. Furthermore, ductility of CFFTs showed a strong correlation with the rupture strain of FRP. Parametric study further showed that different FRP architecture and rebar types may lead to different failure modes for CFFT columns. Transient analysis of strong ground motions showed that the column with off-axis nonlinear filament-wound glass FRP tube exhibited a superior seismic performance to all other CFFTs. Moreover, higher FRP reinforcement ratios may lead to a brittle system failure, while a well-engineered FRP reinforcement configuration may significantly enhance the seismic performance of CFFT columns.
Resumo:
We reconstruct the environmental evolution of the East China Sea in the past 14 kyr based on glycerol dialkyl glycerol tetraethers (GDGTs) in a sediment core from the subaqueous Yangtze River Delta. Two primary phases are recognized. Phase I (13.8-8 cal kyr BP) reflects a predominantly continental influence, showing distinctly higher concentrations of branched GDGTs (averaged 143 ng/g dry sediment weight, dsw) than isoprenoid GDGTs (averaged 36 ng/g dsw), high BIT index (branched vs. isoprenoid tetraethers) values (>0.78) and a fluctuating GDGT-0/crenarchaeol ratio (R0/5, varied from 0.52 to 3.81). Within this interval, temporal increases of terrestrial and marine influence are attributed to Younger Dryas (YD) (ca. 12.9-12.2 cal kyr BP) cold event and melt-water pulse (MWP) -1B (11.5-11.1 cal kyr BP), respectively. The prominent transition from 8 to 7.9 cal kyr BP shows a sharp decrease in BIT index value (<0.4) and increase in crenarchaeol, which marks the beginning of phase II. Afterwards, the proxies remain relatively constant, which indicates that phase II (7.9 cal kyr BP-present) is a shelf sedimentary environment with high stand of sea level. Overall, the BIT index in our record serves as a good marker for terrestrial influence at the site, and likely reflects the flooding history of the region. The TEX86 (TetraEther Index of tetraethers consisting of 86 carbons) proxy is not applicable in phase I because of an excess terrestrial influence; but it seems to be valid for revealing the annual SST in phase II (21.6±0.9°C, n=49). In contrast, the MBT'/CBT (Methylation of Branched Tetraethers and Cyclization of Branched Tetraethers) proxy appears to faithfully record the annual mean air temperature (MAT) (14.3±0.63°C, n=68) and presents an integrated signal over the middle and lower Yangtze River drainage basin.
Resumo:
Distributed Computing frameworks belong to a class of programming models that allow developers to
launch workloads on large clusters of machines. Due to the dramatic increase in the volume of
data gathered by ubiquitous computing devices, data analytic workloads have become a common
case among distributed computing applications, making Data Science an entire field of
Computer Science. We argue that Data Scientist's concern lays in three main components: a dataset,
a sequence of operations they wish to apply on this dataset, and some constraint they may have
related to their work (performances, QoS, budget, etc). However, it is actually extremely
difficult, without domain expertise, to perform data science. One need to select the right amount
and type of resources, pick up a framework, and configure it. Also, users are often running their
application in shared environments, ruled by schedulers expecting them to specify precisely their resource
needs. Inherent to the distributed and concurrent nature of the cited frameworks, monitoring and
profiling are hard, high dimensional problems that block users from making the right
configuration choices and determining the right amount of resources they need. Paradoxically, the
system is gathering a large amount of monitoring data at runtime, which remains unused.
In the ideal abstraction we envision for data scientists, the system is adaptive, able to exploit
monitoring data to learn about workloads, and process user requests into a tailored execution
context. In this work, we study different techniques that have been used to make steps toward
such system awareness, and explore a new way to do so by implementing machine learning
techniques to recommend a specific subset of system configurations for Apache Spark applications.
Furthermore, we present an in depth study of Apache Spark executors configuration, which highlight
the complexity in choosing the best one for a given workload.
Resumo:
Bayesian methods offer a flexible and convenient probabilistic learning framework to extract interpretable knowledge from complex and structured data. Such methods can characterize dependencies among multiple levels of hidden variables and share statistical strength across heterogeneous sources. In the first part of this dissertation, we develop two dependent variational inference methods for full posterior approximation in non-conjugate Bayesian models through hierarchical mixture- and copula-based variational proposals, respectively. The proposed methods move beyond the widely used factorized approximation to the posterior and provide generic applicability to a broad class of probabilistic models with minimal model-specific derivations. In the second part of this dissertation, we design probabilistic graphical models to accommodate multimodal data, describe dynamical behaviors and account for task heterogeneity. In particular, the sparse latent factor model is able to reveal common low-dimensional structures from high-dimensional data. We demonstrate the effectiveness of the proposed statistical learning methods on both synthetic and real-world data.
Resumo:
While molecular and cellular processes are often modeled as stochastic processes, such as Brownian motion, chemical reaction networks and gene regulatory networks, there are few attempts to program a molecular-scale process to physically implement stochastic processes. DNA has been used as a substrate for programming molecular interactions, but its applications are restricted to deterministic functions and unfavorable properties such as slow processing, thermal annealing, aqueous solvents and difficult readout limit them to proof-of-concept purposes. To date, whether there exists a molecular process that can be programmed to implement stochastic processes for practical applications remains unknown.
In this dissertation, a fully specified Resonance Energy Transfer (RET) network between chromophores is accurately fabricated via DNA self-assembly, and the exciton dynamics in the RET network physically implement a stochastic process, specifically a continuous-time Markov chain (CTMC), which has a direct mapping to the physical geometry of the chromophore network. Excited by a light source, a RET network generates random samples in the temporal domain in the form of fluorescence photons which can be detected by a photon detector. The intrinsic sampling distribution of a RET network is derived as a phase-type distribution configured by its CTMC model. The conclusion is that the exciton dynamics in a RET network implement a general and important class of stochastic processes that can be directly and accurately programmed and used for practical applications of photonics and optoelectronics. Different approaches to using RET networks exist with vast potential applications. As an entropy source that can directly generate samples from virtually arbitrary distributions, RET networks can benefit applications that rely on generating random samples such as 1) fluorescent taggants and 2) stochastic computing.
By using RET networks between chromophores to implement fluorescent taggants with temporally coded signatures, the taggant design is not constrained by resolvable dyes and has a significantly larger coding capacity than spectrally or lifetime coded fluorescent taggants. Meanwhile, the taggant detection process becomes highly efficient, and the Maximum Likelihood Estimation (MLE) based taggant identification guarantees high accuracy even with only a few hundred detected photons.
Meanwhile, RET-based sampling units (RSU) can be constructed to accelerate probabilistic algorithms for wide applications in machine learning and data analytics. Because probabilistic algorithms often rely on iteratively sampling from parameterized distributions, they can be inefficient in practice on the deterministic hardware traditional computers use, especially for high-dimensional and complex problems. As an efficient universal sampling unit, the proposed RSU can be integrated into a processor / GPU as specialized functional units or organized as a discrete accelerator to bring substantial speedups and power savings.
Resumo:
Aberrant behavior of biological signaling pathways has been implicated in diseases such as cancers. Therapies have been developed to target proteins in these networks in the hope of curing the illness or bringing about remission. However, identifying targets for drug inhibition that exhibit good therapeutic index has proven to be challenging since signaling pathways have a large number of components and many interconnections such as feedback, crosstalk, and divergence. Unfortunately, some characteristics of these pathways such as redundancy, feedback, and drug resistance reduce the efficacy of single drug target therapy and necessitate the employment of more than one drug to target multiple nodes in the system. However, choosing multiple targets with high therapeutic index poses more challenges since the combinatorial search space could be huge. To cope with the complexity of these systems, computational tools such as ordinary differential equations have been used to successfully model some of these pathways. Regrettably, for building these models, experimentally-measured initial concentrations of the components and rates of reactions are needed which are difficult to obtain, and in very large networks, they may not be available at the moment. Fortunately, there exist other modeling tools, though not as powerful as ordinary differential equations, which do not need the rates and initial conditions to model signaling pathways. Petri net and graph theory are among these tools. In this thesis, we introduce a methodology based on Petri net siphon analysis and graph network centrality measures for identifying prospective targets for single and multiple drug therapies. In this methodology, first, potential targets are identified in the Petri net model of a signaling pathway using siphon analysis. Then, the graph-theoretic centrality measures are employed to prioritize the candidate targets. Also, an algorithm is developed to check whether the candidate targets are able to disable the intended outputs in the graph model of the system or not. We implement structural and dynamical models of ErbB1-Ras-MAPK pathways and use them to assess and evaluate this methodology. The identified drug-targets, single and multiple, correspond to clinically relevant drugs. Overall, the results suggest that this methodology, using siphons and centrality measures, shows promise in identifying and ranking drugs. Since this methodology only uses the structural information of the signaling pathways and does not need initial conditions and dynamical rates, it can be utilized in larger networks.
Resumo:
Motivated by environmental protection concerns, monitoring the flue gas of thermal power plant is now often mandatory due to the need to ensure that emission levels stay within safe limits. Optical based gas sensing systems are increasingly employed for this purpose, with regression techniques used to relate gas optical absorption spectra to the concentrations of specific gas components of interest (NOx, SO2 etc.). Accurately predicting gas concentrations from absorption spectra remains a challenging problem due to the presence of nonlinearities in the relationships and the high-dimensional and correlated nature of the spectral data. This article proposes a generalized fuzzy linguistic model (GFLM) to address this challenge. The GFLM is made up of a series of “If-Then” fuzzy rules. The absorption spectra are input variables in the rule antecedent. The rule consequent is a general nonlinear polynomial function of the absorption spectra. Model parameters are estimated using least squares and gradient descent optimization algorithms. The performance of GFLM is compared with other traditional prediction models, such as partial least squares, support vector machines, multilayer perceptron neural networks and radial basis function networks, for two real flue gas spectral datasets: one from a coal-fired power plant and one from a gas-fired power plant. The experimental results show that the generalized fuzzy linguistic model has good predictive ability, and is competitive with alternative approaches, while having the added advantage of providing an interpretable model.
Resumo:
Motivated by environmental protection concerns, monitoring the flue gas of thermal power plant is now often mandatory due to the need to ensure that emission levels stay within safe limits. Optical based gas sensing systems are increasingly employed for this purpose, with regression techniques used to relate gas optical absorption spectra to the concentrations of specific gas components of interest (NOx, SO2 etc.). Accurately predicting gas concentrations from absorption spectra remains a challenging problem due to the presence of nonlinearities in the relationships and the high-dimensional and correlated nature of the spectral data. This article proposes a generalized fuzzy linguistic model (GFLM) to address this challenge. The GFLM is made up of a series of “If-Then” fuzzy rules. The absorption spectra are input variables in the rule antecedent. The rule consequent is a general nonlinear polynomial function of the absorption spectra. Model parameters are estimated using least squares and gradient descent optimization algorithms. The performance of GFLM is compared with other traditional prediction models, such as partial least squares, support vector machines, multilayer perceptron neural networks and radial basis function networks, for two real flue gas spectral datasets: one from a coal-fired power plant and one from a gas-fired power plant. The experimental results show that the generalized fuzzy linguistic model has good predictive ability, and is competitive with alternative approaches, while having the added advantage of providing an interpretable model.
Resumo:
Tellurite glasses are photonic materials of special interest to the branch of optoelectronic and communication, due to its important optical properties such as high refractive index, broad IR transmittance, low phonon energy etc. Tellurite glasses are solutions to the search of potential candidates for nonlinear optical devices. Low phonon energy makes it an efficient host for dopant ions like rare earths, allowing a better environment for radiative transitions. The dopant ions maintain majority of their individual properties in the glass matrix. Tellurites are less toxic than chalcogenides, more chemically and thermally stable which makes them a highly suitable fiber material for nonlinear applications in the midinfrared and they are of increased research interest in applications like laser, amplifier, sensor etc. Low melting point and glass transition temperature helps tellurite glass preparation easier than other glass families.In order to probe into the versatility of tellurite glasses in optoelectronic industry; we have synthesized and undertaken various optical studies on tellurite glasses. We have proved that the highly nonlinear tellurite glasses are suitable candidates in optical limiting, with comparatively lower optical limiting threshold. Tuning the optical properties of glasses is an important factor in the optoelectronic research. We have found that thermal poling is an efficient mechanism in tuning the optical properties of these materials. Another important nonlinear phenomenon found in zinc tellurite glasses is their ability to switch from reverse saturable absorption to saturable absorption in the presence of lanthanide ions. The proposed thesis to be submitted will have seven chapters.