Biblioteca Digital

991 resultados para dimension reduction

A comparison of methods for classifying clinical samples based on proteomics data : a case study for statistical and machine learning approaches

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The discovery of protein variation is an important strategy in disease diagnosis within the biological sciences. The current benchmark for elucidating information from multiple biological variables is the so called “omics” disciplines of the biological sciences. Such variability is uncovered by implementation of multivariable data mining techniques which come under two primary categories, machine learning strategies and statistical based approaches. Typically proteomic studies can produce hundreds or thousands of variables, p, per observation, n, depending on the analytical platform or method employed to generate the data. Many classification methods are limited by an n≪p constraint, and as such, require pre-treatment to reduce the dimensionality prior to classification. Recently machine learning techniques have gained popularity in the field for their ability to successfully classify unknown samples. One limitation of such methods is the lack of a functional model allowing meaningful interpretation of results in terms of the features used for classification. This is a problem that might be solved using a statistical model-based approach where not only is the importance of the individual protein explicit, they are combined into a readily interpretable classification rule without relying on a black box approach. Here we incorporate statistical dimension reduction techniques Partial Least Squares (PLS) and Principal Components Analysis (PCA) followed by both statistical and machine learning classification methods, and compared them to a popular machine learning technique, Support Vector Machines (SVM). Both PLS and SVM demonstrate strong utility for proteomic classification problems.

Setting water quality objectives for the health of freshwater fish

Relevância:

60.00% 60.00%

Publicador:

Resumo:

An important responsibility of the Environment Protection Authority, Victoria, is to set objectives for levels of environmental contaminants. To support the development of environmental objectives for water quality, a need has been identified to understand the dual impacts of concentration and duration of a contaminant on biota in freshwater streams. For suspended solids contamination, information reported by Newcombe and Jensen [ North American Journal of Fisheries Management , 16(4):693--727, 1996] study of freshwater fish and the daily suspended solids data from the United States Geological Survey stream monitoring network is utilised. The study group was requested to examine both the utility of the Newcombe and Jensen and the USA data, as well as the formulation of a procedure for use by the Environment Protection Authority Victoria that takes concentration and duration of harmful episodes into account when assessing water quality. The extent to which the impact of a toxic event on fish health could be modelled deterministically was also considered. It was found that concentration and exposure duration were the main compounding factors on the severity of effects of suspended solids on freshwater fish. A protocol for assessing the cumulative effect on fish health and a simple deterministic model, based on the biology of gill harm and recovery, was proposed. References D. W. T. Au, C. A. Pollino, R. S. S Wu, P. K. S. Shin, S. T. F. Lau, and J. Y. M. Tang. Chronic effects of suspended solids on gill structure, osmoregulation, growth, and triiodothyronine in juvenile green grouper epinephelus coioides . Marine Ecology Press Series , 266:255--264, 2004. J.C. Bezdek, S.K. Chuah, and D. Leep. Generalized k-nearest neighbor rules. Fuzzy Sets and Systems , 18:237--26, 1986. E. T. Champagne, K. L. Bett-Garber, A. M. McClung, and C. Bergman. {Sensory characteristics of diverse rice cultivars as influenced by genetic and environmental factors}. Cereal Chem. , {81}:{237--243}, {2004}. S. G. Cheung and P. K. S. Shin. Size effects of suspended particles on gill damage in green-lipped mussel perna viridis. Marine Pollution Bulletin , 51(8--12):801--810, 2005. D. H. Evans. The fish gill: site of action and model for toxic effects of environmental pollutants. Environmental Health Perspectives , 71:44--58, 1987. G. C. Grigg. The failure of oxygen transport in a fish at low levels of ambient oxygen. Comp. Biochem. Physiol. , 29:1253--1257, 1969. G. Holmes, A. Donkin, and I.H. Witten. {Weka: A machine learning workbench}. In Proceedings of the Second Australia and New Zealand Conference on Intelligent Information Systems , volume {24}, pages {357--361}, {Brisbane, Australia}, {1994}. {IEEE Computer Society}. D. D. Macdonald and C. P. Newcombe. Utility of the stress index for predicting suspended sediment effects: response to comments. North American Journal of Fisheries Management , 13:873--876, 1993. C. P. Newcombe. Suspended sediment in aquatic ecosystems: ill effects as a function of concentration and duration of exposure. Technical report, British Columbia Ministry of Environment, Lands and Parks, Habitat Protection branch, Victoria, 1994. C. P. Newcombe and J. O. T. Jensen. Channel suspended sediment and fisheries: A synthesis for quantitative assessment of risk and impact. North American Journal of Fisheries Management , 16(4):693--727, 1996. C. P. Newcombe and D. D. Macdonald. Effects of suspended sediments on aquatic ecosystems. North American Journal of Fisheries Management , 11(1):72--82, 1991. K. Schmidt-Nielsen. Scaling. Why is animal size so important? Cambridge University Press, NY, 1984. J. S. Schwartz, A. Simon, and L. Klimetz. Use of fish functional traits to associate in-stream suspended sediment transport metrics with biological impairment. Environmental Monitoring and Assessment , 179(1--4):347--369, 2011. E. Al Shaw and J. S. Richardson. Direct and indirect effects of sediment pulse duration on stream invertebrate assemb ages and rainbow trout ( Oncorhynchus mykiss ) growth and survival. Canadian Journal of Fish and Aquatic Science , 58:2213--2221, 2001. P. Tiwari and H. Hasegawa. {Demand for housing in Tokyo: A discrete choice analysis}. Regional Studies , {38}:{27--42}, {2004}. Y. Tramblay, A. Saint-Hilaire, T. B. M. J. Ouarda, F. Moatar, and B Hecht. Estimation of local extreme suspended sediment concentrations in california rivers. Science of the Total Environment , 408:4221--

A principled experimental design approach to Big Data analysis

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Big Datasets are endemic, but they are often notoriously difficult to analyse because of their size, heterogeneity, history and quality. The purpose of this paper is to open a discourse on the use of modern experimental design methods to analyse Big Data in order to answer particular questions of interest. By appealing to a range of examples, it is suggested that this perspective on Big Data modelling and analysis has wide generality and advantageous inferential and computational properties. In particular, the principled experimental design approach is shown to provide a flexible framework for analysis that, for certain classes of objectives and utility functions, delivers near equivalent answers compared with analyses of the full dataset under a controlled error rate. It can also provide a formalised method for iterative parameter estimation, model checking, identification of data gaps and evaluation of data quality. Finally, it has the potential to add value to other Big Data sampling algorithms, in particular divide-and-conquer strategies, by determining efficient sub-samples.

Full and Model-Reduced Structure-Preserving Simulation of Incompressible Fluids

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This thesis outlines the construction of several types of structured integrators for incompressible fluids. We first present a vorticity integrator, which is the Hamiltonian counterpart of the existing Lagrangian-based fluid integrator. We next present a model-reduced variational Eulerian integrator for incompressible fluids, which combines the efficiency gains of dimension reduction, the qualitative robustness to coarse spatial and temporal resolutions of geometric integrators, and the simplicity of homogenized boundary conditions on regular grids to deal with arbitrarily-shaped domains with sub-grid accuracy.

Both these numerical methods involve approximating the Lie group of volume-preserving diffeomorphisms by a finite-dimensional Lie-group and then restricting the resulting variational principle by means of a non-holonomic constraint. Advantages and limitations of this discretization method will be outlined. It will be seen that these derivation techniques are unable to yield symplectic integrators, but that energy conservation is easily obtained, as is a discretized version of Kelvin's circulation theorem.

Finally, we outline the basis of a spectral discrete exterior calculus, which may be a useful element in producing structured numerical methods for fluids in the future.

改进的统计不相关最优鉴别矢量集

Relevância:

60.00% 60.00%

Publicador:

Resumo:

该文对统计不相关最优鉴别矢量集算法进行研究,在分析统计不相关最优鉴别矢量集算法的基础上提出了一种改进的方法。该方法在类内散布矩阵的特征空间中求解统计不相关最优鉴别矢量集。为了加快特征抽取速度,利用基于图像鉴别分析的维数压缩方法,对图像数据进行了压缩。在ORL和Yale人脸数据库的数值实验,验证本文所提出的方法的有效性。

Mapping the structure of perceptual and visual-motor abilities in healthy young adults.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The ability to quickly detect and respond to visual stimuli in the environment is critical to many human activities. While such perceptual and visual-motor skills are important in a myriad of contexts, considerable variability exists between individuals in these abilities. To better understand the sources of this variability, we assessed perceptual and visual-motor skills in a large sample of 230 healthy individuals via the Nike SPARQ Sensory Station, and compared variability in their behavioral performance to demographic, state, sleep and consumption characteristics. Dimension reduction and regression analyses indicated three underlying factors: Visual-Motor Control, Visual Sensitivity, and Eye Quickness, which accounted for roughly half of the overall population variance in performance on this battery. Inter-individual variability in Visual-Motor Control was correlated with gender and circadian patters such that performance on this factor was better for males and for those who had been awake for a longer period of time before assessment. The current findings indicate that abilities involving coordinated hand movements in response to stimuli are subject to greater individual variability, while visual sensitivity and occulomotor control are largely stable across individuals.

Latent Variable Models for Stochastic Discount Factors.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Latent variable models in finance originate both from asset pricing theory and time series analysis. These two strands of literature appeal to two different concepts of latent structures, which are both useful to reduce the dimension of a statistical model specified for a multivariate time series of asset prices. In the CAPM or APT beta pricing models, the dimension reduction is cross-sectional in nature, while in time-series state-space models, dimension is reduced longitudinally by assuming conditional independence between consecutive returns, given a small number of state variables. In this paper, we use the concept of Stochastic Discount Factor (SDF) or pricing kernel as a unifying principle to integrate these two concepts of latent variables. Beta pricing relations amount to characterize the factors as a basis of a vectorial space for the SDF. The coefficients of the SDF with respect to the factors are specified as deterministic functions of some state variables which summarize their dynamics. In beta pricing models, it is often said that only the factorial risk is compensated since the remaining idiosyncratic risk is diversifiable. Implicitly, this argument can be interpreted as a conditional cross-sectional factor structure, that is, a conditional independence between contemporaneous returns of a large number of assets, given a small number of factors, like in standard Factor Analysis. We provide this unifying analysis in the context of conditional equilibrium beta pricing as well as asset pricing with stochastic volatility, stochastic interest rates and other state variables. We address the general issue of econometric specifications of dynamic asset pricing models, which cover the modern literature on conditionally heteroskedastic factor models as well as equilibrium-based asset pricing models with an intertemporal specification of preferences and market fundamentals. We interpret various instantaneous causality relationships between state variables and market fundamentals as leverage effects and discuss their central role relative to the validity of standard CAPM-like stock pricing and preference-free option pricing.

A hybrid method for sag source location in power network

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The work presented in this paper belongs to the power quality knowledge area and deals with the voltage sags in power transmission and distribution systems. Propagating throughout the power network, voltage sags can cause plenty of problems for domestic and industrial loads that can financially cost a lot. To impose penalties to responsible party and to improve monitoring and mitigation strategies, sags must be located in the power network. With such a worthwhile objective, this paper comes up with a new method for associating a sag waveform with its origin in transmission and distribution networks. It solves this problem through developing hybrid methods which hire multiway principal component analysis (MPCA) as a dimension reduction tool. MPCA reexpresses sag waveforms in a new subspace just in a few scores. We train some well-known classifiers with these scores and exploit them for classification of future sags. The capabilities of the proposed method for dimension reduction and classification are examined using the real data gathered from three substations in Catalonia, Spain. The obtained classification rates certify the goodness and powerfulness of the developed hybrid methods as brand-new tools for sag classification

abctools: an R package for tuning approximate Bayesian computation analyses

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Approximate Bayesian computation (ABC) is a popular family of algorithms which perform approximate parameter inference when numerical evaluation of the likelihood function is not possible but data can be simulated from the model. They return a sample of parameter values which produce simulations close to the observed dataset. A standard approach is to reduce the simulated and observed datasets to vectors of summary statistics and accept when the difference between these is below a specified threshold. ABC can also be adapted to perform model choice. In this article, we present a new software package for R, abctools which provides methods for tuning ABC algorithms. This includes recent dimension reduction algorithms to tune the choice of summary statistics, and coverage methods to tune the choice of threshold. We provide several illustrations of these routines on applications taken from the ABC literature.

Detection of vehicles with monolithic classifier vis-à-vis a boosted cascaded classifier

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper describes the comparison of accuracy and performance of two machine learning approaches for visual object detection and tracking vehicles, from an on-road image sequence. The first is a neural network based approach. Where an algorithm of multi resolution technique based on Haar basis functions was used to obtain an image with different scales. Thereafter a classification was carried out with the multilayer feed forward neural network. Principle Component Analysis (PCA) technique was used as a dimension reduction technique to make the classification process much more efficient. The second approach is based on boosting which also yields very good detection rates. In general, boosting is one of the most important developments in classification methodology. It works by sequentially applying a classification algorithm to reweighed versions of the training data, followed by taking a weighted majority vote of the sequence of classifiers thus produced. For this work, a strong classifier was trained by the adaboost algorithm. The results of comparing the two methodologies visà-vis shows the effectiveness of the methods that have been used.

Driver verification based on handgrip recognition on steering wheel

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents a novel driver verification algorithm based on the recognition of handgrip patterns on steering wheel. A pressure sensitive mat mounted on a steering wheel is employed to collect a series of pressure images exerted by the hands of the drivers who intend to start the vehicle. Then, feature extraction from those images is carried out through two major steps: Quad-Tree-based multi-resolution decomposition on the images and Principle Component Analysis (PCA)-based dimension reduction, followed by implementing a likelihood-ratio classifier to distinguish drivers into known or unknown ones. The experimental results obtained in this study show that the mean acceptance rates of 78.15% and 78.22% for the trained subjects and the mean rejection rates of 93.92% and 90.93% to the un-trained ones are achieved in two trials, respectively. It can be concluded that the driver verification approach based on the handgrip recognition on steering wheel is promising and will be further explored in the near future.

Blind Spectral Unmixing Based on Sparse Nonnegative Matrix Factorization

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Nonnegative matrix factorization (NMF) is a widely used method for blind spectral unmixing (SU), which aims at obtaining the endmembers and corresponding fractional abundances, knowing only the collected mixing spectral data. It is noted that the abundance may be sparse (i.e., the endmembers may be with sparse distributions) and sparse NMF tends to lead to a unique result, so it is intuitive and meaningful to constrain NMF with sparseness for solving SU. However, due to the abundance sum-to-one constraint in SU, the traditional sparseness measured by L0/L1-norm is not an effective constraint any more. A novel measure (termed as S-measure) of sparseness using higher order norms of the signal vector is proposed in this paper. It features the physical significance. By using the S-measure constraint (SMC), a gradient-based sparse NMF algorithm (termed as NMF-SMC) is proposed for solving the SU problem, where the learning rate is adaptively selected, and the endmembers and abundances are simultaneously estimated. In the proposed NMF-SMC, there is no pure index assumption and no need to know the exact sparseness degree of the abundance in prior. Yet, it does not require the preprocessing of dimension reduction in which some useful information may be lost. Experiments based on synthetic mixtures and real-world images collected by AVIRIS and HYDICE sensors are performed to evaluate the validity of the proposed method.

Aplicação de superfícies seletivas em frequência para melhoria de resposta de arranjos de antenas planares

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This work aims to show how the application of frequency selective surfaces (FSS) in planar antenna arrays become an alternative to obtain desired radiation characteristics from changes in radiation parameters of the arrays, such as bandwidth, gain and directivity. In addition to analyzing these parameters is also made a study of the mutual coupling between the elements of the array. To accomplish this study, were designed a microstrip antenna array with two patch elements, fed by a network feed. Another change made in the array was the use of the truncated ground plane, with the objective of increasing the bandwidth and miniaturize the elements of the array. In order to study the behavior of frequency selective surfaces applied in antenna arrays, three different layouts were proposed. The first layout uses the FSS as a superstrate (above the array). The second layout uses the FSS as reflector element (below the array). The third layout is placed between two FSS. Numerical and experimental results for each of the proposed configurations are presented in order to validate the research

Mexico: Combining monthly inflation predictions from surveys

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We examine the problem of combining Mexican inflation predictions or projections provided by a biweekly survey of professional forecasters. Consumer price inflation in Mexico is measured twice a month. We consider several combining methods and advocate the use of dimension reduction techniques whose performance is compared with different benchmark methods, including the simplest average prediction. Missing values in the database are imputed by two different databased methods. The results obtained are basically robust to the choice of the imputation method. A preliminary analysis of the data was based on its panel data structure and showed the potential usefulness of using dimension reduction techniques to combine the experts' predictions. The main findings are: the first monthly predictions are best combined by way of the first principal component of the predictions available; the best second monthly prediction is obtained by calculating the median prediction and is more accurate than the first one.

Modeling gene expression regulatory networks with the sparse vector autoregressive model

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Abstract Background To understand the molecular mechanisms underlying important biological processes, a detailed description of the gene products networks involved is required. In order to define and understand such molecular networks, some statistical methods are proposed in the literature to estimate gene regulatory networks from time-series microarray data. However, several problems still need to be overcome. Firstly, information flow need to be inferred, in addition to the correlation between genes. Secondly, we usually try to identify large networks from a large number of genes (parameters) originating from a smaller number of microarray experiments (samples). Due to this situation, which is rather frequent in Bioinformatics, it is difficult to perform statistical tests using methods that model large gene-gene networks. In addition, most of the models are based on dimension reduction using clustering techniques, therefore, the resulting network is not a gene-gene network but a module-module network. Here, we present the Sparse Vector Autoregressive model as a solution to these problems. Results We have applied the Sparse Vector Autoregressive model to estimate gene regulatory networks based on gene expression profiles obtained from time-series microarray experiments. Through extensive simulations, by applying the SVAR method to artificial regulatory networks, we show that SVAR can infer true positive edges even under conditions in which the number of samples is smaller than the number of genes. Moreover, it is possible to control for false positives, a significant advantage when compared to other methods described in the literature, which are based on ranks or score functions. By applying SVAR to actual HeLa cell cycle gene expression data, we were able to identify well known transcription factor targets. Conclusion The proposed SVAR method is able to model gene regulatory networks in frequent situations in which the number of samples is lower than the number of genes, making it possible to naturally infer partial Granger causalities without any a priori information. In addition, we present a statistical test to control the false discovery rate, which was not previously possible using other gene regulatory network models.

«
1
2
3
4
5
6
7
8
...
66
67
»