984 resultados para regression algorithm


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Inferring the spatial expansion dynamics of invading species from molecular data is notoriously difficult due to the complexity of the processes involved. For these demographic scenarios, genetic data obtained from highly variable markers may be profitably combined with specific sampling schemes and information from other sources using a Bayesian approach. The geographic range of the introduced toad Bufo marinus is still expanding in eastern and northern Australia, in each case from isolates established around 1960. A large amount of demographic and historical information is available on both expansion areas. In each area, samples were collected along a transect representing populations of different ages and genotyped at 10 microsatellite loci. Five demographic models of expansion, differing in the dispersal pattern for migrants and founders and in the number of founders, were considered. Because the demographic history is complex, we used an approximate Bayesian method, based on a rejection-regression algorithm. to formally test the relative likelihoods of the five models of expansion and to infer demographic parameters. A stepwise migration-foundation model with founder events was statistically better supported than other four models in both expansion areas. Posterior distributions supported different dynamics of expansion in the studied areas. Populations in the eastern expansion area have a lower stable effective population size and have been founded by a smaller number of individuals than those in the northern expansion area. Once demographically stabilized, populations exchange a substantial number of effective migrants per generation in both expansion areas, and such exchanges are larger in northern than in eastern Australia. The effective number of migrants appears to be considerably lower than that of founders in both expansion areas. We found our inferences to be relatively robust to various assumptions on marker. demographic, and historical features. The method presented here is the only robust, model-based method available so far, which allows inferring complex population dynamics over a short time scale. It also provides the basis for investigating the interplay between population dynamics, drift, and selection in invasive species.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A modified radial basis function (RBF) neural network and its identification algorithm based on observational data with heterogeneous noise are introduced. The transformed system output of Box-Cox is represented by the RBF neural network. To identify the model from observational data, the singular value decomposition of the full regression matrix consisting of basis functions formed by system input data is initially carried out and a new fast identification method is then developed using Gauss-Newton algorithm to derive the required Box-Cox transformation, based on a maximum likelihood estimator (MLE) for a model base spanned by the largest eigenvectors. Finally, the Box-Cox transformation-based RBF neural network, with good generalisation and sparsity, is identified based on the derived optimal Box-Cox transformation and an orthogonal forward regression algorithm using a pseudo-PRESS statistic to select a sparse RBF model with good generalisation. The proposed algorithm and its efficacy are demonstrated with numerical examples.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The modelling of a nonlinear stochastic dynamical processes from data involves solving the problems of data gathering, preprocessing, model architecture selection, learning or adaptation, parametric evaluation and model validation. For a given model architecture such as associative memory networks, a common problem in non-linear modelling is the problem of "the curse of dimensionality". A series of complementary data based constructive identification schemes, mainly based on but not limited to an operating point dependent fuzzy models, are introduced in this paper with the aim to overcome the curse of dimensionality. These include (i) a mixture of experts algorithm based on a forward constrained regression algorithm; (ii) an inherent parsimonious delaunay input space partition based piecewise local lineal modelling concept; (iii) a neurofuzzy model constructive approach based on forward orthogonal least squares and optimal experimental design and finally (iv) the neurofuzzy model construction algorithm based on basis functions that are Bézier Bernstein polynomial functions and the additive decomposition. Illustrative examples demonstrate their applicability, showing that the final major hurdle in data based modelling has almost been removed.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Considering the Wald, score, and likelihood ratio asymptotic test statistics, we analyze a multivariate null intercept errors-in-variables regression model, where the explanatory and the response variables are subject to measurement errors, and a possible structure of dependency between the measurements taken within the same individual are incorporated, representing a longitudinal structure. This model was proposed by Aoki et al. (2003b) and analyzed under the bayesian approach. In this article, considering the classical approach, we analyze asymptotic test statistics and present a simulation study to compare the behavior of the three test statistics for different sample sizes, parameter values and nominal levels of the test. Also, closed form expressions for the score function and the Fisher information matrix are presented. We consider two real numerical illustrations, the odontological data set from Hadgu and Koch (1999), and a quality control data set.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

BACKGROUND: Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study.

METHODS: The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and logistic regression, to identify key biomarkers associated with depression in the National Health and Nutrition Examination Study (2009-2010). Depression was measured using the Patient Health Questionnaire-9 and 67 biomarkers were analysed. Covariates in this study included gender, age, race, smoking, food security, Poverty Income Ratio, Body Mass Index, physical activity, alcohol use, medical conditions and medications. The final imputed weighted multiple logistic regression model included possible confounders and moderators.

RESULTS: After the creation of 20 imputation data sets from multiple chained regression sequences, machine learning boosted regression initially identified 21 biomarkers associated with depression. Using traditional logistic regression methods, including controlling for possible confounders and moderators, a final set of three biomarkers were selected. The final three biomarkers from the novel hybrid variable selection methodology were red cell distribution width (OR 1.15; 95% CI 1.01, 1.30), serum glucose (OR 1.01; 95% CI 1.00, 1.01) and total bilirubin (OR 0.12; 95% CI 0.05, 0.28). Significant interactions were found between total bilirubin with Mexican American/Hispanic group (p = 0.016), and current smokers (p<0.001).

CONCLUSION: The systematic use of a hybrid methodology for variable selection, fusing data mining techniques using a machine learning algorithm with traditional statistical modelling, accounted for missing data and complex survey sampling methodology and was demonstrated to be a useful tool for detecting three biomarkers associated with depression for future hypothesis generation: red cell distribution width, serum glucose and total bilirubin.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Learning from small number of examples is a challenging problem in machine learning. An effective way to improve the performance is through exploiting knowledge from other related tasks. Multi-task learning (MTL) is one such useful paradigm that aims to improve the performance through jointly modeling multiple related tasks. Although there exist numerous classification or regression models in machine learning literature, most of the MTL models are built around ridge or logistic regression. There exist some limited works, which propose multi-task extension of techniques such as support vector machine, Gaussian processes. However, all these MTL models are tied to specific classification or regression algorithms and there is no single MTL algorithm that can be used at a meta level for any given learning algorithm. Addressing this problem, we propose a generic, model-agnostic joint modeling framework that can take any classification or regression algorithm of a practitioner’s choice (standard or custom-built) and build its MTL variant. The key observation that drives our framework is that due to small number of examples, the estimates of task parameters are usually poor, and we show that this leads to an under-estimation of task relatedness between any two tasks with high probability. We derive an algorithm that brings the tasks closer to their true relatedness by improving the estimates of task parameters. This is achieved by appropriate sharing of data across tasks. We provide the detail theoretical underpinning of the algorithm. Through our experiments with both synthetic and real datasets, we demonstrate that the multi-task variants of several classifiers/regressors (logistic regression, support vector machine, K-nearest neighbor, Random Forest, ridge regression, support vector regression) convincingly outperform their single-task counterparts. We also show that the proposed model performs comparable or better than many state-of-the-art MTL and transfer learning baselines.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A identificação e descrição dos caracteres litológicos de uma formação são indispensáveis à avaliação de formações complexas. Com este objetivo, tem sido sistematicamente usada a combinação de ferramentas nucleares em poços não-revestidos. Os perfis resultantes podem ser considerados como a interação entre duas fases distintas: • Fase de transporte da radiação desde a fonte até um ou mais detectores, através da formação. • Fase de detecção, que consiste na coleção da radiação, sua transformação em pulsos de corrente e, finalmente, na distribuição espectral destes pulsos. Visto que a presença do detector não afeta fortemente o resultado do transporte da radiação, cada fase pode ser simulada independentemente uma da outra, o que permite introduzir um novo tipo de modelamento que desacopla as duas fases. Neste trabalho, a resposta final é simulada combinando soluções numéricas do transporte com uma biblioteca de funções resposta do detector, para diferentes energias incidentes e para cada arranjo específico de fontes e detectores. O transporte da radiação é calculado através do algoritmo de elementos finitos (FEM), na forma de fluxo escalar 2½-D, proveniente da solução numérica da aproximação de difusão para multigrupos da equação de transporte de Boltzmann, no espaço de fase, dita aproximação P1, onde a variável direção é expandida em termos dos polinômios ortogonais de Legendre. Isto determina a redução da dimensionalidade do problema, tornando-o mais compatível com o algoritmo FEM, onde o fluxo dependa exclusivamente da variável espacial e das propriedades físicas da formação. A função resposta do detector NaI(Tl) é obtida independentemente pelo método Monte Carlo (MC) em que a reconstrução da vida de uma partícula dentro do cristal cintilador é feita simulando, interação por interação, a posição, direção e energia das diferentes partículas, com a ajuda de números aleatórios aos quais estão associados leis de probabilidades adequadas. Os possíveis tipos de interação (Rayleigh, Efeito fotoelétrico, Compton e Produção de pares) são determinados similarmente. Completa-se a simulação quando as funções resposta do detector são convolvidas com o fluxo escalar, produzindo como resposta final, o espectro de altura de pulso do sistema modelado. Neste espectro serão selecionados conjuntos de canais denominados janelas de detecção. As taxas de contagens em cada janela apresentam dependências diferenciadas sobre a densidade eletrônica e a fitologia. Isto permite utilizar a combinação dessas janelas na determinação da densidade e do fator de absorção fotoelétrico das formações. De acordo com a metodologia desenvolvida, os perfis, tanto em modelos de camadas espessas quanto finas, puderam ser simulados. O desempenho do método foi testado em formações complexas, principalmente naquelas em que a presença de minerais de argila, feldspato e mica, produziram efeitos consideráveis capazes de perturbar a resposta final das ferramentas. Os resultados mostraram que as formações com densidade entre 1.8 e 4.0 g/cm3 e fatores de absorção fotoelétrico no intervalo de 1.5 a 5 barns/e-, tiveram seus caracteres físicos e litológicos perfeitamente identificados. As concentrações de Potássio, Urânio e Tório, puderam ser obtidas com a introdução de um novo sistema de calibração, capaz de corrigir os efeitos devidos à influência de altas variâncias e de correlações negativas, observadas principalmente no cálculo das concentrações em massa de Urânio e Potássio. Na simulação da resposta da sonda CNL, utilizando o algoritmo de regressão polinomial de Tittle, foi verificado que, devido à resolução vertical limitada por ela apresentada, as camadas com espessuras inferiores ao espaçamento fonte - detector mais distante tiveram os valores de porosidade aparente medidos erroneamente. Isto deve-se ao fato do algoritmo de Tittle aplicar-se exclusivamente a camadas espessas. Em virtude desse erro, foi desenvolvido um método que leva em conta um fator de contribuição determinado pela área relativa de cada camada dentro da zona de máxima informação. Assim, a porosidade de cada ponto em subsuperfície pôde ser determinada convolvendo estes fatores com os índices de porosidade locais, porém supondo cada camada suficientemente espessa a fim de adequar-se ao algoritmo de Tittle. Por fim, as limitações adicionais impostas pela presença de minerais perturbadores, foram resolvidas supondo a formação como que composta por um mineral base totalmente saturada com água, sendo os componentes restantes considerados perturbações sobre este caso base. Estes resultados permitem calcular perfis sintéticos de poço, que poderão ser utilizados em esquemas de inversão com o objetivo de obter uma avaliação quantitativa mais detalhada de formações complexas.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

O presente estudo realiza estimativas da condutividade térmica dos principais minerais formadores de rochas, bem como estimativas da condutividade média da fase sólida de cinco litologias básicas (arenitos, calcários, dolomitos, anidritas e litologias argilosas). Alguns modelos térmicos foram comparados entre si, possibilitando a verificação daquele mais apropriado para representar o agregado de minerais e fluidos que compõem as rochas. Os resultados obtidos podem ser aplicados a modelamentos térmicos os mais variados. A metodologia empregada baseia-se em um algoritmo de regressão não-linear denominado de Busca Aleatória Controlada. O comportamento do algoritmo é avaliado para dados sintéticos antes de ser usado em dados reais. O modelo usado na regressão para obter a condutividade térmica dos minerais é o modelo geométrico médio. O método de regressão, usado em cada subconjunto litológico, forneceu os seguintes valores para a condutividade térmica média da fase sólida: arenitos 5,9 ± 1,33 W/mK, calcários 3.1 ± 0.12 W/mK, dolomitos 4.7 ± 0.56 W/mK, anidritas 6.3 ± 0.27 W/mK e para litologias argilosas 3.4 ± 0.48 W/mK. Na sequência, são fornecidas as bases para o estudo da difusão do calor em coordenadas cilíndricas, considerando o efeito de invasão do filtrado da lama na formação, através de uma adaptação da simulação de injeção de poços proveniente das teorias relativas à engenharia de reservatório. Com isto, estimam-se os erros relativos sobre a resistividade aparente assumindo como referência a temperatura original da formação. Nesta etapa do trabalho, faz-se uso do método de diferenças finitas para avaliar a distribuição de temperatura poço-formação. A simulação da invasão é realizada, em coordenadas cilíndricas, através da adaptação da equação de Buckley-Leverett em coordenadas cartesianas. Efeitos como o aparecimento do reboco de lama na parede do poço, gravidade e pressão capilar não são levados em consideração. A partir das distribuições de saturação e temperatura, obtém-se a distribuição radial de resistividade, a qual é convolvida com a resposta radial da ferramenta de indução (transmissor-receptor) resultando na resistividade aparente da formação. Admitindo como referência a temperatura original da formação, são obtidos os erros relativos da resistividade aparente. Através da variação de alguns parâmetros, verifica-se que a porosidade e a saturação original da formação podem ser responsáveis por enormes erros na obtenção da resistividade, principalmente se tais "leituras" forem realizadas logo após a perfuração (MWD). A diferença de temperatura entre poço e formação é a principal causadora de tais erros, indicando que em situações onde esta diferença de temperatura seja grande, perfilagens com ferramentas de indução devam ser realizadas de um a dois dias após a perfuração do poço.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Arctic permafrost landscapes are among the most vulnerable and dynamic landscapes globally, but due to their extent and remoteness most of the landscape changes remain unnoticed. In order to detect disturbances in these areas we developed an automated processing chain for the calculation and analysis of robust trends of key land surface indicators based on the full record of available Landsat TM, ETM +, and OLI data. The methodology was applied to the ~ 29,000 km**2 Lena Delta in Northeast Siberia, where robust trend parameters (slope, confidence intervals of the slope, and intercept) were calculated for Tasseled Cap Greenness, Wetness and Brightness, NDVI, and NDWI, and NDMI based on 204 Landsat scenes for the observation period between 1999 and 2014. The resulting datasets revealed regional greening trends within the Lena Delta with several localized hot-spots of change, particularly in the vicinity of the main river channels. With a 30-m spatial resolution various permafrost-thaw related processes and disturbances, such as thermokarst lake expansion and drainage, fluvial erosion, and coastal changes were detected within the Lena Delta region, many of which have not been noticed or described before. Such hotspots of permafrost change exhibit significantly different trend parameters compared to non-disturbed areas. The processed dataset, which is made freely available through the data archive PANGAEA, will be a useful resource for further process specific analysis by researchers and land managers. With the high level of automation and the use of the freely available Landsat archive data, the workflow is scalable and transferrable to other regions, which should enable the comparison of land surface changes in different permafrost affected regions and help to understand and quantify permafrost landscape dynamics.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Inferring the spatial expansion dynamics of invading species from molecular data is notoriously difficult due to the complexity of the processes involved. For these demographic scenarios, genetic data obtained from highly variable markers may be profitably combined with specific sampling schemes and information from other sources using a Bayesian approach. The geographic range of the introduced toad Bufo marinus is still expanding in eastern and northern Australia, in each case from isolates established around 1960. A large amount of demographic and historical information is available on both expansion areas. In each area, samples were collected along a transect representing populations of different ages and genotyped at 10 microsatellite loci. Five demographic models of expansion, differing in the dispersal pattern for migrants and founders and in the number of founders, were considered. Because the demographic history is complex, we used an approximate Bayesian method, based on a rejection-regression algorithm. to formally test the relative likelihoods of the five models of expansion and to infer demographic parameters. A stepwise migration-foundation model with founder events was statistically better supported than other four models in both expansion areas. Posterior distributions supported different dynamics of expansion in the studied areas. Populations in the eastern expansion area have a lower stable effective population size and have been founded by a smaller number of individuals than those in the northern expansion area. Once demographically stabilized, populations exchange a substantial number of effective migrants per generation in both expansion areas, and such exchanges are larger in northern than in eastern Australia. The effective number of migrants appears to be considerably lower than that of founders in both expansion areas. We found our inferences to be relatively robust to various assumptions on marker. demographic, and historical features. The method presented here is the only robust, model-based method available so far, which allows inferring complex population dynamics over a short time scale. It also provides the basis for investigating the interplay between population dynamics, drift, and selection in invasive species.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this thesis, a machine learning approach was used to develop a predictive model for residual methanol concentration in industrial formalin produced at the Akzo Nobel factory in Kristinehamn, Sweden. The MATLABTM computational environment supplemented with the Statistics and Machine LearningTM toolbox from the MathWorks were used to test various machine learning algorithms on the formalin production data from Akzo Nobel. As a result, the Gaussian Process Regression algorithm was found to provide the best results and was used to create the predictive model. The model was compiled to a stand-alone application with a graphical user interface using the MATLAB CompilerTM.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper gives a new iterative algorithm for kernel logistic regression. It is based on the solution of a dual problem using ideas similar to those of the Sequential Minimal Optimization algorithm for Support Vector Machines. Asymptotic convergence of the algorithm is proved. Computational experiments show that the algorithm is robust and fast. The algorithmic ideas can also be used to give a fast dual algorithm for solving the optimization problem arising in the inner loop of Gaussian Process classifiers.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Using the classical Parzen window (PW) estimate as the target function, the sparse kernel density estimator is constructed in a forward constrained regression manner. The leave-one-out (LOO) test score is used for kernel selection. The jackknife parameter estimator subject to positivity constraint check is used for the parameter estimation of a single parameter at each forward step. As such the proposed approach is simple to implement and the associated computational cost is very low. An illustrative example is employed to demonstrate that the proposed approach is effective in constructing sparse kernel density estimators with comparable accuracy to that of the classical Parzen window estimate.