991 resultados para vector auto regression
Resumo:
Self-organizing maps (SOM) are artificial neural networks widely used in the data mining field, mainly because they constitute a dimensionality reduction technique given the fixed grid of neurons associated with the network. In order to properly the partition and visualize the SOM network, the various methods available in the literature must be applied in a post-processing stage, that consists of inferring, through its neurons, relevant characteristics of the data set. In general, such processing applied to the network neurons, instead of the entire database, reduces the computational costs due to vector quantization. This work proposes a post-processing of the SOM neurons in the input and output spaces, combining visualization techniques with algorithms based on gravitational forces and the search for the shortest path with the greatest reward. Such methods take into account the connection strength between neighbouring neurons and characteristics of pattern density and distances among neurons, both associated with the position that the neurons occupy in the data space after training the network. Thus, the goal consists of defining more clearly the arrangement of the clusters present in the data. Experiments were carried out so as to evaluate the proposed methods using various artificially generated data sets, as well as real world data sets. The results obtained were compared with those from a number of well-known methods existent in the literature
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Hundreds of Terabytes of CMS (Compact Muon Solenoid) data are being accumulated for storage day by day at the University of Nebraska-Lincoln, which is one of the eight US CMS Tier-2 sites. Managing this data includes retaining useful CMS data sets and clearing storage space for newly arriving data by deleting less useful data sets. This is an important task that is currently being done manually and it requires a large amount of time. The overall objective of this study was to develop a methodology to help identify the data sets to be deleted when there is a requirement for storage space. CMS data is stored using HDFS (Hadoop Distributed File System). HDFS logs give information regarding file access operations. Hadoop MapReduce was used to feed information in these logs to Support Vector Machines (SVMs), a machine learning algorithm applicable to classification and regression which is used in this Thesis to develop a classifier. Time elapsed in data set classification by this method is dependent on the size of the input HDFS log file since the algorithmic complexities of Hadoop MapReduce algorithms here are O(n). The SVM methodology produces a list of data sets for deletion along with their respective sizes. This methodology was also compared with a heuristic called Retention Cost which was calculated using size of the data set and the time since its last access to help decide how useful a data set is. Accuracies of both were compared by calculating the percentage of data sets predicted for deletion which were accessed at a later instance of time. Our methodology using SVMs proved to be more accurate than using the Retention Cost heuristic. This methodology could be used to solve similar problems involving other large data sets.
Resumo:
Support Vector Machines (SVMs) have achieved very good performance on different learning problems. However, the success of SVMs depends on the adequate choice of the values of a number of parameters (e.g., the kernel and regularization parameters). In the current work, we propose the combination of meta-learning and search algorithms to deal with the problem of SVM parameter selection. In this combination, given a new problem to be solved, meta-learning is employed to recommend SVM parameter values based on parameter configurations that have been successfully adopted in previous similar problems. The parameter values returned by meta-learning are then used as initial search points by a search technique, which will further explore the parameter space. In this proposal, we envisioned that the initial solutions provided by meta-learning are located in good regions of the search space (i.e. they are closer to optimum solutions). Hence, the search algorithm would need to evaluate a lower number of candidate solutions when looking for an adequate solution. In this work, we investigate the combination of meta-learning with two search algorithms: Particle Swarm Optimization and Tabu Search. The implemented hybrid algorithms were used to select the values of two SVM parameters in the regression domain. These combinations were compared with the use of the search algorithms without meta-learning. The experimental results on a set of 40 regression problems showed that, on average, the proposed hybrid methods obtained lower error rates when compared to their components applied in isolation.
Resumo:
A Bayesian approach to estimation of the regression coefficients of a multinominal logit model with ordinal scale response categories is presented. A Monte Carlo method is used to construct the posterior distribution of the link function. The link function is treated as an arbitrary scalar function. Then the Gauss-Markov theorem is used to determine a function of the link which produces a random vector of coefficients. The posterior distribution of the random vector of coefficients is used to estimate the regression coefficients. The method described is referred to as a Bayesian generalized least square (BGLS) analysis. Two cases involving multinominal logit models are described. Case I involves a cumulative logit model and Case II involves a proportional-odds model. All inferences about the coefficients for both cases are described in terms of the posterior distribution of the regression coefficients. The results from the BGLS method are compared to maximum likelihood estimates of the regression coefficients. The BGLS method avoids the nonlinear problems encountered when estimating the regression coefficients of a generalized linear model. The method is not complex or computationally intensive. The BGLS method offers several advantages over Bayesian approaches. ^
Resumo:
The Fas–Fas ligand (FasL) system plays an important role in the induction of lymphoid apoptosis and has been implicated in the suppression of immune responses. Herein, we report that gene transfer of FasL inhibits tumor cell growth in vivo. Although such inhibition is expected in Fas+ tumor cell lines, marked regression was unexpectedly observed after FasL gene transfer into the CT26 colon carcinoma that does not express Fas. Infection by an adenoviral vector encoding FasL rapidly eliminated tumor masses in the Fas+ Renca tumor by inducing cell death, whereas the elimination of Fas− CT26 cells was mediated by inflammatory cells. Analysis of human malignancies revealed Fas, but not FasL, expression in a majority of tumors and susceptibility to FasL in most Fas+ cell lines. These findings suggest that gene transfer of FasL generates apoptotic responses and induces potent inflammatory reactions that can be used to induce the regression of malignancies.
Resumo:
Rodent tumor cells engineered to secrete cytokines such as interleukin 2 (IL-2) or IL-4 are rejected by syngeneic recipients due to an enhanced antitumor host immune response. An adenovirus vector (AdCAIL-2) containing the human IL-2 gene has been constructed and shown to direct secretion of high levels of human IL-2 in infected tumor cells. AdCAIL-2 induces regression of tumors in a transgenic mouse model of mammary adenocarcinoma following intratumoral injection. Elimination of existing tumors in this way results in immunity against a second challenge with tumor cells. These findings suggest that adenovirus vectors expressing cytokines may form the basis for highly effective immunotherapies of human cancers.
Resumo:
El análisis de las autoatribuciones académicas constituye un aspecto esencial del componente afectivo y emocional de la motivación escolar en estudiantes de educación secundaria obligatoria (ESO). El objetivo de este estudio fue analizar, mediante un diseño transversal, las diferencias de género y curso y el papel predictivo de estas variables en las atribuciones causales académicas de los alumnos medidas a través de las escalas generales de la Sydney Attribution Scale (SAS). El cuestionario fue administrado a 2.022 estudiantes (51.08% chicos) de 1º a 4º de ESO. El rango de edad fue de 12 a 16 años (M = 13.81; DT = 1.35). Los resultados derivados de los análisis de varianza y de los tamaños del efecto (índice d) revelaron que los chicos atribuyeron sus éxitos significativamente más a su capacidad, mientras las chicas los atribuyeron significativamente más al esfuerzo. Respecto a las atribuciones de fracaso escolar, los resultados indicaron que los chicos los atribuyeron significativamente más a la falta de esfuerzo que las chicas. Asimismo, se hallaron diferencias de curso académico en la mayoría de las atribuciones causales analizadas. Los análisis de regresión logística indicaron que el género y el curso fueron predictores significativos de las atribuciones causales académicas, aunque los resultados variaron para cada una de las escalas de la SAS. Los resultados son discutidos en relación a la necesidad de diseñar programas de intervención que tengan en cuenta las variables sexo y curso académico.
Resumo:
Purpose: To define a range of normality for the vectorial parameters Ocular Residual Astigmatism (ORA) and topography disparity (TD) and to evaluate their relationship with visual, refractive, anterior and posterior corneal curvature, pachymetric and corneal volume data in normal healthy eyes. Methods: This study comprised a total of 101 consecutive normal healthy eyes of 101 patients ranging in age from 15 to 64 years old. In all cases, a complete corneal analysis was performed using a Scheimpflug photography-based topography system (Pentacam system Oculus Optikgeräte GmbH). Anterior corneal topographic data were imported from the Pentacam system to the iASSORT software (ASSORT Pty. Ltd.), which allowed the calculation of the ocular residual astigmatism (ORA) and topography disparity (TD). Linear regression analysis was used for obtaining a linear expression relating ORA and posterior corneal astigmatism (PCA). Results: Mean magnitude of ORA was 0.79 D (SD: 0.43), with a normality range from 0 to 1.63 D. 90 eyes (89.1%) showed against-the-rule ORA. A weak although statistically significant correlation was found between the magnitudes of posterior corneal astigmatism and ORA (r = 0.34, p < 0.01). Regression analysis showed the presence of a linear relationship between these two variables, although with a very limited predictability (R2: 0.08). Mean magnitude of TD was 0.89 D (SD: 0.50), with a normality range from 0 to 1.87 D. Conclusion: The magnitude of the vector parameters ORA and TD is lower than 1.9 D in the healthy human eye.
Resumo:
This study presents the first analysis of the impact of NASCAR sponsorship announcements on the stock prices of sponsoring firms. The primary finding of the study-that NASCAR sponsorship announcements were accompanied by the largest increases in shareholder wealth ever recorded in the marketing literature in response to a voluntary marketing program-represents a striking and unambiguous stock market endorsement of the sponsorships. Indeed, the 24 sponsors analyzed in this study experienced mean increases in shareholder wealth of over $300 million dollars, net of all of the costs associated with the sponsorships. A multiple regression analysis of firm-specific stock price changes and select corporate and sponsorship attributes indicates that NASCAR sponsorships with more successful racing teams, corporate (as opposed to product or divisional) sponsorships, and sponsorships with direct ties to the consumer automotive industry are all positively correlated with perceived sponsorship success, while corporate cash flow per share (a well-known proxy for agency conflicts within the firm) is negatively related with shareholder approval.
Resumo:
Resiliência representa o processo dinâmico envolvendo a adaptação positiva no contexto de adversidade significativa. Estudos sobre o conceito têm aumentado com o advento da Psicologia Positiva, pelos potenciais efeitos na saúde e no desempenho dos trabalhadores. Outros conceitos importantes para a saúde circunscritos no escopo da Psicologia Positiva no contexto de trabalho são os de auto-eficácia, definida como crenças das pessoas sobre suas capacidades e/ou seu exercício de controle sobre os eventos que afetam sua vida e o de suporte social no trabalho, que compreende a percepção do quanto o contexto laborativo oferece apoio aos trabalhadores. Pouca literatura existe sobre resiliência no contexto de trabalho e nenhum estudo envolvendo os três construtos foi encontrado. Por isto, esta investigação analisou o impacto da auto-eficácia e da percepção de suporte social no trabalho sobre a resiliência de trabalhadores. Participaram 243 universitários trabalhadores da região metropolitana de São Paulo, com idade média de 23 anos (DP=6,2 anos), em sua maioria do sexo feminino (69,5%), cristãos (católicos=51,5%; protestantes=18,1%), atuantes em cargos de apoio administrativo e técnico (49,1%), oriundos de organizações de diversos ramos. Foi aplicado um questionário para coletar dados sócio-demográficos dos participantes e três escalas brasileiras válidas para medir a percepção de suporte social no trabalho (Escala de Percepção de Suporte Social no Trabalho EPSST), as crenças de auto-eficácia (Escala de Auto-eficácia Geral Percebida) e nível de resiliência (Escala de Resiliência de Connor-Davidson CD-RISC-10). Foram realizadas análises estatísticas exploratórias e descritivas, análises de regressão stepwise, análises de variância (ANOVA) e teste t para descrever participantes, variáveis e testar o modelo. Os dados revelaram que os universitários trabalhadores apresentam níveis de resiliência e auto-eficácia acima da média e de suporte social no trabalho, na média. Auto-eficácia se confirmou como preditor significativo de resiliência ao contrário dos três tipos de percepção de suporte social no trabalho (informacional, emocional e instrumental). Os achados indicaram a necessidade de aprofundamento sobre o tema e foi apontada a necessidade de novos estudos que auxiliem na compreensão dos resultados desta investigação.
Resumo:
Background: Allergy is a form of hypersensitivity to normally innocuous substances, such as dust, pollen, foods or drugs. Allergens are small antigens that commonly provoke an IgE antibody response. There are two types of bioinformatics-based allergen prediction. The first approach follows FAO/WHO Codex alimentarius guidelines and searches for sequence similarity. The second approach is based on identifying conserved allergenicity-related linear motifs. Both approaches assume that allergenicity is a linearly coded property. In the present study, we applied ACC pre-processing to sets of known allergens, developing alignment-independent models for allergen recognition based on the main chemical properties of amino acid sequences.Results: A set of 684 food, 1,156 inhalant and 555 toxin allergens was collected from several databases. A set of non-allergens from the same species were selected to mirror the allergen set. The amino acids in the protein sequences were described by three z-descriptors (z1, z2 and z3) and by auto- and cross-covariance (ACC) transformation were converted into uniform vectors. Each protein was presented as a vector of 45 variables. Five machine learning methods for classification were applied in the study to derive models for allergen prediction. The methods were: discriminant analysis by partial least squares (DA-PLS), logistic regression (LR), decision tree (DT), naïve Bayes (NB) and k nearest neighbours (kNN). The best performing model was derived by kNN at k = 3. It was optimized, cross-validated and implemented in a server named AllerTOP, freely accessible at http://www.pharmfac.net/allertop. AllerTOP also predicts the most probable route of exposure. In comparison to other servers for allergen prediction, AllerTOP outperforms them with 94% sensitivity.Conclusions: AllerTOP is the first alignment-free server for in silico prediction of allergens based on the main physicochemical properties of proteins. Significantly, as well allergenicity AllerTOP is able to predict the route of allergen exposure: food, inhalant or toxin. © 2013 Dimitrov et al.; licensee BioMed Central Ltd.
Resumo:
Este estudo investiga a otimização da resistência ao cisalhamento no plano de juntas de sobreposição co-curadas do compósito termoplástico unidirecional auto-reforçado de polietileno de baixa densidade reciclado reforçado por fibras de polietileno de ultra alto peso molecular através da relação desta resistência com os parâmetros processuais de prensagem a quente para a conformação da junta (pressão, temperatura, tempo e comprimento). A matriz teve sua estrutura química analisada para verificar potenciais degradações devidas à sua origem de reciclagem. Matriz e reforço foram caracterizados termicamente para definir a janela de temperatura de processamento de junta a ser estudada. A elaboração das condições de cura dos corpos de prova foi feita de acordo com a metodologia de Projeto de Experimento de Superfície de Resposta e a relação entre a resistência ao cisalhamento das juntas e os respectivos parâmetros de cura foi obtida através de equação de regressão gerada pelo método dos Mínimos Quadrados Ordinários. A caracterização mecânica em tração do material foi analisada micro e macromecanicamente. A análise química da matriz não demonstrou a presença de grupos carboxílicos que evidenciassem degradação por ramificações de cadeia e reticulação advindos da reciclagem do material. As metodologias de ensaio propostas demonstraram ser eficazes, podendo servir como base para a constituição de normas técnicas. Demonstrou-se que é possível obter juntas com resistência ótima ao cisalhamento de 6,88 MPa quando processadas a 1 bar, 115°C, 5 min e com 12 mm. A análise da fratura revelou que a ruptura por cisalhamento das juntas foi precedida por múltiplas fissuras longitudinais induzidas por sucessivos debondings, tanto dentro quanto fora da junta, devido à tensão transversal acumulada na mesma, proporcional a seu comprimento. A temperatura demonstrou ser o parâmetro de processamento mais relevante para a performance da junta, a qual é pouco afetada por variações na pressão e tempo de cura.
Resumo:
In this thesis, new classes of models for multivariate linear regression defined by finite mixtures of seemingly unrelated contaminated normal regression models and seemingly unrelated contaminated normal cluster-weighted models are illustrated. The main difference between such families is that the covariates are treated as fixed in the former class of models and as random in the latter. Thus, in cluster-weighted models the assignment of the data points to the unknown groups of observations depends also by the covariates. These classes provide an extension to mixture-based regression analysis for modelling multivariate and correlated responses in the presence of mild outliers that allows to specify a different vector of regressors for the prediction of each response. Expectation-conditional maximisation algorithms for the calculation of the maximum likelihood estimate of the model parameters have been derived. As the number of free parameters incresases quadratically with the number of responses and the covariates, analyses based on the proposed models can become unfeasible in practical applications. These problems have been overcome by introducing constraints on the elements of the covariance matrices according to an approach based on the eigen-decomposition of the covariance matrices. The performances of the new models have been studied by simulations and using real datasets in comparison with other models. In order to gain additional flexibility, mixtures of seemingly unrelated contaminated normal regressions models have also been specified so as to allow mixing proportions to be expressed as functions of concomitant covariates. An illustration of the new models with concomitant variables and a study on housing tension in the municipalities of the Emilia-Romagna region based on different types of multivariate linear regression models have been performed.