957 resultados para Multivariate analysis of variance
Resumo:
In this work we study the classification of forest types using mathematics based image analysis on satellite data. We are interested in improving classification of forest segments when a combination of information from two or more different satellites is used. The experimental part is based on real satellite data originating from Canada. This thesis gives summary of the mathematics basics of the image analysis and supervised learning , methods that are used in the classification algorithm. Three data sets and four feature sets were investigated in this thesis. The considered feature sets were 1) histograms (quantiles) 2) variance 3) skewness and 4) kurtosis. Good overall performances were achieved when a combination of ASTERBAND and RADARSAT2 data sets was used.
Resumo:
Raw measurement data does not always immediately convey useful information, but applying mathematical statistical analysis tools into measurement data can improve the situation. Data analysis can offer benefits like acquiring meaningful insight from the dataset, basing critical decisions on the findings, and ruling out human bias through proper statistical treatment. In this thesis we analyze data from an industrial mineral processing plant with the aim of studying the possibility of forecasting the quality of the final product, given by one variable, with a model based on the other variables. For the study mathematical tools like Qlucore Omics Explorer (QOE) and Sparse Bayesian regression (SB) are used. Later on, linear regression is used to build a model based on a subset of variables that seem to have most significant weights in the SB model. The results obtained from QOE show that the variable representing the desired final product does not correlate with other variables. For SB and linear regression, the results show that both SB and linear regression models built on 1-day averaged data seriously underestimate the variance of true data, whereas the two models built on 1-month averaged data are reliable and able to explain a larger proportion of variability in the available data, making them suitable for prediction purposes. However, it is concluded that no single model can fit well the whole available dataset and therefore, it is proposed for future work to make piecewise non linear regression models if the same available dataset is used, or the plant to provide another dataset that should be collected in a more systematic fashion than the present data for further analysis.
Resumo:
One of the largest genera of Orchidaceae in the Neotropics with about 450 species, Maxillaria presents several taxonomic uncertainties about its generic circumscription and the delimitation of species groups, mainly due to the large variability of some species. The present study aims at verifying the morphological variation and species delimitation in the Brasiliorchis picta complex, a recent new genus derived from Maxillaria, using morphometric multivariate analysis. A total of 340 specimens belonging to six species (B. chrysantha (Barb. Rodr.) R.B. Singer, S. Koehler & Carnevali, B. gracilis (Lodd.) R.B. Singer, S. Koehler & Carnevali, B. marginata (Lindl.) R.B. Singer, S. Koehler & Carnevali, B. picta (Hook.) R. Singer, S. Koehler & Carnevali, B. porphyrostele (Rchb. f.) R.B. Singer, S. Koehler & Carnevali and B. ubatubana (Hoehne) R.B. Singer, S. Koehler & Carnevali) were analyzed using multivariate methods (PCA, CVA, DA, and Cluster Analysis with UPGMA). B. gracilis shows the largest morphological discontinuity, mainly due to its smaller size. The other species tend to form distinct groups, but intermediate characteristics between pairs of species induce overlaps among the individuals of different species and thus confuse the distinction of each one. Hybridization and geographic distribution can be involved in the differentiation of the species and lineages in this complex. Because the species classified a priori in this work cannot be recognized by the quantitative characters measured here, such other tools as geometric morphometry and molecular data should be employed in future works to clarify species relationships in this complex.
Resumo:
Transitional cell carcinoma (TCC) of the urothelium is often multifocal and subsequent tumors may occur anywhere in the urinary tract after the treatment of a primary carcinoma. Patients initially presenting a bladder cancer are at significant risk of developing metachronous tumors in the upper urinary tract (UUT). We evaluated the prognostic factors of primary invasive bladder cancer that may predict a metachronous UUT TCC after radical cystectomy. The records of 476 patients who underwent radical cystectomy for primary invasive bladder TCC from 1989 to 2001 were reviewed retrospectively. The prognostic factors of UUT TCC were determined by multivariate analysis using the COX proportional hazards regression model. Kaplan-Meier analysis was also used to assess the variable incidence of UUT TCC according to different risk factors. Twenty-two patients (4.6%). developed metachronous UUT TCC. Multiplicity, prostatic urethral involvement by the bladder cancer and the associated carcinoma in situ (CIS) were significant and independent factors affecting the occurrence of metachronous UUT TCC (P = 0.0425, 0.0082, and 0.0006, respectively). These results were supported, to some extent, by analysis of the UUT TCC disease-free rate by the Kaplan-Meier method, whereby patients with prostatic urethral involvement or with associated CIS demonstrated a significantly lower metachronous UUT TCC disease-free rate than patients without prostatic urethral involvement or without associated CIS (log-rank test, P = 0.0116 and 0.0075, respectively). Multiple tumors, prostatic urethral involvement and associated CIS were risk factors for metachronous UUT TCC, a conclusion that may be useful for designing follow-up strategies for primary invasive bladder cancer after radical cystectomy.
Resumo:
The autonomic nervous system plays an important role in physiological and pathological conditions, and has been extensively evaluated by parametric and non-parametric spectral analysis. To compare the results obtained with fast Fourier transform (FFT) and the autoregressive (AR) method, we performed a comprehensive comparative study using data from humans and rats during pharmacological blockade (in rats), a postural test (in humans), and in the hypertensive state (in both humans and rats). Although postural hypotension in humans induced an increase in normalized low-frequency (LFnu) of systolic blood pressure, the increase in the ratio was detected only by AR. In rats, AR and FFT analysis did not agree for LFnu and high frequency (HFnu) under basal conditions and after vagal blockade. The increase in the LF/HF ratio of the pulse interval, induced by methylatropine, was detected only by FFT. In hypertensive patients, changes in LF and HF for systolic blood pressure were observed only by AR; FFT was able to detect the reduction in both blood pressure variance and total power. In hypertensive rats, AR presented different values of variance and total power for systolic blood pressure. Moreover, AR and FFT presented discordant results for LF, LFnu, HF, LF/HF ratio, and total power for pulse interval. We provide evidence for disagreement in 23% of the indices of blood pressure and heart rate variability in humans and 67% discordance in rats when these variables are evaluated by AR and FFT under physiological and pathological conditions. The overall disagreement between AR and FFT in this study was 43%.
Resumo:
Osteoporosis has become a serious global public health issue. Hence, osteoporotic fracture healing has been investigated in several previous studies because there is still controversy over the effect osteoporosis has on the healing process. The current study aimed to analyze two different periods of bone healing in normal and osteopenic rats. Sixty, 7-week-old female Wistar rats were randomly divided into four groups: unrestricted and immobilized for 2 weeks after osteotomy (OU2), suspended and immobilized for 2 weeks after osteotomy (OS2), unrestricted and immobilized for 6 weeks after osteotomy (OU6), and suspended and immobilized for 6 weeks after osteotomy (OS6). Osteotomy was performed in the middle third of the right tibia 21 days after tail suspension, when the osteopenic condition was already set. The fractured limb was then immobilized by orthosis. Tibias were collected 2 and 6 weeks after osteotomy, and were analyzed by bone densitometry, mechanical testing, and histomorphometry. Bone mineral density values from bony calluses were significantly lower in the 2-week post-osteotomy groups compared with the 6-week post-osteotomy groups (multivariate general linear model analysis, P<0.000). Similarly, the mechanical properties showed that animals had stronger bones 6 weeks after osteotomy compared with 2 weeks after osteotomy (multivariate general linear model analysis, P<0.000). Histomorphometry indicated gradual bone healing. Results showed that osteopenia did not influence the bone healing process, and that time was an independent determinant factor regardless of whether the fracture was osteopenic. This suggests that the body is able to compensate for the negative effects of suspension.
Resumo:
AbstractThis study aimed to evaluate the effect of the distillation time and the sample mass on the total SO2 content in integral passion fruit juice (Passiflora sp). For the SO2 analysis, a modified version of the Monier-Williams method was used. In this experiment, the distillation time and the sample mass were reduced to half of the values proposed in the original method. The analyses were performed in triplicate for each distilling time x sample mass binomial, making a total of 12 tests, which were performed on the same day. The significance of the effects of the different distillation times and sample mass were evaluated by applying one-factor analysis of variance (ANOVA). For a 95% confidence limit, it was found that the proposed amendments to the distillation time, sample mass, and the interaction between distilling time x sample mass were not significant (p > 0.05) in determining the SO2 content in passion fruit juice. In view of the results that were obtained it was concluded that for integral passion fruit juice it was possible to reduce the distillation time and the sample mass in determining the SO2 content by the Monier-Williams method without affecting the result.
Multivariate study of Nile tilapia byproducts enriched with omega-3 and dried with different methods
Resumo:
Abstract The present work aimed at studying the effect of different drying methods applied to tilapia byproducts (heads, viscera and carcasses) fed with flaxseed, verifying the contents of omega-3 fatty acids. Two diets were given to the tilapia: a control and a flaxseed formulation, over the course of 60 days. After this period, they were slaughtered and their byproducts (heads, viscera and carcasses) were collected. These fish parts were analyzed in natura, lyophilized and oven dried. Byproducts from tilapia fed with flaxseed presented docosapentaenoic, eicopentaenoic and docosahexanoic fatty acids as a result of the enzymatic metabolism of the fish. The byproducts from the oven drying process had lower levels of polyunsaturated fatty acids. In the multivariate analysis, the byproducts from fish fed with flaxseed had a greater composition of fatty acids. The addition of flaxseed in fish diets, as well as the utilization of their byproducts, may become a good business strategy. Additionally, the byproducts may be dried to facilitate transport and storage.
Resumo:
Introduction: Pre-implantation kidney biopsy is a decision-making tool when considering the use of grafts from deceased donors with expanded criteria, implanting one or two kidneys and comparing this to post-transplantation biopsies. The role of histopathological alterations in kidney compartments as a prognostic factor in graft survival and function has had conflicting results. Objective: This study evaluated the prevalence of chronic alterations in pre-implant biopsies of kidney grafts and the association of findings with graft function and survival in one year post-transplant. Methods: 110 biopsies were analyzed between 2006 and 2009 at Santa Casa de Porto Alegre, including live donors, ideal deceased donors and those with expanded criteria. The score was computed according to criteria suggested by Remuzzi. The glomerular filtration rate (GFR) was calculated using the abbreviated MDRD formula. Results: No statistical difference was found in the survival of donors stratified according to Remuzzi criteria. The GFR was significantly associated with the total scores in the groups with mild and moderate alterations, and in the kidney compartments alone, by univariate analysis. The multivariate model found an association with the presence of arteriosclerosis, glomerulosclerosis, acute rejection and delayed graft function. Conclusion: Pre-transplant chronic kidney alterations did not influence the post-transplantation one-year graft survival, but arteriosclerosis and glomerulosclerosis is predictive of a worse GFR. Delayed graft function and acute rejection are independent prognostic factors.
Resumo:
Avidins (Avds) are homotetrameric or homodimeric glycoproteins with typically less than 130 amino acid residues per monomer. They form a highly stable, non-covalent complex with biotin (vitamin H) with Kd = 10-15 M (for chicken Avd). The best-studied Avds are the chicken Avd from Gallus gallus and streptavidin from Streptomyces avidinii, although other Avd studies have also included Avds from various origins, e.g., from frogs, fishes, mushrooms and from many different bacteria. Several engineered Avds have been reported as well, e.g., dual-chain Avds (dcAvds) and single-chain Avds (scAvds), circular permutants with up to four simultaneously modifiable ligand-binding sites. These engineered Avds along with the many native Avds have potential to be used in various nanobiotechnological applications. In this study, we made a structure-based alignment representing all currently available sequences of Avds and studied the evolutionary relationship of Avds using phylogenetic analysis. First, we created an initial multiple sequence alignment of Avds using 42 closely related sequences, guided by the known Avd crystal structures. Next, we searched for non-redundant Avd sequences from various online databases, including National Centre for Biotechnology Information and the Universal Protein Resource; the identified sequences were added to the initial alignment to expand it to a final alignment of 242 Avd sequences. The MEGA software package was used to create distance matrices and a phylogenetic tree. Bootstrap reproducibility of the tree was poor at multiple nodes and may reflect on several possible issues with the data: the sequence length compared is relatively short and, whereas some positions are highly conserved and functional, others can vary without impinging on the structure or the function, so there are few informative sites; it may be that periods of rapid duplication have led to paralogs and that the differences among them are within the error limit of the data; and there may be other yet unknown reasons. Principle component analysis applied to alternative distance data did segregate the major groups, and success is likely due to the multivariate consideration of all the information. Furthermore, based on our extensive alignment and phylogenetic analysis, we expressed two novel Avds, lacavidin from Lactrodectus Hesperus, a western black widow spider, and hoefavidin from Hoeflea phototrophica, an aerobic marine bacterium, the ultimate aim being to determine their X-ray structures. These Avds were selected because of their unique sequences: lacavidin has an N-terminal Avd-like domain but a long C-terminal overhang, whereas hoefavidin was thought to be a dimeric Avd. Both these Avds could be used as novel scaffolds in biotechnological applications.
Resumo:
This work investigates theoretical properties of symmetric and anti-symmetric kernels. First chapters give an overview of the theory of kernels used in supervised machine learning. Central focus is on the regularized least squares algorithm, which is motivated as a problem of function reconstruction through an abstract inverse problem. Brief review of reproducing kernel Hilbert spaces shows how kernels define an implicit hypothesis space with multiple equivalent characterizations and how this space may be modified by incorporating prior knowledge. Mathematical results of the abstract inverse problem, in particular spectral properties, pseudoinverse and regularization are recollected and then specialized to kernels. Symmetric and anti-symmetric kernels are applied in relation learning problems which incorporate prior knowledge that the relation is symmetric or anti-symmetric, respectively. Theoretical properties of these kernels are proved in a draft this thesis is based on and comprehensively referenced here. These proofs show that these kernels can be guaranteed to learn only symmetric or anti-symmetric relations, and they can learn any relations relative to the original kernel modified to learn only symmetric or anti-symmetric parts. Further results prove spectral properties of these kernels, central result being a simple inequality for the the trace of the estimator, also called the effective dimension. This quantity is used in learning bounds to guarantee smaller variance.
Resumo:
Over time the demand for quantitative portfolio management has increased among financial institutions but there is still a lack of practical tools. In 2008 EDHEC Risk and Asset Management Research Centre conducted a survey of European investment practices. It revealed that the majority of asset or fund management companies, pension funds and institutional investors do not use more sophisticated models to compensate the flaws of the Markowitz mean-variance portfolio optimization. Furthermore, tactical asset allocation managers employ a variety of methods to estimate return and risk of assets, but also need sophisticated portfolio management models to outperform their benchmarks. Recent development in portfolio management suggests that new innovations are slowly gaining ground, but still need to be studied carefully. This thesis tries to provide a practical tactical asset allocation (TAA) application to the Black–Litterman (B–L) approach and unbiased evaluation of B–L models’ qualities. Mean-variance framework, issues related to asset allocation decisions and return forecasting are examined carefully to uncover issues effecting active portfolio management. European fixed income data is employed in an empirical study that tries to reveal whether a B–L model based TAA portfolio is able outperform its strategic benchmark. The tactical asset allocation utilizes Vector Autoregressive (VAR) model to create return forecasts from lagged values of asset classes as well as economic variables. Sample data (31.12.1999–31.12.2012) is divided into two. In-sample data is used for calibrating a strategic portfolio and the out-of-sample period is for testing the tactical portfolio against the strategic benchmark. Results show that B–L model based tactical asset allocation outperforms the benchmark portfolio in terms of risk-adjusted return and mean excess return. The VAR-model is able to pick up the change in investor sentiment and the B–L model adjusts portfolio weights in a controlled manner. TAA portfolio shows promise especially in moderately shifting allocation to more risky assets while market is turning bullish, but without overweighting investments with high beta. Based on findings in thesis, Black–Litterman model offers a good platform for active asset managers to quantify their views on investments and implement their strategies. B–L model shows potential and offers interesting research avenues. However, success of tactical asset allocation is still highly dependent on the quality of input estimates.
Resumo:
In this paper, we propose exact inference procedures for asset pricing models that can be formulated in the framework of a multivariate linear regression (CAPM), allowing for stable error distributions. The normality assumption on the distribution of stock returns is usually rejected in empirical studies, due to excess kurtosis and asymmetry. To model such data, we propose a comprehensive statistical approach which allows for alternative - possibly asymmetric - heavy tailed distributions without the use of large-sample approximations. The methods suggested are based on Monte Carlo test techniques. Goodness-of-fit tests are formally incorporated to ensure that the error distributions considered are empirically sustainable, from which exact confidence sets for the unknown tail area and asymmetry parameters of the stable error distribution are derived. Tests for the efficiency of the market portfolio (zero intercepts) which explicitly allow for the presence of (unknown) nuisance parameter in the stable error distribution are derived. The methods proposed are applied to monthly returns on 12 portfolios of the New York Stock Exchange over the period 1926-1995 (5 year subperiods). We find that stable possibly skewed distributions provide statistically significant improvement in goodness-of-fit and lead to fewer rejections of the efficiency hypothesis.
Resumo:
The thesis has covered various aspects of modeling and analysis of finite mean time series with symmetric stable distributed innovations. Time series analysis based on Box and Jenkins methods are the most popular approaches where the models are linear and errors are Gaussian. We highlighted the limitations of classical time series analysis tools and explored some generalized tools and organized the approach parallel to the classical set up. In the present thesis we mainly studied the estimation and prediction of signal plus noise model. Here we assumed the signal and noise follow some models with symmetric stable innovations.We start the thesis with some motivating examples and application areas of alpha stable time series models. Classical time series analysis and corresponding theories based on finite variance models are extensively discussed in second chapter. We also surveyed the existing theories and methods correspond to infinite variance models in the same chapter. We present a linear filtering method for computing the filter weights assigned to the observation for estimating unobserved signal under general noisy environment in third chapter. Here we consider both the signal and the noise as stationary processes with infinite variance innovations. We derived semi infinite, double infinite and asymmetric signal extraction filters based on minimum dispersion criteria. Finite length filters based on Kalman-Levy filters are developed and identified the pattern of the filter weights. Simulation studies show that the proposed methods are competent enough in signal extraction for processes with infinite variance.Parameter estimation of autoregressive signals observed in a symmetric stable noise environment is discussed in fourth chapter. Here we used higher order Yule-Walker type estimation using auto-covariation function and exemplify the methods by simulation and application to Sea surface temperature data. We increased the number of Yule-Walker equations and proposed a ordinary least square estimate to the autoregressive parameters. Singularity problem of the auto-covariation matrix is addressed and derived a modified version of the Generalized Yule-Walker method using singular value decomposition.In fifth chapter of the thesis we introduced partial covariation function as a tool for stable time series analysis where covariance or partial covariance is ill defined. Asymptotic results of the partial auto-covariation is studied and its application in model identification of stable auto-regressive models are discussed. We generalize the Durbin-Levinson algorithm to include infinite variance models in terms of partial auto-covariation function and introduce a new information criteria for consistent order estimation of stable autoregressive model.In chapter six we explore the application of the techniques discussed in the previous chapter in signal processing. Frequency estimation of sinusoidal signal observed in symmetric stable noisy environment is discussed in this context. Here we introduced a parametric spectrum analysis and frequency estimate using power transfer function. Estimate of the power transfer function is obtained using the modified generalized Yule-Walker approach. Another important problem in statistical signal processing is to identify the number of sinusoidal components in an observed signal. We used a modified version of the proposed information criteria for this purpose.
Resumo:
Computational Biology is the research are that contributes to the analysis of biological data through the development of algorithms which will address significant research problems.The data from molecular biology includes DNA,RNA ,Protein and Gene expression data.Gene Expression Data provides the expression level of genes under different conditions.Gene expression is the process of transcribing the DNA sequence of a gene into mRNA sequences which in turn are later translated into proteins.The number of copies of mRNA produced is called the expression level of a gene.Gene expression data is organized in the form of a matrix. Rows in the matrix represent genes and columns in the matrix represent experimental conditions.Experimental conditions can be different tissue types or time points.Entries in the gene expression matrix are real values.Through the analysis of gene expression data it is possible to determine the behavioral patterns of genes such as similarity of their behavior,nature of their interaction,their respective contribution to the same pathways and so on. Similar expression patterns are exhibited by the genes participating in the same biological process.These patterns have immense relevance and application in bioinformatics and clinical research.Theses patterns are used in the medical domain for aid in more accurate diagnosis,prognosis,treatment planning.drug discovery and protein network analysis.To identify various patterns from gene expression data,data mining techniques are essential.Clustering is an important data mining technique for the analysis of gene expression data.To overcome the problems associated with clustering,biclustering is introduced.Biclustering refers to simultaneous clustering of both rows and columns of a data matrix. Clustering is a global whereas biclustering is a local model.Discovering local expression patterns is essential for identfying many genetic pathways that are not apparent otherwise.It is therefore necessary to move beyond the clustering paradigm towards developing approaches which are capable of discovering local patterns in gene expression data.A biclusters is a submatrix of the gene expression data matrix.The rows and columns in the submatrix need not be contiguous as in the gene expression data matrix.Biclusters are not disjoint.Computation of biclusters is costly because one will have to consider all the combinations of columans and rows in order to find out all the biclusters.The search space for the biclustering problem is 2 m+n where m and n are the number of genes and conditions respectively.Usually m+n is more than 3000.The biclustering problem is NP-hard.Biclustering is a powerful analytical tool for the biologist.The research reported in this thesis addresses the problem of biclustering.Ten algorithms are developed for the identification of coherent biclusters from gene expression data.All these algorithms are making use of a measure called mean squared residue to search for biclusters.The objective here is to identify the biclusters of maximum size with the mean squared residue lower than a given threshold. All these algorithms begin the search from tightly coregulated submatrices called the seeds.These seeds are generated by K-Means clustering algorithm.The algorithms developed can be classified as constraint based,greedy and metaheuristic.Constarint based algorithms uses one or more of the various constaints namely the MSR threshold and the MSR difference threshold.The greedy approach makes a locally optimal choice at each stage with the objective of finding the global optimum.In metaheuristic approaches particle Swarm Optimization(PSO) and variants of Greedy Randomized Adaptive Search Procedure(GRASP) are used for the identification of biclusters.These algorithms are implemented on the Yeast and Lymphoma datasets.Biologically relevant and statistically significant biclusters are identified by all these algorithms which are validated by Gene Ontology database.All these algorithms are compared with some other biclustering algorithms.Algorithms developed in this work overcome some of the problems associated with the already existing algorithms.With the help of some of the algorithms which are developed in this work biclusters with very high row variance,which is higher than the row variance of any other algorithm using mean squared residue, are identified from both Yeast and Lymphoma data sets.Such biclusters which make significant change in the expression level are highly relevant biologically.