922 resultados para principal component regression
Resumo:
This work is devoted to the analysis of signal variation of the Cross-Direction and Machine-Direction measurements from paper web. The data that we possess comes from the real paper machine. Goal of the work is to reconstruct the basis weight structure of the paper and to predict its behaviour to the future. The resulting synthetic data is needed for simulation of paper web. The main idea that we used for describing the basis weight variation in the Cross-Direction is Empirical Orthogonal Functions (EOF) algorithm, which is closely related to Principal Component Analysis (PCA) method. Signal forecasting in time is based on Time-Series analysis. Two principal mathematical procedures that we used in the work are Autoregressive-Moving Average (ARMA) modelling and Ornstein–Uhlenbeck (OU) process.
Resumo:
The study of spatial variability of soil and plants attributes, or precision agriculture, a technique that aims the rational use of natural resources, is expanding commercially in Brazil. Nevertheless, there is a lack of mathematical analysis that supports the correlation of these independent variables and their interactions with the productivity, identifying scientific standards technologically applicable. The aim of this study was to identify patterns of soil variability according to the eleven physical and seven chemical indicators in an agricultural area. It was used two multivariate techniques: the hierarchical cluster analysis (HCA) and the principal component analysis (PCA). According to the HCA, the area was divided into five management zones: zone 1 with 2.87ha, zone 2 with 0.8ha, zone 3 with 1.84ha, zone 4 with 1.33ha and zone 5 with 2.76ha. By the PCA, it was identified the most important variables within each zone: V% for the zone 1, CTC in the zone 2, levels of H+Al in the zone 4 and sand content and altitude in the zone 5. The zone 3 was classified as an intermediate zone with characteristics of all others. According to the results it is concluded that it is possible to separate into groups (management zones) samples with the same patterns of variability by the multivariate statistical techniques.
Resumo:
The aim of this study was to investigate the effect of pre-slaughter handling on the occurrence of PSE (Pale, Soft, and Exudative) meat in swine slaughtered at a commercial slaughterhouse located in the metropolitan region of Dourados, Mato Grosso do Sul, Brazil. Based on the database (n=1,832 carcasses), it was possible to apply the integrated multivariate analysis for the purpose of identifying, among the selected variables, those of greatest relevance to this study. Results of the Principal Component Analysis showed that the first five components explained 89.28% of total variance. In the Factor Analysis, the first factor represented the thermal stress and fatiguing conditions for swine during pre-slaughter handling. In general, this study indicated the importance of the pre-slaughter handling stages, evidencing those of greatest stress and threat to animal welfare and pork quality, which are transport time, resting period, lairage time before unloading, unloading time, and ambience.
Resumo:
ABSTRACT This study aimed to develop a methodology based on multivariate statistical analysis of principal components and cluster analysis, in order to identify the most representative variables in studies of minimum streamflow regionalization, and to optimize the identification of the hydrologically homogeneous regions for the Doce river basin. Ten variables were used, referring to the river basin climatic and morphometric characteristics. These variables were individualized for each of the 61 gauging stations. Three dependent variables that are indicative of minimum streamflow (Q7,10, Q90 and Q95). And seven independent variables that concern to climatic and morphometric characteristics of the basin (total annual rainfall – Pa; total semiannual rainfall of the dry and of the rainy season – Pss and Psc; watershed drainage area – Ad; length of the main river – Lp; total length of the rivers – Lt; and average watershed slope – SL). The results of the principal component analysis pointed out that the variable SL was the least representative for the study, and so it was discarded. The most representative independent variables were Ad and Psc. The best divisions of hydrologically homogeneous regions for the three studied flow characteristics were obtained using the Mahalanobis similarity matrix and the complete linkage clustering method. The cluster analysis enabled the identification of four hydrologically homogeneous regions in the Doce river basin.
Resumo:
In this thesis, a classi cation problem in predicting credit worthiness of a customer is tackled. This is done by proposing a reliable classi cation procedure on a given data set. The aim of this thesis is to design a model that gives the best classi cation accuracy to e ectively predict bankruptcy. FRPCA techniques proposed by Yang and Wang have been preferred since they are tolerant to certain type of noise in the data. These include FRPCA1, FRPCA2 and FRPCA3 from which the best method is chosen. Two di erent approaches are used at the classi cation stage: Similarity classi er and FKNN classi er. Algorithms are tested with Australian credit card screening data set. Results obtained indicate a mean classi cation accuracy of 83.22% using FRPCA1 with similarity classi- er. The FKNN approach yields a mean classi cation accuracy of 85.93% when used with FRPCA2, making it a better method for the suitable choices of the number of nearest neighbors and fuzziness parameters. Details on the calibration of the fuzziness parameter and other parameters associated with the similarity classi er are discussed.
Resumo:
In this study, cantilever-enhanced photoacoustic spectroscopy (CEPAS) was applied in different drug detection schemes. The study was divided into two different applications: trace detection of vaporized drugs and drug precursors in the gas-phase, and detection of cocaine abuse in hair. The main focus, however, was the study of hair samples. In the gas-phase, methyl benzoate, a hydrolysis product of cocaine hydrochloride, and benzyl methyl ketone (BMK), a precursor of amphetamine and methamphetamine were investigated. In the solid-phase, hair samples from cocaine overdose patients were measured and compared to a drug-free reference group. As hair consists mostly of long fibrous proteins generally called keratin, proteins from fingernails and saliva were also studied for comparison. Different measurement setups were applied in this study. Gas measurements were carried out using quantum cascade lasers (QLC) as a source in the photoacoustic detection. Also, an external cavity (EC) design was used for a broader tuning range. Detection limits of 3.4 particles per billion (ppb) for methyl benzoate and 26 ppb for BMK in 0.9 s were achieved with the EC-QCL PAS setup. The achieved detection limits are sufficient for realistic drug detection applications. The measurements from drug overdose patients were carried out using Fourier transform infrared (FTIR) PAS. The drug-containing hair samples and drug-free samples were both measured with the FTIR-PAS setup, and the measured spectra were analyzed statistically with principal component analysis (PCA). The two groups were separated by their spectra with PCA and proper spectral pre-processing. To improve the method, ECQCL measurements of the hair samples, and studies using photoacoustic microsampling techniques, were performed. High quality, high-resolution spectra with a broad tuning range were recorded from a single hair fiber. This broad tuning range of an EC-QCL has not previously been used in the photoacoustic spectroscopy of solids. However, no drug detection studies were performed with the EC-QCL solid-phase setup.
Resumo:
This study evaluated the photosynthetic responses of seven tropical trees of different successional groups under contrasting irradiance conditions, taking into account changes in gas exchange and chlorophyll a fluorescence. Although early successional species have shown higher values of CO2 assimilation (A) and transpiration (E), there was not a defined pattern of the daily gas exchange responses to high irradiance (FSL) among evaluated species. Cariniana legalis (Mart.) Kuntze (late secondary) and Astronium graveolens Jacq. (early secondary) exhibited larger reductions in daily-integrated CO2 assimilation (DIA) when transferred from medium light (ML) to FSL. On the other hand, the pioneer species Guazuma ulmifolia Lam. had significant DIA increase when exposed to FSL. The pioneers Croton spp. trended to show a DIA decrease around 19%, while Cytharexyllum myrianthum Cham. (pioneer) and Rhamnidium elaeocarpum Reiss. (early secondary) trended to increase DIA when transferred to FSL. Under this condition, all species showed dynamic photoinhibition, except for C. legalis that presented chronic photoinhibition of photosynthesis. Considering daily photosynthetic processes, our results supported the hypothesis of more flexible responses of early successional species (pioneer and early secondary species). The principal component analysis indicated that the photochemical parameters effective quantum efficiency of photosystem II and apparent electron transport rate were more suitable to separate the successional groups under ML condition, whereas A and E play a major role to this task under FSL condition.
Resumo:
Tutkimus käsittelee Yrittäjyyskasvatuksen Mittariston -projektia, jossa tutkimuskohteena on peruskoulun ensimmäisen asteen luokan- ja aineenopettajien näkemys ja kokemus yrittäjyyskasvatuksen verkostoyhteistyöstä. Tutkimuksen tarkoituksena oli selvittää miten hyvin opettajat tuntevat verkostoyhteistyötä, mikä on heidän tietämyksensä yrittäjyyskasvatuksesta ja kuinka tämä näkyy heidän työssään ja opetuksessaan. Tutkimuksen otos on 450 opettajaa. Tulokset analysoitiin SPSS-tilastomenetelmäohjelmalla. Tilastollisina tutkimusmenetelminä käytettiin jakaumien frekvenssianalyysiä, Faktorianalyysin Pääkomponenttianalyysiä ja Kaksisuuntaista varianssianalyysia (Anova). Tutkimuksen johtopäätöksenä voidaan todeta, että opettajien tiedot yhteistyö-verkostojen tarjoamista palveluista ovat hyvin hajanaiset. Ongelma jatkuu helposti niin kauan kunnes opettajien koulutusohjelmaan tuodaan lisää yrittäjyyskasvatus- ja yrittäjyysopintoja. Tämä pitäisi huomioida myös tulevissa opetussuunnitelmissa. Tämän tutkimuksen tavoitteena oli tuoda esille Yrittäjyyskasvatuksen mittariston tulosten kautta yrittäjyyskasvatuksen nykytila, tuoda ratkaisuja ehdotusten kautta opetukseen ja herättää keskustelua yrittäjyyskasvatuksen parantamiseksi.
Resumo:
Ferruginous "campos rupestres" are a particular type of vegetation growing on iron-rich primary soils. We investigated the influence of soil properties on plant species abundance at two sites of ferruginous "campos rupestres" and one site of quartzitic "campo rupestre", all of them in "Quadrilátero Ferrífero", in Minas Gerais State, southeastern Brazil. In each site, 30 quadrats were sampled to assess plant species composition and abundance, and soil samples were taken to perform chemical and physical analyses. The analyzed soils are strongly acidic and presented low fertility and high levels of metallic cations; a principal component analysis of soil data showed a clear segregation among sites due mainly to fertility and heavy metals content, especially Cu, Zn, and Pb. The canonical correspondence analysis indicated a strong correlation between plant species abundance and soil properties, also segregating the sites.
Resumo:
The amount of biological data has grown exponentially in recent decades. Modern biotechnologies, such as microarrays and next-generation sequencing, are capable to produce massive amounts of biomedical data in a single experiment. As the amount of the data is rapidly growing there is an urgent need for reliable computational methods for analyzing and visualizing it. This thesis addresses this need by studying how to efficiently and reliably analyze and visualize high-dimensional data, especially that obtained from gene expression microarray experiments. First, we will study the ways to improve the quality of microarray data by replacing (imputing) the missing data entries with the estimated values for these entries. Missing value imputation is a method which is commonly used to make the original incomplete data complete, thus making it easier to be analyzed with statistical and computational methods. Our novel approach was to use curated external biological information as a guide for the missing value imputation. Secondly, we studied the effect of missing value imputation on the downstream data analysis methods like clustering. We compared multiple recent imputation algorithms against 8 publicly available microarray data sets. It was observed that the missing value imputation indeed is a rational way to improve the quality of biological data. The research revealed differences between the clustering results obtained with different imputation methods. On most data sets, the simple and fast k-NN imputation was good enough, but there were also needs for more advanced imputation methods, such as Bayesian Principal Component Algorithm (BPCA). Finally, we studied the visualization of biological network data. Biological interaction networks are examples of the outcome of multiple biological experiments such as using the gene microarray techniques. Such networks are typically very large and highly connected, thus there is a need for fast algorithms for producing visually pleasant layouts. A computationally efficient way to produce layouts of large biological interaction networks was developed. The algorithm uses multilevel optimization within the regular force directed graph layout algorithm.
Resumo:
Identification of low-dimensional structures and main sources of variation from multivariate data are fundamental tasks in data analysis. Many methods aimed at these tasks involve solution of an optimization problem. Thus, the objective of this thesis is to develop computationally efficient and theoretically justified methods for solving such problems. Most of the thesis is based on a statistical model, where ridges of the density estimated from the data are considered as relevant features. Finding ridges, that are generalized maxima, necessitates development of advanced optimization methods. An efficient and convergent trust region Newton method for projecting a point onto a ridge of the underlying density is developed for this purpose. The method is utilized in a differential equation-based approach for tracing ridges and computing projection coordinates along them. The density estimation is done nonparametrically by using Gaussian kernels. This allows application of ridge-based methods with only mild assumptions on the underlying structure of the data. The statistical model and the ridge finding methods are adapted to two different applications. The first one is extraction of curvilinear structures from noisy data mixed with background clutter. The second one is a novel nonlinear generalization of principal component analysis (PCA) and its extension to time series data. The methods have a wide range of potential applications, where most of the earlier approaches are inadequate. Examples include identification of faults from seismic data and identification of filaments from cosmological data. Applicability of the nonlinear PCA to climate analysis and reconstruction of periodic patterns from noisy time series data are also demonstrated. Other contributions of the thesis include development of an efficient semidefinite optimization method for embedding graphs into the Euclidean space. The method produces structure-preserving embeddings that maximize interpoint distances. It is primarily developed for dimensionality reduction, but has also potential applications in graph theory and various areas of physics, chemistry and engineering. Asymptotic behaviour of ridges and maxima of Gaussian kernel densities is also investigated when the kernel bandwidth approaches infinity. The results are applied to the nonlinear PCA and to finding significant maxima of such densities, which is a typical problem in visual object tracking.
Resumo:
This thesis describes the occurrence and sources of selected persistent organic pollutants (POPs) such as polychlorinated dibenzo-p-dioxins (PCDDs), polychlorinated dibenzofurans (PCDFs), polychlorinated biphenyls (PCBs), polybrominated diphenyl ethers (PBDEs) and hexachlorocyclohexanes (HCHs) in the northern watershed of Lake Victoria. Sediments and fish were collected from three highly polluted embayments (i.e. Murchison Bay, Napoleon Gulf and Thurston Bay) of the lake. The analysis for PCDD/Fs, PCBs and PBDEs was done using a high resolution mass spectrometer coupled to a gas chromatograph (GC), and a GC equipped with an electron capture detector was used for HCHs. Total (Σ) PCDD/Fs, PCBs and PBDEs in sediments ranged from 3.19 to 478, 313 to 4325 and 60.8 to 179 pg g-1 dry weight (dw), respectively. The highest concentrations of pollutants were found at sites close to industrial areas and wastewater discharge points. The maximum concentrations of PCDD/Fs, PCBs, PBDEs and HCHs in fish muscle homogenates were 49, 779, 495 and 45,900 pg g-1 wet weight (ww), respectively. The concentrations of the pollutants in Nile perch (Lates niloticus) were significantly greater than those in Nile tilapia (Oreochromis niloticus), possibly due to differences in trophic level and dietary feeding habits among fish species. World Health Organization-toxic equivalency quotient (WHO2005-TEQ) values in the sediments were up to 4.24 pg g-1 dw for PCDD/Fs and 0.55 pg TEQ g-1 dw for the 12 dioxin-like PCBs (dl-PCBs). 23.1% of the samples from the Napoleon Gulf were above the interim sediment quality guideline value of 0.85 pg WHO-TEQ g-1 dw set by the Canadian Council for Ministers of the Environment. The WHO2005-TEQs in fish were 0.001-0.16 pg g-1 for PCDD/Fs and 0.001-0.31 pg g-1 ww for dl- PCBs. The TEQ values were within a permissible level of 3.5 pg g−1 ww recommended by the European Commission. Based on the Commission set TEQs and minimum risk level criteria formulated by the Agency for Toxic Substances and Disease Registry, the consumption of fish from Lake Victoria gives no indication of health risks associated to PCDD/Fs and PCBs. Principal component analysis (PCA) indicated that anthropogenic activities such as agricultural straw open burning, medical waste incinerators and municipal solid waste combustors were the major sources of PCDD/Fs in the watershed of Lake Victoria. The ratios of α-/γ-HCH varied from 0.89 to 1.68 suggesting that the highest HCH residues mainly came from earlier usage and fresh γ-HCH (lindane). In the present study, the concentration of POPs in fish were not significantly related to those in sediments, and the biota sediment accumulation factor (BSAF) concept was found to be a poor predictor of the bioavailability and bioaccumulation of environmental pollutants.
Resumo:
The objective of the present study was to investigate the psychometric properties and cross-cultural validity of the Beck Depression Inventory (BDI) among ethnic Chinese living in the city of São Paulo, Brazil. The study was conducted on 208 community individuals. Reliability and discriminant analysis were used to test the psychometric properties and validity of the BDI. Principal component analysis was performed to assess the BDI's factor structure for the total sample and by gender. The mean BDI score was lower (6.74, SD = 5.98) than observed in Western counterparts and showed no gender difference, good internal consistency (Cronbach's alpha 0.82), and high discrimination of depressive symptoms (75-100%). Factor analysis extracted two factors for the total sample and each gender: cognitive-affective dimension and somatic dimension. We conclude that depressive symptoms can be reliably assessed by the BDI in the Brazilian Chinese population, with a validity comparable to that for international studies. Indeed, cultural and measurement biases might have influenced the response of Chinese subjects.
Resumo:
Premenstrual syndrome and premenstrual dysphoric disorder (PMDD) seem to form a severity continuum with no clear-cut boundary. However, since the American Psychiatric Association proposed the research criteria for PMDD in 1994, there has been no agreement about the symptomatic constellation that constitutes this syndrome. The objective of the present study was to establish the core latent structure of PMDD symptoms in a non-clinical sample. Data concerning PMDD symptoms were obtained from 632 regularly menstruating college students (mean age 24.4 years, SD 5.9, range 17 to 49). For the first random half (N = 316), we performed principal component analysis (PCA) and for the remaining half (N = 316), we tested three theory-derived competing models of PMDD by confirmatory factor analysis. PCA allowed us to extract two correlated factors, i.e., dysphoric-somatic and behavioral-impairment factors. The two-dimensional latent model derived from PCA showed the best overall fit among three models tested by confirmatory factor analysis (c²53 = 64.39, P = 0.13; goodness-of-fit indices = 0.96; adjusted goodness-of-fit indices = 0.95; root mean square residual = 0.05; root mean square error of approximation = 0.03; 90%CI = 0.00 to 0.05; Akaike's information criterion = -41.61). The items "out of control" and "physical symptoms" loaded conspicuously on the first factor and "interpersonal impairment" loaded higher on the second factor. The construct validity for PMDD was accounted for by two highly correlated dimensions. These results support the argument for focusing on the core psychopathological dimension of PMDD in future studies.
Resumo:
A modified version of the intruder-resident paradigm was used to investigate if social recognition memory lasts at least 24 h. One hundred and forty-six adult male Wistar rats were used. Independent groups of rats were exposed to an intruder for 0.083, 0.5, 2, 24, or 168 h and tested 24 h after the first encounter with the familiar or a different conspecific. Factor analysis was employed to identify associations between behaviors and treatments. Resident rats exhibited a 24-h social recognition memory, as indicated by a 3- to 5-fold decrease in social behaviors in the second encounter with the same conspecific compared to those observed for a different conspecific, when the duration of the first encounter was 2 h or longer. It was possible to distinguish between two different categories of social behaviors and their expression depended on the duration of the first encounter. Sniffing the anogenital area (49.9% of the social behaviors), sniffing the body (17.9%), sniffing the head (3%), and following the conspecific (3.1%), exhibited mostly by resident rats, characterized social investigation and revealed long-term social recognition memory. However, dominance (23.8%) and mild aggression (2.3%), exhibited by both resident and intruders, characterized social agonistic behaviors and were not affected by memory. Differently, sniffing the environment (76.8% of the non-social behaviors) and rearing (14.3%), both exhibited mostly by adult intruder rats, characterized non-social behaviors. Together, these results show that social recognition memory in rats may last at least 24 h after a 2-h or longer exposure to the conspecific.