Biblioteca Digital

922 resultados para principal component regression

NEAR INFRARED SPECTROSCOPY FOR ESTIMATING SUGARCANE BAGASSE CONTENT IN MEDIUM DENSITY FIBERBOARD

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Medium density fiberboard (MDF) is an engineered wood product formed by breaking down selected lignin-cellulosic material residuals into fibers, combining it with wax and a resin binder, and then forming panels by applying high temperature and pressure. Because the raw material in the industrial process is ever-changing, the panel industry requires methods for monitoring the composition of their products. The aim of this study was to estimate the ratio of sugarcane (SC) bagasse to Eucalyptus wood in MDF panels using near infrared (NIR) spectroscopy. Principal component analysis (PCA) and partial least square (PLS) regressions were performed. MDF panels having different bagasse contents were easily distinguished from each other by the PCA of their NIR spectra with clearly different patterns of response. The PLS-R models for SC content of these MDF samples presented a strong coefficient of determination (0.96) between the NIR-predicted and Lab-determined values and a low standard error of prediction (similar to 1.5%) in the cross-validations. A key role of resins (adhesives), cellulose, and lignin for such PLS-R calibrations was shown. PLS-DA model correctly classified ninety-four percent of MDF samples by cross-validations and ninety-eight percent of the panels by independent test set. These NIR-based models can be useful to quickly estimate sugarcane bagasse vs. Eucalyptus wood content ratio in unknown MDF samples and to verify the quality of these engineered wood products in an online process.

Chemometric Studies on Natural Products as Potential Inhibitors of the NADH Oxidase from Trypanosoma cruzi Using the VolSurf Approach

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Natural products have widespread biological activities, including inhibition of mitochondrial enzyme systems. Some of these activities, for example cytotoxicity, may be the result of alteration of cellular bioenergetics. Based on previous computer-aided drug design (CADD) studies and considering reported data on structure-activity relationships (SAR), an assumption regarding the mechanism of action of natural products against parasitic infections involves the NADH-oxidase inhibition. In this study, chemometric tools, such as: Principal Component Analysis (PCA), Consensus PCA (CPCA), and partial least squares regression (PLS), were applied to a set of forty natural compounds, acting as NADH-oxidase inhibitors. The calculations were performed using the VolSurf+ program. The formalisms employed generated good exploratory and predictive results. The independent variables or descriptors having a hydrophobic profile were strongly correlated to the biological data.

Fluorescence and reflectance spectroscopy for identification of atherosclerosis in human carotid arteries using principal components analysis

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Objectives: The aim of this work was to verify the differentiation between normal and pathological human carotid artery tissues by using fluorescence and reflectance spectroscopy in the 400- to 700-nm range and the spectral characterization by means of principal components analysis. Background Data: Atherosclerosis is the most common and serious pathology of the cardiovascular system. Principal components represent the main spectral characteristics that occur within the spectral data and could be used for tissue classification. Materials and Methods: Sixty postmortem carotid artery fragments (26 non-atherosclerotic and 34 atherosclerotic with non-calcified plaques) were studied. The excitation radiation consisted of a 488-nm argon laser. Two 600-mu m core optical fibers were used, one for excitation and one to collect the fluorescence radiation from the samples. The reflectance system was composed of a halogen lamp coupled to an excitation fiber positioned in one of the ports of an integrating sphere that delivered 5 mW to the sample. The photo-reflectance signal was coupled to a 1/4-m spectrograph via an optical fiber. Euclidean distance was then used to classify each principal component score into one of two classes, normal and atherosclerotic tissue, for both fluorescence and reflectance. Results: The principal components analysis allowed classification of the samples with 81% sensitivity and 88% specificity for fluorescence, and 81% sensitivity and 91% specificity for reflectance. Conclusions: Our results showed that principal components analysis could be applied to differentiate between normal and atherosclerotic tissue with high sensitivity and specificity.

Dietary patterns and nutritional adequacy in a Mediterranean country

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Dietary patterns have been related to health outcomes and morbi-mortality. Mediterranean diet indexes are correlated With adequate nutrient intake. The objective of the present study was to analyse the adequacy of nutrient intake of a posteriori defined Mediterranean (MDP) and Western (WDP) diet patterns in the Seguimiento Universidad de Navarra (SUN) cohort. A sample of 17 197 subjects participated in the study. Participants completed I 136-item validated semi-quantitative FFQ. Principal component analysis was used to define dietary patterns. Individuals were classified according to quintiles of adherence based on dietary pattern scores. Non-dietary variables, such as smoking and physical activity habits, were also taken into account. The probability approach was used to assess nutrient intake adequacy of certain vitamins (vitamins B(12), B(6), B(3), B(2), B(1), A, C, D and E) and minerals (Na, Zn, iodine, Se, folic acid, P, Mg, K, Fe and Ca). Logistic regression analysis was used to assess the adequacy of nutrient intake according to adherence to dietary patterns. WDP and MDP were defined. A higher quintile of adherence to an MDP was associated to I lower prevalence of inadequacy for the intake of Zn, iodine, vitamin E, Mg, Fe, vitamin B I, vitamin A, Se, vitamin C and folic acid. The adjusted OR for not reaching at least six (or at leas( ten) nutrient recommendations were 0.09 (95% Cl: 0.07, 0.11) (and 0.02 (95% Cl: 0.00, 0.16)) for the upper quintile of MDP and 4.4 (95% Cl: 3.6, 5.5) and 2.5 (95 % Cl: 1.1, 5.4) for the WDP. The MDP was associated to a better profile of nutrient intake.

Impact of maritime air mass trajectories on the Western European coast urban aerosol

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Lisbon is the largest urban area in the Western European coast. Due to this geographical position the Atlantic Ocean serves as an important source of particles and plays an important role in many atmospheric processes. The main objectives of this study were to (1) perform a chemical characterization of particulate matter (PM2.5) sampled in Lisbon, (2) identify the main sources of particles, (3) determine PM contribution to this urban area, and (4) assess the impact of maritime air mass trajectories on concentration and composition of respirable PM sampled in Lisbon. During 2007, PM2.5 was collected on a daily basis in the center of Lisbon with a Partisol sampler. The exposed Teflon filters were measured by gravimetry and cut into two parts: one for analysis by instrumental neutron activation analysis (INAA) and the other by ion chromatography (IC). Principal component analysis (PCA) and multilinear regression analysis (MLRA) were used to identify possible sources of PM2.5 and determine mass contribution. Five main groups of sources were identified: secondary aerosols, traffic, calcium, soil, and sea. Four-day backtracking trajectories ending in Lisbon at the starting sampling time were calculated using the HYSPLIT model. Results showed that maritime transport scenarios were frequent. These episodes were characterized by a significant decrease of anthropogenic aerosol concentrations and exerted a significant role on air quality in this urban area.

Some reflections on acquisition, processing and analysis of statistical data in forest soils

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Beyond the classical statistical approaches (determination of basic statistics, regression analysis, ANOVA, etc.) a new set of applications of different statistical techniques has increasingly gained relevance in the analysis, processing and interpretation of data concerning the characteristics of forest soils. This is possible to be seen in some of the recent publications in the context of Multivariate Statistics. These new methods require additional care that is not always included or refered in some approaches. In the particular case of geostatistical data applications it is necessary, besides to geo-reference all the data acquisition, to collect the samples in regular grids and in sufficient quantity so that the variograms can reflect the spatial distribution of soil properties in a representative manner. In the case of the great majority of Multivariate Statistics techniques (Principal Component Analysis, Correspondence Analysis, Cluster Analysis, etc.) despite the fact they do not require in most cases the assumption of normal distribution, they however need a proper and rigorous strategy for its utilization. In this work, some reflections about these methodologies and, in particular, about the main constraints that often occur during the information collecting process and about the various linking possibilities of these different techniques will be presented. At the end, illustrations of some particular cases of the applications of these statistical methods will also be presented.

Forest soil short term response to prescribed fire: an approach by Principal Components Analysis

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In the current context of serious climate changes, where the increase of the frequency of some extreme events occurrence can enhance the rate of periods prone to high intensity forest fires, the National Forest Authority often implements, in several Portuguese forest areas, a regular set of measures in order to control the amount of fuel mass availability (PNDFCI, 2008). In the present work we’ll present a preliminary analysis concerning the assessment of the consequences given by the implementation of prescribed fire measures to control the amount of fuel mass in soil recovery, in particular in terms of its water retention capacity, its organic matter content, pH and content of iron. This work is included in a larger study (Meira-Castro, 2009(a); Meira-Castro, 2009(b)). According to the established praxis on the data collection, embodied in multidimensional matrices of n columns (variables in analysis) by p lines (sampled areas at different depths), and also considering the quantitative data nature present in this study, we’ve chosen a methodological approach that considers the multivariate statistical analysis, in particular, the Principal Component Analysis (PCA ) (Góis, 2004). The experiments were carried out in a soil cover over a natural site of Andaluzitic schist, in Gramelas, Caminha, NW Portugal, who was able to maintain itself intact from prescribed burnings from four years and was submit to prescribed fire in March 2008. The soils samples were collected from five different plots at six different time periods. The methodological option that was adopted have allowed us to identify the most relevant relational structures inside the n variables, the p samples and in two sets at the same time (Garcia-Pereira, 1990). Consequently, and in addition to the traditional outputs produced from the PCA, we have analyzed the influence of both sampling depths and geomorphological environments in the behavior of all variables involved.

Unmixing hyperspectral data: independent and dependent component analysis

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The development of high spatial resolution airborne and spaceborne sensors has improved the capability of ground-based data collection in the fields of agriculture, geography, geology, mineral identification, detection [2, 3], and classification [4–8]. The signal read by the sensor from a given spatial element of resolution and at a given spectral band is a mixing of components originated by the constituent substances, termed endmembers, located at that element of resolution. This chapter addresses hyperspectral unmixing, which is the decomposition of the pixel spectra into a collection of constituent spectra, or spectral signatures, and their corresponding fractional abundances indicating the proportion of each endmember present in the pixel [9, 10]. Depending on the mixing scales at each pixel, the observed mixture is either linear or nonlinear [11, 12]. The linear mixing model holds when the mixing scale is macroscopic [13]. The nonlinear model holds when the mixing scale is microscopic (i.e., intimate mixtures) [14, 15]. The linear model assumes negligible interaction among distinct endmembers [16, 17]. The nonlinear model assumes that incident solar radiation is scattered by the scene through multiple bounces involving several endmembers [18]. Under the linear mixing model and assuming that the number of endmembers and their spectral signatures are known, hyperspectral unmixing is a linear problem, which can be addressed, for example, under the maximum likelihood setup [19], the constrained least-squares approach [20], the spectral signature matching [21], the spectral angle mapper [22], and the subspace projection methods [20, 23, 24]. Orthogonal subspace projection [23] reduces the data dimensionality, suppresses undesired spectral signatures, and detects the presence of a spectral signature of interest. The basic concept is to project each pixel onto a subspace that is orthogonal to the undesired signatures. As shown in Settle [19], the orthogonal subspace projection technique is equivalent to the maximum likelihood estimator. This projection technique was extended by three unconstrained least-squares approaches [24] (signature space orthogonal projection, oblique subspace projection, target signature space orthogonal projection). Other works using maximum a posteriori probability (MAP) framework [25] and projection pursuit [26, 27] have also been applied to hyperspectral data. In most cases the number of endmembers and their signatures are not known. Independent component analysis (ICA) is an unsupervised source separation process that has been applied with success to blind source separation, to feature extraction, and to unsupervised recognition [28, 29]. ICA consists in finding a linear decomposition of observed data yielding statistically independent components. Given that hyperspectral data are, in given circumstances, linear mixtures, ICA comes to mind as a possible tool to unmix this class of data. In fact, the application of ICA to hyperspectral data has been proposed in reference 30, where endmember signatures are treated as sources and the mixing matrix is composed by the abundance fractions, and in references 9, 25, and 31–38, where sources are the abundance fractions of each endmember. In the first approach, we face two problems: (1) The number of samples are limited to the number of channels and (2) the process of pixel selection, playing the role of mixed sources, is not straightforward. In the second approach, ICA is based on the assumption of mutually independent sources, which is not the case of hyperspectral data, since the sum of the abundance fractions is constant, implying dependence among abundances. This dependence compromises ICA applicability to hyperspectral images. In addition, hyperspectral data are immersed in noise, which degrades the ICA performance. IFA [39] was introduced as a method for recovering independent hidden sources from their observed noisy mixtures. IFA implements two steps. First, source densities and noise covariance are estimated from the observed data by maximum likelihood. Second, sources are reconstructed by an optimal nonlinear estimator. Although IFA is a well-suited technique to unmix independent sources under noisy observations, the dependence among abundance fractions in hyperspectral imagery compromises, as in the ICA case, the IFA performance. Considering the linear mixing model, hyperspectral observations are in a simplex whose vertices correspond to the endmembers. Several approaches [40–43] have exploited this geometric feature of hyperspectral mixtures [42]. Minimum volume transform (MVT) algorithm [43] determines the simplex of minimum volume containing the data. The MVT-type approaches are complex from the computational point of view. Usually, these algorithms first find the convex hull defined by the observed data and then fit a minimum volume simplex to it. Aiming at a lower computational complexity, some algorithms such as the vertex component analysis (VCA) [44], the pixel purity index (PPI) [42], and the N-FINDR [45] still find the minimum volume simplex containing the data cloud, but they assume the presence in the data of at least one pure pixel of each endmember. This is a strong requisite that may not hold in some data sets. In any case, these algorithms find the set of most pure pixels in the data. Hyperspectral sensors collects spatial images over many narrow contiguous bands, yielding large amounts of data. For this reason, very often, the processing of hyperspectral data, included unmixing, is preceded by a dimensionality reduction step to reduce computational complexity and to improve the signal-to-noise ratio (SNR). Principal component analysis (PCA) [46], maximum noise fraction (MNF) [47], and singular value decomposition (SVD) [48] are three well-known projection techniques widely used in remote sensing in general and in unmixing in particular. The newly introduced method [49] exploits the structure of hyperspectral mixtures, namely the fact that spectral vectors are nonnegative. The computational complexity associated with these techniques is an obstacle to real-time implementations. To overcome this problem, band selection [50] and non-statistical [51] algorithms have been introduced. This chapter addresses hyperspectral data source dependence and its impact on ICA and IFA performances. The study consider simulated and real data and is based on mutual information minimization. Hyperspectral observations are described by a generative model. This model takes into account the degradation mechanisms normally found in hyperspectral applications—namely, signature variability [52–54], abundance constraints, topography modulation, and system noise. The computation of mutual information is based on fitting mixtures of Gaussians (MOG) to data. The MOG parameters (number of components, means, covariances, and weights) are inferred using the minimum description length (MDL) based algorithm [55]. We study the behavior of the mutual information as a function of the unmixing matrix. The conclusion is that the unmixing matrix minimizing the mutual information might be very far from the true one. Nevertheless, some abundance fractions might be well separated, mainly in the presence of strong signature variability, a large number of endmembers, and high SNR. We end this chapter by sketching a new methodology to blindly unmix hyperspectral data, where abundance fractions are modeled as a mixture of Dirichlet sources. This model enforces positivity and constant sum sources (full additivity) constraints. The mixing matrix is inferred by an expectation-maximization (EM)-type algorithm. This approach is in the vein of references 39 and 56, replacing independent sources represented by MOG with mixture of Dirichlet sources. Compared with the geometric-based approaches, the advantage of this model is that there is no need to have pure pixels in the observations. The chapter is organized as follows. Section 6.2 presents a spectral radiance model and formulates the spectral unmixing as a linear problem accounting for abundance constraints, signature variability, topography modulation, and system noise. Section 6.3 presents a brief resume of ICA and IFA algorithms. Section 6.4 illustrates the performance of IFA and of some well-known ICA algorithms with experimental data. Section 6.5 studies the ICA and IFA limitations in unmixing hyperspectral data. Section 6.6 presents results of ICA based on real data. Section 6.7 describes the new blind unmixing scheme and some illustrative examples. Section 6.8 concludes with some remarks.

Fatores de risco auto reportados associados aos acidentes rodoviários : um estudo sobre os condutores portugueses de veículos ligeiros

Relevância:

90.00% 90.00%

Publicador:

Resumo:

RESUMO - Objetivos: Anualmente morrem cerca de 1,3 milhões de pessoas, a nível mundial, devido aos acidentes de viação. Também mais de 20 milhões de pessoas sofrem ferimentos ligeiros ou graves devido aos acidentes de viação que resultam em incapacidade temporária ou permanente. Desta forma, consideram-se os acidentes de viação, um grave problema de saúde pública, com custos elevados para as sociedades afetando a saúde das populações e economias de cada país. Este estudo pretendeu descrever e caracterizar os condutores de veículos ligeiros, residentes em Portugal Continental, abrangendo características sociodemográficas, experiência de condução e questões relativas a atitudes, opiniões e comportamentos. Por outro lado procurou-se analisar a associação entre as opiniões, atitudes e comportamentos, auto reportados e a ocorrência de um acidente de viação nos últimos três anos a fim de construir um modelo final preditivo do risco de sofrer um acidente de viação. Método: Foi realizado um estudo observacional analítico transversal baseado num questionário traduzido para a língua portuguesa e com origem no projeto europeu SARTRE 4. A população-alvo foram todos os condutores de veículos ligeiros possuidores de uma licença de condução e residentes em Portugal Continental, baseado numa amostra de igual dimensão à definida no estudo europeu SARTRE 4 (600 condutores de veículos ligeiros). Das 52 perguntas existentes, selecionaram-se pela análise de componentes principais (ACP) variáveis potencialmente independentes e complementares para as componentes opiniões, atitudes e comportamentos. Para além das medidas descritivas usuais, recorreu-se à regressão logística binária para analisar associações e obter um modelo que permitisse estimar a probabilidade de sofrer um acidente rodoviário em função das variáveis selecionadas referentes às opiniões, atitudes e comportamentos auto reportados. Resultados: Dos 612 condutores inquiridos, 62,7% (383) responderam não ter sofrido nenhum acidente de viação nos últimos três anos enquanto 37,3% (228) respondeu ter estado envolvido em pelo menos um acidente de viação com danos materiais ou feridos, no mesmo período. De uma forma geral, o típico condutor que referiu ter sofrido um acidente nos últimos três anos é homem com mais de 65 anos de idade, com o 1º ensino básico, viúvo e sem filhos, não empregado e reside numa área urbana. Os condutores residentes numa área suburbana apresentaram um risco 5,368 mais elevado de sofrer um acidente de viação em relação aos condutores que habitam numa zona rural (IC 95%: 2,344-12,297; p<0,001). Os condutores que foram apenas submetidos uma vez a um controlo de álcool, nos últimos três anos, durante o exercício da condução apresentaram um risco 3,009 superior de sofrer um acidente de viação em relação aos condutores que nunca foram fiscalizados pela polícia (IC 95%: 1,949-4,647, p<0,001). Os condutores que referiram muito frequentemente parar para dormir quando se sentem cansados a conduzir têm uma probabilidade inferior de 81% de sofrer um acidente de viação em relação aos condutores que nunca o fazem (IC 95%: 0,058-0,620; p=0,006). Os condutores que quando cansados raramente bebem um café/bebida energética têm um risco de 4,829 superior de sofrer um acidente de viação do que os condutores que sempre o referiram fazer (IC 95%:1,807-12,903; p=0,002). Conclusões: Os resultados obtidos em relação aos fatores comportamentais vão ao encontro da maioria dos fatores de risco associados aos acidentes de viação referidos na literatura. Ainda assim, foram identificadas novas associações entre o risco de sofrer um acidente e as opiniões e as atitudes auto reportadas que através de estudos de maiores dimensões populacionais poderão vir a ser mais exploradas. Este trabalho vem reforçar a necessidade urgente de novas estratégias de intervenção, principalmente na componente comportamental, direcionadas aos grupos de risco, mantendo as existentes.

A VEP study in sleeping and awake one-month-old infants and its relation with social behavior

Relevância:

90.00% 90.00%

Publicador:

Resumo:

With the present study we aimed to analyze the relationship between infants' behavior and their visual evoked-potential (VEPs) response. Specifically, we want to verify differences regarding the VEP response in sleeping and awake infants and if an association between VEP components, in both groups, with neurobehavioral outcome could be identified. To do so, thirty-two full-term and healthy infants, approximately 1-month of age, were assessed through a VEP unpatterned flashlight stimuli paradigm, offered in two different intensities, and were assessed using a neurobehavioral scale. However, only 18 infants have both assessments, and therefore, these is the total included in both analysis. Infants displayed a mature neurobehavioral outcome, expected for their age. We observed that P2 and N3 components were present in both sleeping and awake infants. Differences between intensities were found regarding the P2 amplitude, but only in awake infants. Regression analysis showed that N3 amplitude predicted an adequate social interactive and internal regulatory behavior in infants who were awake during the stimuli presentation. Taking into account that social orientation and regulatory behaviors are fundamental keys for social-like behavior in 1-month-old infants, this study provides an important approach for assessing physiological biomarkers (VEPs) and its relation with social behavior, very early in postnatal development. Moreover, we evidence the importance of the infant's state when studying differences regarding visual threshold processing and its association with behavioral outcome.

On the use of principal components in contemporany population genetics: a case study

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In human Population Genetics, routine applications of principal component techniques are oftenrequired. Population biologists make widespread use of certain discrete classifications of humansamples into haplotypes, the monophyletic units of phylogenetic trees constructed from severalsingle nucleotide bimorphisms hierarchically ordered. Compositional frequencies of the haplotypesare recorded within the different samples. Principal component techniques are then required as adimension-reducing strategy to bring the dimension of the problem to a manageable level, say two,to allow for graphical analysis.Population biologists at large are not aware of the special features of compositional data and normally make use of the crude covariance of compositional relative frequencies to construct principalcomponents. In this short note we present our experience with using traditional linear principalcomponents or compositional principal components based on logratios, with reference to a specificdataset

A novel method for automated classification of epileptiform activity in the human electroencephalogram-based on independent component analysis.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Diagnosis of several neurological disorders is based on the detection of typical pathological patterns in the electroencephalogram (EEG). This is a time-consuming task requiring significant training and experience. Automatic detection of these EEG patterns would greatly assist in quantitative analysis and interpretation. We present a method, which allows automatic detection of epileptiform events and discrimination of them from eye blinks, and is based on features derived using a novel application of independent component analysis. The algorithm was trained and cross validated using seven EEGs with epileptiform activity. For epileptiform events with compensation for eyeblinks, the sensitivity was 65 +/- 22% at a specificity of 86 +/- 7% (mean +/- SD). With feature extraction by PCA or classification of raw data, specificity reduced to 76 and 74%, respectively, for the same sensitivity. On exactly the same data, the commercially available software Reveal had a maximum sensitivity of 30% and concurrent specificity of 77%. Our algorithm performed well at detecting epileptiform events in this preliminary test and offers a flexible tool that is intended to be generalized to the simultaneous classification of many waveforms in the EEG.

Principal components of functional connectivity: A new approach to study dynamic brain connectivity during rest.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Functional connectivity (FC) as measured by correlation between fMRI BOLD time courses of distinct brain regions has revealed meaningful organization of spontaneous fluctuations in the resting brain. However, an increasing amount of evidence points to non-stationarity of FC; i.e., FC dynamically changes over time reflecting additional and rich information about brain organization, but representing new challenges for analysis and interpretation. Here, we propose a data-driven approach based on principal component analysis (PCA) to reveal hidden patterns of coherent FC dynamics across multiple subjects. We demonstrate the feasibility and relevance of this new approach by examining the differences in dynamic FC between 13 healthy control subjects and 15 minimally disabled relapse-remitting multiple sclerosis patients. We estimated whole-brain dynamic FC of regionally-averaged BOLD activity using sliding time windows. We then used PCA to identify FC patterns, termed "eigenconnectivities", that reflect meaningful patterns in FC fluctuations. We then assessed the contributions of these patterns to the dynamic FC at any given time point and identified a network of connections centered on the default-mode network with altered contribution in patients. Our results complement traditional stationary analyses, and reveal novel insights into brain connectivity dynamics and their modulation in a neurodegenerative disease.

Suitability of GRIND-based principal properties for the description of molecular similarity and ligand-based virtual screening

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The information provided by the alignment-independent GRid Independent Descriptors (GRIND) can be condensed by the application of principal component analysis, obtaining a small number of principal properties (GRIND-PP), which is more suitable for describing molecular similarity. The objective of the present study is to optimize diverse parameters involved in the obtention of the GRIND-PP and validate their suitability for applications, requiring a biologically relevant description of the molecular similarity. With this aim, GRIND-PP computed with a collection of diverse settings were used to carry out ligand-based virtual screening (LBVS) on standard conditions. The quality of the results obtained was remarkable and comparable with other LBVS methods, and their detailed statistical analysis allowed to identify the method settings more determinant for the quality of the results and their optimum. Remarkably, some of these optimum settings differ significantly from those used in previously published applications, revealing their unexplored potential. Their applicability in large compound database was also explored by comparing the equivalence of the results obtained using either computed or projected principal properties. In general, the results of the study confirm the suitability of the GRIND-PP for practical applications and provide useful hints about how they should be computed for obtaining optimum results.

Principal curves and principal oriented points

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Principal curves have been defined Hastie and Stuetzle (JASA, 1989) assmooth curves passing through the middle of a multidimensional dataset. They are nonlinear generalizations of the first principalcomponent, a characterization of which is the basis for the principalcurves definition.In this paper we propose an alternative approach based on a differentproperty of principal components. Consider a point in the space wherea multivariate normal is defined and, for each hyperplane containingthat point, compute the total variance of the normal distributionconditioned to belong to that hyperplane. Choose now the hyperplaneminimizing this conditional total variance and look for thecorresponding conditional mean. The first principal component of theoriginal distribution passes by this conditional mean and it isorthogonal to that hyperplane. This property is easily generalized todata sets with nonlinear structure. Repeating the search from differentstarting points, many points analogous to conditional means are found.We call them principal oriented points. When a one-dimensional curveruns the set of these special points it is called principal curve oforiented points. Successive principal curves are recursively definedfrom a generalization of the total variance.

«
1
2
...
6
7
8
9
10
11
12
...
61
62
»