930 resultados para PRINCIPAL COMPONENT ANALYSIS
Resumo:
To classify mosquito species based on common features of their habitats, samples were obtained fortnightly between June 2001-October 2003 in the subtropical province of Chaco, Argentina. Data on the type of larval habitat, nature of the habitat (artificial or natural), size, depth, location related to sunlight, distance to the neighbouring houses, type of substrate, organic material, vegetation and algae type and their presence were collected. Data on the permanence, temperature, pH, turbidity, colour, odour and movement of the larval habitat's water were also collected. From the cluster analysis, three groups of species associated by their degree of habitat similarity were obtained and are listed below. Group 1 consisted of Aedes aegypti. Group 2 consisted of Culex imitator, Culex davisi, Wyeomyia muehlensi and Toxorhynchites haemorrhoidalis separatus. Within group 3, two subgroups are distinguished: A (Psorophora ferox, Psorophora cyanescens, Psorophora varinervis, Psorophora confinnis, Psorophora cingulata, Ochlerotatus hastatus-oligopistus, Ochlerotatus serratus, Ochlerotatus scapularis, Culex intrincatus, Culex quinquefasciatus, Culex pilosus, Ochlerotatus albifasciatus, Culex bidens) and B (Culex maxi, Culex eduardoi, Culex chidesteri, Uranotaenia lowii, Uranotaenia pulcherrima, Anopheles neomaculipalpus, Anopheles triannulatus, Anopheles albitarsis, Uranotaenia apicalis, Mansonia humeralis and Aedeomyia squamipennis). Principal component analysis indicates that the size of the larval habitats and the presence of aquatic vegetation are the main characteristics that explain the variation among different species. In contrast, water permanence is second in importance. Water temperature, pH and the type of larval habitat are less important in explaining the clustering of species.
Resumo:
In many socially monogamous birds, both partners perform extrapair copulations (EPC). As this behaviour potentially inflicts direct costs on females, they are currently hypothesized to search for genetic benefits for descendants, either as 'good' or 'complementary' genes. Although these hypotheses have found some support, several studies failed to find any beneficial consequence of EPC, and whether this behaviour is adaptive to females is subject to discussion. Here, we test these two hypotheses in a natural population of blue tits by accounting for the effect of most parameters known to potentially affect extrapair fertilization. Results suggest that female body mass affected the type of extrapair genetic benefits obtained. Heavy females obtained extrapair fertilizations when their social male was of low quality (as reflected by sexual display) and produced larger extrapair than within-pair chicks. Lean females obtained extrapair fertilizations when their social mate was genetically similar, thereby producing more heterozygous extrapair chicks. Our results suggest that mating patterns may be condition-dependent.
Resumo:
Although the reported aetiological agent of cutaneous leishmaniasis (CL) in Sri Lanka is Leishmania donovani, the sandfly vector remains unknown. Ninety-five sandflies, 60 females and 35 males, collected in six localities in the district of Matale, central Sri Lanka, close to current active transmission foci of CL were examined for taxonomically relevant characteristics. Eleven diagnostic morphological characters for female sandflies were compared with measurements described for Indian and Sri Lankan sandflies, including the now recognised Phlebotomus argentipes sensu lato species complex. The mean morphometric measurements of collected female sandflies differed significantly from published values for P. argentipes morphospecies B, now re-identified as Phlebotomus annandalei from Delft Island and northern Sri Lanka, from recently re-identified P. argentipes s.s. sibling species and from Phlebotomus glaucus. Furthermore, analysis of underlying variation in the morphometric data through principal component analysis also illustrated differences between the population described herein and previously recognised members of the P. argentipes species complex. Collectively, these results suggest that a morphologically distinct population, perhaps most closely related to P. glaucus of the P. argentipess. I. species complex, exists in areas of active CL transmission. Thus, research is required to determine the ability of this population of flies to transmit cutaneous leishmaniasis.
Resumo:
Analyzing functional data often leads to finding common factors, for which functional principal component analysis proves to be a useful tool to summarize and characterize the random variation in a function space. The representation in terms of eigenfunctions is optimal in the sense of L-2 approximation. However, the eigenfunctions are not always directed towards an interesting and interpretable direction in the context of functional data and thus could obscure the underlying structure. To overcome such difficulty, an alternative to functional principal component analysis is proposed that produces directed components which may be more informative and easier to interpret. These structural components are similar to principal components, but are adapted to situations in which the domain of the function may be decomposed into disjoint intervals such that there is effectively independence between intervals and positive correlation within intervals. The approach is demonstrated with synthetic examples as well as real data. Properties for special cases are also studied.
Resumo:
Imaging mass spectrometry (IMS) represents an innovative tool in the cancer research pipeline, which is increasingly being used in clinical and pharmaceutical applications. The unique properties of the technique, especially the amount of data generated, make the handling of data from multiple IMS acquisitions challenging. This work presents a histology-driven IMS approach aiming to identify discriminant lipid signatures from the simultaneous mining of IMS data sets from multiple samples. The feasibility of the developed workflow is evaluated on a set of three human colorectal cancer liver metastasis (CRCLM) tissue sections. Lipid IMS on tissue sections was performed using MALDI-TOF/TOF MS in both negative and positive ionization modes after 1,5-diaminonaphthalene matrix deposition by sublimation. The combination of both positive and negative acquisition results was performed during data mining to simplify the process and interrogate a larger lipidome into a single analysis. To reduce the complexity of the IMS data sets, a sub data set was generated by randomly selecting a fixed number of spectra from a histologically defined region of interest, resulting in a 10-fold data reduction. Principal component analysis confirmed that the molecular selectivity of the regions of interest is maintained after data reduction. Partial least-squares and heat map analyses demonstrated a selective signature of the CRCLM, revealing lipids that are significantly up- and down-regulated in the tumor region. This comprehensive approach is thus of interest for defining disease signatures directly from IMS data sets by the use of combinatory data mining, opening novel routes of investigation for addressing the demands of the clinical setting.
Resumo:
BACKGROUND Differences in the distribution of genotypes between individuals of the same ethnicity are an important confounder factor commonly undervalued in typical association studies conducted in radiogenomics. OBJECTIVE To evaluate the genotypic distribution of SNPs in a wide set of Spanish prostate cancer patients for determine the homogeneity of the population and to disclose potential bias. DESIGN SETTING AND PARTICIPANTS A total of 601 prostate cancer patients from Andalusia, Basque Country, Canary and Catalonia were genotyped for 10 SNPs located in 6 different genes associated to DNA repair: XRCC1 (rs25487, rs25489, rs1799782), ERCC2 (rs13181), ERCC1 (rs11615), LIG4 (rs1805388, rs1805386), ATM (rs17503908, rs1800057) and P53 (rs1042522). The SNP genotyping was made in a Biotrove OpenArray® NT Cycler. OUTCOME MEASUREMENTS AND STATISTICAL ANALYSIS Comparisons of genotypic and allelic frequencies among populations, as well as haplotype analyses were determined using the web-based environment SNPator. Principal component analysis was made using the SnpMatrix and XSnpMatrix classes and methods implemented as an R package. Non-supervised hierarchical cluster of SNP was made using MultiExperiment Viewer. RESULTS AND LIMITATIONS We observed that genotype distribution of 4 out 10 SNPs was statistically different among the studied populations, showing the greatest differences between Andalusia and Catalonia. These observations were confirmed in cluster analysis, principal component analysis and in the differential distribution of haplotypes among the populations. Because tumor characteristics have not been taken into account, it is possible that some polymorphisms may influence tumor characteristics in the same way that it may pose a risk factor for other disease characteristics. CONCLUSION Differences in distribution of genotypes within different populations of the same ethnicity could be an important confounding factor responsible for the lack of validation of SNPs associated with radiation-induced toxicity, especially when extensive meta-analysis with subjects from different countries are carried out.
Resumo:
BACKGROUND Functional brain images such as Single-Photon Emission Computed Tomography (SPECT) and Positron Emission Tomography (PET) have been widely used to guide the clinicians in the Alzheimer's Disease (AD) diagnosis. However, the subjectivity involved in their evaluation has favoured the development of Computer Aided Diagnosis (CAD) Systems. METHODS It is proposed a novel combination of feature extraction techniques to improve the diagnosis of AD. Firstly, Regions of Interest (ROIs) are selected by means of a t-test carried out on 3D Normalised Mean Square Error (NMSE) features restricted to be located within a predefined brain activation mask. In order to address the small sample-size problem, the dimension of the feature space was further reduced by: Large Margin Nearest Neighbours using a rectangular matrix (LMNN-RECT), Principal Component Analysis (PCA) or Partial Least Squares (PLS) (the two latter also analysed with a LMNN transformation). Regarding the classifiers, kernel Support Vector Machines (SVMs) and LMNN using Euclidean, Mahalanobis and Energy-based metrics were compared. RESULTS Several experiments were conducted in order to evaluate the proposed LMNN-based feature extraction algorithms and its benefits as: i) linear transformation of the PLS or PCA reduced data, ii) feature reduction technique, and iii) classifier (with Euclidean, Mahalanobis or Energy-based methodology). The system was evaluated by means of k-fold cross-validation yielding accuracy, sensitivity and specificity values of 92.78%, 91.07% and 95.12% (for SPECT) and 90.67%, 88% and 93.33% (for PET), respectively, when a NMSE-PLS-LMNN feature extraction method was used in combination with a SVM classifier, thus outperforming recently reported baseline methods. CONCLUSIONS All the proposed methods turned out to be a valid solution for the presented problem. One of the advances is the robustness of the LMNN algorithm that not only provides higher separation rate between the classes but it also makes (in combination with NMSE and PLS) this rate variation more stable. In addition, their generalization ability is another advance since several experiments were performed on two image modalities (SPECT and PET).
Resumo:
Natural fluctuations in soil microbial communities are poorly documented because of the inherent difficulty to perform a simultaneous analysis of the relative abundances of multiple populations over a long time period. Yet, it is important to understand the magnitudes of community composition variability as a function of natural influences (e.g., temperature, plant growth, or rainfall) because this forms the reference or baseline against which external disturbances (e.g., anthropogenic emissions) can be judged. Second, definition of baseline fluctuations in complex microbial communities may help to understand at which point the systems become unbalanced and cannot return to their original composition. In this paper, we examined the seasonal fluctuations in the bacterial community of an agricultural soil used for regular plant crop production by using terminal restriction fragment length polymorphism profiling (T-RFLP) of the amplified 16S ribosomal ribonucleic acid (rRNA) gene diversity. Cluster and statistical analysis of T-RFLP data showed that soil bacterial communities fluctuated very little during the seasons (similarity indices between 0.835 and 0.997) with insignificant variations in 16S rRNA gene richness and diversity indices. Despite overall insignificant fluctuations, between 8 and 30% of all terminal restriction fragments changed their relative intensity in a significant manner among consecutive time samples. To determine the magnitude of community variations induced by external factors, soil samples were subjected to either inoculation with a pure bacterial culture, addition of the herbicide mecoprop, or addition of nutrients. All treatments resulted in statistically measurable changes of T-RFLP profiles of the communities. Addition of nutrients or bacteria plus mecoprop resulted in bacteria composition, which did not return to the original profile within 14 days. We propose that at less than 70% similarity in T-RFLP, the bacterial communities risk to drift apart to inherently different states.
Resumo:
BACKGROUND Compared to food patterns, nutrient patterns have been rarely used particularly at international level. We studied, in the context of a multi-center study with heterogeneous data, the methodological challenges regarding pattern analyses. METHODOLOGY/PRINCIPAL FINDINGS We identified nutrient patterns from food frequency questionnaires (FFQ) in the European Prospective Investigation into Cancer and Nutrition (EPIC) Study and used 24-hour dietary recall (24-HDR) data to validate and describe the nutrient patterns and their related food sources. Associations between lifestyle factors and the nutrient patterns were also examined. Principal component analysis (PCA) was applied on 23 nutrients derived from country-specific FFQ combining data from all EPIC centers (N = 477,312). Harmonized 24-HDRs available for a representative sample of the EPIC populations (N = 34,436) provided accurate mean group estimates of nutrients and foods by quintiles of pattern scores, presented graphically. An overall PCA combining all data captured a good proportion of the variance explained in each EPIC center. Four nutrient patterns were identified explaining 67% of the total variance: Principle component (PC) 1 was characterized by a high contribution of nutrients from plant food sources and a low contribution of nutrients from animal food sources; PC2 by a high contribution of micro-nutrients and proteins; PC3 was characterized by polyunsaturated fatty acids and vitamin D; PC4 was characterized by calcium, proteins, riboflavin, and phosphorus. The nutrients with high loadings on a particular pattern as derived from country-specific FFQ also showed high deviations in their mean EPIC intakes by quintiles of pattern scores when estimated from 24-HDR. Center and energy intake explained most of the variability in pattern scores. CONCLUSION/SIGNIFICANCE The use of 24-HDR enabled internal validation and facilitated the interpretation of the nutrient patterns derived from FFQs in term of food sources. These outcomes open research opportunities and perspectives of using nutrient patterns in future studies particularly at international level.
Resumo:
Three multivariate statistical tools (principal component analysis, factor analysis, analysis discriminant) have been tested to characterize and model the sags registered in distribution substations. Those models use several features to represent the magnitude, duration and unbalanced grade of sags. They have been obtained from voltage and current waveforms. The techniques are tested and compared using 69 registers of sags. The advantages and drawbacks of each technique are listed
Resumo:
A statistical method for classification of sags their origin downstream or upstream from the recording point is proposed in this work. The goal is to obtain a statistical model using the sag waveforms useful to characterise one type of sags and to discriminate them from the other type. This model is built on the basis of multi-way principal component analysis an later used to project the available registers in a new space with lower dimension. Thus, a case base of diagnosed sags is built in the projection space. Finally classification is done by comparing new sags against the existing in the case base. Similarity is defined in the projection space using a combination of distances to recover the nearest neighbours to the new sag. Finally the method assigns the origin of the new sag according to the origin of their neighbours
Resumo:
The work presented in this paper belongs to the power quality knowledge area and deals with the voltage sags in power transmission and distribution systems. Propagating throughout the power network, voltage sags can cause plenty of problems for domestic and industrial loads that can financially cost a lot. To impose penalties to responsible party and to improve monitoring and mitigation strategies, sags must be located in the power network. With such a worthwhile objective, this paper comes up with a new method for associating a sag waveform with its origin in transmission and distribution networks. It solves this problem through developing hybrid methods which hire multiway principal component analysis (MPCA) as a dimension reduction tool. MPCA reexpresses sag waveforms in a new subspace just in a few scores. We train some well-known classifiers with these scores and exploit them for classification of future sags. The capabilities of the proposed method for dimension reduction and classification are examined using the real data gathered from three substations in Catalonia, Spain. The obtained classification rates certify the goodness and powerfulness of the developed hybrid methods as brand-new tools for sag classification
Resumo:
Evaluation of segmentation methods is a crucial aspect in image processing, especially in the medical imaging field, where small differences between segmented regions in the anatomy can be of paramount importance. Usually, segmentation evaluation is based on a measure that depends on the number of segmented voxels inside and outside of some reference regions that are called gold standards. Although some other measures have been also used, in this work we propose a set of new similarity measures, based on different features, such as the location and intensity values of the misclassified voxels, and the connectivity and the boundaries of the segmented data. Using the multidimensional information provided by these measures, we propose a new evaluation method whose results are visualized applying a Principal Component Analysis of the data, obtaining a simplified graphical method to compare different segmentation results. We have carried out an intensive study using several classic segmentation methods applied to a set of MRI simulated data of the brain with several noise and RF inhomogeneity levels, and also to real data, showing that the new measures proposed here and the results that we have obtained from the multidimensional evaluation, improve the robustness of the evaluation and provides better understanding about the difference between segmentation methods.
Resumo:
Although Leontopodium alpinum is considered to be threatened in many countries, only limited scientific information about its autecology is available. In this study, we aim to define the most important ecological factors which influence the distribution of L. alpinum in the Swiss Alps. These were assessed at the national scale using species distribution models based on topoclimatic predictors and at the community scale using exhaustive plant inventories. The latter were analysed using hierarchical clustering and principal component analysis, and the results were interpreted using ecological indicator values. L. alpinum was found almost exclusively on base-rich bedrocks (limestone and ultramaphic rocks). The species distribution models showed that the available moisture (dry regions, mostly in the Inner Alps), elevation (mostly above 2000 m.a.s.l.) and slope (mostly >30°) were the most important predictors. The relevés showed that L. alpinum is present in a wide range of plant communities, all subalpine-alpine open grasslands, with a low grass cover. As a light-demanding and short species, L. alpinum requires light at ground level; hence, it can only grow in open, nutrient-poor grasslands. These conditions are met in dry conditions (dry, summer-warm climate, rocky and draining soil, south-facing aspect and/or steep slope), at high elevations, on oligotrophic soils and/or on windy ridges. Base-rich soils appear to also be essential, although it is still unclear if this corresponds to physiological or ecological (lower competition) requirements.
Resumo:
Rho GTPases are conformational switches that control a wide variety of signaling pathways critical for eukaryotic cell development and proliferation. They represent attractive targets for drug design as their aberrant function and deregulated activity is associated with many human diseases including cancer. Extensive high-resolution structures (.100) and recent mutagenesis studies have laid the foundation for the design of new structure-based chemotherapeutic strategies. Although the inhibition of Rho signaling with drug-like compounds is an active area of current research, very little attention has been devoted to directly inhibiting Rho by targeting potential allosteric non-nucleotide binding sites. By avoiding the nucleotide binding site, compounds may minimize the potential for undesirable off-target interactions with other ubiquitous GTP and ATP binding proteins. Here we describe the application of molecular dynamics simulations, principal component analysis, sequence conservation analysis, and ensemble small-molecule fragment mapping to provide an extensive mapping of potential small-molecule binding pockets on Rho family members. Characterized sites include novel pockets in the vicinity of the conformationaly responsive switch regions as well as distal sites that appear to be related to the conformations of the nucleotide binding region. Furthermore the use of accelerated molecular dynamics simulation, an advanced sampling method that extends the accessible time-scale of conventional simulations, is found to enhance the characterization of novel binding sites when conformational changes are important for the protein mechanism.