925 resultados para probabilistic principal component analysis (probabilistic PCA)
Resumo:
In order to classify mosquito immature stage habitats, samples were taken in 42 localities of Córdoba Province, Argentina, representing the phytogeographic regions of Chaco, Espinal and Pampa. Immature stage habitats were described and classified according to the following criteria: natural or artificial; size; location related to light and neighboring houses; vegetation; water: permanence, movement, turbidity and pH. Four groups of species were associated based on the habitat similarity by means of cluster analysis: Aedes albifasciatus, Culex saltanensis, Cx. mollis, Cx. brethesi, Psorophora ciliata, Anopheles albitarsis, and Uranotaenia lowii (Group A); Cx. acharistus, Cx. quinquefasciatus, Cx. bidens, Cx. dolosus, Cx. maxi and Cx. apicinus (Group B); Cx. coronator, Cx. chidesteri, Mansonia titillans and Ps. ferox (Group C); Ae. fluviatilis and Ae. milleri (Group D). The principal component analysis (ordination method) pointed out that the different types of habitats, their nature (natural or artificial), plant species, water movement and depth are the main characters explaining the observed variation among the mosquito species. The distribution of mosquito species by phytogeographic region did not affect the species groups, since species belonging to different groups were collected in the same region.
Resumo:
Working memory, the ability to store and simultaneously manipulate information, is affected in several neuropsychiatric disorders which lead to severe cognitive and functional deficits. An electrophysiological marker for this process could help identify early cerebral function abnormalities. In subjects performing working memory-specific n-back tasks, event-related potential analysis revealed a positive-negative waveform (PNwm) component modulated in amplitude by working memory load. It occurs in the expected time range for this process, 140-280 ms after stimulus onset, superimposed on the classical P200 and N200 components. Independent Component Analysis extracted two functional components with latencies and topographical scalp distributions similar to the PNwm. Our results imply that the PNwm represents a new electrophysiological index for working memory load in humans.
Resumo:
In order to determine if habitat similarity is correlated with a similarity of sensilla pattern, we analyzed six species of Triatominae present in two biogeographic regions of Brazil: the "caatinga" and the "cerrado". In broad terms Triatoma infestans (cerrado) and T. brasiliensis (caatinga) are found in human domiciles, T. sordida (cerrado) and T. pseudomaculata (caatinga) colonize peridomestic habitats, and Rhodnius neglectus (cerrado) and R. nasutus (caatinga) inhabit palm tree crowns. The number and distribution of four sensilla types (bristles, thin and thick walled trichoidea, and basiconica) were compared in these species. Sexual dimorphism of sensilla patterns was noted in T. sordida, T. brasiliensis and T. pseudomaculata. A principal component analysis showed three main groups: (i) species that live in the palms, (ii) domiciliated species and (iii) those living in the peridomestic habitat. T. infestans almost exclusively domestic, was placed at the centre of the canonical map and some individuals of other species overlapped there. These results support the idea that the patterns of antennal sensilla are sensitive indicators of adaptive process in Triatominae. We propose that those species that inhabit less stable habitats possess more types of sensilla on the pedicel, and higher number of antennal sensilla.
Resumo:
DREAM is an initiative that allows researchers to assess how well their methods or approaches can describe and predict networks of interacting molecules [1]. Each year, recently acquired datasets are released to predictors ahead of publication. Researchers typically have about three months to predict the masked data or network of interactions, using any predictive method. Predictions are assessed prior to an annual conference where the best predictions are unveiled and discussed. Here we present the strategy we used to make a winning prediction for the DREAM3 phosphoproteomics challenge. We used Amelia II, a multiple imputation software method developed by Gary King, James Honaker and Matthew Blackwell[2] in the context of social sciences to predict the 476 out of 4624 measurements that had been masked for the challenge. To chose the best possible multiple imputation parameters to apply for the challenge, we evaluated how transforming the data and varying the imputation parameters affected the ability to predict additionally masked data. We discuss the accuracy of our findings and show that multiple imputations applied to this dataset is a powerful method to accurately estimate the missing data. We postulate that multiple imputations methods might become an integral part of experimental design as a mean to achieve cost savings in experimental design or to increase the quantity of samples that could be handled for a given cost.
Resumo:
Objectives Exposure assessment to a single pesticide does not capture the complexity of the occupational exposure. Recently, pesticide use patterns analysis has emerged as an alternative to study these exposures. The aim of this study is to identify the pesticide use pattern among flower growers in Mexico participating in the study on the endocrine and reproductive effects associated with pesticide exposure. Methods A cross-sectional study was carried out to gather retrospective information on pesticide use applying a questionnaire to the person in charge of the participating flower growing farms. Information about seasonal frequency of pesticide use (rainy and dry) for the years 2004 and 2005 was obtained. Principal components analysis was performed. Results Complete information was obtained for 88 farms and 23 pesticides were included in the analysis. Six principal components were selected, which explained more than 70% of the data variability. The identified pesticide use patterns during both years were: 1. fungicides benomyl, carbendazim, thiophanate and metalaxyl (both seasons), including triadimephon during the rainy season, chlorotalonyl and insecticide permethrin during the dry season; 2. insecticides oxamyl, biphenthrin and fungicide iprodione (both seasons), including insecticide methomyl during the dry season; 3. fungicide mancozeb and herbicide glyphosate (only during the rainy season); 4. insecticides metamidophos and parathion (both seasons); 5. insecticides omethoate and methomyl (only rainy season); and 6. insecticides abamectin and carbofuran (only dry season). Some pesticides do not show a clear pattern of seasonal use during the studied years. Conclusions The principal component analysis is useful to summarise a large set of exposure variables into smaller groups of exposure patterns, identifying the mixtures of pesticides in the occupational environment that may have an interactive effect on a particular health effect.
Resumo:
Ecological niche modelling was used to predict the potential geographical distribution of Rhodnius nasutus Stål and Rhodnius neglectus Lent, in Brazil and to investigate the niche divergence between these morphologically similar triatomine species. The distribution of R. neglectus covered mainly the cerrado of Central Brazil, but the prediction maps also revealed its occurrence in transitional areas within the caatinga, Pantanal and Amazon biomes. The potential distribution of R. nasutus covered the Northeastern Region of Brazil in the semi-arid caatinga and the Maranhão babaçu forests. Clear ecological niche differences between these species were observed. R. nasutus occurred more in warmer and drier areas than R. neglectus. In the principal component analysis PC1 was correlated with altitude and temperature (mainly temperature in the coldest and driest months) and PC2 with vegetation index and precipitation. The prediction maps support potential areas of co-occurrence for these species in the Maranhão babaçu forests and in caatinga/cerrado transitional areas, mainly in state of Piaui. Entomologists engaged in Chagas disease vector surveillance should be aware that R. neglectus and R. nasutus can occur in the same localities of Northeastern Brazil. Thus, the identification of bugs in these areas should be improved by applying morphometrical and/or molecular methods.
Resumo:
To classify mosquito species based on common features of their habitats, samples were obtained fortnightly between June 2001-October 2003 in the subtropical province of Chaco, Argentina. Data on the type of larval habitat, nature of the habitat (artificial or natural), size, depth, location related to sunlight, distance to the neighbouring houses, type of substrate, organic material, vegetation and algae type and their presence were collected. Data on the permanence, temperature, pH, turbidity, colour, odour and movement of the larval habitat's water were also collected. From the cluster analysis, three groups of species associated by their degree of habitat similarity were obtained and are listed below. Group 1 consisted of Aedes aegypti. Group 2 consisted of Culex imitator, Culex davisi, Wyeomyia muehlensi and Toxorhynchites haemorrhoidalis separatus. Within group 3, two subgroups are distinguished: A (Psorophora ferox, Psorophora cyanescens, Psorophora varinervis, Psorophora confinnis, Psorophora cingulata, Ochlerotatus hastatus-oligopistus, Ochlerotatus serratus, Ochlerotatus scapularis, Culex intrincatus, Culex quinquefasciatus, Culex pilosus, Ochlerotatus albifasciatus, Culex bidens) and B (Culex maxi, Culex eduardoi, Culex chidesteri, Uranotaenia lowii, Uranotaenia pulcherrima, Anopheles neomaculipalpus, Anopheles triannulatus, Anopheles albitarsis, Uranotaenia apicalis, Mansonia humeralis and Aedeomyia squamipennis). Principal component analysis indicates that the size of the larval habitats and the presence of aquatic vegetation are the main characteristics that explain the variation among different species. In contrast, water permanence is second in importance. Water temperature, pH and the type of larval habitat are less important in explaining the clustering of species.
Resumo:
In many socially monogamous birds, both partners perform extrapair copulations (EPC). As this behaviour potentially inflicts direct costs on females, they are currently hypothesized to search for genetic benefits for descendants, either as 'good' or 'complementary' genes. Although these hypotheses have found some support, several studies failed to find any beneficial consequence of EPC, and whether this behaviour is adaptive to females is subject to discussion. Here, we test these two hypotheses in a natural population of blue tits by accounting for the effect of most parameters known to potentially affect extrapair fertilization. Results suggest that female body mass affected the type of extrapair genetic benefits obtained. Heavy females obtained extrapair fertilizations when their social male was of low quality (as reflected by sexual display) and produced larger extrapair than within-pair chicks. Lean females obtained extrapair fertilizations when their social mate was genetically similar, thereby producing more heterozygous extrapair chicks. Our results suggest that mating patterns may be condition-dependent.
Resumo:
Although the reported aetiological agent of cutaneous leishmaniasis (CL) in Sri Lanka is Leishmania donovani, the sandfly vector remains unknown. Ninety-five sandflies, 60 females and 35 males, collected in six localities in the district of Matale, central Sri Lanka, close to current active transmission foci of CL were examined for taxonomically relevant characteristics. Eleven diagnostic morphological characters for female sandflies were compared with measurements described for Indian and Sri Lankan sandflies, including the now recognised Phlebotomus argentipes sensu lato species complex. The mean morphometric measurements of collected female sandflies differed significantly from published values for P. argentipes morphospecies B, now re-identified as Phlebotomus annandalei from Delft Island and northern Sri Lanka, from recently re-identified P. argentipes s.s. sibling species and from Phlebotomus glaucus. Furthermore, analysis of underlying variation in the morphometric data through principal component analysis also illustrated differences between the population described herein and previously recognised members of the P. argentipes species complex. Collectively, these results suggest that a morphologically distinct population, perhaps most closely related to P. glaucus of the P. argentipess. I. species complex, exists in areas of active CL transmission. Thus, research is required to determine the ability of this population of flies to transmit cutaneous leishmaniasis.
Resumo:
Analyzing functional data often leads to finding common factors, for which functional principal component analysis proves to be a useful tool to summarize and characterize the random variation in a function space. The representation in terms of eigenfunctions is optimal in the sense of L-2 approximation. However, the eigenfunctions are not always directed towards an interesting and interpretable direction in the context of functional data and thus could obscure the underlying structure. To overcome such difficulty, an alternative to functional principal component analysis is proposed that produces directed components which may be more informative and easier to interpret. These structural components are similar to principal components, but are adapted to situations in which the domain of the function may be decomposed into disjoint intervals such that there is effectively independence between intervals and positive correlation within intervals. The approach is demonstrated with synthetic examples as well as real data. Properties for special cases are also studied.
Resumo:
Imaging mass spectrometry (IMS) represents an innovative tool in the cancer research pipeline, which is increasingly being used in clinical and pharmaceutical applications. The unique properties of the technique, especially the amount of data generated, make the handling of data from multiple IMS acquisitions challenging. This work presents a histology-driven IMS approach aiming to identify discriminant lipid signatures from the simultaneous mining of IMS data sets from multiple samples. The feasibility of the developed workflow is evaluated on a set of three human colorectal cancer liver metastasis (CRCLM) tissue sections. Lipid IMS on tissue sections was performed using MALDI-TOF/TOF MS in both negative and positive ionization modes after 1,5-diaminonaphthalene matrix deposition by sublimation. The combination of both positive and negative acquisition results was performed during data mining to simplify the process and interrogate a larger lipidome into a single analysis. To reduce the complexity of the IMS data sets, a sub data set was generated by randomly selecting a fixed number of spectra from a histologically defined region of interest, resulting in a 10-fold data reduction. Principal component analysis confirmed that the molecular selectivity of the regions of interest is maintained after data reduction. Partial least-squares and heat map analyses demonstrated a selective signature of the CRCLM, revealing lipids that are significantly up- and down-regulated in the tumor region. This comprehensive approach is thus of interest for defining disease signatures directly from IMS data sets by the use of combinatory data mining, opening novel routes of investigation for addressing the demands of the clinical setting.
Resumo:
BACKGROUND Differences in the distribution of genotypes between individuals of the same ethnicity are an important confounder factor commonly undervalued in typical association studies conducted in radiogenomics. OBJECTIVE To evaluate the genotypic distribution of SNPs in a wide set of Spanish prostate cancer patients for determine the homogeneity of the population and to disclose potential bias. DESIGN SETTING AND PARTICIPANTS A total of 601 prostate cancer patients from Andalusia, Basque Country, Canary and Catalonia were genotyped for 10 SNPs located in 6 different genes associated to DNA repair: XRCC1 (rs25487, rs25489, rs1799782), ERCC2 (rs13181), ERCC1 (rs11615), LIG4 (rs1805388, rs1805386), ATM (rs17503908, rs1800057) and P53 (rs1042522). The SNP genotyping was made in a Biotrove OpenArray® NT Cycler. OUTCOME MEASUREMENTS AND STATISTICAL ANALYSIS Comparisons of genotypic and allelic frequencies among populations, as well as haplotype analyses were determined using the web-based environment SNPator. Principal component analysis was made using the SnpMatrix and XSnpMatrix classes and methods implemented as an R package. Non-supervised hierarchical cluster of SNP was made using MultiExperiment Viewer. RESULTS AND LIMITATIONS We observed that genotype distribution of 4 out 10 SNPs was statistically different among the studied populations, showing the greatest differences between Andalusia and Catalonia. These observations were confirmed in cluster analysis, principal component analysis and in the differential distribution of haplotypes among the populations. Because tumor characteristics have not been taken into account, it is possible that some polymorphisms may influence tumor characteristics in the same way that it may pose a risk factor for other disease characteristics. CONCLUSION Differences in distribution of genotypes within different populations of the same ethnicity could be an important confounding factor responsible for the lack of validation of SNPs associated with radiation-induced toxicity, especially when extensive meta-analysis with subjects from different countries are carried out.
Resumo:
Natural fluctuations in soil microbial communities are poorly documented because of the inherent difficulty to perform a simultaneous analysis of the relative abundances of multiple populations over a long time period. Yet, it is important to understand the magnitudes of community composition variability as a function of natural influences (e.g., temperature, plant growth, or rainfall) because this forms the reference or baseline against which external disturbances (e.g., anthropogenic emissions) can be judged. Second, definition of baseline fluctuations in complex microbial communities may help to understand at which point the systems become unbalanced and cannot return to their original composition. In this paper, we examined the seasonal fluctuations in the bacterial community of an agricultural soil used for regular plant crop production by using terminal restriction fragment length polymorphism profiling (T-RFLP) of the amplified 16S ribosomal ribonucleic acid (rRNA) gene diversity. Cluster and statistical analysis of T-RFLP data showed that soil bacterial communities fluctuated very little during the seasons (similarity indices between 0.835 and 0.997) with insignificant variations in 16S rRNA gene richness and diversity indices. Despite overall insignificant fluctuations, between 8 and 30% of all terminal restriction fragments changed their relative intensity in a significant manner among consecutive time samples. To determine the magnitude of community variations induced by external factors, soil samples were subjected to either inoculation with a pure bacterial culture, addition of the herbicide mecoprop, or addition of nutrients. All treatments resulted in statistically measurable changes of T-RFLP profiles of the communities. Addition of nutrients or bacteria plus mecoprop resulted in bacteria composition, which did not return to the original profile within 14 days. We propose that at less than 70% similarity in T-RFLP, the bacterial communities risk to drift apart to inherently different states.
Resumo:
Three multivariate statistical tools (principal component analysis, factor analysis, analysis discriminant) have been tested to characterize and model the sags registered in distribution substations. Those models use several features to represent the magnitude, duration and unbalanced grade of sags. They have been obtained from voltage and current waveforms. The techniques are tested and compared using 69 registers of sags. The advantages and drawbacks of each technique are listed
Resumo:
A statistical method for classification of sags their origin downstream or upstream from the recording point is proposed in this work. The goal is to obtain a statistical model using the sag waveforms useful to characterise one type of sags and to discriminate them from the other type. This model is built on the basis of multi-way principal component analysis an later used to project the available registers in a new space with lower dimension. Thus, a case base of diagnosed sags is built in the projection space. Finally classification is done by comparing new sags against the existing in the case base. Similarity is defined in the projection space using a combination of distances to recover the nearest neighbours to the new sag. Finally the method assigns the origin of the new sag according to the origin of their neighbours