932 resultados para data reduction by factor analysis
Resumo:
This document presents a tool able to automatically gather data provided by real energy markets and to generate scenarios, capture and improve market players’ profiles and strategies by using knowledge discovery processes in databases supported by artificial intelligence techniques, data mining algorithms and machine learning methods. It provides the means for generating scenarios with different dimensions and characteristics, ensuring the representation of real and adapted markets, and their participating entities. The scenarios generator module enhances the MASCEM (Multi-Agent Simulator of Competitive Electricity Markets) simulator, endowing a more effective tool for decision support. The achievements from the implementation of the proposed module enables researchers and electricity markets’ participating entities to analyze data, create real scenarios and make experiments with them. On the other hand, applying knowledge discovery techniques to real data also allows the improvement of MASCEM agents’ profiles and strategies resulting in a better representation of real market players’ behavior. This work aims to improve the comprehension of electricity markets and the interactions among the involved entities through adequate multi-agent simulation.
Resumo:
Listeria monocytogenes, etiological agent of severe human foodborne infection, uses sophisticated mechanisms of entry into host cytoplasm and manipulation of the cellular cytoskeleton, resulting in cell death. The host cells and bacteria interaction may result in cytokine production as Tumor Necrosis Factor (TNF) alpha. Hepatocytes have potential to produce pro-inflammatory cytokines as TNF-alpha when invaded by bacteria. In the present work we showed the behavior of hepatocytes invaded by L. monocytogenes by microscopic analysis, determination of TNF-alpha production by bioassay and analysis of the apoptosis through TUNEL technique. The presence of bacterium, in ratios that ranged from 5 to 50,000 bacteria per cell, induced the rupture of cellular monolayers. We observed the presence of internalized bacteria in the first hour of incubation by electronic microscopy. The levels of TNF-alpha increased from first hour of incubation to sixth hour, ranging from 0 to 3749 pg/mL. After seven and eight hours of incubation non-significant TNF-alpha levels decrease occurred, indicating possible saturation of cellular receptors. Thus, the quantity of TNF-alpha produced by hepatocytes was dependent of the incubation time, as well as of the proportion between bacteria and cells. The apoptosis rate increased in direct form with the incubation time (1 h to 8 + 24 h), ranging from 0 to 43%, as well as with the bacteria : cells ratio. These results show the ability of hepatocyte invasion by non-hemolytic L. monocytogenes, and the main consequences of this phenomenon were the release of TNF-alpha by hepatocytes and the induction of apoptosis. We speculate that hepatocytes use apoptosis induced by TNF-alpha for release bacteria to extracellular medium. This phenomenon may facilitate the bacteria destruction by the immune system.
Resumo:
High-content analysis has revolutionized cancer drug discovery by identifying substances that alter the phenotype of a cell, which prevents tumor growth and metastasis. The high-resolution biofluorescence images from assays allow precise quantitative measures enabling the distinction of small molecules of a host cell from a tumor. In this work, we are particularly interested in the application of deep neural networks (DNNs), a cutting-edge machine learning method, to the classification of compounds in chemical mechanisms of action (MOAs). Compound classification has been performed using image-based profiling methods sometimes combined with feature reduction methods such as principal component analysis or factor analysis. In this article, we map the input features of each cell to a particular MOA class without using any treatment-level profiles or feature reduction methods. To the best of our knowledge, this is the first application of DNN in this domain, leveraging single-cell information. Furthermore, we use deep transfer learning (DTL) to alleviate the intensive and computational demanding effort of searching the huge parameter's space of a DNN. Results show that using this approach, we obtain a 30% speedup and a 2% accuracy improvement.
Resumo:
Dissertação apresentada como requisito parcial para obtenção do grau de Mestre em Estatística e Gestão de Informação
Resumo:
Here we focus on factor analysis from a best practices point of view, by investigating the factor structure of neuropsychological tests and using the results obtained to illustrate on choosing a reasonable solution. The sample (n=1051 individuals) was randomly divided into two groups: one for exploratory factor analysis (EFA) and principal component analysis (PCA), to investigate the number of factors underlying the neurocognitive variables; the second to test the "best fit" model via confirmatory factor analysis (CFA). For the exploratory step, three extraction (maximum likelihood, principal axis factoring and principal components) and two rotation (orthogonal and oblique) methods were used. The analysis methodology allowed exploring how different cognitive/psychological tests correlated/discriminated between dimensions, indicating that to capture latent structures in similar sample sizes and measures, with approximately normal data distribution, reflective models with oblimin rotation might prove the most adequate.
Resumo:
The occurrence of anaerobic oxidation of methane (AOM) and trace methane oxidation (TMO) was investigated in a freshwater natural gas source. Sediment samples were taken and analyzed for potential electron acceptors coupled to AOM. Long-term incubations with 13C-labeled CH4 (13CH4) and different electron acceptors showed that both AOM and TMO occurred. In most conditions, 13C-labeled CO2 (13CO2) simultaneously increased with methane formation, which is typical for TMO. In the presence of nitrate, neither methane formation nor methane oxidation occurred. Net AOM was measured only with sulfate as electron acceptor. Here, sulfide production occurred simultaneously with 13CO2 production and no methanogenesis occurred, excluding TMO as a possible source for 13CO2 production from 13CH4. Archaeal 16S rRNA gene analysis showed the highest presence of ANME-2a/b (ANaerobic MEthane oxidizing archaea) and AAA (AOM Associated Archaea) sequences in the incubations with methane and sulfate as compared with only methane addition. Higher abundance of ANME-2a/b in incubations with methane and sulfate as compared with only sulfate addition was shown by qPCR analysis. Bacterial 16S rRNA gene analysis showed the presence of sulfate-reducing bacteria belonging to SEEP-SRB1. This is the first report that explicitly shows that AOM is associated with sulfate reduction in an enrichment culture of ANME-2a/b and AAA methanotrophs and SEEP-SRB1 sulfate reducers from a low-saline environment.
Resumo:
Ground-based measurements of the parameters of atmosphere in Tbilisi during the same period, which are provided by the Mikheil Nodia Institute of geophysics, were used as calibration data. Satellite data monthly averaging, preprocessing, analysis and visualization was performed using Giovanni web-based application. Maps of trends and periodic components of the atmosphere aerosol optical thickness and ozone concentration over the study area were calculated.
Resumo:
This paper provides empirical evidence that continuous time models with one factor of volatility, in some conditions, are able to fit the main characteristics of financial data. It also reports the importance of the feedback factor in capturing the strong volatility clustering of data, caused by a possible change in the pattern of volatility in the last part of the sample. We use the Efficient Method of Moments (EMM) by Gallant and Tauchen (1996) to estimate logarithmic models with one and two stochastic volatility factors (with and without feedback) and to select among them.
Resumo:
Background: There is currently no identified marker predicting benefit from Bev in patients with breast cancer (pts). We monitored prospectively 6 angiogenesis-related factors in the blood of advanced stage pts treated with a combination of Bev and PLD in a phase II trial of the Swiss Group for Clinical Cancer Research, SAKK.Methods: Pts received PLD (20 mg/m2) and Bev (10 mg/kg) every 2 weeks for a maximum of 12 administrations, followed by Bev monotherapy until progression or severe toxicity. Blood samples were collected at baseline, during treatment and at treatment discontinuation. Enzyme-linked immunosorbent assays (Quantikine, R&DSystems and Reliatech) were used to measure vascular endothelial growth factor (VEGF), placental growth factor (PlGF), matrix metalloproteinase 9 (MMP-9) and soluble VEGF receptors -1, -2 and -3. The natural log-transformed (ln) data for each factor was analyzed by analysis of variance (ANOVA) model to investigate differences between the mean values of the subgroups of interest (where a = 0.05), based on the best tumor response by RECIST.Results: 132 samples were collected in 41 pts. The mean of baseline ln MMP-9 levels was significantly lower in pts with tumor progression than those with tumor response (p=0.0202, log fold change=0.8786) or disease control (p=0.0035, log fold change=0.8427). Higher MMP-9 level was a significant predictor of superior progression free survival (PFS): p=0.0417, hazard ratio=0.574, 95% CI=0.336-0.979. In a multivariate cox proportional hazards model, containing performance status, disease free interval, number of tumor sites, visceral involvement and prior adjuvant chemotherapy, using stepwise regression baseline MMP-9 was still a statistically 117P Table 1. SOLTI-0701* AC01B07* NU07B1* SOR+CAP N=20 PL+CAP N=33 SOR+ GEM/CAP N=23 PL+ GEM/CAP N=27 SOR+PAC N=48 PL+PAC N=46 Baseline characteristics Age, median (range), y 49 (32-72) 53 (30-78 54 (32-69) 57 (31-82) 50 (27-80) 52 (23-74) AJCC stage, n (%) IIIB/IIIC 3 (15) 6 (18) 0 (0) 3 (11) 8 (17) 9 (20) IV 17 (85) 27 (82) 23 (100) 24 (89) 40 (83) 37 (80) Metastatic site, n (%) Non-visceral 3 (15) 6 (18) 7 (30) 6 (22) 9 (19) 17 (37) Visceral 17 (85) 27 (82) 16 (70) 21 (78) 39 (81) 29 (63) Prior metastatic chemo, n (%) 8 (40) 15 (45) 21 (91) 25 (93) - - Efficacy PFS, median, mo 4.3 2.5 3.1 2.6 5.6 5.5 HR (95% CI)_ 0.60 (0.31, 1.14) 0.57 (0.30, 1.09) 0.86 (0.50, 1.45) 1-sided P value_ 0.055 0.044 0.281 Overall survival, median, mo 17.5 16.1 Pending 14.7 18.2 HR (95% CI)_ 0.98 (0.50, 1.89) 1.11 (0.64, 1.94) 1-sided P value_ 0.476 0.352 Safety N=20 N=33 N=22 N=27 N=46 N=46 Tx-emergent Grade 3/4, n (%) 15 (75) 16 (48) 20 (91) 17 (63) 36 (78) 16 (35) Grade 3§ hand-foot skin reaction/ syndrome 8 (40) 5 (15) 8 (36) 0 (0) 14 (30) 2 (4) *Efficacy results based on intent-to-treat population and safety results based on safety population (pts who received study drug[s]); _Cox regression within each subgroup; _log-rank test within each subgroup; §maximum toxicity grade for hand-foot skin reaction/syndrome; AJCC, American Joint Committee on Cancer mittedabstractsª The Author 2011. Published by Oxford University Press on behalf of the European Society for Medical Oncology. All rights reserved. For permissions, please email: journals.permissions@oup.com Downloaded from annonc.oxfordjournals.org at Bibliotheque Cantonale et Universitaire on June 6, 2011 significant factor (p=0.0266). The results of the other measured factors were presented elsewhere.Conclusions: Higher levels of MMP-9 could predict tumor response and superior PFSin pts treated with a combination of Bev and PLD. These exploratory results justify further investigations of MMP-9 in pts treated with Bev combinations in order to assess its role as a prognostic and predictive factor.Disclosure: K. Zaman: Participation in advisory board of Roche; partial sponsoring ofthe study by Roche (the main sponsor was the Swiss Federation against Cancer (Oncosuisse)). B. Thu¨rlimann: stock of Roche; Research grants from Roche. R. vonMoos: Participant of Advisory Board and Speaker honoraria
Resumo:
Next-generation sequencing offers an unprecedented opportunity to jointly analyze cellular and viral transcriptional activity without prerequisite knowledge of the nature of the transcripts. SupT1 cells were infected with a vesicular stomatitis virus G envelope protein (VSV-G)-pseudotyped HIV vector. At 24 h postinfection, both cellular and viral transcriptomes were analyzed by serial analysis of gene expression followed by high-throughput sequencing (SAGE-Seq). Read mapping resulted in 33 to 44 million tags aligning with the human transcriptome and 0.23 to 0.25 million tags aligning with the genome of the HIV-1 vector. Thus, at peak infection, 1 transcript in 143 is of viral origin (0.7%), including a small component of antisense viral transcription. Of the detected cellular transcripts, 826 (2.3%) were differentially expressed between mock- and HIV-infected samples. The approach also assessed whether HIV-1 infection modulates the expression of repetitive elements or endogenous retroviruses. We observed very active transcription of these elements, with 1 transcript in 237 being of such origin, corresponding on average to 123,123 reads in mock-infected samples (0.40%) and 129,149 reads in HIV-1-infected samples (0.45%) mapping to the genomic Repbase repository. This analysis highlights key details in the generation and interpretation of high-throughput data in the setting of HIV-1 cellular infection.
Resumo:
Functional Data Analysis (FDA) deals with samples where a whole function is observedfor each individual. A particular case of FDA is when the observed functions are densityfunctions, that are also an example of infinite dimensional compositional data. In thiswork we compare several methods for dimensionality reduction for this particular typeof data: functional principal components analysis (PCA) with or without a previousdata transformation and multidimensional scaling (MDS) for diferent inter-densitiesdistances, one of them taking into account the compositional nature of density functions. The difeerent methods are applied to both artificial and real data (householdsincome distributions)
Resumo:
Background Multiple logistic regression is precluded from many practical applications in ecology that aim to predict the geographic distributions of species because it requires absence data, which are rarely available or are unreliable. In order to use multiple logistic regression, many studies have simulated "pseudo-absences" through a number of strategies, but it is unknown how the choice of strategy influences models and their geographic predictions of species. In this paper we evaluate the effect of several prevailing pseudo-absence strategies on the predictions of the geographic distribution of a virtual species whose "true" distribution and relationship to three environmental predictors was predefined. We evaluated the effect of using a) real absences b) pseudo-absences selected randomly from the background and c) two-step approaches: pseudo-absences selected from low suitability areas predicted by either Ecological Niche Factor Analysis: (ENFA) or BIOCLIM. We compared how the choice of pseudo-absence strategy affected model fit, predictive power, and information-theoretic model selection results. Results Models built with true absences had the best predictive power, best discriminatory power, and the "true" model (the one that contained the correct predictors) was supported by the data according to AIC, as expected. Models based on random pseudo-absences had among the lowest fit, but yielded the second highest AUC value (0.97), and the "true" model was also supported by the data. Models based on two-step approaches had intermediate fit, the lowest predictive power, and the "true" model was not supported by the data. Conclusion If ecologists wish to build parsimonious GLM models that will allow them to make robust predictions, a reasonable approach is to use a large number of randomly selected pseudo-absences, and perform model selection based on an information theoretic approach. However, the resulting models can be expected to have limited fit.
Resumo:
In Switzerland, the annual cost of damage by natural elements has been increasing for several years despite the introduction of protective measures. Mainly induced by material destruction building insurance companies have to pay the majority of this cost. In many European countries, governments and insurance companies consider prevention strategies to reduce vulnerability. In Switzerland, since 2004, the cost of damage due to natural hazards has surpassed the cost of damage due to fire; a traditional activity of the Cantonal Insurance company (EGA). Therefore, the strategy for efficient fire prevention incorporates a reduction of the vulnerability of buildings. The thesis seeks to illustrate the relevance of such an approach when applied to the damage caused by natural hazards. It examines the role of insurance place and its involvement in targeted prevention of natural disasters. Integrated risk management involves a faultless comprehension of all risk parameters The first part of the thesis is devoted to the theoretical development of the key concepts that influence risk management, such as: hazard, vulnerability, exposure or damage. The literature on this subject, very prolific in recent years, was taken into account and put in perspective in the context of this study. Among the risk parameters, it is shown in the thesis that vulnerability is a factor that we can influence efficiently in order to limit the cost of damage to buildings. This is confirmed through the development of an analysis method. This method has led to the development of a tool to assess damage to buildings by flooding. The tool, designed for the property insurer or owner, proposes several steps, namely: - Vulnerability and damage potential assessment; - Proposals for remedial measures and risk reduction from an analysis of the costs of a potential flood; - Adaptation of a global strategy in high-risk areas based on the elements at risk. The final part of the thesis is devoted to the study of a hail event in order to provide a better understanding of damage to buildings. For this, two samples from the available claims data were selected and analysed in the study. The results allow the identification of new trends A second objective of the study was to develop a hail model based on the available data The model simulates a random distribution of intensities and coupled with a risk model, proposes a simulation of damage costs for the determined study area. Le coût annuel des dommages provoqués par les éléments naturels en Suisse est conséquent et sa tendance est en augmentation depuis plusieurs années, malgré la mise en place d'ouvrages de protection et la mise en oeuvre de moyens importants. Majoritairement induit par des dégâts matériels, le coût est supporté en partie par les assurances immobilières en ce qui concerne les dommages aux bâtiments. Dans de nombreux pays européens, les gouvernements et les compagnies d'assurance se sont mis à concevoir leur stratégie de prévention en termes de réduction de la vulnérabilité. Depuis 2004, en Suisse, ce coût a dépassé celui des dommages dus à l'incendie, activité traditionnelle des établissements cantonaux d'assurance (ECA). Ce fait, aux implications stratégiques nombreuses dans le domaine public de la gestion des risques, résulte en particulier d'une politique de prévention des incendies menée efficacement depuis plusieurs années, notamment par le biais de la diminution de la vulnérabilité des bâtiments. La thèse, par la mise en valeur de données actuarielles ainsi que par le développement d'outils d'analyse, cherche à illustrer la pertinence d'une telle approche appliquée aux dommages induits par les phénomènes naturels. Elle s'interroge sur la place de l'assurance et son implication dans une prévention ciblée des catastrophes naturelles. La gestion intégrale des risques passe par une juste maîtrise de ses paramètres et de leur compréhension. La première partie de la thèse est ainsi consacrée au développement théorique des concepts clés ayant une influence sur la gestion des risques, comme l'aléa, la vulnérabilité, l'exposition ou le dommage. La littérature à ce sujet, très prolifique ces dernières années, a été repnse et mise en perspective dans le contexte de l'étude, à savoir l'assurance immobilière. Parmi les paramètres du risque, il est démontré dans la thèse que la vulnérabilité est un facteur sur lequel il est possible d'influer de manière efficace dans le but de limiter les coûts des dommages aux bâtiments. Ce raisonnement est confirmé dans un premier temps dans le cadre de l'élaboration d'une méthode d'analyse ayant débouché sur le développement d'un outil d'estimation des dommages aux bâtiments dus aux inondations. L'outil, destiné aux assurances immobilières, et le cas échéant aux propriétaires, offre plusieurs étapes, à savoir : - l'analyse de la vulnérabilité et le potentiel de dommages ; - des propositions de mesures de remédiation et de réduction du risque issues d'une analyse des coûts engendrés par une inondation potentielle; - l'adaptation d'une stratégie globale dans les zones à risque en fonction des éléments à risque. La dernière partie de la thèse est consacrée à l'étude d'un événement de grêle dans le but de fournir une meilleure compréhension des dommages aux bâtiments et de leur structure. Pour cela, deux échantillons ont été sélectionnés et analysés parmi les données de sinistres à disposition de l'étude. Les résultats obtenus, tant au niveau du portefeuille assuré que de l'analyse individuelle, permettent de dégager des tendances nouvelles. Un deuxième objectif de l'étude a consisté à élaborer une modélisation d'événements de grêle basée sur les données à disposition. Le modèle permet de simuler une distribution aléatoire des intensités et, couplé à un modèle d'estimation des risques, offre une simulation des coûts de dommages envisagés pour une zone d'étude déterminée. Les perspectives de ce travail permettent une meilleure focalisation du rôle de l'assurance et de ses besoins en matière de prévention.
Resumo:
When continuous data are coded to categorical variables, two types of coding are possible: crisp coding in the form of indicator, or dummy, variables with values either 0 or 1; or fuzzy coding where each observation is transformed to a set of "degrees of membership" between 0 and 1, using co-called membership functions. It is well known that the correspondence analysis of crisp coded data, namely multiple correspondence analysis, yields principal inertias (eigenvalues) that considerably underestimate the quality of the solution in a low-dimensional space. Since the crisp data only code the categories to which each individual case belongs, an alternative measure of fit is simply to count how well these categories are predicted by the solution. Another approach is to consider multiple correspondence analysis equivalently as the analysis of the Burt matrix (i.e., the matrix of all two-way cross-tabulations of the categorical variables), and then perform a joint correspondence analysis to fit just the off-diagonal tables of the Burt matrix - the measure of fit is then computed as the quality of explaining these tables only. The correspondence analysis of fuzzy coded data, called "fuzzy multiple correspondence analysis", suffers from the same problem, albeit attenuated. Again, one can count how many correct predictions are made of the categories which have highest degree of membership. But here one can also defuzzify the results of the analysis to obtain estimated values of the original data, and then calculate a measure of fit in the familiar percentage form, thanks to the resultant orthogonal decomposition of variance. Furthermore, if one thinks of fuzzy multiple correspondence analysis as explaining the two-way associations between variables, a fuzzy Burt matrix can be computed and the same strategy as in the crisp case can be applied to analyse the off-diagonal part of this matrix. In this paper these alternative measures of fit are defined and applied to a data set of continuous meteorological variables, which are coded crisply and fuzzily into three categories. Measuring the fit is further discussed when the data set consists of a mixture of discrete and continuous variables.
Resumo:
The emergence of host-races within aphids may constitute an obstacle to pest management by means of plant resistance. There are examples of host-races within cereals aphids, but their occurrence in Rose Grain Aphid, Metopolophium dirhodum (Walker, 1849), has not been reported yet. In this work, RAPD markers were used to assess effects of the hosts and geographic distance on the genetic diversity of M. dirhodum lineages. Twenty-three clones were collected on oats and wheat in twelve localitites of southern Brazil. From twenty-seven primers tested, only four primers showed polymorphisms. Fourteen different genotypes were revealed by cluster analysis. Five genotypes were collected only on wheat; seven only on oats and two were collected in both hosts. Genetic and geographical distances among all clonal lineages were not correlated. Analysis of molecular variance showed that some molecular markers are not randomly distributed among clonal lineages collected on oats and on wheat. These results suggest the existence of host-races within M. dirhodum, which should be further investigated using a combination of ecological and genetic data.