996 resultados para sample plot database
Resumo:
Background: A common task in analyzing microarray data is to determine which genes are differentially expressed across two (or more) kind of tissue samples or samples submitted under experimental conditions. Several statistical methods have been proposed to accomplish this goal, generally based on measures of distance between classes. It is well known that biological samples are heterogeneous because of factors such as molecular subtypes or genetic background that are often unknown to the experimenter. For instance, in experiments which involve molecular classification of tumors it is important to identify significant subtypes of cancer. Bimodal or multimodal distributions often reflect the presence of subsamples mixtures. Consequently, there can be genes differentially expressed on sample subgroups which are missed if usual statistical approaches are used. In this paper we propose a new graphical tool which not only identifies genes with up and down regulations, but also genes with differential expression in different subclasses, that are usually missed if current statistical methods are used. This tool is based on two measures of distance between samples, namely the overlapping coefficient (OVL) between two densities and the area under the receiver operating characteristic (ROC) curve. The methodology proposed here was implemented in the open-source R software. Results: This method was applied to a publicly available dataset, as well as to a simulated dataset. We compared our results with the ones obtained using some of the standard methods for detecting differentially expressed genes, namely Welch t-statistic, fold change (FC), rank products (RP), average difference (AD), weighted average difference (WAD), moderated t-statistic (modT), intensity-based moderated t-statistic (ibmT), significance analysis of microarrays (samT) and area under the ROC curve (AUC). On both datasets all differentially expressed genes with bimodal or multimodal distributions were not selected by all standard selection procedures. We also compared our results with (i) area between ROC curve and rising area (ABCR) and (ii) the test for not proper ROC curves (TNRC). We found our methodology more comprehensive, because it detects both bimodal and multimodal distributions and different variances can be considered on both samples. Another advantage of our method is that we can analyze graphically the behavior of different kinds of differentially expressed genes. Conclusion: Our results indicate that the arrow plot represents a new flexible and useful tool for the analysis of gene expression profiles from microarrays.
Resumo:
Microarray allow to monitoring simultaneously thousands of genes, where the abundance of the transcripts under a same experimental condition at the same time can be quantified. Among various available array technologies, double channel cDNA microarray experiments have arisen in numerous technical protocols associated to genomic studies, which is the focus of this work. Microarray experiments involve many steps and each one can affect the quality of raw data. Background correction and normalization are preprocessing techniques to clean and correct the raw data when undesirable fluctuations arise from technical factors. Several recent studies showed that there is no preprocessing strategy that outperforms others in all circumstances and thus it seems difficult to provide general recommendations. In this work, it is proposed to use exploratory techniques to visualize the effects of preprocessing methods on statistical analysis of cancer two-channel microarray data sets, where the cancer types (classes) are known. For selecting differential expressed genes the arrow plot was used and the graph of profiles resultant from the correspondence analysis for visualizing the results. It was used 6 background methods and 6 normalization methods, performing 36 pre-processing methods and it was analyzed in a published cDNA microarray database (Liver) available at http://genome-www5.stanford.edu/ which microarrays were already classified by cancer type. All statistical analyses were performed using the R statistical software.
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Informática
Resumo:
Una Plot Suite és una aplicació web que permet localitzar plots d’una Base de Dades a partir de formularis. S’obtindran taules on apareixeran els plots amb les seves característiques i es podrà obtenir còpies dels plots sol·licitats. Gràcies al seu disseny es podran afegir nous plots a la Base de Dades i fins i tot modificar l’estructura d’una manera molt intuïtiva.
Resumo:
Introduction The European Foundation for the improvement of living and working conditions conducts a survey every 5 years since 1990. The foundation also offers the possibility to non-EU countries to be included in the survey: in 2005, Switzerland took part for the first time in the fourth edition of this survey. The Institute for Work and Health (IST) has been associated to the Swiss project conducted under the leadership of the SECO and the Fachhochschule Nordwestschweiz. The survey covers different aspects of work like job characteristics and employment conditions, health and safety, work organization, learning and development opportunities, and the balance between working and non-working life (Parent-Thirion, Fernandez Macias, Hurley, & Vermeylen, 2007). More particularly, one question assesses the worker's self-perception of the effects of work on health. We identified (for the Swiss sample) several factors affecting the risk to report health problems caused by work. The Swiss sample includes 1040 respondents. Selection of participants was based on a random multi-stage sampling and was carried out by M.I.S Trend S.A. (Lausanne). Participation rate was 59%. The database was weighted by household size, gender, age, region of domicile, occupational group, and economic sector. Specially trained interviewers carried out the interviews at the respondents home. The survey was carriedout between the 19th of September 2005 and the 30th of November 2005. As detailed in (Graf et al., 2007), 31% of the Swiss respondents identify work as the cause of health problems they experience. Most frequently reported health problems include back pain (18%), stress (17%), muscle pain (13%), and overall fatigue (11%). Ergonomic aspects associated with higher risk of reporting health problems caused by work include frequent awkward postures (odds ratio [OR] 4.7, 95% confidence interval [CI] 3.1 to 5.4), tasks involving lifting heavy loads (OR 2.7, 95% CI 2.0 to 3.6) or lifting people (OR 2.2, 95% CI 1.4 to 3.5), standing or walking (OR 1.4, 95% CI 1.1 to 1.9), as well as repetitive movements (OR 1.7, 95% CI 1.3 to 2.3). These results highlight the need to continue and intensify the prevention of work related health problems in occupations characterized by risk factors related to ergonomics.
Resumo:
This paper analyses and discusses arguments that emerge from a recent discussion about the proper assessment of the evidential value of correspondences observed between the characteristics of a crime stain and those of a sample from a suspect when (i) this latter individual is found as a result of a database search and (ii) remaining database members are excluded as potential sources (because of different analytical characteristics). Using a graphical probability approach (i.e., Bayesian networks), the paper here intends to clarify that there is no need to (i) introduce a correction factor equal to the size of the searched database (i.e., to reduce a likelihood ratio), nor to (ii) adopt a propositional level not directly related to the suspect matching the crime stain (i.e., a proposition of the kind 'some person in (outside) the database is the source of the crime stain' rather than 'the suspect (some other person) is the source of the crime stain'). The present research thus confirms existing literature on the topic that has repeatedly demonstrated that the latter two requirements (i) and (ii) should not be a cause of concern.
Resumo:
This paper analyses and discusses arguments that emerge from a recent discussion about the proper assessment of the evidential value of correspondences observed between the characteristics of a crime stain and those of a sample from a suspect when (i) this latter individual is found as a result of a database search and (ii) remaining database members are excluded as potential sources (because of different analytical characteristics). Using a graphical probability approach (i.e., Bayesian networks), the paper here intends to clarify that there is no need to (i) introduce a correction factor equal to the size of the searched database (i.e., to reduce a likelihood ratio), nor to (ii) adopt a propositional level not directly related to the suspect matching the crime stain (i.e., a proposition of the kind 'some person in (outside) the database is the source of the crime stain' rather than 'the suspect (some other person) is the source of the crime stain'). The present research thus confirms existing literature on the topic that has repeatedly demonstrated that the latter two requirements (i) and (ii) should not be a cause of concern.
Resumo:
The objective of this study was to determine the minimum number of plants per plot that must be sampled in experiments with sugarcane (Saccharum officinarum) full-sib families in order to provide an effective estimation of genetic and phenotypic parameters of yield-related traits. The data were collected in a randomized complete block design with 18 sugarcane full-sib families and 6 replicates, with 20 plants per plot. The sample size was determined using resampling techniques with replacement, followed by an estimation of genetic and phenotypic parameters. Sample-size estimates varied according to the evaluated parameter and trait. The resampling method permits an efficient comparison of the sample-size effects on the estimation of genetic and phenotypic parameters. A sample of 16 plants per plot, or 96 individuals per family, was sufficient to obtain good estimates for all traits considered of all the characters evaluated. However, for Brix, if sample separation by trait were possible, ten plants per plot would give an efficient estimate for most of the characters evaluated.
Resumo:
The objectives of this study were to evaluate baby corn yield, green corn yield, and grain yield in corn cultivar BM 3061, with weed control achieved via a combination of hoeing and intercropping with gliricidia, and determine how sample size influences weed growth evaluation accuracy. A randomized block design with ten replicates was used. The cultivar was submitted to the following treatments: A = hoeings at 20 and 40 days after corn sowing (DACS), B = hoeing at 20 DACS + gliricidia sowing after hoeing, C = gliricidia sowing together with corn sowing + hoeing at 40 DACS, D = gliricidia sowing together with corn sowing, and E = no hoeing. Gliricidia was sown at a density of 30 viable seeds m-2. After harvesting the mature ears, the area of each plot was divided into eight sampling units measuring 1.2 m² each to evaluate weed growth (above-ground dry biomass). Treatment A provided the highest baby corn, green corn, and grain yields. Treatment B did not differ from treatment A with respect to the yield values for the three products, and was equivalent to treatment C for green corn yield, but was superior to C with regard to baby corn weight and grain yield. Treatments D and E provided similar yields and were inferior to the other treatments. Therefore, treatment B is a promising one. The relation between coefficient of experimental variation (CV) and sample size (S) to evaluate growth of the above-ground part of the weeds was given by the equation CV = 37.57 S-0.15, i.e., CV decreased as S increased. The optimal sample size indicated by this equation was 4.3 m².
Resumo:
Programas de saúde e bem-estar têm sido adotados por empresas como forma de melhorar a saúde de empregados, e muitos estudos descrevem retornos econômicos positivos sobre os investimentos envolvidos. Entretanto, estudos mais recentes com metodologia melhor têm demonstrado retornos menores. O objetivo deste estudo foi investigar se características de programas de saúde e bem-estar agem como preditores de custos de internação hospitalar (em Reais correntes) e da proporção de funcionários que têm licença médica, entre Abril de 2014 e Maio de 2015, em uma amostra não-aleatória de empresas no Brasil, através de parceria com uma empresa gestora de ‘big data’ para saúde. Um questionário sobre características de programas de saúde no ambiente de trabalho foi respondida por seis grandes empresas brasileiras. Dados retirados destes seis questionários (presença e idade de programa de saúde, suas características – inclusão de atividades de screening, educação sobre saúde, ligação com outros programas da empresa, integração do programa à estrutura da empresa, e ambientes de trabalho voltado para a saúde – e a adoção de incentivos financeiros para aderência de funcionários ao programa), bem como dados individuais de idade, gênero e categoria de plano de saúde de cada empregado , foram usados para construir um banco de dados com mais de 76.000 indivíduos. Através de um modelo de regressão múltipla e seleção ‘stepwise’ de variáveis, a idade do empregado foi positivamente associada e a idade do programa de saúde e a categoria ‘premium’ de plano de saúde do funcionário foram negativamente associadas aos custos de internação hospitalar (como esperado). Inesperadamente, a inclusão de programas de screening e iniciativas de educação de saúde nos programas de saúde e bem-estar nas empresas foram identificados como preditores positivos significativos para custos de admissão hospitalar. Para evitar a inclusão errônea de licenças-maternidade, apenas os dados de licença médica de pacientes do sexo masculino foram analisados (dados disponíveis apenas para duas entre as companhias incluídas, com um total de 18.957 pacientes do sexo masculino). Analisando estes dados através de um teste Z para comparação de proporções, a empresa com programa de saúde que inclui atividades voltadas a cessação de hábitos ruins (como tabagismo e etilismo), controle de diabetes e hipertensão, e que adota incentivos financeiros para a aderência de funcionários ao programa tem menor proporção de empregados com licençca médica no período analisado, quando comparada com a outra empresa que não tem estas características (também conforme esperado). Entretanto, a companhia com menor proporção de funcionários com licença médica também foi aquela que adota programa de screening entre as atividades de seu programa de saúde. Potenciais fontes de ameaça à validade interna e externa destes resultados são discutidas, bem como possíveis explicações para a associação entre programas de screening e educação médica a piores indicadores de saúde nesta amostra de companhias são discutidas. Novos estudos com melhor desenho, com amostras maiores e randômicas são necessários para validar estes resultados e possivelmente melhorar a validade interna e externa destes resultados.
Resumo:
The main goal of our research was to search for SSRs in the Eucalyptus EST FORESTs database (using a software for mining SSR-motifs). With this objective, we created a database for cataloging Eucalyptus EST-derived SSRs, and developed a bioinformatics tool, named Satellyptus, for finding and analyzing microsatellites in the Eucalyptus EST database. The search for microsatellites in the FORESTs database containing 71,115 Eucalyptus EST sequences (52.09 Mb) revealed 20,530 SSRs in 15,621 ESTs. The SSR abundance detected on the Eucalyptus ESTs database (29% or one microsatellite every four sequences) is considered very high for plants. Amongst the categories of SSR motifs, the dimeric (37%) and trimeric ones (33%) predominated. The AG/CT motif was the most frequent (35.15%) followed by the trimeric CCG/CGG (12.81%). From a random sample of 1,217 sequences, 343 microsatellites in 265 SSR-containing sequences were identified. Approximately 48% of these ESTs containing microsatellites were homologous to proteins with known biological function. Most of the microsatellites detected in Eucalyptus ESTs were positioned at either the 5 or 3 end. Our next priority involves the design of flanking primers for codominant SSR loci, which could lead to the development of a set of microsatellite-based markers suitable for marker-assisted Eucalyptus breeding programs.
Resumo:
This article updates the Brazilian database on food carotenoids. Emphasis is on carotenoids that have been demonstrated important to human health: alpha-carotene, beta-carotene, beta-cryptoxanthin, lycopene, lutein and zeaxanthin. The sampling and sample preparation strategies and the analytical methodology are presented. Possible sources of analytical errors, as well as the measures taken to avoid them, are discussed. Compositional variation due to such factors as variety/cultivar, stage of maturity, part of the plant utilized, climate or season and production technique are demonstrated. The effects of post-harvest handling, preparation, processing and storage of food on the carotenoid composition are also discussed. The importance of biodiversity is manifested by the variety of carotenoid sources and the higher levels of carotenoids in native, uncultivated or semi-cultivated fruits and vegetables in comparison to commercially produced crops. (C) 2008 Elsevier B.V. All rights reserved.
Resumo:
The Poincaré plot for heart rate variability analysis is a technique considered geometrical and non-linear, that can be used to assess the dynamics of heart rate variability by a representation of the values of each pair of R-R intervals into a simplified phase space that describes the system's evolution. The aim of the present study was to verify if there is some correlation between SD1, SD2 and SD1/SD2 ratio and heart rate variability nonlinear indexes either in disease or healthy conditions. 114 patients with arterial coronary disease and 65 healthy subjects underwent 30. minute heart rate registration, in supine position and the analyzed indexes were as follows: SD1, SD2, SD1/SD2, Sample Entropy, Lyapunov Exponent, Hurst Exponent, Correlation Dimension, Detrended Fluctuation Analysis, SDNN, RMSSD, LF, HF and LF/HF ratio. Correlation coefficients between SD1, SD2 and SD1/SD2 indexes and the other variables were tested by the Spearman rank correlation test and a regression analysis. We verified high correlation between SD1/SD2 index and HE and DFA (α1) in both groups, suggesting that this ratio can be used as a surrogate variable. © 2013 Elsevier B.V.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Princeton WordNet (WN.Pr) lexical database has motivated efficient compilations of bulky relational lexicons since its inception in the 1980's. The EuroWordNet project, the first multilingual initiative built upon WN.Pr, opened up ways of building individual wordnets, and interrelating them by means of the so-called Inter-Lingual-Index, an unstructured list of the WN.Pr synsets. Other important initiative, relying on a slightly different method of building multilingual wordnets, is the MultiWordNet project, where the key strategy is building language specific wordnets keeping as much as possible of the semantic relations available in the WN.Pr. This paper, in particular, stresses that the additional advantage of using WN.Pr lexical database as a resource for building wordnets for other languages is to explore possibilities of implementing an automatic procedure to map the WN.Pr conceptual relations as hyponymy, co-hyponymy, troponymy, meronymy, cause, and entailment onto the lexical database of the wordnet under construction, a viable possibility, for those are language-independent relations that hold between lexicalized concepts, not between lexical units. Accordingly, combining methods from both initiatives, this paper presents the ongoing implementation of the WN.Br lexical database and the aforementioned automation procedure illustrated with a sample of the automatic encoding of the hyponymy and co-hyponymy relations.