952 resultados para Dynamic data set visualization


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper addresses the application of a PCA analysis on categorical data prior to diagnose a patients data set using a Case-Based Reasoning (CBR) system. The particularity is that the standard PCA techniques are designed to deal with numerical attributes, but our medical data set contains many categorical data and alternative methods as RS-PCA are required. Thus, we propose to hybridize RS-PCA (Regular Simplex PCA) and a simple CBR. Results show how the hybrid system produces similar results when diagnosing a medical data set, that the ones obtained when using the original attributes. These results are quite promising since they allow to diagnose with less computation effort and memory storage

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In most geochemical analyses log-ratio techniques are required to analyse compositional data sets. When a chemical element is present at a low concentration in is usally identified as a value below the detection límit and added to the data set either as zero or simply by attaching a less-than label. In any case, the occirrence of such concentration prevents us from applying the log-ratio approach. We review here the tehoretical bases of the most recent proposals for dealing with these types of observation, give some advice on their practical application and illustrate their performance throgh some examples using geochemical data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many questions in evolutionary biology require an estimate of divergence times but, for groups with a sparse fossil record, such estimates rely heavily on molecular dating methods. The accuracy of these methods depends on both an adequate underlying model and the appropriate implementation of fossil evidence as calibration points. We explore the effect of these in Poaceae (grasses), a diverse plant lineage with a very limited fossil record, focusing particularly on dating the early divergences in the group. We show that molecular dating based on a data set of plastid markers is strongly dependent on the model assumptions. In particular, an acceleration of evolutionary rates at the base of Poaceae followed by a deceleration in the descendants strongly biases methods that assume an autocorrelation of rates. This problem can be circumvented by using markers that have lower rate variation, and we show that phylogenetic markers extracted from complete nuclear genomes can be a useful complement to the more commonly used plastid markers. However, estimates of divergence times remain strongly affected by different implementations of fossil calibration points. Analyses calibrated with only macrofossils lead to estimates for the age of core Poaceae ∼51-55 Ma, but the inclusion of microfossil evidence pushes this age to 74-82 Ma and leads to lower estimated evolutionary rates in grasses. These results emphasize the importance of considering markers from multiple genomes and alternative fossil placements when addressing evolutionary issues that depend on ages estimated for important groups.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The final year project came to us as an opportunity to get involved in a topic which has appeared to be attractive during the learning process of majoring in economics: statistics and its application to the analysis of economic data, i.e. econometrics.Moreover, the combination of econometrics and computer science is a very hot topic nowadays, given the Information Technologies boom in the last decades and the consequent exponential increase in the amount of data collected and stored day by day. Data analysts able to deal with Big Data and to find useful results from it are verydemanded in these days and, according to our understanding, the work they do, although sometimes controversial in terms of ethics, is a clear source of value added both for private corporations and the public sector. For these reasons, the essence of this project is the study of a statistical instrument valid for the analysis of large datasets which is directly related to computer science: Partial Correlation Networks.The structure of the project has been determined by our objectives through the development of it. At first, the characteristics of the studied instrument are explained, from the basic ideas up to the features of the model behind it, with the final goal of presenting SPACE model as a tool for estimating interconnections in between elements in large data sets. Afterwards, an illustrated simulation is performed in order to show the power and efficiency of the model presented. And at last, the model is put into practice by analyzing a relatively large data set of real world data, with the objective of assessing whether the proposed statistical instrument is valid and useful when applied to a real multivariate time series. In short, our main goals are to present the model and evaluate if Partial Correlation Network Analysis is an effective, useful instrument and allows finding valuable results from Big Data.As a result, the findings all along this project suggest the Partial Correlation Estimation by Joint Sparse Regression Models approach presented by Peng et al. (2009) to work well under the assumption of sparsity of data. Moreover, partial correlation networks are shown to be a very valid tool to represent cross-sectional interconnections in between elements in large data sets.The scope of this project is however limited, as there are some sections in which deeper analysis would have been appropriate. Considering intertemporal connections in between elements, the choice of the tuning parameter lambda, or a deeper analysis of the results in the real data application are examples of aspects in which this project could be completed.To sum up, the analyzed statistical tool has been proved to be a very useful instrument to find relationships that connect the elements present in a large data set. And after all, partial correlation networks allow the owner of this set to observe and analyze the existing linkages that could have been omitted otherwise.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Aldosterone and vasopressin are responsible for the final adjustment of sodium and water reabsorption in the kidney. In principal cells of the kidney cortical collecting duct (CCD), the integral response to aldosterone and the long-term functional effects of vasopressin depend on transcription. In this study, we analyzed the transcriptome of a highly differentiated mouse clonal CCD principal cell line (mpkCCD(cl4)) and the changes in the transcriptome induced by aldosterone and vasopressin. Serial analysis of gene expression (SAGE) was performed on untreated cells and on cells treated with either aldosterone or vasopressin for 4 h. The transcriptomes in these three experimental conditions were determined by sequencing 169,721 transcript tags from the corresponding SAGE libraries. Limiting the analysis to tags that occurred twice or more in the data set, 14,654 different transcripts were identified, 3,642 of which do not match known mouse sequences. Statistical comparison (at P < 0.05 level) of the three SAGE libraries revealed 34 AITs (aldosterone-induced transcripts), 29 ARTs (aldosterone-repressed transcripts), 48 VITs (vasopressin-induced transcripts) and 11 VRTs (vasopressin-repressed transcripts). A selection of the differentially-expressed, hormone-specific transcripts (5 VITs, 2 AITs and 1 ART) has been validated in the mpkCCD(cl4) cell line either by Northern blot hybridization or reverse transcription-PCR. The hepatocyte nuclear transcription factor HNF-3-alpha (VIT39), the receptor activity modifying protein RAMP3 (VIT48), and the glucocorticoid-induced leucine zipper protein (GILZ) (AIT28) are candidate proteins playing a role in physiological responses of this cell line to vasopressin and aldosterone.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: The use of valproic acid in the first trimester of pregnancy is associated with an increased risk of spina bifida, but data on the risks of other congenital malformations are limited. METHODS: We first combined data from eight published cohort studies (1565 pregnancies in which the women were exposed to valproic acid, among which 118 major malformations were observed) and identified 14 malformations that were significantly more common among the offspring of women who had received valproic acid during the first trimester. We then assessed the associations between use of valproic acid during the first trimester and these 14 malformations by performing a case-control study with the use of the European Surveillance of Congenital Anomalies (EUROCAT) antiepileptic-study database, which is derived from population-based congenital-anomaly registries. Registrations (i.e., pregnancy outcomes with malformations included in EUROCAT) with any of these 14 malformations were compared with two control groups, one consisting of infants with malformations not previously linked to valproic acid use (control group 1), and one consisting of infants with chromosomal abnormalities (control group 2). The data set included 98,075 live births, stillbirths, or terminations with malformations among 3.8 million births in 14 European countries from 1995 through 2005. RESULTS: Exposure to valproic acid monotherapy was recorded for a total of 180 registrations, with 122 registrations in the case group, 45 in control group 1, and 13 in control group 2. As compared with no use of an antiepileptic drug during the first trimester (control group 1), use of valproic acid monotherapy was associated with significantly increased risks for 6 of the 14 malformations under consideration; the adjusted odds ratios were as follows: spina bifida, 12.7 (95% confidence interval [CI], 7.7 to 20.7); atrial septal defect, 2.5 (95% CI, 1.4 to 4.4); cleft palate, 5.2 (95% CI, 2.8 to 9.9); hypospadias, 4.8 (95% CI, 2.9 to 8.1); polydactyly, 2.2 (95% CI, 1.0 to 4.5); and craniosynostosis, 6.8 (95% CI, 1.8 to 18.8). Results for exposure to valproic acid were similar to results for exposure to other antiepileptic drugs. CONCLUSIONS: The use of valproic acid monotherapy in the first trimester was associated with significantly increased risks of several congenital malformations, as compared with no use of antiepileptic drugs or with use of other antiepileptic drugs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Down syndrome (DS) is characterized by extensive phenotypic variability, with most traits occurring in only a fraction of affected individuals. Substantial gene-expression variation is present among unaffected individuals, and this variation has a strong genetic component. Since DS is caused by genomic-dosage imbalance, we hypothesize that gene-expression variation of human chromosome 21 (HSA21) genes in individuals with DS has an impact on the phenotypic variability among affected individuals. We studied gene-expression variation in 14 lymphoblastoid and 17 fibroblast cell lines from individuals with DS and an equal number of controls. Gene expression was assayed using quantitative real-time polymerase chain reaction on 100 and 106 HSA21 genes and 23 and 26 non-HSA21 genes in lymphoblastoid and fibroblast cell lines, respectively. Surprisingly, only 39% and 62% of HSA21 genes in lymphoblastoid and fibroblast cells, respectively, showed a statistically significant difference between DS and normal samples, although the average up-regulation of HSA21 genes was close to the expected 1.5-fold in both cell types. Gene-expression variation in DS and normal samples was evaluated using the Kolmogorov-Smirnov test. According to the degree of overlap in expression levels, we classified all genes into 3 groups: (A) nonoverlapping, (B) partially overlapping, and (C) extensively overlapping expression distributions between normal and DS samples. We hypothesize that, in each cell type, group A genes are the most dosage sensitive and are most likely involved in the constant DS traits, group B genes might be involved in variable DS traits, and group C genes are not dosage sensitive and are least likely to participate in DS pathological phenotypes. This study provides the first extensive data set on HSA21 gene-expression variation in DS and underscores its role in modulating the outcome of gene-dosage imbalance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The most adequate approach for benchmarking web accessibility is manual expert evaluation supplemented by automatic analysis tools. But manual evaluation has a high cost and is impractical to be applied on large websites. In reality, there is no choice but to rely on automated tools when reviewing large web sites for accessibility. The question is: to what extent the results from automatic evaluation of a web site and individual web pages can be used as an approximation for manual results? This paper presents the initial results of an investigation aimed at answering this question. He have performed both manual and automatic evaluations of the accessibility of web pages of two sites and we have compared the results. In our data set automatically retrieved results could most definitely be used as an approximation manual evaluation results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The objective of this work was to evaluate the growth of the mangrove oyster Crassostrea gasar cultured in marine and estuarine environments. Oysters were cultured for 11 months in a longline system in two study sites - São Francisco do Sul and Florianópolis -, in the state of Santa Catarina, Southern Brazil. Water chlorophyll-α concentration, temperature, and salinity were measured weekly. The oysters were measured monthly (shell size and weight gain) to assess growth. At the end of the culture period, the average wet flesh weight, dry flesh weight, and shell weight were determined, as well as the distribution of oysters per size class. Six nonlinear models (logistic, exponential, Gompertz, Brody, Richards, and Von Bertalanffy) were adjusted to the oyster growth data set. Final mean shell sizes were higher in São Francisco do Sul than in Florianópolis. In addition, oysters cultured in São Francisco do Sul were more uniformly distributed in the four size classes than those cultured in Florianópolis. The highest average values of wet flesh weight and shell weight were observed in São Francisco do Sul, whereas dry flesh weight did not differ between the sites. The estuary environment is more promising for the cultivation of oysters.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

To newly identify loci for age at natural menopause, we carried out a meta-analysis of 22 genome-wide association studies (GWAS) in 38,968 women of European descent, with replication in up to 14,435 women. In addition to four known loci, we identified 13 loci newly associated with age at natural menopause (at P < 5 × 10(-8)). Candidate genes located at these newly associated loci include genes implicated in DNA repair (EXO1, HELQ, UIMC1, FAM175A, FANCI, TLK1, POLG and PRIM1) and immune function (IL11, NLRP11 and PRRC2A (also known as BAT2)). Gene-set enrichment pathway analyses using the full GWAS data set identified exoDNase, NF-κB signaling and mitochondrial dysfunction as biological processes related to timing of menopause.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Abstract One of the most important issues in molecular biology is to understand regulatory mechanisms that control gene expression. Gene expression is often regulated by proteins, called transcription factors which bind to short (5 to 20 base pairs),degenerate segments of DNA. Experimental efforts towards understanding the sequence specificity of transcription factors is laborious and expensive, but can be substantially accelerated with the use of computational predictions. This thesis describes the use of algorithms and resources for transcriptionfactor binding site analysis in addressing quantitative modelling, where probabilitic models are built to represent binding properties of a transcription factor and can be used to find new functional binding sites in genomes. Initially, an open-access database(HTPSELEX) was created, holding high quality binding sequences for two eukaryotic families of transcription factors namely CTF/NF1 and LEFT/TCF. The binding sequences were elucidated using a recently described experimental procedure called HTP-SELEX, that allows generation of large number (> 1000) of binding sites using mass sequencing technology. For each HTP-SELEX experiments we also provide accurate primary experimental information about the protein material used, details of the wet lab protocol, an archive of sequencing trace files, and assembled clone sequences of binding sequences. The database also offers reasonably large SELEX libraries obtained with conventional low-throughput protocols.The database is available at http://wwwisrec.isb-sib.ch/htpselex/ and and ftp://ftp.isrec.isb-sib.ch/pub/databases/htpselex. The Expectation-Maximisation(EM) algorithm is one the frequently used methods to estimate probabilistic models to represent the sequence specificity of transcription factors. We present computer simulations in order to estimate the precision of EM estimated models as a function of data set parameters(like length of initial sequences, number of initial sequences, percentage of nonbinding sequences). We observed a remarkable robustness of the EM algorithm with regard to length of training sequences and the degree of contamination. The HTPSELEX database and the benchmarked results of the EM algorithm formed part of the foundation for the subsequent project, where a statistical framework called hidden Markov model has been developed to represent sequence specificity of the transcription factors CTF/NF1 and LEF1/TCF using the HTP-SELEX experiment data. The hidden Markov model framework is capable of both predicting and classifying CTF/NF1 and LEF1/TCF binding sites. A covariance analysis of the binding sites revealed non-independent base preferences at different nucleotide positions, providing insight into the binding mechanism. We next tested the LEF1/TCF model by computing binding scores for a set of LEF1/TCF binding sequences for which relative affinities were determined experimentally using non-linear regression. The predicted and experimentally determined binding affinities were in good correlation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Molecular monitoring of BCR/ABL transcripts by real time quantitative reverse transcription PCR (qRT-PCR) is an essential technique for clinical management of patients with BCR/ABL-positive CML and ALL. Though quantitative BCR/ABL assays are performed in hundreds of laboratories worldwide, results among these laboratories cannot be reliably compared due to heterogeneity in test methods, data analysis, reporting, and lack of quantitative standards. Recent efforts towards standardization have been limited in scope. Aliquots of RNA were sent to clinical test centers worldwide in order to evaluate methods and reporting for e1a2, b2a2, and b3a2 transcript levels using their own qRT-PCR assays. Total RNA was isolated from tissue culture cells that expressed each of the different BCR/ABL transcripts. Serial log dilutions were prepared, ranging from 100 to 10-5, in RNA isolated from HL60 cells. Laboratories performed 5 independent qRT-PCR reactions for each sample type at each dilution. In addition, 15 qRT-PCR reactions of the 10-3 b3a2 RNA dilution were run to assess reproducibility within and between laboratories. Participants were asked to run the samples following their standard protocols and to report cycle threshold (Ct), quantitative values for BCR/ABL and housekeeping genes, and ratios of BCR/ABL to housekeeping genes for each sample RNA. Thirty-seven (n=37) participants have submitted qRT-PCR results for analysis (36, 37, and 34 labs generated data for b2a2, b3a2, and e1a2, respectively). The limit of detection for this study was defined as the lowest dilution that a Ct value could be detected for all 5 replicates. For b2a2, 15, 16, 4, and 1 lab(s) showed a limit of detection at the 10-5, 10-4, 10-3, and 10-2 dilutions, respectively. For b3a2, 20, 13, and 4 labs showed a limit of detection at the 10-5, 10-4, and 10-3 dilutions, respectively. For e1a2, 10, 21, 2, and 1 lab(s) showed a limit of detection at the 10-5, 10-4, 10-3, and 10-2 dilutions, respectively. Log %BCR/ABL ratio values provided a method for comparing results between the different laboratories for each BCR/ABL dilution series. Linear regression analysis revealed concordance among the majority of participant data over the 10-1 to 10-4 dilutions. The overall slope values showed comparable results among the majority of b2a2 (mean=0.939; median=0.9627; range (0.399 - 1.1872)), b3a2 (mean=0.925; median=0.922; range (0.625 - 1.140)), and e1a2 (mean=0.897; median=0.909; range (0.5174 - 1.138)) laboratory results (Fig. 1-3)). Thirty-four (n=34) out of the 37 laboratories reported Ct values for all 15 replicates and only those with a complete data set were included in the inter-lab calculations. Eleven laboratories either did not report their copy number data or used other reporting units such as nanograms or cell numbers; therefore, only 26 laboratories were included in the overall analysis of copy numbers. The median copy number was 348.4, with a range from 15.6 to 547,000 copies (approximately a 4.5 log difference); the median intra-lab %CV was 19.2% with a range from 4.2% to 82.6%. While our international performance evaluation using serially diluted RNA samples has reinforced the fact that heterogeneity exists among clinical laboratories, it has also demonstrated that performance within a laboratory is overall very consistent. Accordingly, the availability of defined BCR/ABL RNAs may facilitate the validation of all phases of quantitative BCR/ABL analysis and may be extremely useful as a tool for monitoring assay performance. Ongoing analyses of these materials, along with the development of additional control materials, may solidify consensus around their application in routine laboratory testing and possible integration in worldwide efforts to standardize quantitative BCR/ABL testing.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recent evidence suggests the human auditory system is organized,like the visual system, into a ventral 'what' pathway, devoted toidentifying objects and a dorsal 'where' pathway devoted to thelocalization of objects in space w1x. Several brain regions have beenidentified in these two different pathways, but until now little isknown about the temporal dynamics of these regions. We investigatedthis issue using 128-channel auditory evoked potentials(AEPs).Stimuli were stationary sounds created by varying interaural timedifferences and environmental real recorded sounds. Stimuli ofeach condition (localization, recognition) were presented throughearphones in a blocked design, while subjects determined theirposition or meaning, respectively.AEPs were analyzed in terms of their topographical scalp potentialdistributions (segmentation maps) and underlying neuronalgenerators (source estimation) w2x.Fourteen scalp potential distributions (maps) best explained theentire data set.Ten maps were nonspecific (associated with auditory stimulationin general), two were specific for sound localization and two werespecific for sound recognition (P-values ranging from 0.02 to0.045).Condition-specific maps appeared at two distinct time periods:;200 ms and ;375-550 ms post-stimulus.The brain sources associated with the maps specific for soundlocalization were mainly situated in the inferior frontal cortices,confirming previous findings w3x. The sources associated withsound recognition were predominantly located in the temporal cortices,with a weaker activation in the frontal cortex.The data show that sound localization and sound recognitionengage different brain networks that are apparent at two distincttime periods.References1. Maeder et al. Neuroimage 2001.2. Michel et al. Brain Research Review 2001.3. Ducommun et al. Neuroimage 2002.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The objective of this work was to select semivariogram models to estimate the population density of fig fly (Zaprionus indianus; Diptera: Drosophilidae) throughout the year, using ordinary kriging. Nineteen monitoring sites were demarcated in an area of 8,200 m2, cropped with six fruit tree species: persimmon, citrus, fig, guava, apple, and peach. During a 24 month period, 106 weekly evaluations were done in these sites. The average number of adult fig flies captured weekly per trap, during each month, was subjected to the circular, spherical, pentaspherical, exponential, Gaussian, rational quadratic, hole effect, K-Bessel, J-Bessel, and stable semivariogram models, using ordinary kriging interpolation. The models with the best fit were selected by cross-validation. Each data set (months) has a particular spatial dependence structure, which makes it necessary to define specific models of semivariograms in order to enhance the adjustment to the experimental semivariogram. Therefore, it was not possible to determine a standard semivariogram model; instead, six theoretical models were selected: circular, Gaussian, hole effect, K-Bessel, J-Bessel, and stable.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Due to the large number of characteristics, there is a need to extract the most relevant characteristicsfrom the input data, so that the amount of information lost in this way is minimal, and the classification realized with the projected data set is relevant with respect to the original data. In order to achieve this feature extraction, different statistical techniques, as well as the principal components analysis (PCA) may be used. This thesis describes an extension of principal components analysis (PCA) allowing the extraction ofa finite number of relevant features from high-dimensional fuzzy data and noisy data. PCA finds linear combinations of the original measurement variables that describe the significant variation in the data. The comparisonof the two proposed methods was produced by using postoperative patient data. Experiment results demonstrate the ability of using the proposed two methods in complex data. Fuzzy PCA was used in the classificationproblem. The classification was applied by using the similarity classifier algorithm where total similarity measures weights are optimized with differential evolution algorithm. This thesis presents the comparison of the classification results based on the obtained data from the fuzzy PCA.