982 resultados para Dataset


Relevância:

10.00% 10.00%

Publicador:

Resumo:

How can we correlate the neural activity in the human brain as it responds to typed words, with properties of these terms (like ‘edible’, ‘fits in hand’)? In short, we want to find latent variables, that jointly explain both the brain activity, as well as the behavioral responses. This is one of many settings of the Coupled Matrix-Tensor Factorization (CMTF) problem.

Can we accelerate any CMTF solver, so that it runs within a few minutes instead of tens of hours to a day, while maintaining good accuracy? We introduce Turbo-SMT, a meta-method capable of doing exactly that: it boosts the performance of any CMTF algorithm, by up to 200x, along with an up to 65 fold increase in sparsity, with comparable accuracy to the baseline.

We apply Turbo-SMT to BrainQ, a dataset consisting of a (nouns, brain voxels, human subjects) tensor and a (nouns, properties) matrix, with coupling along the nouns dimension. Turbo-SMT is able to find meaningful latent variables, as well as to predict brain activity with competitive accuracy.




Relevância:

10.00% 10.00%

Publicador:

Resumo:

Retrospective clinical datasets are often characterized by a relatively small sample size and many missing data. In this case, a common way for handling the missingness consists in discarding from the analysis patients with missing covariates, further reducing the sample size. Alternatively, if the mechanism that generated the missing allows, incomplete data can be imputed on the basis of the observed data, avoiding the reduction of the sample size and allowing methods to deal with complete data later on. Moreover, methodologies for data imputation might depend on the particular purpose and might achieve better results by considering specific characteristics of the domain. The problem of missing data treatment is studied in the context of survival tree analysis for the estimation of a prognostic patient stratification. Survival tree methods usually address this problem by using surrogate splits, that is, splitting rules that use other variables yielding similar results to the original ones. Instead, our methodology consists in modeling the dependencies among the clinical variables with a Bayesian network, which is then used to perform data imputation, thus allowing the survival tree to be applied on the completed dataset. The Bayesian network is directly learned from the incomplete data using a structural expectation–maximization (EM) procedure in which the maximization step is performed with an exact anytime method, so that the only source of approximation is due to the EM formulation itself. On both simulated and real data, our proposed methodology usually outperformed several existing methods for data imputation and the imputation so obtained improved the stratification estimated by the survival tree (especially with respect to using surrogate splits).

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The melting of high-latitude permafrost peatlands is a major concern due to a potential positive feedback on global climate change. We examine the ecology of testate amoebae in permafrost peatlands, based on sites in Sweden (~ 200 km north of the Arctic Circle). Multivariate statistical analysis confirms that water-table depth and moisture content are the dominant controls on the distribution of testate amoebae, corroborating the results from studies in mid-latitude peatlands. We present a new testate amoeba-based water table transfer function and thoroughly test it for the effects of spatial autocorrelation, clustered sampling design and uneven sampling gradients. We find that the transfer function has good predictive power; the best-performing model is based on tolerance-downweighted weighted averaging with inverse deshrinking (performance statistics with leave-one-out cross validation: R2 = 0.87, RMSEP = 5.25 cm). The new transfer function was applied to a short core from Stordalen mire, and reveals a major shift in peatland ecohydrology coincident with the onset of the Little Ice Age (c. AD 1400). We also applied the model to an independent contemporary dataset from Stordalen and find that it outperforms predictions based on other published transfer functions. The new transfer function will enable palaeohydrological reconstruction from permafrost peatlands in Northern Europe, thereby permitting greatly improved understanding of the long-term ecohydrological dynamics of these important carbon stores as well as their responses to recent climate change.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This randomised controlled trial evaluated the impact of the Lifestart parenting initiative, a five-year home visiting programme, on parent and child outcomes. 424 parents and children aged less than 12 months were recruited from across Ireland and randomly assigned to either the intervention or control group. The intervention group received the programme for five years; the control group did not, but continued as normal. Both groups were tested at three time points: pre-test, mid-point (child aged 3 years) and post-test (child aged 5 years). Post-test data collection is still on-going and will be completed by November 2014. Indicative findings (using available data) are presented here, however the analysis of the full dataset will be presented at the April 2015 meeting.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Mollusks are the most morphologically disparate living animal phylum, they have diversified into all habitats, and have a deep fossil record. Monophyly and identity of their eight living classes is undisputed, but relationships between these groups and patterns of their early radiation have remained elusive. Arguments about traditional morphological phylogeny focus on a small number of topological concepts but often without regard to proximity of the individual classes. In contrast, molecular studies have proposed a number of radically different, inherently contradictory, and controversial sister relationships. Here, we assembled a dataset of 42 unique published trees describing molluscan interrelationships. We used these data to ask several questions about the state of resolution of molluscan phylogeny compared to a null model of the variation possible in random trees constructed from a monophyletic assemblage of eight terminals. Although 27 different unique trees have been proposed from morphological inference, the majority of these are not statistically different from each other. Within the available molecular topologies, only four studies to date have included the deep-sea class Monoplacophora; but 36.4% of all trees are not significantly different. We also present supertrees derived from 2 data partitions and 3 methods, including all available molecular molluscan phylogenies, which will form the basis for future hypothesis testing. The supertrees presented here were not constructed to provide yet another hypothesis of molluscan relationships, but rather to algorithmically evaluate the relationships present in the disparate published topologies. Based on the totality of available evidence, certain patterns of relatedness among constituent taxa become clear. The internodal distance is consistently short between a few taxon pairs, particularly supporting the relatedness of Monoplacophora and the chitons, Polyplacophora. Other taxon pairs are rarely or never found in close proximity, such as the vermiform Caudofoveata and Bivalvia. Our results have specific utility for guiding constructive research planning in order to better test relationships in Mollusca as well as other problematic groups. Taxa with consistently proximate relationships should be the focus of a combined approach in a concerted assessment of potential genetic and anatomical homology, while unequivocally distant taxa will make the most constructive choices for exemplar selection in higher-level phylogenomic analyses.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Ellerman Bombs (EBs) are thought to arise as a result of photospheric magnetic reconnection. We use data from the Swedish 1-m Solar Telescope(SST), to study EB events on the solar disk and at the limb. Both datasets show that EBs are connected to the foot-points of forming chromospheric jets. The limb observations show that a bright structure in the H$\alpha$ blue wing connects to the EB initially fuelling it,leading to the ejection of material upwards. The material moves along a loop structure where a newly formed jet is subsequently observed in the red wing of H$\alpha$. In the disk dataset, an EB initiates a jet which propagates away from the apparent reconnection site within the EB flame.The EB then splits into two, with associated brightenings in the inter-granular lanes (IGLs). Micro-jets are then observed, extending to500 km with a lifetime of a few minutes. Observed velocities of themicro-jets are approximately 5-10 km s$^{-1}$, while their chromospheric counterparts range from 50-80 km s$^{-1}$. MURaM simulations of quiet Sun reconnection show that micro-jets with similar properties to that of the observations follow the line of reconnection in the photosphere,with associated H$\alpha$ brightening at the location of increased temperature.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Clade V nematodes comprise several parasitic species that include the cyathostomins, primary helminth pathogens of horses. Next generation transcriptome datasets are available for eight parasitic clade V nematodes, although no equine parasites are included in this group. Here, we report next generation transcriptome sequencing analysis for the common cyathostomin species, Cylicostephanus goldi. A cDNA library was generated from RNA extracted from 17 C. goldi male and female adult parasites. Following sequencing using a 454 GS FLX pyrosequencer, a total of 475,215 sequencing reads were generated, which were assembled into 26,910 contigs. Using Gene Ontology and Kyoto Encyclopedia of Genes and Genomes databases, 27% of the transcriptome was annotated. Further in-depth analysis was carried out by comparing the C. goldi dataset with the next generation transcriptomes and genomes of other clade V nematodes, with the Oesophagostomum dentatum transcriptome and the Haemonchus contortus genome showing the highest levels of sequence identity with the cyathostomin dataset (45%). The C. goldi transcriptome was mined for genes associated with anthelmintic mode of action and/or resistance. Sequences encoding proteins previously associated with the three major anthelmintic classes used in horses were identified, with the exception of the P-glycoprotein group. Targeted resequencing of the glutamate gated chloride channel α4 subunit (glc-3), one of the primary targets of the macrocyclic lactone anthelmintics, was performed for several cyathostomin species. We believe this study reports the first transcriptome dataset for an equine helminth parasite, providing the opportunity for in-depth analysis of these important parasites at the molecular level. Sequences encoding enzymes involved in key processes and genes associated with levamisole/pyrantel and macrocyclic lactone resistance, in particular the glutamate gated chloride channels, were identified. This novel data will inform cyathostomin biology and anthelmintic resistance studies in future.