861 resultados para multiple data sources
Resumo:
Imaging mass spectrometry (IMS) represents an innovative tool in the cancer research pipeline, which is increasingly being used in clinical and pharmaceutical applications. The unique properties of the technique, especially the amount of data generated, make the handling of data from multiple IMS acquisitions challenging. This work presents a histology-driven IMS approach aiming to identify discriminant lipid signatures from the simultaneous mining of IMS data sets from multiple samples. The feasibility of the developed workflow is evaluated on a set of three human colorectal cancer liver metastasis (CRCLM) tissue sections. Lipid IMS on tissue sections was performed using MALDI-TOF/TOF MS in both negative and positive ionization modes after 1,5-diaminonaphthalene matrix deposition by sublimation. The combination of both positive and negative acquisition results was performed during data mining to simplify the process and interrogate a larger lipidome into a single analysis. To reduce the complexity of the IMS data sets, a sub data set was generated by randomly selecting a fixed number of spectra from a histologically defined region of interest, resulting in a 10-fold data reduction. Principal component analysis confirmed that the molecular selectivity of the regions of interest is maintained after data reduction. Partial least-squares and heat map analyses demonstrated a selective signature of the CRCLM, revealing lipids that are significantly up- and down-regulated in the tumor region. This comprehensive approach is thus of interest for defining disease signatures directly from IMS data sets by the use of combinatory data mining, opening novel routes of investigation for addressing the demands of the clinical setting.
Resumo:
Given the very large amount of data obtained everyday through population surveys, much of the new research again could use this information instead of collecting new samples. Unfortunately, relevant data are often disseminated into different files obtained through different sampling designs. Data fusion is a set of methods used to combine information from different sources into a single dataset. In this article, we are interested in a specific problem: the fusion of two data files, one of which being quite small. We propose a model-based procedure combining a logistic regression with an Expectation-Maximization algorithm. Results show that despite the lack of data, this procedure can perform better than standard matching procedures.
Resumo:
BACKGROUND Multiple sclerosis (MS) is a neurodegenerative, autoimmune disease of the central nervous system. Genome-wide association studies (GWAS) have identified over hundred polymorphisms with modest individual effects in MS susceptibility and they have confirmed the main individual effect of the Major Histocompatibility Complex. Additional risk loci with immunologically relevant genes were found significantly overrepresented. Nonetheless, it is accepted that most of the genetic architecture underlying susceptibility to the disease remains to be defined. Candidate association studies of the leukocyte immunoglobulin-like receptor LILRA3 gene in MS have been repeatedly reported with inconsistent results. OBJECTIVES In an attempt to shed some light on these controversial findings, a combined analysis was performed including the previously published datasets and three newly genotyped cohorts. Both wild-type and deleted LILRA3 alleles were discriminated in a single-tube PCR amplification and the resulting products were visualized by their different electrophoretic mobilities. RESULTS AND CONCLUSION Overall, this meta-analysis involved 3200 MS patients and 3069 matched healthy controls and it did not evidence significant association of the LILRA3 deletion [carriers of LILRA3 deletion: p = 0.25, OR (95% CI) = 1.07 (0.95-1.19)], even after stratification by gender and the HLA-DRB1*15:01 risk allele.
Resumo:
The sparsely spaced highly permeable fractures of the granitic rock aquifer at Stang-er-Brune (Brittany, France) form a well-connected fracture network of high permeability but unknown geometry. Previous work based on optical and acoustic logging together with single-hole and cross-hole flowmeter data acquired in 3 neighbouring boreholes (70-100 m deep) has identified the most important permeable fractures crossing the boreholes and their hydraulic connections. To constrain possible flow paths by estimating the geometries of known and previously unknown fractures, we have acquired, processed and interpreted multifold, single- and cross-hole GPR data using 100 and 250 MHz antennas. The GPR data processing scheme consisting of timezero corrections, scaling, bandpass filtering and F-X deconvolution, eigenvector filtering, muting, pre-stack Kirchhoff depth migration and stacking was used to differentiate fluid-filled fracture reflections from source generated noise. The final stacked and pre-stack depth-migrated GPR sections provide high-resolution images of individual fractures (dipping 30-90°) in the surroundings (2-20 m for the 100 MHz antennas; 2-12 m for the 250 MHz antennas) of each borehole in a 2D plane projection that are of superior quality to those obtained from single-offset sections. Most fractures previously identified from hydraulic testing can be correlated to reflections in the single-hole data. Several previously unknown major near vertical fractures have also been identified away from the boreholes.
Resumo:
When continuous data are coded to categorical variables, two types of coding are possible: crisp coding in the form of indicator, or dummy, variables with values either 0 or 1; or fuzzy coding where each observation is transformed to a set of "degrees of membership" between 0 and 1, using co-called membership functions. It is well known that the correspondence analysis of crisp coded data, namely multiple correspondence analysis, yields principal inertias (eigenvalues) that considerably underestimate the quality of the solution in a low-dimensional space. Since the crisp data only code the categories to which each individual case belongs, an alternative measure of fit is simply to count how well these categories are predicted by the solution. Another approach is to consider multiple correspondence analysis equivalently as the analysis of the Burt matrix (i.e., the matrix of all two-way cross-tabulations of the categorical variables), and then perform a joint correspondence analysis to fit just the off-diagonal tables of the Burt matrix - the measure of fit is then computed as the quality of explaining these tables only. The correspondence analysis of fuzzy coded data, called "fuzzy multiple correspondence analysis", suffers from the same problem, albeit attenuated. Again, one can count how many correct predictions are made of the categories which have highest degree of membership. But here one can also defuzzify the results of the analysis to obtain estimated values of the original data, and then calculate a measure of fit in the familiar percentage form, thanks to the resultant orthogonal decomposition of variance. Furthermore, if one thinks of fuzzy multiple correspondence analysis as explaining the two-way associations between variables, a fuzzy Burt matrix can be computed and the same strategy as in the crisp case can be applied to analyse the off-diagonal part of this matrix. In this paper these alternative measures of fit are defined and applied to a data set of continuous meteorological variables, which are coded crisply and fuzzily into three categories. Measuring the fit is further discussed when the data set consists of a mixture of discrete and continuous variables.
Resumo:
Yosemite Valley poses significant rockfall hazard and related risk due to its glacially steepened walls and approximately 4 million visitors annually. To assess rockfall hazard, it is necessary to evaluate the geologic structure that contributes to the destabilization of rockfall sources and locate the most probable future source areas. Coupling new remote sensing techniques (Terrestrial Laser Scanning, Aerial Laser Scanning) and traditional field surveys, we investigated the regional geologic and structural setting, the orientation of the primary discontinuity sets for large areas of Yosemite Valley, and the specific discontinuity sets present at active rockfall sources. This information, combined with better understanding of the geologic processes that contribute to the progressive destabilization and triggering of granitic rock slabs, contributes to a more accurate rockfall susceptibility assessment for Yosemite Valley and elsewhere.
Resumo:
Interspecific competition, life history traits, environmental heterogeneity and spatial structure as well as disturbance are known to impact the successful dispersal strategies in metacommunities. However, studies on the direction of impact of those factors on dispersal have yielded contradictory results and often considered only few competing dispersal strategies at the same time. We used a unifying modeling approach to contrast the combined effects of species traits (adult survival, specialization), environmental heterogeneity and structure (spatial autocorrelation, habitat availability) and disturbance on the selected, maintained and coexisting dispersal strategies in heterogeneous metacommunities. Using a negative exponential dispersal kernel, we allowed for variation of both species dispersal distance and dispersal rate. We showed that strong disturbance promotes species with high dispersal abilities, while low local adult survival and habitat availability select against them. Spatial autocorrelation favors species with higher dispersal ability when adult survival and disturbance rate are low, and selects against them in the opposite situation. Interestingly, several dispersal strategies coexist when disturbance and adult survival act in opposition, as for example when strong disturbance regime favors species with high dispersal abilities while low adult survival selects species with low dispersal. Our results unify apparently contradictory previous results and demonstrate that spatial structure, disturbance and adult survival determine the success and diversity of coexisting dispersal strategies in competing metacommunities.
Resumo:
OBJECT: To study a scan protocol for coronary magnetic resonance angiography based on multiple breath-holds featuring 1D motion compensation and to compare the resulting image quality to a navigator-gated free-breathing acquisition. Image reconstruction was performed using L1 regularized iterative SENSE. MATERIALS AND METHODS: The effects of respiratory motion on the Cartesian sampling scheme were minimized by performing data acquisition in multiple breath-holds. During the scan, repetitive readouts through a k-space center were used to detect and correct the respiratory displacement of the heart by exploiting the self-navigation principle in image reconstruction. In vivo experiments were performed in nine healthy volunteers and the resulting image quality was compared to a navigator-gated reference in terms of vessel length and sharpness. RESULTS: Acquisition in breath-hold is an effective method to reduce the scan time by more than 30 % compared to the navigator-gated reference. Although an equivalent mean image quality with respect to the reference was achieved with the proposed method, the 1D motion compensation did not work equally well in all cases. CONCLUSION: In general, the image quality scaled with the robustness of the motion compensation. Nevertheless, the featured setup provides a positive basis for future extension with more advanced motion compensation methods.
Resumo:
It is common in econometric applications that several hypothesis tests arecarried out at the same time. The problem then becomes how to decide whichhypotheses to reject, accounting for the multitude of tests. In this paper,we suggest a stepwise multiple testing procedure which asymptoticallycontrols the familywise error rate at a desired level. Compared to relatedsingle-step methods, our procedure is more powerful in the sense that itoften will reject more false hypotheses. In addition, we advocate the useof studentization when it is feasible. Unlike some stepwise methods, ourmethod implicitly captures the joint dependence structure of the teststatistics, which results in increased ability to detect alternativehypotheses. We prove our method asymptotically controls the familywise errorrate under minimal assumptions. We present our methodology in the context ofcomparing several strategies to a common benchmark and deciding whichstrategies actually beat the benchmark. However, our ideas can easily beextended and/or modied to other contexts, such as making inference for theindividual regression coecients in a multiple regression framework. Somesimulation studies show the improvements of our methods over previous proposals. We also provide an application to a set of real data.
Resumo:
Diverse sources of GABAergic inhibition are a major feature of cortical networks, but distinct inhibitory input systems have not been systematically characterized in the thalamus. Here, we contrasted the properties of two independent GABAergic pathways in the posterior thalamic nucleus of rat, one input from the reticular thalamic nucleus (nRT), and one "extrareticular" input from the anterior pretectal nucleus (APT). The vast majority of nRT-thalamic terminals formed single synapses per postsynaptic target and innervated thin distal dendrites of relay cells. In contrast, single APT-thalamic terminals formed synaptic contacts exclusively via multiple, closely spaced synapses on thick relay cell dendrites. Quantal analysis demonstrated that the two inputs displayed comparable quantal amplitudes, release probabilities, and multiple release sites. The morphological and physiological data together indicated multiple, single-site contacts for nRT and multisite contacts for APT axons. The contrasting synaptic arrangements of the two pathways were paralleled by different short-term plasticities. The multisite APT-thalamic pathway showed larger charge transfer during 50-100 Hz stimulation compared with the nRT pathway and a greater persistent inhibition accruing during stimulation trains. Our results demonstrate that the two inhibitory systems are morpho-functionally distinct and suggest and that multisite GABAergic terminals are tailored for maintained synaptic inhibition even at high presynaptic firing rates. These data explain the efficacy of extrareticular inhibition in timing relay cell activity in sensory and motor thalamic nuclei. Finally, based on the classic nomenclature and the difference between reticular and extrareticular terminals, we define a novel, multisite GABAergic terminal type (F3) in the thalamus.
Resumo:
ABSTRACT In recent years, geotechnologies as remote and proximal sensing and attributes derived from digital terrain elevation models indicated to be very useful for the description of soil variability. However, these information sources are rarely used together. Therefore, a methodology for assessing and specialize soil classes using the information obtained from remote/proximal sensing, GIS and technical knowledge has been applied and evaluated. Two areas of study, in the State of São Paulo, Brazil, totaling approximately 28.000 ha were used for this work. First, in an area (area 1), conventional pedological mapping was done and from the soil classes found patterns were obtained with the following information: a) spectral information (forms of features and absorption intensity of spectral curves with 350 wavelengths -2,500 nm) of soil samples collected at specific points in the area (according to each soil type); b) obtaining equations for determining chemical and physical properties of the soil from the relationship between the results obtained in the laboratory by the conventional method, the levels of chemical and physical attributes with the spectral data; c) supervised classification of Landsat TM 5 images, in order to detect changes in the size of the soil particles (soil texture); d) relationship between classes relief soils and attributes. Subsequently, the obtained patterns were applied in area 2 obtain pedological classification of soils, but in GIS (ArcGIS). Finally, we developed a conventional pedological mapping in area 2 to which was compared with a digital map, ie the one obtained only with pre certain standards. The proposed methodology had a 79 % accuracy in the first categorical level of Soil Classification System, 60 % accuracy in the second category level and became less useful in the categorical level 3 (37 % accuracy).
Resumo:
The paper presents the Multiple Kernel Learning (MKL) approach as a modelling and data exploratory tool and applies it to the problem of wind speed mapping. Support Vector Regression (SVR) is used to predict spatial variations of the mean wind speed from terrain features (slopes, terrain curvature, directional derivatives) generated at different spatial scales. Multiple Kernel Learning is applied to learn kernels for individual features and thematic feature subsets, both in the context of feature selection and optimal parameters determination. An empirical study on real-life data confirms the usefulness of MKL as a tool that enhances the interpretability of data-driven models.
Resumo:
Background: Research in epistasis or gene-gene interaction detection for human complex traits has grown over the last few years. It has been marked by promising methodological developments, improved translation efforts of statistical epistasis to biological epistasis and attempts to integrate different omics information sources into the epistasis screening to enhance power. The quest for gene-gene interactions poses severe multiple-testing problems. In this context, the maxT algorithm is one technique to control the false-positive rate. However, the memory needed by this algorithm rises linearly with the amount of hypothesis tests. Gene-gene interaction studies will require a memory proportional to the squared number of SNPs. A genome-wide epistasis search would therefore require terabytes of memory. Hence, cache problems are likely to occur, increasing the computation time. In this work we present a new version of maxT, requiring an amount of memory independent from the number of genetic effects to be investigated. This algorithm was implemented in C++ in our epistasis screening software MBMDR-3.0.3. We evaluate the new implementation in terms of memory efficiency and speed using simulated data. The software is illustrated on real-life data for Crohn’s disease. Results: In the case of a binary (affected/unaffected) trait, the parallel workflow of MBMDR-3.0.3 analyzes all gene-gene interactions with a dataset of 100,000 SNPs typed on 1000 individuals within 4 days and 9 hours, using 999 permutations of the trait to assess statistical significance, on a cluster composed of 10 blades, containing each four Quad-Core AMD Opteron(tm) Processor 2352 2.1 GHz. In the case of a continuous trait, a similar run takes 9 days. Our program found 14 SNP-SNP interactions with a multiple-testing corrected p-value of less than 0.05 on real-life Crohn’s disease (CD) data. Conclusions: Our software is the first implementation of the MB-MDR methodology able to solve large-scale SNP-SNP interactions problems within a few days, without using much memory, while adequately controlling the type I error rates. A new implementation to reach genome-wide epistasis screening is under construction. In the context of Crohn’s disease, MBMDR-3.0.3 could identify epistasis involving regions that are well known in the field and could be explained from a biological point of view. This demonstrates the power of our software to find relevant phenotype-genotype higher-order associations.