8 resultados para Large datasets
em BORIS: Bern Open Repository and Information System - Berna - Suiça
Resumo:
Well-known data mining algorithms rely on inputs in the form of pairwise similarities between objects. For large datasets it is computationally impossible to perform all pairwise comparisons. We therefore propose a novel approach that uses approximate Principal Component Analysis to efficiently identify groups of similar objects. The effectiveness of the approach is demonstrated in the context of binary classification using the supervised normalized cut as a classifier. For large datasets from the UCI repository, the approach significantly improves run times with minimal loss in accuracy.
Resumo:
This paper analyses local geographical contexts targeted by transnational large-scale land acquisitions (>200 ha per deal) in order to understand how emerging patterns of socio-ecological characteristics can be related to processes of large-scale foreign investment in land. Using a sample of 139 land deals georeferenced with high spatial accuracy, we first analyse their target contexts in terms of land cover, population density, accessibility, and indicators for agricultural potential. Three distinct patterns emerge from the analysis: densely populated and easily accessible croplands (35% of land deals); remote forestlands with lower population densities (34% of land deals); and moderately populated and moderately accessible shrub- or grasslands (26% of land deals). These patterns are consistent with processes described in the relevant case study literature, and they each involve distinct types of stakeholders and associated competition over land. We then repeat the often-cited analysis that postulates a link between land investments and target countries with abundant so-called “idle” or “marginal” lands as measured by yield gap and available suitable but uncultivated land; our methods differ from the earlier approach, however, in that we examine local context (10-km radius) rather than countries as a whole. The results show that earlier findings are disputable in terms of concepts, methods, and contents. Further, we reflect on methodologies for exploring linkages between socioecological patterns and land investment processes. Improving and enhancing large datasets of georeferenced land deals is an important next step; at the same time, careful choice of the spatial scale of analysis is crucial for ensuring compatibility between the spatial accuracy of land deal locations and the resolution of available geospatial data layers. Finally, we argue that new approaches and methods must be developed to empirically link socio-ecological patterns in target contexts to key determinants of land investment processes. This would help to improve the validity and the reach of our findings as an input for evidence-informed policy debates.
Resumo:
HIV virulence, i.e. the time of progression to AIDS, varies greatly among patients. As for other rapidly evolving pathogens of humans, it is difficult to know if this variance is controlled by the genotype of the host or that of the virus because the transmission chain is usually unknown. We apply the phylogenetic comparative approach (PCA) to estimate the heritability of a trait from one infection to the next, which indicates the control of the virus genotype over this trait. The idea is to use viral RNA sequences obtained from patients infected by HIV-1 subtype B to build a phylogeny, which approximately reflects the transmission chain. Heritability is measured statistically as the propensity for patients close in the phylogeny to exhibit similar infection trait values. The approach reveals that up to half of the variance in set-point viral load, a trait associated with virulence, can be heritable. Our estimate is significant and robust to noise in the phylogeny. We also check for the consistency of our approach by showing that a trait related to drug resistance is almost entirely heritable. Finally, we show the importance of taking into account the transmission chain when estimating correlations between infection traits. The fact that HIV virulence is, at least partially, heritable from one infection to the next has clinical and epidemiological implications. The difference between earlier studies and ours comes from the quality of our dataset and from the power of the PCA, which can be applied to large datasets and accounts for within-host evolution. The PCA opens new perspectives for approaches linking clinical data and evolutionary biology because it can be extended to study other traits or other infectious diseases.
Resumo:
Organisms provide some of the most sensitive indicators of climate change and evolutionary responses are becoming apparent in species with short generation times. Large datasets on genetic polymorphism that can provide an historical benchmark against which to test for recent evolutionary responses are very rare, but an exception is found in the brown-lipped banded snail (Cepaea nemoralis). This species is sensitive to its thermal environment and exhibits several polymorphisms of shell colour and banding pattern affecting shell albedo in the majority of populations within its native range in Europe. We tested for evolutionary changes in shell albedo that might have been driven by the warming of the climate in Europe over the last half century by compiling an historical dataset for 6,515 native populations of C. nemoralis and comparing this with new data on nearly 3,000 populations. The new data were sampled mainly in 2009 through the Evolution MegaLab, a citizen science project that engaged thousands of volunteers in 15 countries throughout Europe in the biggest such exercise ever undertaken. A known geographic cline in the frequency of the colour phenotype with the highest albedo (yellow) was shown to have persisted and a difference in colour frequency between woodland and more open habitats was confirmed, but there was no general increase in the frequency of yellow shells. This may have been because snails adapted to a warming climate through behavioural thermoregulation. By contrast, we detected an unexpected decrease in the frequency of Unbanded shells and an increase in the Mid-banded morph. Neither of these evolutionary changes appears to be a direct response to climate change, indicating that the influence of other selective agents, possibly related to changing predation pressure and habitat change with effects on micro-climate.
Resumo:
Abstract: Near-infrared spectroscopy (NIRS) enables the non-invasive measurement of changes in hemodynamics and oxygenation in tissue. Changes in light-coupling due to movement of the subject can cause movement artifacts (MAs) in the recorded signals. Several methods have been developed so far that facilitate the detection and reduction of MAs in the data. However, due to fixed parameter values (e.g., global threshold) none of these methods are perfectly suitable for long-term (i.e., hours) recordings or were not time-effective when applied to large datasets. We aimed to overcome these limitations by automation, i.e., data adaptive thresholding specifically designed for long-term measurements, and by introducing a stable long-term signal reconstruction. Our new technique (“acceleration-based movement artifact reduction algorithm”, AMARA) is based on combining two methods: the “movement artifact reduction algorithm” (MARA, Scholkmann et al. Phys. Meas. 2010, 31, 649–662), and the “accelerometer-based motion artifact removal” (ABAMAR, Virtanen et al. J. Biomed. Opt. 2011, 16, 087005). We describe AMARA in detail and report about successful validation of the algorithm using empirical NIRS data, measured over the prefrontal cortex in adolescents during sleep. In addition, we compared the performance of AMARA to that of MARA and ABAMAR based on validation data.
Resumo:
This paper examines how the geospatial accuracy of samples and sample size influence conclusions from geospatial analyses. It does so using the example of a study investigating the global phenomenon of large-scale land acquisitions and the socio-ecological characteristics of the areas they target. First, we analysed land deal datasets of varying geospatial accuracy and varying sizes and compared the results in terms of land cover, population density, and two indicators for agricultural potential: yield gap and availability of uncultivated land that is suitable for rainfed agriculture. We found that an increase in geospatial accuracy led to a substantial and greater change in conclusions about the land cover types targeted than an increase in sample size, suggesting that using a sample of higher geospatial accuracy does more to improve results than using a larger sample. The same finding emerged for population density, yield gap, and the availability of uncultivated land suitable for rainfed agriculture. Furthermore, the statistical median proved to be more consistent than the mean when comparing the descriptive statistics for datasets of different geospatial accuracy. Second, we analysed effects of geospatial accuracy on estimations regarding the potential for advancing agricultural development in target contexts. Our results show that the target contexts of the majority of land deals in our sample whose geolocation is known with a high level of accuracy contain smaller amounts of suitable, but uncultivated land than regional- and national-scale averages suggest. Consequently, the more target contexts vary within a country, the more detailed the spatial scale of analysis has to be in order to draw meaningful conclusions about the phenomena under investigation. We therefore advise against using national-scale statistics to approximate or characterize phenomena that have a local-scale impact, particularly if key indicators vary widely within a country.
Resumo:
Deep tissue imaging has become state of the art in biology, but now the problem is to quantify spatial information in a global, organ-wide context. Although access to the raw data is no longer a limitation, the computational tools to extract biologically useful information out of these large data sets is still catching up. In many cases, to understand the mechanism behind a biological process, where molecules or cells interact with each other, it is mandatory to know their mutual positions. We illustrate this principle here with the immune system. Although the general functions of lymph nodes as immune sentinels are well described, many cellular and molecular details governing the interactions of lymphocytes and dendritic cells remain unclear to date and prevent an in-depth mechanistic understanding of the immune system. We imaged ex vivo lymph nodes isolated from both wild-type and transgenic mice lacking key factors for dendritic cell positioning and used software written in MATLAB to determine the spatial distances between the dendritic cells and the internal high endothelial vascular network. This allowed us to quantify the spatial localization of the dendritic cells in the lymph node, which is a critical parameter determining the effectiveness of an adaptive immune response.
Resumo:
Transcriptomics could contribute significantly to the early and specific diagnosis of rejection episodes by defining 'molecular Banff' signatures. Recently, the description of pathogenesis-based transcript sets offered a new opportunity for objective and quantitative diagnosis. Generating high-quality transcript panels is thus critical to define high-performance diagnostic classifier. In this study, a comparative analysis was performed across four different microarray datasets of heterogeneous sample collections from two published clinical datasets and two own datasets including biopsies for clinical indication, and samples from nonhuman primates. We characterized a common transcriptional profile of 70 genes, defined as acute rejection transcript set (ARTS). ARTS expression is significantly up-regulated in all AR samples as compared with stable allografts or healthy kidneys, and strongly correlates with the severity of Banff AR types. Similarly, ARTS were tested as a classifier in a large collection of 143 independent biopsies recently published by the University of Alberta. Results demonstrate that the 'in silico' approach applied in this study is able to identify a robust and reliable molecular signature for AR, supporting a specific and sensitive molecular diagnostic approach for renal transplant monitoring.