Biblioteca Digital

24 resultados para data availability

em Université de Lausanne, Switzerland

Modular analysis of gene expression data with R.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

SUMMARY: Large sets of data, such as expression profiles from many samples, require analytic tools to reduce their complexity. The Iterative Signature Algorithm (ISA) is a biclustering algorithm. It was designed to decompose a large set of data into so-called 'modules'. In the context of gene expression data, these modules consist of subsets of genes that exhibit a coherent expression profile only over a subset of microarray experiments. Genes and arrays may be attributed to multiple modules and the level of required coherence can be varied resulting in different 'resolutions' of the modular mapping. In this short note, we introduce two BioConductor software packages written in GNU R: The isa2 package includes an optimized implementation of the ISA and the eisa package provides a convenient interface to run the ISA, visualize its output and put the biclusters into biological context. Potential users of these packages are all R and BioConductor users dealing with tabular (e.g. gene expression) data. AVAILABILITY: http://www.unil.ch/cbg/ISA CONTACT: sven.bergmann@unil.ch

Regional-scale debris-flow risk assessment for an alpine valley

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, we perform a societal and economic risk assessment for debris flows at the regional scale, for lower Valtellina, Northern Italy. We apply a simple empirical debris-flow model, FLOW-R, which couples a probabilistic flow routing algorithm with an energy line approach, providing the relative probability of transit, and the maximum kinetic energy, for each cell. By assessing a vulnerability to people and to other exposed elements (buildings, public facilities, crops, woods, communication lines), and their economic value, we calculated the expected annual losses both in terms of lives (societal risk) and goods (direct economic risk). For societal risk assessment, we distinguish for the day and night scenarios. The distribution of people at different moments of the day was considered, accounting for the occupational and recreational activities, to provide a more realistic assessment of risk. Market studies were performed in order to assess a realistic economic value to goods, structures, and lifelines. As terrain unit, a 20 m x 20 m cell was used, in accordance with data availability and the spatial resolution requested for a risk assessment at this scale. Societal risk the whole area amounts to 1.98 and 4.22 deaths/year for the day and the night scenarios, respectively, with a maximum of 0.013 deaths/year/cell. Economic risk for goods amounts to 1,760,291 ?/year, with a maximum of 13,814 ?/year/cell.

microRNAs in colon cancer: a roadmap for discovery.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Cancer omics data are exponentially created and associated with clinical variables, and important findings can be extracted based on bioinformatics approaches which can then be experimentally validated. Many of these findings are related to a specific class of non-coding RNA molecules called microRNAs (miRNAs) (post-transcriptional regulators of mRNA expression). The related research field is quite heterogeneous and bioinformaticians, clinicians, statisticians and biologists, as well as data miners and engineers collaborate to cure stored data and on new impulses coming from the output of the latest Next Generation Sequencing technologies. Here we review the main research findings on miRNA of the first 10 years in colon cancer research with an emphasis on possible uses in clinical practice. This review intends to provide a road map in the jungle of publications of miRNA in colorectal cancer, focusing on data availability and new ways to generate biologically relevant information out of these huge amounts of data.

Essays on the measurement of school efficiency

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Measuring school efficiency is a challenging task. First, a performance measurement technique has to be selected. Within Data Envelopment Analysis (DEA), one such technique, alternative models have been developed in order to deal with environmental variables. The majority of these models lead to diverging results. Second, the choice of input and output variables to be included in the efficiency analysis is often dictated by data availability. The choice of the variables remains an issue even when data is available. As a result, the choice of technique, model and variables is probably, and ultimately, a political judgement. Multi-criteria decision analysis methods can help the decision makers to select the most suitable model. The number of selection criteria should remain parsimonious and not be oriented towards the results of the models in order to avoid opportunistic behaviour. The selection criteria should also be backed by the literature or by an expert group. Once the most suitable model is identified, the principle of permanence of methods should be applied in order to avoid a change of practices over time. Within DEA, the two-stage model developed by Ray (1991) is the most convincing model which allows for an environmental adjustment. In this model, an efficiency analysis is conducted with DEA followed by an econometric analysis to explain the efficiency scores. An environmental variable of particular interest, tested in this thesis, consists of the fact that operations are held, for certain schools, on multiple sites. Results show that the fact of being located on more than one site has a negative influence on efficiency. A likely way to solve this negative influence would consist of improving the use of ICT in school management and teaching. Planning new schools should also consider the advantages of being located on a unique site, which allows reaching a critical size in terms of pupils and teachers. The fact that underprivileged pupils perform worse than privileged pupils has been public knowledge since Coleman et al. (1966). As a result, underprivileged pupils have a negative influence on school efficiency. This is confirmed by this thesis for the first time in Switzerland. Several countries have developed priority education policies in order to compensate for the negative impact of disadvantaged socioeconomic status on school performance. These policies have failed. As a result, other actions need to be taken. In order to define these actions, one has to identify the social-class differences which explain why disadvantaged children underperform. Childrearing and literary practices, health characteristics, housing stability and economic security influence pupil achievement. Rather than allocating more resources to schools, policymakers should therefore focus on related social policies. For instance, they could define pre-school, family, health, housing and benefits policies in order to improve the conditions for disadvantaged children.

Density-based hierarchical clustering of pyro-sequences on a large scale--the case of fungal ITS1.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

MOTIVATION: Analysis of millions of pyro-sequences is currently playing a crucial role in the advance of environmental microbiology. Taxonomy-independent, i.e. unsupervised, clustering of these sequences is essential for the definition of Operational Taxonomic Units. For this application, reproducibility and robustness should be the most sought after qualities, but have thus far largely been overlooked. RESULTS: More than 1 million hyper-variable internal transcribed spacer 1 (ITS1) sequences of fungal origin have been analyzed. The ITS1 sequences were first properly extracted from 454 reads using generalized profiles. Then, otupipe, cd-hit-454, ESPRIT-Tree and DBC454, a new algorithm presented here, were used to analyze the sequences. A numerical assay was developed to measure the reproducibility and robustness of these algorithms. DBC454 was the most robust, closely followed by ESPRIT-Tree. DBC454 features density-based hierarchical clustering, which complements the other methods by providing insights into the structure of the data. AVAILABILITY: An executable is freely available for non-commercial users at ftp://ftp.vital-it.ch/tools/dbc454. It is designed to run under MPI on a cluster of 64-bit Linux machines running Red Hat 4.x, or on a multi-core OSX system. CONTACT: dbc454@vital-it.ch or nicolas.guex@isb-sib.ch.

Assessment of debris-flow susceptibility at medium-scale in the Barcelonnette Basin, France

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Debris flows are among the most dangerous processes in mountainous areas due to their rapid rate of movement and long runout zone. Sudden and rather unexpected impacts produce not only damages to buildings and infrastructure but also threaten human lives. Medium- to regional-scale susceptibility analyses allow the identification of the most endangered areas and suggest where further detailed studies have to be carried out. Since data availability for larger regions is mostly the key limiting factor, empirical models with low data requirements are suitable for first overviews. In this study a susceptibility analysis was carried out for the Barcelonnette Basin, situated in the southern French Alps. By means of a methodology based on empirical rules for source identification and the empirical angle of reach concept for the 2-D runout computation, a worst-case scenario was first modelled. In a second step, scenarios for high, medium and low frequency events were developed. A comparison with the footprints of a few mapped events indicates reasonable results but suggests a high dependency on the quality of the digital elevation model. This fact emphasises the need for a careful interpretation of the results while remaining conscious of the inherent assumptions of the model used and quality of the input data.

Big data and other challenges in the quest for orthologs.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Given the rapid increase of species with a sequenced genome, the need to identify orthologous genes between them has emerged as a central bioinformatics task. Many different methods exist for orthology detection, which makes it difficult to decide which one to choose for a particular application. Here, we review the latest developments and issues in the orthology field, and summarize the most recent results reported at the third 'Quest for Orthologs' meeting. We focus on community efforts such as the adoption of reference proteomes, standard file formats and benchmarking. Progress in these areas is good, and they are already beneficial to both orthology consumers and providers. However, a major current issue is that the massive increase in complete proteomes poses computational challenges to many of the ortholog database providers, as most orthology inference algorithms scale at least quadratically with the number of proteomes. The Quest for Orthologs consortium is an open community with a number of working groups that join efforts to enhance various aspects of orthology analysis, such as defining standard formats and datasets, documenting community resources and benchmarking. AVAILABILITY AND IMPLEMENTATION: All such materials are available at http://questfororthologs.org. CONTACT: erik.sonnhammer@scilifelab.se or c.dessimoz@ucl.ac.uk.

Paper 3: EUROCAT data quality indicators for population-based registries of congenital anomalies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The European Surveillance of Congenital Anomalies (EUROCAT) network of population-based congenital anomaly registries is an important source of epidemiologic information on congenital anomalies in Europe covering live births, fetal deaths from 20 weeks gestation, and terminations of pregnancy for fetal anomaly. EUROCAT's policy is to strive for high-quality data, while ensuring consistency and transparency across all member registries. A set of 30 data quality indicators (DQIs) was developed to assess five key elements of data quality: completeness of case ascertainment, accuracy of diagnosis, completeness of information on EUROCAT variables, timeliness of data transmission, and availability of population denominator information. This article describes each of the individual DQIs and presents the output for each registry as well as the EUROCAT (unweighted) average, for 29 full member registries for 2004-2008. This information is also available on the EUROCAT website for previous years. The EUROCAT DQIs allow registries to evaluate their performance in relation to other registries and allows appropriate interpretations to be made of the data collected. The DQIs provide direction for improving data collection and ascertainment, and they allow annual assessment for monitoring continuous improvement. The DQI are constantly reviewed and refined to best document registry procedures and processes regarding data collection, to ensure appropriateness of DQI, and to ensure transparency so that the data collected can make a substantial and useful contribution to epidemiologic research on congenital anomalies.

Food availability dictates the timing of parturition in insectivorous mouse-eared bats

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Which of these two confounding factors, weather or food availability - that largely correlate and interact - controls the timing of parturition in insectivorous bats? To answer this question. we took advantage of a predator-prey system that offers a unique opportunity to perform natural experiments. The phenology of reproduction of two sibling bat species that inhabit the same colonial roosts, but exploit different feeding niches. was investigated. Myotis myotis feeds mainly on carabid beetles, a food source available from the end of hibernation onwards, whereas bush crickets, the main prey of M. blythii, are not available early in the season due to their successive instars; cockchafers are actually the sole possible alternative prey for M. blythii at that time of the year, but they occur every third year only, independently of local weather conditions. By comparing the species responses to the presence/absence of cockchafers, we could test the hypothesis that food availability, rather than climate. influences the timing of bat parturition. Our data show that Nt. blythii gave birth, on average. 10 d later than M. myotis in years without cockchafers, whilst parturition (1) was synchronous during cockchafer years, and (2) did not show much among-year time variation in M. myotis. This suggests that food availability is the chief factor regulating the timing of parturition in mouse-eared bats.

What spatial data do we need to develop global mammal conservation strategies?

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Spatial data on species distributions are available in two main forms, point locations and distribution maps (polygon ranges and grids). The first are often temporally and spatially biased, and too discontinuous, to be useful (untransformed) in spatial analyses. A variety of modelling approaches are used to transform point locations into maps. We discuss the attributes that point location data and distribution maps must satisfy in order to be useful in conservation planning. We recommend that before point location data are used to produce and/or evaluate distribution models, the dataset should be assessed under a set of criteria, including sample size, age of data, environmental/geographical coverage, independence, accuracy, time relevance and (often forgotten) representation of areas of permanent and natural presence of the species. Distribution maps must satisfy additional attributes if used for conservation analyses and strategies, including minimizing commission and omission errors, credibility of the source/assessors and availability for public screening. We review currently available databases for mammals globally and show that they are highly variable in complying with these attributes. The heterogeneity and weakness of spatial data seriously constrain their utility to global and also sub-global scale conservation analyses.

Ecological correlations between food availability and mortality: more than you can eat?

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Introduction: There is little information regarding the impact of diet on disease incidence and mortality in Switzerland. Objectives: We aimed at assessing the associations between food availability and disease using ecological correlations. Methods: Time-trend ecologic study for period 1970 to 2009. Food availability was measured through the FAO food balance sheets. Standardized mortality rates (SMRs) were obtained from the Swiss Federal Office of Statistics. Cancer incidence data was obtained from the WHO Health for all database and the Vaud cancer registry. Association between food availability and mortality/incidence was assessed at lags 0, 5, 10 and 15 years by Spearman correlation. Results: Alcoholic beverages and fruit availability were positively associated with SMRs from all types of cardiovascular disease, while fish availability was negatively associated. Animal products, meat and animal fats were positively associated with SMR from ischemic heart disease only. For cancers, opposite results were found whether the association used SMRs or incidence rates. For all cancers, alcoholic beverages and fruits were positively associated with SMRs but negatively associated with incidence rates. Similar findings were obtained for all other foods, with the exception of vegetables, which were weakly and negatively associated with SMRs and incidence rates. Finally, a 15 years lag time reversed the association for animal and vegetal products, weakened the association for alcohol and fruits and strengthened the association for fish. Conclusion: Ecologic associations between food availability and disease vary considerably whether mortality or incidence rates are used. Great care should be taken when interpreting the results.

Estimating national-level syringe availability to injecting drug users and injection coverage: Switzerland, 1996-2006.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND: Measuring syringe availability and coverage is essential in the assessment of HIV/AIDS risk reduction policies. Estimates of syringe availability and coverage were produced for the years 1996 and 2006, based on all relevant available national-level aggregated data from published sources. METHODS: We defined availability as the total monthly number of syringes provided by harm reduction system divided by the estimated number of injecting drug users (IDU), and defined coverage as the proportion of injections performed with a new syringe, at national level (total supply over total demand). Estimates of supply of syringes were derived from the national monitoring system, including needle and syringe programmes (NSP), pharmacies, and medically prescribed heroin programmes. Estimates of syringe demand were based on the number of injections performed by IDU derived from surveys of low threshold facilities for drug users (LTF) with NSP combined with the number of IDU. This number was estimated by two methods combining estimates of heroin users (multiple estimation method) and (a) the number of IDU in methadone treatment (MT) (non-injectors) or (b) the proportion of injectors amongst LTF attendees. Central estimates and ranges were obtained for availability and coverage. RESULTS: The estimated number of IDU decreased markedly according to both methods. The MT-based method (from 14,818 to 4809) showed a much greater decrease and smaller size of the IDU population compared to the LTF-based method (from 24,510 to 12,320). Availability and coverage estimates are higher with the MT-based method. For 1996, central estimates of syringe availability were 30.5 and 18.4 per IDU per month; for 2006, they were 76.5 and 29.9. There were 4 central estimates of coverage. For 1996 they ranged from 24.3% to 43.3%, and for 2006, from 50.5% to 134.3%. CONCLUSION: Although 2006 estimates overlap 1996 estimates, the results suggest a shift to improved syringe availability and coverage over time.

Tree-based message diffusion for managing replicated data in unreliable and resource-constrained peer-to-peer environment

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract This thesis proposes a set of adaptive broadcast solutions and an adaptive data replication solution to support the deployment of P2P applications. P2P applications are an emerging type of distributed applications that are running on top of P2P networks. Typical P2P applications are video streaming, file sharing, etc. While interesting because they are fully distributed, P2P applications suffer from several deployment problems, due to the nature of the environment on which they perform. Indeed, defining an application on top of a P2P network often means defining an application where peers contribute resources in exchange for their ability to use the P2P application. For example, in P2P file sharing application, while the user is downloading some file, the P2P application is in parallel serving that file to other users. Such peers could have limited hardware resources, e.g., CPU, bandwidth and memory or the end-user could decide to limit the resources it dedicates to the P2P application a priori. In addition, a P2P network is typically emerged into an unreliable environment, where communication links and processes are subject to message losses and crashes, respectively. To support P2P applications, this thesis proposes a set of services that address some underlying constraints related to the nature of P2P networks. The proposed services include a set of adaptive broadcast solutions and an adaptive data replication solution that can be used as the basis of several P2P applications. Our data replication solution permits to increase availability and to reduce the communication overhead. The broadcast solutions aim, at providing a communication substrate encapsulating one of the key communication paradigms used by P2P applications: broadcast. Our broadcast solutions typically aim at offering reliability and scalability to some upper layer, be it an end-to-end P2P application or another system-level layer, such as a data replication layer. Our contributions are organized in a protocol stack made of three layers. In each layer, we propose a set of adaptive protocols that address specific constraints imposed by the environment. Each protocol is evaluated through a set of simulations. The adaptiveness aspect of our solutions relies on the fact that they take into account the constraints of the underlying system in a proactive manner. To model these constraints, we define an environment approximation algorithm allowing us to obtain an approximated view about the system or part of it. This approximated view includes the topology and the components reliability expressed in probabilistic terms. To adapt to the underlying system constraints, the proposed broadcast solutions route messages through tree overlays permitting to maximize the broadcast reliability. Here, the broadcast reliability is expressed as a function of the selected paths reliability and of the use of available resources. These resources are modeled in terms of quotas of messages translating the receiving and sending capacities at each node. To allow a deployment in a large-scale system, we take into account the available memory at processes by limiting the view they have to maintain about the system. Using this partial view, we propose three scalable broadcast algorithms, which are based on a propagation overlay that tends to the global tree overlay and adapts to some constraints of the underlying system. At a higher level, this thesis also proposes a data replication solution that is adaptive both in terms of replica placement and in terms of request routing. At the routing level, this solution takes the unreliability of the environment into account, in order to maximize reliable delivery of requests. At the replica placement level, the dynamically changing origin and frequency of read/write requests are analyzed, in order to define a set of replica that minimizes communication cost.

AssociationViewer: a scalable and integrated software tool for visualization of large-scale variation data in genomic context.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

SUMMARY: We present a tool designed for visualization of large-scale genetic and genomic data exemplified by results from genome-wide association studies. This software provides an integrated framework to facilitate the interpretation of SNP association studies in genomic context. Gene annotations can be retrieved from Ensembl, linkage disequilibrium data downloaded from HapMap and custom data imported in BED or WIG format. AssociationViewer integrates functionalities that enable the aggregation or intersection of data tracks. It implements an efficient cache system and allows the display of several, very large-scale genomic datasets. AVAILABILITY: The Java code for AssociationViewer is distributed under the GNU General Public Licence and has been tested on Microsoft Windows XP, MacOSX and GNU/Linux operating systems. It is available from the SourceForge repository. This also includes Java webstart, documentation and example datafiles.

ExpressionView--an interactive viewer for modules identified in gene expression data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

SUMMARY: ExpressionView is an R package that provides an interactive graphical environment to explore transcription modules identified in gene expression data. A sophisticated ordering algorithm is used to present the modules with the expression in a visually appealing layout that provides an intuitive summary of the results. From this overview, the user can select individual modules and access biologically relevant metadata associated with them. AVAILABILITY: http://www.unil.ch/cbg/ExpressionView. Screenshots, tutorials and sample data sets can be found on the ExpressionView web site.

«
1
2
»