861 resultados para multiple data sources


Relevância:

80.00% 80.00%

Publicador:

Resumo:

BACKGROUND: With the large amount of biological data that is currently publicly available, many investigators combine multiple data sets to increase the sample size and potentially also the power of their analyses. However, technical differences ("batch effects") as well as differences in sample composition between the data sets may significantly affect the ability to draw generalizable conclusions from such studies. FOCUS: The current study focuses on the construction of classifiers, and the use of cross-validation to estimate their performance. In particular, we investigate the impact of batch effects and differences in sample composition between batches on the accuracy of the classification performance estimate obtained via cross-validation. The focus on estimation bias is a main difference compared to previous studies, which have mostly focused on the predictive performance and how it relates to the presence of batch effects. DATA: We work on simulated data sets. To have realistic intensity distributions, we use real gene expression data as the basis for our simulation. Random samples from this expression matrix are selected and assigned to group 1 (e.g., 'control') or group 2 (e.g., 'treated'). We introduce batch effects and select some features to be differentially expressed between the two groups. We consider several scenarios for our study, most importantly different levels of confounding between groups and batch effects. METHODS: We focus on well-known classifiers: logistic regression, Support Vector Machines (SVM), k-nearest neighbors (kNN) and Random Forests (RF). Feature selection is performed with the Wilcoxon test or the lasso. Parameter tuning and feature selection, as well as the estimation of the prediction performance of each classifier, is performed within a nested cross-validation scheme. The estimated classification performance is then compared to what is obtained when applying the classifier to independent data.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Voting Advice Applications (VAAs) have become a central component of election campaigns worldwide. Through matching political preferences of voters to parties and candidates, the web application grants voters a look into their political mirror and reveals the most suitable political choices to them in terms of policy congruence. Both the dense and concise information on the electoral offer and the comparative nature of the application make VAAs an unprecedented information source for electoral decision making. In times where electoral choices are found to be highly individualized and driven by political issue positions, an ever increasing number of voters turn to VAAs before casting their ballots. With VAAs in high demand, the question of their effects on voters has become a pressing research topic. In various countries, survey research has been used to proclaim an impact of VAAs on electoral behavior, yet practically all studies fail to provide the scientific evidence that would allow for making such claims. In this thesis, I set out to systematically establish the causal link between VAA use and electoral behavior, using various data sources and appropriate statistical techniques in doing so. The focus lies on the Swiss VAA smartvote, introduced in the forefront of the 2003 Swiss federal elections and meanwhile an integral part of the national election campaign, smartvote has produced over a million voting recommendations in the last Swiss federal elections to an active electorate of two million, potentially guiding a vast amount of voters in their choices on the ballot. In order to determine the effect of the VAA on electoral behavior, I analyze both voting preferences and choice among Swiss voters during two consecutive election periods. First, I introduce statistical techniques to adequately examine VAA effects in observational studies and use them to demonstrate that voters who used smartvote prior to the 2007 Swiss federal elections were significantly more likely to swing vote in the elections than non- users. Second, I analyze preference voting during the same election and show that the smartvote voting recommendation inclines politically knowledgeable voters to modify their ballots and cast candidate specific preference votes. Third, to further tackle the indication that smartvote use affects the preference structure of voters, I employ an experimental research design to demonstrate that voters who use the application tend to strengthen their vote propensities for their most preferred party and adapt their overall party preferences in a way that they consider more than one party as eligible vote options after engaging with the application. Finally, vote choice is examined for the 2011 Swiss federal election, showing once more that the VAA initiated a change of party choice among voters. In sum, this thesis presents empirical evidence for the transformative effect of the Swiss VAA smartvote on the electoral behavior.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Among the types of remote sensing acquisitions, optical images are certainly one of the most widely relied upon data sources for Earth observation. They provide detailed measurements of the electromagnetic radiation reflected or emitted by each pixel in the scene. Through a process termed supervised land-cover classification, this allows to automatically yet accurately distinguish objects at the surface of our planet. In this respect, when producing a land-cover map of the surveyed area, the availability of training examples representative of each thematic class is crucial for the success of the classification procedure. However, in real applications, due to several constraints on the sample collection process, labeled pixels are usually scarce. When analyzing an image for which those key samples are unavailable, a viable solution consists in resorting to the ground truth data of other previously acquired images. This option is attractive but several factors such as atmospheric, ground and acquisition conditions can cause radiometric differences between the images, hindering therefore the transfer of knowledge from one image to another. The goal of this Thesis is to supply remote sensing image analysts with suitable processing techniques to ensure a robust portability of the classification models across different images. The ultimate purpose is to map the land-cover classes over large spatial and temporal extents with minimal ground information. To overcome, or simply quantify, the observed shifts in the statistical distribution of the spectra of the materials, we study four approaches issued from the field of machine learning. First, we propose a strategy to intelligently sample the image of interest to collect the labels only in correspondence of the most useful pixels. This iterative routine is based on a constant evaluation of the pertinence to the new image of the initial training data actually belonging to a different image. Second, an approach to reduce the radiometric differences among the images by projecting the respective pixels in a common new data space is presented. We analyze a kernel-based feature extraction framework suited for such problems, showing that, after this relative normalization, the cross-image generalization abilities of a classifier are highly increased. Third, we test a new data-driven measure of distance between probability distributions to assess the distortions caused by differences in the acquisition geometry affecting series of multi-angle images. Also, we gauge the portability of classification models through the sequences. In both exercises, the efficacy of classic physically- and statistically-based normalization methods is discussed. Finally, we explore a new family of approaches based on sparse representations of the samples to reciprocally convert the data space of two images. The projection function bridging the images allows a synthesis of new pixels with more similar characteristics ultimately facilitating the land-cover mapping across images.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

OBJECTIVE: The purpose of this study was to evaluate the effect of structured physical exercise programs during pregnancy on the course of labor and delivery. STUDY DESIGN: We conducted a systematic review and metaanalysis using the following data sources: Medline and The Cochrane Library. In our study, we used randomized controlled trials (RCT) that evaluated the effects of exercise programs during pregnancy on labor and delivery. The results are summarized as relative risks. RESULTS: In the 16 RCTs that were included there were 3359 women. Women in exercise groups had a significantly lower risk of cesarean delivery (relative risk, 0.85; 95% confidence interval [CI], 0.73-0.99). Birthweight was not significantly reduced in exercise groups. The risk of instrumental delivery was similar among groups (relative risk, 1.00; 95% CI, 0.82-1.22). Data on Apgar score, episiotomy, epidural anesthesia, perineal tear, length of labor, and induction of labor were insufficient to draw conclusions. With the use of data from 11 studies (1668 women), our analysis showed that women in the exercise groups gained significantly less weight than women in control groups (mean difference, -1.13 kg; 95% CI, -1.49 to -0.78). CONCLUSION: Structured physical exercise during pregnancy reduces the risk of cesarean delivery. This is an important finding to convince women to be active during their pregnancy and should lead the physician to recommend physical exercise to pregnant women, when this is not contraindicated.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Soil information is needed for managing the agricultural environment. The aim of this study was to apply artificial neural networks (ANNs) for the prediction of soil classes using orbital remote sensing products, terrain attributes derived from a digital elevation model and local geology information as data sources. This approach to digital soil mapping was evaluated in an area with a high degree of lithologic diversity in the Serra do Mar. The neural network simulator used in this study was JavaNNS and the backpropagation learning algorithm. For soil class prediction, different combinations of the selected discriminant variables were tested: elevation, declivity, aspect, curvature, curvature plan, curvature profile, topographic index, solar radiation, LS topographic factor, local geology information, and clay mineral indices, iron oxides and the normalized difference vegetation index (NDVI) derived from an image of a Landsat-7 Enhanced Thematic Mapper Plus (ETM+) sensor. With the tested sets, best results were obtained when all discriminant variables were associated with geological information (overall accuracy 93.2 - 95.6 %, Kappa index 0.924 - 0.951, for set 13). Excluding the variable profile curvature (set 12), overall accuracy ranged from 93.9 to 95.4 % and the Kappa index from 0.932 to 0.948. The maps based on the neural network classifier were consistent and similar to conventional soil maps drawn for the study area, although with more spatial details. The results show the potential of ANNs for soil class prediction in mountainous areas with lithological diversity.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We conduct a large-scale comparative study on linearly combining superparent-one-dependence estimators (SPODEs), a popular family of seminaive Bayesian classifiers. Altogether, 16 model selection and weighing schemes, 58 benchmark data sets, and various statistical tests are employed. This paper's main contributions are threefold. First, it formally presents each scheme's definition, rationale, and time complexity and hence can serve as a comprehensive reference for researchers interested in ensemble learning. Second, it offers bias-variance analysis for each scheme's classification error performance. Third, it identifies effective schemes that meet various needs in practice. This leads to accurate and fast classification algorithms which have an immediate and significant impact on real-world applications. Another important feature of our study is using a variety of statistical tests to evaluate multiple learning methods across multiple data sets.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Recent years have seen a significant increase in understanding of the host genetic and genomic determinants of susceptibility to HIV-1 infection and disease progression, driven in large part by candidate gene studies, genome-wide association studies, genome-wide transcriptome analyses, and large-scale in vitro genome screens. These studies have identified common variants in some host loci that clearly influence disease progression, characterized the scale and dynamics of gene and protein expression changes in response to infection, and provided the first comprehensive catalogs of genes and pathways involved in viral replication. Experimental models of AIDS and studies in natural hosts of primate lentiviruses have complemented and in some cases extended these findings. As the relevant technology continues to progress, the expectation is that such studies will increase in depth (e.g., to include host whole exome and whole genome sequencing) and in breadth (in particular, by integrating multiple data types).

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper investigates the prevalence of incapacity in performing daily activities and the associations between household composition and availability of family members and receipt of care among older adults with functioning problems in Spain, England and the United States of America (USA). We examine how living arrangements, marital status, child availability, limitations in functioning ability, age and gender affect the probability of receiving formal care and informal care from household members and from others in three countries with different family structures, living arrangements and policies supporting care of the incapacitated. Data sources include the 2006 Survey of Health, Ageing and Retirement in Europe for Spain, the third wave of the English Longitudinal Study of Ageing (2006), and the eighth wave of the USA Health and Retirement Study (2006). Logistic and multinomial logistic regressions are used to estimate the probability of receiving care and the sources of care among persons age 50 and older. The percentage of people with functional limitations receiving care is higher in Spain. More care comes from outside the household in the USA and England than in Spain. The use of formal care among the incapacitated is lowest in the USA and highest in Spain.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This article introduces EsPal: a Web-accessible repository containing a comprehensive set of properties of Spanish words. EsPal is based on an extensible set of data sources, beginning with a 300 million token written database and a 460 million token subtitle database. Properties available include word frequency, orthographic structure and neighborhoods, phonological structure and neighborhoods, and subjective ratings such as imageability. Subword structure properties are also available in terms of bigrams and trigrams, bi-phones, and bi-syllables. Lemma and part-of-speech information and their corresponding frequencies are also indexed. The website enables users to either upload a set of words to receive their properties, or to receive a set of words matching constraints on the properties. The properties themselves are easily extensible and will be added over time as they become available. It is freely available from the following website: http://www.bcbl.eu/databases/espal

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In the context of recent attempts to redefine the 'skin notation' concept, a position paper summarizing an international workshop on the topic stated that the skin notation should be a hazard indicator related to the degree of toxicity and the potential for transdermal exposure of a chemical. Within the framework of developing a web-based tool integrating this concept, we constructed a database of 7101 agents for which a percutaneous permeation constant can be estimated (using molecular weight and octanol-water partition constant), and for which at least one of the following toxicity indices could be retrieved: Inhalation occupational exposure limit (n=644), Oral lethal dose 50 (LD50, n=6708), cutaneous LD50 (n=1801), Oral no observed adverse effect level (NOAEL, n=1600), and cutaneous NOAEL (n=187). Data sources included the Registry of toxic effects of chemical substances (RTECS, MDL information systems, Inc.), PHYSPROP (Syracuse Research Corp.) and safety cards from the International Programme on Chemical Safety (IPCS). A hazard index, which corresponds to the product of exposure duration and skin surface exposed that would yield an internal dose equal to a toxic reference dose was calculated. This presentation provides a descriptive summary of the database, correlations between toxicity indices, and an example of how the web tool will help industrial hygienist decide on the possibility of a dermal risk using the hazard index.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

BACKGROUND: Data on the association between subclinical thyroid dysfunction and fractures conflict. PURPOSE: To assess the risk for hip and nonspine fractures associated with subclinical thyroid dysfunction among prospective cohorts. DATA SOURCES: Search of MEDLINE and EMBASE (1946 to 16 March 2014) and reference lists of retrieved articles without language restriction. STUDY SELECTION: Two physicians screened and identified prospective cohorts that measured thyroid function and followed participants to assess fracture outcomes. DATA EXTRACTION: One reviewer extracted data using a standardized protocol, and another verified data. Both reviewers independently assessed methodological quality of the studies. DATA SYNTHESIS: The 7 population-based cohorts of heterogeneous quality included 50,245 participants with 1966 hip and 3281 nonspine fractures. In random-effects models that included the 5 higher-quality studies, the pooled adjusted hazard ratios (HRs) of participants with subclinical hyperthyroidism versus euthyrodism were 1.38 (95% CI, 0.92 to 2.07) for hip fractures and 1.20 (CI, 0.83 to 1.72) for nonspine fractures without statistical heterogeneity (P = 0.82 and 0.52, respectively; I2= 0%). Pooled estimates for the 7 cohorts were 1.26 (CI, 0.96 to 1.65) for hip fractures and 1.16 (CI, 0.95 to 1.42) for nonspine fractures. When thyroxine recipients were excluded, the HRs for participants with subclinical hyperthyroidism were 2.16 (CI, 0.87 to 5.37) for hip fractures and 1.43 (CI, 0.73 to 2.78) for nonspine fractures. For participants with subclinical hypothyroidism, HRs from higher-quality studies were 1.12 (CI, 0.83 to 1.51) for hip fractures and 1.04 (CI, 0.76 to 1.42) for nonspine fractures (P for heterogeneity = 0.69 and 0.88, respectively; I2 = 0%). LIMITATIONS: Selective reporting cannot be excluded. Adjustment for potential common confounders varied and was not adequately done across all studies. CONCLUSION: Subclinical hyperthyroidism might be associated with an increased risk for hip and nonspine fractures, but additional large, high-quality studies are needed. PRIMARY FUNDING SOURCE: Swiss National Science Foundation.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

INTRODUCTION: infants hospitalised in neonatology are inevitably exposed to pain repeatedly. Premature infants are particularly vulnerable, because they are hypersensitive to pain and demonstrate diminished behavioural responses to pain. They are therefore at risk of developing short and long-term complications if pain remains untreated. CONTEXT: compared to acute pain, there is limited evidence in the literature on prolonged pain in infants. However, the prevalence is reported between 20 and 40 %. OBJECTIVE : this single case study aimed to identify the bio-contextual characteristics of neonates who experienced prolonged pain. METHODS : this study was carried out in the neonatal unit of a tertiary referral centre in Western Switzerland. A retrospective data analysis of seven infants' profile, who experienced prolonged pain ,was performed using five different data sources. RESULTS : the mean gestational age of the seven infants was 32weeks. The main diagnosis included prematurity and respiratory distress syndrome. The total observations (N=55) showed that the participants had in average 21.8 (SD 6.9) painful procedures that were estimated to be of moderate to severe intensity each day. Out of the 164 recorded pain scores (2.9 pain assessment/day/infant), 14.6 % confirmed acute pain. Out of those experiencing acute pain, analgesia was given in 16.6 % of them and 79.1 % received no analgesia. CONCLUSION: this study highlighted the difficulty in managing pain in neonates who are exposed to numerous painful procedures. Pain in this population remains underevaluated and as a result undertreated.Results of this study showed that nursing documentation related to pain assessment is not systematic.Regular assessment and documentation of acute and prolonged pain are recommended. This could be achieved with clear guidelines on the Assessment Intervention Reassessment (AIR) cyclewith validated measures adapted to neonates. The adequacy of pain assessment is a pre-requisite for appropriate pain relief in neonates.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In the region of Alt Empordà (Girona), olive groves historically shaped a landscape of high economic value, cultural and environmental although since mid-twentieth century is in sharp decline. One of the municipalities located on the edge of the plana altoampurdanesa that best exemplify this process of abandonment and near disappearance of that crop is Navata. This article analyzes the changes in the olive grove of this municipality in the period 1957-2004, both through the development of mapping land use and land cover level of detail as the analysis of data sources and oral exercise that makes emphasizes the loss of 90% of the existing olive grove in the middle of last century. The frost of February of 1956 was one of the main causes, but not unique. However, both the importance of the olive groves as an identity and conservation of biodiversity as the differential nature of the oil obtained from varieties natives to the area (especially argudell), justify the need to promote policies that will help your recovery

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The Iowa State Profile Tool is a comprehensive, high-level assessment of Iowa’s progress toward a balanced long-term care system – a system that relies less on institutional services and provides greater opportunities for the in-home and community-based services that most people prefer. This report includes long-term support for people of all ages and disability types and is based on a variety of state and federal data sources and interviews with public and private leaders in Iowa’s long-term care system.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We construct a rich dataset covering 47 developing countries over the years 1990-2007, combining several micro and macro level data sources to explore the link between political factors and body mass index (BMI). We implement a heteroskedastic generalized ordered logit model allowing for different covariate effects across the BMI distribution and accounting for the unequal BMI dispersion by geographical area. We find that systems with democratic qualities are more likely to reduce under-weight, but increase overweight/obesity, whereas effective political competition does entail double-benefits in the form of reducing both under-weight and obesity. Our results are robust to the introduction of country fixed effects.