974 resultados para Data Interpretation, Statistical
Resumo:
1. Aim - Concerns over how global change will influence species distributions, in conjunction with increased emphasis on understanding niche dynamics in evolutionary and community contexts, highlight the growing need for robust methods to quantify niche differences between or within taxa. We propose a statistical framework to describe and compare environmental niches from occurrence and spatial environmental data.2. Location - Europe, North America, South America3. Methods - The framework applies kernel smoothers to densities of species occurrence in gridded environmental space to calculate metrics of niche overlap and test hypotheses regarding niche conservatism. We use this framework and simulated species with predefined distributions and amounts of niche overlap to evaluate several ordination and species distribution modeling techniques for quantifying niche overlap. We illustrate the approach with data on two well-studied invasive species.4. Results - We show that niche overlap can be accurately detected with the framework when variables driving the distributions are known. The method is robust to known and previously undocumented biases related to the dependence of species occurrences on the frequency of environmental conditions that occur across geographic space. The use of a kernel smoother makes the process of moving from geographical space to multivariate environmental space independent of both sampling effort and arbitrary choice of resolution in environmental space. However, the use of ordination and species distribution model techniques for selecting, combining and weighting variables on which niche overlap is calculated provide contrasting results.5. Main conclusions - The framework meets the increasing need for robust methods to quantify niche differences. It is appropriate to study niche differences between species, subspecies or intraspecific lineages that differ in their geographical distributions. Alternatively, it can be used to measure the degree to which the environmental niche of a species or intraspecific lineage has changed over time.
Resumo:
Com caractersticas morfolgicas e edafo-climticas extremamente diversificadas, a ilha de Santo Anto em Cabo Verde apresenta uma reconhecida vulnerabilidade ambiental a par de uma elevada carncia de estudos cientficos que incidam sobre essa realidade e sirvam de base uma compreenso integrada dos fenmenos. A cartografia digital e as tecnologias de informao geogrfica vm proporcionando um avano tecnolgico na coleco, armazenamento e processamento de dados espaciais. Vrias ferramentas actualmente disponveis permitem modelar uma multiplicidade de factores, localizar e quantificar os fenmenos bem como e definir os nveis de contribuio de diferentes factores no resultado final. No presente estudo, desenvolvido no mbito do curso de ps-graduao e mestrado em sistemas de Informao geogrfica realizado pela Universidade de Trs-os-Montes e Alto Douro, pretende-se contribuir para a minimizao do deficit de informao relativa s caractersticas biofsicas da citada ilha, recorrendo-se aplicao de tecnologias de informao geogrfica e deteco remota, associadas anlise estatstica multivariada. Nesse mbito, foram produzidas e analisadas cartas temticas e desenvolvido um modelo de anlise integrada de dados. Com efeito, a multiplicidade de variveis espaciais produzidas, de entre elas 29 variveis com variao contnua passveis de influenciar as caractersticas biofsicas da regio e, possveis ocorrncias de efeitos mtuos antagnicos ou sinergticos, condicionam uma relativa complexidade interpretao a partir dos dados originais. Visando contornar este problema, recorre-se a uma rede de amostragem sistemtica, totalizando 921 pontos ou repeties, para extrair os dados correspondentes s 29 variveis nos pontos de amostragem e, subsequente desenvolvimento de tcnicas de anlise estatstica multivariada, nomeadamente a anlise em componentes principais. A aplicao destas tcnicas permitiu simplificar e interpretar as varireis originais, normalizando-as e resumindo a informao contida na diversidade de variveis originais, correlacionadas entre si, num conjunto de variveis ortogonais (no correlacionadas), e com nveis de importncia decrescente, as componentes principais. Fixou-se como meta a concentrao de 75% da varincia dos dados originais explicadas pelas primeiras 3 componentes principais e, desenvolveu-se um processo interactivo em diferentes etapas, eliminando sucessivamente as variveis menos representativas. Na ltima etapa do processo as 3 primeiras CP resultaram em 74,54% da varincia dos dados originais explicadas mas, que vieram a demonstrar na fase posterior, serem insuficientes para retratar a realidade. Optou-se pela incluso da 4 CP (CP4), com a qual 84% da referida varincia era explicada e, representando oito variveis biofsicas: a altitude, a densidade hidrogrfica, a densidade de fracturao geolgica, a precipitao, o ndice de vegetao, a temperatura, os recursos hdricos e a distncia rede hidrogrfica. A subsequente interpolao da 1 componente principal (CP1) e, das principais variveis associadas as componentes CP2, CP3 e CP4 como variveis auxiliares, recorrendo a tcnicas geoestatstica em ambiente ArcGIS permitiu a obteno de uma carta representando 84% da variao das caractersticas biofsicas no territrio. A anlise em clusters validada pelo teste t de Student permitiu reclassificar o territrio em 6 unidades biofsicas homogneas. Conclui-se que, as tecnologias de informao geogrfica actualmente disponveis a par de facilitar anlises interactivas e flexveis, possibilitando que se faa variar temas e critrios, integrar novas informaes e introduzir melhorias em modelos construdos com bases em informaes disponveis num determinado contexto, associadas a tcnicas de anlise estatstica multivariada, possibilitam, com base em critrios cientficos, desenvolver a anlise integrada de mltiplas variveis biofsicas cuja correlao entre si, torna complexa a compreenso integrada dos fenmenos.
Resumo:
The primary purpose of this brief is to provide various statistical and institutional details on the development and current status of the public agricultural research system in Cape Verde. This information has been collected and presented in a systematic way in order to inform and thereby improve research policy formulation with regard to the Cape Verdean NARS. Most importantly, these data are assembled and reported in a way that makes them directly comparable with the data presented in the other country briefs in this series. And because institutions take time to develop and there are often considerable lags in the agricultural research process, it is necessary for many analytical and policy purposes to have access to longer-run series of data. NARSs vary markedly in their institutional structure and these institutional aspects can have a substantial and direct effect on their research performance. To provide a basis for analysis and cross-country, over-time comparisons, the various research agencies in a country have been grouped into five general categories; government, semi-public, private, academic, and supranational. A description of these categories is provided in table 1.
Resumo:
The primary purpose of this brief is to provide various statistical and institutional details on the development and current status of the public agricultural research system in Cape Verde. This information has been collected and presented in a systematic way in order to inform and thereby improve research policy formulation with regard to the Cape Verdean NARS. Most importantly, these data are assembled and reported in a way that makes them directly comparable with the data presented in the other country briefs in this series. And because institutions take time to develop and there are often considerable lags in the agricultural research process, it is necessary for many analytical and policy purposes to have access to longer-run series of data. NARSs vary markedly in their institutional structure and these institutional aspects can have a substantial and direct effect on their research performance. To provide a basis for analysis and cross-country, over-time comparisons, the various research agencies in a country have been grouped into five general categories; government, semi-public, private, academic, and supranational. A description of these categories is provided in table 1.
Resumo:
To be diagnostically useful, structural MRI must reliably distinguish Alzheimer's disease (AD) from normal aging in individual scans. Recent advances in statistical learning theory have led to the application of support vector machines to MRI for detection of a variety of disease states. The aims of this study were to assess how successfully support vector machines assigned individual diagnoses and to determine whether data-sets combined from multiple scanners and different centres could be used to obtain effective classification of scans. We used linear support vector machines to classify the grey matter segment of T1-weighted MR scans from pathologically proven AD patients and cognitively normal elderly individuals obtained from two centres with different scanning equipment. Because the clinical diagnosis of mild AD is difficult we also tested the ability of support vector machines to differentiate control scans from patients without post-mortem confirmation. Finally we sought to use these methods to differentiate scans between patients suffering from AD from those with frontotemporal lobar degeneration. Up to 96% of pathologically verified AD patients were correctly classified using whole brain images. Data from different centres were successfully combined achieving comparable results from the separate analyses. Importantly, data from one centre could be used to train a support vector machine to accurately differentiate AD and normal ageing scans obtained from another centre with different subjects and different scanner equipment. Patients with mild, clinically probable AD and age/sex matched controls were correctly separated in 89% of cases which is compatible with published diagnosis rates in the best clinical centres. This method correctly assigned 89% of patients with post-mortem confirmed diagnosis of either AD or frontotemporal lobar degeneration to their respective group. Our study leads to three conclusions: Firstly, support vector machines successfully separate patients with AD from healthy aging subjects. Secondly, they perform well in the differential diagnosis of two different forms of dementia. Thirdly, the method is robust and can be generalized across different centres. This suggests an important role for computer based diagnostic image analysis for clinical practice.
Resumo:
We study the statistical properties of three estimation methods for a model of learning that is often fitted to experimental data: quadratic deviation measures without unobserved heterogeneity, and maximum likelihood withand without unobserved heterogeneity. After discussing identification issues, we show that the estimators are consistent and provide their asymptotic distribution. Using Monte Carlo simulations, we show that ignoring unobserved heterogeneity can lead to seriously biased estimations in samples which have the typical length of actual experiments. Better small sample properties areobtained if unobserved heterogeneity is introduced. That is, rather than estimating the parameters for each individual, the individual parameters are considered random variables, and the distribution of those random variables is estimated.
Resumo:
The singular value decomposition and its interpretation as alinear biplot has proved to be a powerful tool for analysing many formsof multivariate data. Here we adapt biplot methodology to the specifficcase of compositional data consisting of positive vectors each of whichis constrained to have unit sum. These relative variation biplots haveproperties relating to special features of compositional data: the studyof ratios, subcompositions and models of compositional relationships. Themethodology is demonstrated on a data set consisting of six-part colourcompositions in 22 abstract paintings, showing how the singular valuedecomposition can achieve an accurate biplot of the colour ratios and howpossible models interrelating the colours can be diagnosed.
Resumo:
The classical binary classification problem is investigatedwhen it is known in advance that the posterior probability function(or regression function) belongs to some class of functions. We introduceand analyze a method which effectively exploits this knowledge. The methodis based on minimizing the empirical risk over a carefully selected``skeleton'' of the class of regression functions. The skeleton is acovering of the class based on a data--dependent metric, especiallyfitted for classification. A new scale--sensitive dimension isintroduced which is more useful for the studied classification problemthan other, previously defined, dimension measures. This fact isdemonstrated by performance bounds for the skeleton estimate in termsof the new dimension.
Resumo:
BACKGROUND: Prognosis prediction for resected primary colon cancer is based on the T-stage Node Metastasis (TNM) staging system. We investigated if four well-documented gene expression risk scores can improve patient stratification. METHODS: Microarray-based versions of risk-scores were applied to a large independent cohort of 688 stage II/III tumors from the PETACC-3 trial. Prognostic value for relapse-free survival (RFS), survival after relapse (SAR), and overall survival (OS) was assessed by regression analysis. To assess improvement over a reference, prognostic model was assessed with the area under curve (AUC) of receiver operating characteristic (ROC) curves. All statistical tests were two-sided, except the AUC increase. RESULTS: All four risk scores (RSs) showed a statistically significant association (single-test, P < .0167) with OS or RFS in univariate models, but with HRs below 1.38 per interquartile range. Three scores were predictors of shorter RFS, one of shorter SAR. Each RS could only marginally improve an RFS or OS model with the known factors T-stage, N-stage, and microsatellite instability (MSI) status (AUC gains < 0.025 units). The pairwise interscore discordance was never high (maximal Spearman correlation = 0.563) A combined score showed a trend to higher prognostic value and higher AUC increase for OS (HR = 1.74, 95% confidence interval [CI] = 1.44 to 2.10, P < .001, AUC from 0.6918 to 0.7321) and RFS (HR = 1.56, 95% CI = 1.33 to 1.84, P < .001, AUC from 0.6723 to 0.6945) than any single score. CONCLUSIONS: The four tested gene expression-based risk scores provide prognostic information but contribute only marginally to improving models based on established risk factors. A combination of the risk scores might provide more robust information. Predictors of RFS and SAR might need to be different.
Resumo:
Detecting local differences between groups of connectomes is a great challenge in neuroimaging, because the large number of tests that have to be performed and the impact on multiplicity correction. Any available information should be exploited to increase the power of detecting true between-group effects. We present an adaptive strategy that exploits the data structure and the prior information concerning positive dependence between nodes and connections, without relying on strong assumptions. As a first step, we decompose the brain network, i.e., the connectome, into subnetworks and we apply a screening at the subnetwork level. The subnetworks are defined either according to prior knowledge or by applying a data driven algorithm. Given the results of the screening step, a filtering is performed to seek real differences at the node/connection level. The proposed strategy could be used to strongly control either the family-wise error rate or the false discovery rate. We show by means of different simulations the benefit of the proposed strategy, and we present a real application of comparing connectomes of preschool children and adolescents.
Resumo:
Accurate detection of subpopulation size determinations in bimodal populations remains problematic yet it represents a powerful way by which cellular heterogeneity under different environmental conditions can be compared. So far, most studies have relied on qualitative descriptions of population distribution patterns, on population-independent descriptors, or on arbitrary placement of thresholds distinguishing biological ON from OFF states. We found that all these methods fall short of accurately describing small population sizes in bimodal populations. Here we propose a simple, statistics-based method for the analysis of small subpopulation sizes for use in the free software environment R and test this method on real as well as simulated data. Four so-called population splitting methods were designed with different algorithms that can estimate subpopulation sizes from bimodal populations. All four methods proved more precise than previously used methods when analyzing subpopulation sizes of transfer competent cells arising in populations of the bacterium Pseudomonas knackmussii B13. The methods' resolving powers were further explored by bootstrapping and simulations. Two of the methods were not severely limited by the proportions of subpopulations they could estimate correctly, but the two others only allowed accurate subpopulation quantification when this amounted to less than 25% of the total population. In contrast, only one method was still sufficiently accurate with subpopulations smaller than 1% of the total population. This study proposes a number of rational approximations to quantifying small subpopulations and offers an easy-to-use protocol for their implementation in the open source statistical software environment R.
Resumo:
OBJECTIVE: To set-up an international cohort of patients suspected with Behet's disease (BD). The cohort is aimed at defining an algorithm for definition of the disease in children. METHODS: International experts have defined the inclusion criteria as follows: recurrent oral aphthosis (ROA) plus one of following-genital ulceration, erythema nodosum, folliculitis, pustulous/acneiform lesions, positive pathergy test, uveitis, venous/arterial thrombosis and family history of BD. Onset of disease is <16 years, disease duration is ≤3 years, future follow-up duration is ≥4 years and informed consent is obtained. The expert committee has classified the included patients into: definite paediatric BD (PED-BD), probable PED-BD and no PED-BD. Statistical analysis is performed to compare the three groups of patients. Centres document their patients into a single database. RESULTS: At January 2010, 110 patients (56 males/54 females) have been included. Mean age at first symptom: 8.1 years (median 8.2 years). At inclusion, 38% had only one symptom associated with ROA, 31% had two and 31% had three or more symptoms. A total of 106 first evaluations have been done. Seventeen patients underwent the first-year evaluation, and 36 had no new symptoms, 12 had one and 9 had two. Experts have examined 48 files and classified 30 as definite and 18 as probable. Twenty-six patients classified as definite fulfilled the International Study Group criteria. Seventeen patients classified as probable did not meet the international criteria. CONCLUSION: The expert committee has classified the majority of patients in the BD group although they presented with few symptoms independently of BD classification criteria.
Resumo:
This paper exploits an unusual transportation setting to estimate the value of a statistical life(VSL). We estimate the trade-offs individuals are willing to make between mortality risk andcost as they travel to and from the international airport in Sierra Leone (which is separated fromthe capital Freetown by a body of water). Travelers choose from among multiple transportoptions ? namely, ferry, helicopter, hovercraft, and water taxi. The setting and original datasetallow us to address some typical omitted variable concerns in order to generate some of the firstrevealed preference VSL estimates from Africa. The data also allows us to compare VSLestimates for travelers from 56 countries, including 20 African and 36 non-African countries, allfacing the same choice situation. The average VSL estimate for African travelers in the sample isUS$577,000 compared to US$924,000 for non-Africans. Individual characteristics, particularlyjob earnings, can largely account for the difference between Africans and non-Africans; Africansin the sample typically earn somewhat less. There is little evidence that individual VSL estimatesare driven by a lack of information, predicted life expectancy, or cultural norms around risktakingor fatalism. The data implies an income elasticity of the VSL of 1.77. These revealedpreference VSL estimates from a developing country fill an important gap in the existingliterature, and can be used for a variety of public policy purposes, including in current debateswithin Sierra Leone regarding the desirability of constructing new transportation infrastructure.
Resumo:
BACKGROUND: As part of EUROCAT's surveillance of congenital anomalies in Europe, a statistical monitoring system has been developed to detect recent clusters or long-term (10 year) time trends. The purpose of this article is to describe the system for the identification and investigation of 10-year time trends, conceived as a "screening" tool ultimately leading to the identification of trends which may be due to changing teratogenic factors.METHODS: The EUROCAT database consists of all cases of congenital anomalies including livebirths, fetal deaths from 20 weeks gestational age, and terminations of pregnancy for fetal anomaly. Monitoring of 10-year trends is performed for each registry for each of 96 non-independent EUROCAT congenital anomaly subgroups, while Pan-Europe analysis combines data from all registries. The monitoring results are reviewed, prioritized according to a prioritization strategy, and communicated to registries for investigation. Twenty-one registries covering over 4 million births, from 1999 to 2008, were included in monitoring in 2010.CONCLUSIONS: Significant increasing trends were detected for abdominal wall anomalies, gastroschisis, hypospadias, Trisomy 18 and renal dysplasia in the Pan-Europe analysis while 68 increasing trends were identified in individual registries. A decreasing trend was detected in over one-third of anomaly subgroups in the Pan-Europe analysis, and 16.9% of individual registry tests. Registry preliminary investigations indicated that many trends are due to changes in data quality, ascertainment, screening, or diagnostic methods. Some trends are inevitably chance phenomena related to multiple testing, while others seem to represent real and continuing change needing further investigation and response by regional/national public health authorities.
Resumo:
SUMMARY: We present a tool designed for visualization of large-scale genetic and genomic data exemplified by results from genome-wide association studies. This software provides an integrated framework to facilitate the interpretation of SNP association studies in genomic context. Gene annotations can be retrieved from Ensembl, linkage disequilibrium data downloaded from HapMap and custom data imported in BED or WIG format. AssociationViewer integrates functionalities that enable the aggregation or intersection of data tracks. It implements an efficient cache system and allows the display of several, very large-scale genomic datasets. AVAILABILITY: The Java code for AssociationViewer is distributed under the GNU General Public Licence and has been tested on Microsoft Windows XP, MacOSX and GNU/Linux operating systems. It is available from the SourceForge repository. This also includes Java webstart, documentation and example datafiles.