981 resultados para Proportion Data
Resumo:
1. Pearson's correlation coefficient only tests whether the data fit a linear model. With large numbers of observations, quite small values of r become significant and the X variable may only account for a minute proportion of the variance in Y. Hence, the value of r squared should always be calculated and included in a discussion of the significance of r. 2. The use of r assumes that a bivariate normal distribution is present and this assumption should be examined prior to the study. If Pearson's r is not appropriate, then a non-parametric correlation coefficient such as Spearman's rs may be used. 3. A significant correlation should not be interpreted as indicating causation especially in observational studies in which there is a high probability that the two variables are correlated because of their mutual correlations with other variables. 4. In studies of measurement error, there are problems in using r as a test of reliability and the ‘intra-class correlation coefficient’ should be used as an alternative. A correlation test provides only limited information as to the relationship between two variables. Fitting a regression line to the data using the method known as ‘least square’ provides much more information and the methods of regression and their application in optometry will be discussed in the next article.
Resumo:
Exploratory analysis of petroleum geochemical data seeks to find common patterns to help distinguish between different source rocks, oils and gases, and to explain their source, maturity and any intra-reservoir alteration. However, at the outset, one is typically faced with (a) a large matrix of samples, each with a range of molecular and isotopic properties, (b) a spatially and temporally unrepresentative sampling pattern, (c) noisy data and (d) often, a large number of missing values. This inhibits analysis using conventional statistical methods. Typically, visualisation methods like principal components analysis are used, but these methods are not easily able to deal with missing data nor can they capture non-linear structure in the data. One approach to discovering complex, non-linear structure in the data is through the use of linked plots, or brushing, while ignoring the missing data. In this paper we introduce a complementary approach based on a non-linear probabilistic model. Generative topographic mapping enables the visualisation of the effects of very many variables on a single plot, while also dealing with missing data. We show how using generative topographic mapping also provides an optimal method with which to replace missing values in two geochemical datasets, particularly where a large proportion of the data is missing.
Resumo:
Purpose - Measurements obtained from the right and left eye of a subject are often correlated whereas many statistical tests assume observations in a sample are independent. Hence, data collected from both eyes cannot be combined without taking this correlation into account. Current practice is reviewed with reference to articles published in three optometry journals, viz., Ophthalmic and Physiological Optics (OPO), Optometry and Vision Science (OVS), Clinical and Experimental Optometry (CEO) during the period 2009–2012. Recent findings - Of the 230 articles reviewed, 148/230 (64%) obtained data from one eye and 82/230 (36%) from both eyes. Of the 148 one-eye articles, the right eye, left eye, a randomly selected eye, the better eye, the worse or diseased eye, or the dominant eye were all used as selection criteria. Of the 82 two-eye articles, the analysis utilized data from: (1) one eye only rejecting data from the adjacent eye, (2) both eyes separately, (3) both eyes taking into account the correlation between eyes, or (4) both eyes using one eye as a treated or diseased eye, the other acting as a control. In a proportion of studies, data were combined from both eyes without correction. Summary - It is suggested that: (1) investigators should consider whether it is advantageous to collect data from both eyes, (2) if one eye is studied and both are eligible, then it should be chosen at random, and (3) two-eye data can be analysed incorporating eyes as a ‘within subjects’ factor.
Resumo:
It is generally believed that the structural reforms that were introduced in India following the macro-economic crisis of 1991 ushered in competition and forced companies to become more efficient. However, whether the post-1991 growth is an outcome of more efficient use of resources or greater use of factor inputs remains an open empirical question. In this paper, we use plant-level data from 1989–1990 and 2000–2001 to address this question. Our results indicate that while there was an increase in the productivity of factor inputs during the 1990s, most of the growth in value added is explained by growth in the use of factor inputs. We also find that median technical efficiency declined in all but one of the industries between 1989–1990 and 2000–2001, and that change in technical efficiency explains a very small proportion of the change in gross value added.
Resumo:
The article focuses on the labour market situation and opportunities of the Hungarian vocational students. After briefly placing the topic in an international context, the study introduces the findings of the Hungarian empirical researches. Due to the differences between the various national education systems, it is not easy to make international comparisons; therefore I chose former socialist countries with characteristics similar to those of Hungary. When comparing the relevant data, it became clear that obtaining a diploma provides more advantages in Hungary. Hungarian researches suggest that vocational schools mostly attract students with poor competence test scores at the end of primary school. Also a significant proportion of these students are disadvantaged. Vocational students are the most likely to drop out of the system and their return to the school later is sporadic at best. Although a completed VET improves their employment conditions and prospects, many of the graduates will leave their profession or do unskilled labour. Their labour income varies greatly depending on their type of trade and experience gained.
Resumo:
The Buchans ore bodies of central Newfoundland represent some of the highest grade VMS deposits ever mined. These Kuroko-type deposits are also known for the well developed and preserved nature of the mechanically transported deposits. The deposits are hosted in Cambro-Ordovician, dominantly calc-alkaline, bimodal volcanic and epiclastic sequences of the Notre Dame Subzone, Newfoundland Appalachians. Stratigraphic relationships in this zone are complicated by extensively developed, brittledominated Silurian thrust faulting. Hydrothermal alteration of host rocks is a common feature of nearly all VMS deposits, and the recognition of these zones has been a key exploration tool. Alteration of host rocks has long been described to be spatially associated with the Buchans ore bodies, most notably with the larger in-situ deposits. This report represents a base-line study in which a complete documentation of the geochemical variance, in terms of both primary (igneous) and alteration effects, is presented from altered volcanic rocks in the vicinity of the Lucky Strike deposit (LSZ), the largest in-situ deposit in the Buchans camp. Packages of altered rocks also occur away from the immediate mining areas and constitute new targets for exploration. These zones, identified mostly by recent and previous drilling, represent untested targets and include the Powerhouse (PHZ), Woodmans Brook (WBZ) and Airport (APZ) alteration zones, as well as the Middle Branch alteration zone (MBZ), which represents a more distal alteration facies related to Buchans ore-formation. Data from each of these zones were compared to those from the LSZ in order to evaluate their relative propectivity. Derived litho geochemical data served two functions: (i) to define primary (igneous) trends and (ii) secondary alteration trends. Primary trends were established using immobile, or conservative, elements (i. e., HFSE, REE, Th, Ti0₂, Al₂0₃, P₂0₅). From these, altered volcanic rocks were interpreted in terms of composition (e.g., basalt - rhyodacite) and magmatic affinity (e.g., calc-alkaline vs. tholeiitic). The information suggests that bimodality is a common feature of all zones, with most rocks plotting as either basalt/andesite or dacite (or rhyodacite); andesitic senso stricto compositions are rare. Magmatic affinities are more varied and complex, but indicate that all units are arc volcanic sequences. Rocks from the LSZ/MBZ represent a transitional to calc-alkalic sequence, however, a slight shift in key geochemical discriminants occurs between the foot-wall to the hanging-wall. Specifically, mafic and felsic lavas of the foot-wall are of transitional (or mildly calc-alkaline) affinity whereas the hanging-wall rocks are relatively more strongly calc-alkaline as indicated by enriched LREE/HREE and higher ZrN, NbN and other ratios in the latter. The geochemical variations also serve as a means to separate the units (at least the felsic rocks) into hanging-wall and foot-wall sequences, therefore providing a valuable exploration tool. Volcanic rocks from the WBZ/PHZ (and probably the APZ) are more typical of tholeiitic to transitional suites, yielding flatter mantlenormalized REE patterns and lower ZrN ratios. Thus, the relationships between the immediate mining area (represented by LSZ/MBZ) and the Buchans East (PHZ/WBZ) and the APZ are uncertain. Host rocks for all zones consist of mafic to felsic volcanic rocks, though the proportion of pyroclastic and epiclastic rocks, is greatest at the LSZ. Phenocryst assemblages and textures are common in all zones, with minor exceptions, and are not useful for discrimination purposes. Felsic rocks from all zones are dominated by sericiteclay+/- silica alteration, whereas mafic rocks are dominated by chlorite- quartz- sericite alteration. Pyrite is ubiquitous in all moderately altered rocks and minor associated base metal sulphides occur locally. The exception is at Lucky Strike, where stockwork quartzveining contains abundant base-metal mineralization and barite. Rocks completely comprised of chlorite (chloritite) also occur in the LSZ foot-wall. In addition, K-feldspar alteration occurs in felsic volcanic rocks at the MBZ associated with Zn-Pb-Ba and, notably, without chlorite. This zone represents a peripheral, but proximal, zone of alteration induced by lower temperature hydrothermal fluids, presumably with little influence from seawater. Alteration geochemistry was interpreted from raw data as well as from mass balanced (recalculated) data derived from immobile element pairs. The data from the LSZ/MBZ indicate a range in the degree of alteration from only minor to severe modification of precursor compositions. Ba tends to show a strong positive correlation with K₂0, although most Ba occurs as barite. With respect to mass changes, Al₂0₃, Ti0₂ and P₂0₅ were shown to be immobile. Nearly all rocks display mass loss of Na₂O, CaO, and Sr reflecting feldspar destruction. These trends are usually mirrored by K₂0-Rb and MgO addition, indicating sericitic and chloritic alteration, respectively. More substantial gains ofK₂0 often occur in rocks with K-feldspar alteration, whereas a few samples also displayed excessive MgO enrichment and represent chloritites. Fe₂0₃ indicates both chlorite and sulphide formation. Si0₂ addition is almost always the case for the altered mafic rocks as silica often infills amygdules and replaces the finer tuffaceous material. The felsic rocks display more variability in Si0₂. Silicic, sericitic and chloritic alteration trends were observed from the other zones, but not K-feldspar, chloritite, or barite. Microprobe analysis of chlorites, sericites and carbonates indicate: (i) sericites from all zones are defined as muscovite and are not phengitic; (ii) at the LSZ, chlorites ranged from Fe-Mg chlorites (pycnochlorite) to Mg-rich chlorite (penninite), with the latter occurring in the stockwork zone and more proximal alteration facies; (iii) chlorites from the WBZ were typical of those from the more distal alteration facies of the LSZ, plotting as ripidolite to pycnochlorite; (iv) conversely, chlorite from the PHZ plot with Mg-Al-rich compositions (chlinochlore to penninite); and (v) carbonate species from each zone are also varied, with calcite occurring in each zone, in addition to dolomite and ankerite in the PHZ and WBZ, respectively. Lead isotope ratios for galena separates from the different various zones, when combined with data from older studies, tend to cluster into four distinctive fields. Overall, the data plot on a broad mixing line and indicate evolution in a relatively low-μ environment. Data from sulphide stringers in altered MBZ rocks, as well as from clastic sulphides (Sandfill prospect), plot in the Buchans ore field, as do the data for galena from altered rocks in the APZ. Samples from the Buchans East area are even more primitive than the Buchans ores, with lead from the PHZ plotting with the Connel Option prospect and data from the WBZ matching that of the Skidder prospect. A sample from a newly discovered debris flow-type sulphide occurrence (Middle Branch East) yields lead isotope ratios that are slightly more radiogenic than Buchans and plot with the Mary March alteration zone. Data within each cluster are interpreted to represent derivation from individual hydrothermal systems in which metals were derived from a common source.
Resumo:
The Yari-Hotaka Mountain Range is one of the most famous formerly-glaciated areas of Japan. Many glacial landforms remain in three neighbouring U-shaped valleys, named Yarisawa, Yokoo and Migimata. Moraines and outwash terraces can be classified into four groups according to their location and to the amount of glacial quartz grains contained in the deposits. A glaciation is proved for other parts of the Northern Japanese Alps before 100 000 years B.P., but not for the Yari-Hotaka Mountain Range, because the corresponding glacial landforms cannot be found here. The oldest known Ichinomata stage before and after 60 000 years B.P. corresponds to the Yokoo glacial which is proved wirhin the whole Japanese Alps. The three younger stages, Babadaira stage (before 30 000 years B.P.), Yarisawa stage I (about 30000 years B.P.) and Yarisawa stage II (about 15000 years B.P.), belong to the Karasawa glacial. About 10 000 years B.P. the glaciers melted away. At all times the relief-influence was especially important for Ihe mass-balances of Japanese glaciers. Wind-drifted snow from the west-exposed windward slopes to the slopes in eastern (lee) exposition, and a voluminous snow accumulation by avalanches from the high rocky walls onto the glacier surfaces beneath, caused very low situated glaciers as well as low equilibrium-lines. In most cases the snow-lines were situated 100 m or more above the equilibrium-lines. During the Ichinomata stage the snow-line reached an altitude of 2400-2450 m. It rose about 100 m to the Babadaira stage, 300 m to Yarisawa stage I and about 450 m to Yarisawa stage II. At present the snow-line is situated above the Northern Japanese Alps at over 4000 m. Therefore only perennial snow-patches exist. If the snow-line would go down by a few hundred meters, this region would be highly interesting Ifor studies on the beginning of mountain glaciation.
Resumo:
Robust joint modelling is an emerging field of research. Through the advancements in electronic patient healthcare records, the popularly of joint modelling approaches has grown rapidly in recent years providing simultaneous analysis of longitudinal and survival data. This research advances previous work through the development of a novel robust joint modelling methodology for one of the most common types of standard joint models, that which links a linear mixed model with a Cox proportional hazards model. Through t-distributional assumptions, longitudinal outliers are accommodated with their detrimental impact being down weighed and thus providing more efficient and reliable estimates. The robust joint modelling technique and its major benefits are showcased through the analysis of Northern Irish end stage renal disease patients. With an ageing population and growing prevalence of chronic kidney disease within the United Kingdom, there is a pressing demand to investigate the detrimental relationship between the changing haemoglobin levels of haemodialysis patients and their survival. As outliers within the NI renal data were found to have significantly worse survival, identification of outlying individuals through robust joint modelling may aid nephrologists to improve patient's survival. A simulation study was also undertaken to explore the difference between robust and standard joint models in the presence of increasing proportions and extremity of longitudinal outliers. More efficient and reliable estimates were obtained by robust joint models with increasing contrast between the robust and standard joint models when a greater proportion of more extreme outliers are present. Through illustration of the gains in efficiency and reliability of parameters when outliers exist, the potential of robust joint modelling is evident. The research presented in this thesis highlights the benefits and stresses the need to utilise a more robust approach to joint modelling in the presence of longitudinal outliers.
Resumo:
We analyze a real data set pertaining to reindeer fecal pellet-group counts obtained from a survey conducted in a forest area in northern Sweden. In the data set, over 70% of counts are zeros, and there is high spatial correlation. We use conditionally autoregressive random effects for modeling of spatial correlation in a Poisson generalized linear mixed model (GLMM), quasi-Poisson hierarchical generalized linear model (HGLM), zero-inflated Poisson (ZIP), and hurdle models. The quasi-Poisson HGLM allows for both under- and overdispersion with excessive zeros, while the ZIP and hurdle models allow only for overdispersion. In analyzing the real data set, we see that the quasi-Poisson HGLMs can perform better than the other commonly used models, for example, ordinary Poisson HGLMs, spatial ZIP, and spatial hurdle models, and that the underdispersed Poisson HGLMs with spatial correlation fit the reindeer data best. We develop R codes for fitting these models using a unified algorithm for the HGLMs. Spatial count response with an extremely high proportion of zeros, and underdispersion can be successfully modeled using the quasi-Poisson HGLM with spatial random effects.
Resumo:
Cancer and cardio-vascular diseases are the leading causes of death world-wide. Caused by systemic genetic and molecular disruptions in cells, these disorders are the manifestation of profound disturbance of normal cellular homeostasis. People suffering or at high risk for these disorders need early diagnosis and personalized therapeutic intervention. Successful implementation of such clinical measures can significantly improve global health. However, development of effective therapies is hindered by the challenges in identifying genetic and molecular determinants of the onset of diseases; and in cases where therapies already exist, the main challenge is to identify molecular determinants that drive resistance to the therapies. Due to the progress in sequencing technologies, the access to a large genome-wide biological data is now extended far beyond few experimental labs to the global research community. The unprecedented availability of the data has revolutionized the capabilities of computational researchers, enabling them to collaboratively address the long standing problems from many different perspectives. Likewise, this thesis tackles the two main public health related challenges using data driven approaches. Numerous association studies have been proposed to identify genomic variants that determine disease. However, their clinical utility remains limited due to their inability to distinguish causal variants from associated variants. In the presented thesis, we first propose a simple scheme that improves association studies in supervised fashion and has shown its applicability in identifying genomic regulatory variants associated with hypertension. Next, we propose a coupled Bayesian regression approach -- eQTeL, which leverages epigenetic data to estimate regulatory and gene interaction potential, and identifies combinations of regulatory genomic variants that explain the gene expression variance. On human heart data, eQTeL not only explains a significantly greater proportion of expression variance in samples, but also predicts gene expression more accurately than other methods. We demonstrate that eQTeL accurately detects causal regulatory SNPs by simulation, particularly those with small effect sizes. Using various functional data, we show that SNPs detected by eQTeL are enriched for allele-specific protein binding and histone modifications, which potentially disrupt binding of core cardiac transcription factors and are spatially proximal to their target. eQTeL SNPs capture a substantial proportion of genetic determinants of expression variance and we estimate that 58% of these SNPs are putatively causal. The challenge of identifying molecular determinants of cancer resistance so far could only be dealt with labor intensive and costly experimental studies, and in case of experimental drugs such studies are infeasible. Here we take a fundamentally different data driven approach to understand the evolving landscape of emerging resistance. We introduce a novel class of genetic interactions termed synthetic rescues (SR) in cancer, which denotes a functional interaction between two genes where a change in the activity of one vulnerable gene (which may be a target of a cancer drug) is lethal, but subsequently altered activity of its partner rescuer gene restores cell viability. Next we describe a comprehensive computational framework --termed INCISOR-- for identifying SR underlying cancer resistance. Applying INCISOR to mine The Cancer Genome Atlas (TCGA), a large collection of cancer patient data, we identified the first pan-cancer SR networks, composed of interactions common to many cancer types. We experimentally test and validate a subset of these interactions involving the master regulator gene mTOR. We find that rescuer genes become increasingly activated as breast cancer progresses, testifying to pervasive ongoing rescue processes. We show that SRs can be utilized to successfully predict patients' survival and response to the majority of current cancer drugs, and importantly, for predicting the emergence of drug resistance from the initial tumor biopsy. Our analysis suggests a potential new strategy for enhancing the effectiveness of existing cancer therapies by targeting their rescuer genes to counteract resistance. The thesis provides statistical frameworks that can harness ever increasing high throughput genomic data to address challenges in determining the molecular underpinnings of hypertension, cardiovascular disease and cancer resistance. We discover novel molecular mechanistic insights that will advance the progress in early disease prevention and personalized therapeutics. Our analyses sheds light on the fundamental biological understanding of gene regulation and interaction, and opens up exciting avenues of translational applications in risk prediction and therapeutics.
Resumo:
High-throughput screening of physical, genetic and chemical-genetic interactions brings important perspectives in the Systems Biology field, as the analysis of these interactions provides new insights into protein/gene function, cellular metabolic variations and the validation of therapeutic targets and drug design. However, such analysis depends on a pipeline connecting different tools that can automatically integrate data from diverse sources and result in a more comprehensive dataset that can be properly interpreted. We describe here the Integrated Interactome System (IIS), an integrative platform with a web-based interface for the annotation, analysis and visualization of the interaction profiles of proteins/genes, metabolites and drugs of interest. IIS works in four connected modules: (i) Submission module, which receives raw data derived from Sanger sequencing (e.g. two-hybrid system); (ii) Search module, which enables the user to search for the processed reads to be assembled into contigs/singlets, or for lists of proteins/genes, metabolites and drugs of interest, and add them to the project; (iii) Annotation module, which assigns annotations from several databases for the contigs/singlets or lists of proteins/genes, generating tables with automatic annotation that can be manually curated; and (iv) Interactome module, which maps the contigs/singlets or the uploaded lists to entries in our integrated database, building networks that gather novel identified interactions, protein and metabolite expression/concentration levels, subcellular localization and computed topological metrics, GO biological processes and KEGG pathways enrichment. This module generates a XGMML file that can be imported into Cytoscape or be visualized directly on the web. We have developed IIS by the integration of diverse databases following the need of appropriate tools for a systematic analysis of physical, genetic and chemical-genetic interactions. IIS was validated with yeast two-hybrid, proteomics and metabolomics datasets, but it is also extendable to other datasets. IIS is freely available online at: http://www.lge.ibi.unicamp.br/lnbio/IIS/.
Resumo:
Since insect species are poikilothermic organisms, they generally exhibit different growth patterns depending on the temperature at which they develop. This factor is important in forensic entomology, especially for estimating postmortem interval (PMI) when it is based on the developmental time of the insects reared in decomposing bodies. This study aimed to estimate the rates of development, viability, and survival of immatures of Sarcophaga (Liopygia) ruficornis (Fabricius 1794) and Microcerella halli (Engel 1931) (Diptera: Sarcophagidae) reared in different temperatures: 10, 15, 20, 25, 30, and 35 ± 1 °C. Bovine raw ground meat was offered as food for all experimental groups, each consisting of four replicates, in the proportion of 2 g/larva. To measure the evolution of growth, ten specimens of each group were randomly chosen and weighed every 12 h, from initial feeding larva to pupae, and then discarded. Considering the records of weight gain, survival rates, and stability of growth rates, the range of optimum temperature for the development of S. (L.) ruficornis is between 20 and 35 °C, and that of M. halli is between 20 and 25 °C. For both species, the longest times of development were in the lowest temperatures. The survival rate at extreme temperatures (10 and 35 °C) was lower in both species. Biological data such as the ones obtained in this study are of great importance to achieve a more accurate estimate of the PMI.