103 resultados para data mules


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: Systematic approaches for identifying proteins involved in different types of cancer are needed. Experimental techniques such as microarrays are being used to characterize cancer, but validating their results can be a laborious task. Computational approaches are used to prioritize between genes putatively involved in cancer, usually based on further analyzing experimental data. Results: We implemented a systematic method using the PIANA software that predicts cancer involvement of genes by integrating heterogeneous datasets. Specifically, we produced lists of genes likely to be involved in cancer by relying on: (i) protein-protein interactions; (ii) differential expression data; and (iii) structural and functional properties of cancer genes. The integrative approach that combines multiple sources of data obtained positive predictive values ranging from 23% (on a list of 811 genes) to 73% (on a list of 22 genes), outperforming the use of any of the data sources alone. We analyze a list of 20 cancer gene predictions, finding that most of them have been recently linked to cancer in literature. Conclusion: Our approach to identifying and prioritizing candidate cancer genes can be used to produce lists of genes likely to be involved in cancer. Our results suggest that differential expression studies yielding high numbers of candidate cancer genes can be filtered using protein interaction networks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The increasing volume of data describing humandisease processes and the growing complexity of understanding, managing, and sharing such data presents a huge challenge for clinicians and medical researchers. This paper presents the@neurIST system, which provides an infrastructure for biomedical research while aiding clinical care, by bringing together heterogeneous data and complex processing and computing services. Although @neurIST targets the investigation and treatment of cerebral aneurysms, the system’s architecture is generic enough that it could be adapted to the treatment of other diseases.Innovations in @neurIST include confining the patient data pertaining to aneurysms inside a single environment that offers cliniciansthe tools to analyze and interpret patient data and make use of knowledge-based guidance in planning their treatment. Medicalresearchers gain access to a critical mass of aneurysm related data due to the system’s ability to federate distributed informationsources. A semantically mediated grid infrastructure ensures that both clinicians and researchers are able to seamlessly access andwork on data that is distributed across multiple sites in a secure way in addition to providing computing resources on demand forperforming computationally intensive simulations for treatment planning and research.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The spectral efficiency achievable with joint processing of pilot and data symbol observations is compared with that achievable through the conventional (separate) approach of first estimating the channel on the basis of the pilot symbols alone, and subsequently detecting the datasymbols. Studied on the basis of a mutual information lower bound, joint processing is found to provide a non-negligible advantage relative to separate processing, particularly for fast fading. It is shown that, regardless of the fading rate, only a very small number of pilot symbols (at most one per transmit antenna and per channel coherence interval) shouldbe transmitted if joint processing is allowed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

When continuous data are coded to categorical variables, two types of coding are possible: crisp coding in the form of indicator, or dummy, variables with values either 0 or 1; or fuzzy coding where each observation is transformed to a set of "degrees of membership" between 0 and 1, using co-called membership functions. It is well known that the correspondence analysis of crisp coded data, namely multiple correspondence analysis, yields principal inertias (eigenvalues) that considerably underestimate the quality of the solution in a low-dimensional space. Since the crisp data only code the categories to which each individual case belongs, an alternative measure of fit is simply to count how well these categories are predicted by the solution. Another approach is to consider multiple correspondence analysis equivalently as the analysis of the Burt matrix (i.e., the matrix of all two-way cross-tabulations of the categorical variables), and then perform a joint correspondence analysis to fit just the off-diagonal tables of the Burt matrix - the measure of fit is then computed as the quality of explaining these tables only. The correspondence analysis of fuzzy coded data, called "fuzzy multiple correspondence analysis", suffers from the same problem, albeit attenuated. Again, one can count how many correct predictions are made of the categories which have highest degree of membership. But here one can also defuzzify the results of the analysis to obtain estimated values of the original data, and then calculate a measure of fit in the familiar percentage form, thanks to the resultant orthogonal decomposition of variance. Furthermore, if one thinks of fuzzy multiple correspondence analysis as explaining the two-way associations between variables, a fuzzy Burt matrix can be computed and the same strategy as in the crisp case can be applied to analyse the off-diagonal part of this matrix. In this paper these alternative measures of fit are defined and applied to a data set of continuous meteorological variables, which are coded crisply and fuzzily into three categories. Measuring the fit is further discussed when the data set consists of a mixture of discrete and continuous variables.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This article reviews the methodology of the studies on drug utilization with particular emphasis on primary care. Population based studies of drug inappropriateness can be done with microdata from Health Electronic Records and e-prescriptions. Multilevel models estimate the influence of factors affecting the appropriateness of drug prescription at different hierarchical levels: patient, doctor, health care organization and regulatory environment. Work by the GIUMAP suggest that patient characteristics are the most important factor in the appropriateness of prescriptions with significant effects at the general practicioner level.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The aim of this paper is to analyse the impact of university knowledge and technology transfer activities on academic research output. Specifically, we study whether researchers with collaborative links with the private sector publish less than their peers without such links, once controlling for other sources of heterogeneity. We report findings from a longitudinal dataset on researchers from two engineering departments in the UK between 1985 until 2006. Our results indicate that researchers with industrial links publish significantly more than their peers. Academic productivity, though, is higher for low levels of industry involvement as compared to high levels.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

One of the disadvantages of old age is that there is more past than future: this,however, may be turned into an advantage if the wealth of experience and, hopefully,wisdom gained in the past can be reflected upon and throw some light on possiblefuture trends. To an extent, then, this talk is necessarily personal, certainly nostalgic,but also self critical and inquisitive about our understanding of the discipline ofstatistics. A number of almost philosophical themes will run through the talk: searchfor appropriate modelling in relation to the real problem envisaged, emphasis onsensible balances between simplicity and complexity, the relative roles of theory andpractice, the nature of communication of inferential ideas to the statistical layman, theinter-related roles of teaching, consultation and research. A list of keywords might be:identification of sample space and its mathematical structure, choices betweentransform and stay, the role of parametric modelling, the role of a sample spacemetric, the underused hypothesis lattice, the nature of compositional change,particularly in relation to the modelling of processes. While the main theme will berelevance to compositional data analysis we shall point to substantial implications forgeneral multivariate analysis arising from experience of the development ofcompositional data analysis…

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Modern methods of compositional data analysis are not well known in biomedical research.Moreover, there appear to be few mathematical and statistical researchersworking on compositional biomedical problems. Like the earth and environmental sciences,biomedicine has many problems in which the relevant scienti c information isencoded in the relative abundance of key species or categories. I introduce three problemsin cancer research in which analysis of compositions plays an important role. Theproblems involve 1) the classi cation of serum proteomic pro les for early detection oflung cancer, 2) inference of the relative amounts of di erent tissue types in a diagnostictumor biopsy, and 3) the subcellular localization of the BRCA1 protein, and it'srole in breast cancer patient prognosis. For each of these problems I outline a partialsolution. However, none of these problems is \solved". I attempt to identify areas inwhich additional statistical development is needed with the hope of encouraging morecompositional data analysts to become involved in biomedical research

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The aim of this talk is to convince the reader that there are a lot of interesting statisticalproblems in presentday life science data analysis which seem ultimately connected withcompositional statistics.Key words: SAGE, cDNA microarrays, (1D-)NMR, virus quasispecies

Relevância:

20.00% 20.00%

Publicador:

Resumo:

It is common in econometric applications that several hypothesis tests arecarried out at the same time. The problem then becomes how to decide whichhypotheses to reject, accounting for the multitude of tests. In this paper,we suggest a stepwise multiple testing procedure which asymptoticallycontrols the familywise error rate at a desired level. Compared to relatedsingle-step methods, our procedure is more powerful in the sense that itoften will reject more false hypotheses. In addition, we advocate the useof studentization when it is feasible. Unlike some stepwise methods, ourmethod implicitly captures the joint dependence structure of the teststatistics, which results in increased ability to detect alternativehypotheses. We prove our method asymptotically controls the familywise errorrate under minimal assumptions. We present our methodology in the context ofcomparing several strategies to a common benchmark and deciding whichstrategies actually beat the benchmark. However, our ideas can easily beextended and/or modied to other contexts, such as making inference for theindividual regression coecients in a multiple regression framework. Somesimulation studies show the improvements of our methods over previous proposals. We also provide an application to a set of real data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We consider the application of normal theory methods to the estimation and testing of a general type of multivariate regressionmodels with errors--in--variables, in the case where various data setsare merged into a single analysis and the observable variables deviatepossibly from normality. The various samples to be merged can differ on the set of observable variables available. We show that there is a convenient way to parameterize the model so that, despite the possiblenon--normality of the data, normal--theory methods yield correct inferencesfor the parameters of interest and for the goodness--of--fit test. Thetheory described encompasses both the functional and structural modelcases, and can be implemented using standard software for structuralequations models, such as LISREL, EQS, LISCOMP, among others. An illustration with Monte Carlo data is presented.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we study the disability transition probabilities (as well as the mortalityprobabilities) due to concurrent factors to age such as income, gender and education. Althoughit is well known that ageing and socioeconomic status influence the probability ofcausing functional disorders, surprisingly little attention has been paid to the combined effectof those factors along the individuals' life and how this affects the transition from one degreeof disability to another. The assumption that tomorrow's disability state is only a functionof the today's state is very strong, since disability is a complex variable that depends onseveral other elements than time. This paper contributes into the field in two ways: (1) byattending the distinction between the initial disability level and the process that leads tohis course (2) by addressing whether and how education, age and income differentially affectthe disability transitions. Using a Markov chain discrete model and a survival analysis, weestimate the probability by year and individual characteristics that changes the state of disabilityand the duration that it takes its progression in each case. We find that people withan initial state of disability have a higher propensity to change and take less time to transitfrom different stages. Men do that more frequently than women. Education and incomehave negative effects on transition. Moreover, we consider the disability benefits associatedto those changes along different stages of disability and therefore we offer some clues onthe potential savings of preventive actions that may delay or avoid those transitions. Onpure cost considerations, preventive programs for improvement show higher benefits thanthose for preventing deterioration, and in general terms, those focussing individuals below65 should go first. Finally the trend of disability in Spain seems not to change among yearsand regional differences are not found.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The growth of pharmaceutical expenditure and its prediction is a major concern for policy makers and health care managers. This paper explores different predictive models to estimate future drug expenses, using demographic and morbidity individual information from an integrated healthcare delivery organization in Catalonia for years 2002 and 2003. The morbidity information consists of codified health encounters grouped through the Clinical Risk Groups (CRGs). We estimate pharmaceutical costs using several model specifications, and CRGs as risk adjusters, providing an alternative way of obtaining high predictive power comparable to other estimations of drug expenditures in the literature. These results have clear implications for the use of risk adjustment and CRGs in setting the premiums for pharmaceutical benefits.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A method to estimate DSGE models using the raw data is proposed. The approachlinks the observables to the model counterparts via a flexible specification which doesnot require the model-based component to be solely located at business cycle frequencies,allows the non model-based component to take various time series patterns, andpermits model misspecification. Applying standard data transformations induce biasesin structural estimates and distortions in the policy conclusions. The proposed approachrecovers important model-based features in selected experimental designs. Twowidely discussed issues are used to illustrate its practical use.