159 resultados para monotone missing data
Resumo:
We have undertaken two-dimensional gel electrophoresis proteomic profiling on a series of cell lines with different recombinant antibody production rates. Due to the nature of gel-based experiments not all protein spots are detected across all samples in an experiment, and hence datasets are invariably incomplete. New approaches are therefore required for the analysis of such graduated datasets. We approached this problem in two ways. Firstly, we applied a missing value imputation technique to calculate missing data points. Secondly, we combined a singular value decomposition based hierarchical clustering with the expression variability test to identify protein spots whose expression correlates with increased antibody production. The results have shown that while imputation of missing data was a useful method to improve the statistical analysis of such data sets, this was of limited use in differentiating between the samples investigated, and highlighted a small number of candidate proteins for further investigation. (c) 2006 Elsevier B.V. All rights reserved.
Resumo:
Objective: To devise more-effective physical activity interventions, the mediating mechanisms yielding behavioral change need to be identified. The Baron-Kenny method is most commonly used. but has low statistical power and May not identify mechanisms of behavioral change in small-to-medium size Studies. More powerful statistical tests are available, Study Design and Setting: Inactive adults (N = 52) were randomized to either a print or a print-plus-telephone intervention. Walking and exercise-related social support Were assessed at baseline, after file intervention, and 4 weeks later. The Baron-Kenny and three alternative methods of mediational analysis (Freedman-Schatzkin; MacKinnon et al.: bootstrap method) were used to examine the effects of social support on initial behavior change and maintenance. Results: A significant mediational effect of social support on initial behavior change was indicated by the MacKinnon et al., bootstrap. and. marginally. Freedman-Schatzkin methods, but not by the Baron-Kenny method. No significant mediational effecl of social support on maintenance of walking was found. Conclusions: Methodologically rigorous intervention studies to identify mediators of change in physical activity are costly and labor intensive, and may not be feasible with large samples. The Use of statistically powerful tests of mediational effects in small-scale studies can inform the development of more effective interventions. (C) 2006 Elsevier Inc. All rights reserved.
Resumo:
Background: Oral itraconazole (ITRA) is used for the treatment of allergic bronchopulmonary aspergillosis in patients with cystic fibrosis (CF) because of its antifungal activity against Aspergillus species. ITRA has an active hydroxy-metabolite (OH-ITRA) which has similar antifungal activity. ITRA is a highly lipophilic drug which is available in two different oral formulations, a capsule and an oral solution. It is reported that the oral solution has a 60% higher relative bioavailability. The influence of altered gastric physiology associated with CF on the pharmacokinetics (PK) of ITRA and its metabolite has not been previously evaluated. Objectives: 1) To estimate the population (pop) PK parameters for ITRA and its active metabolite OH-ITRA including relative bioavailability of the parent after administration of the parent by both capsule and solution and 2) to assess the performance of the optimal design. Methods: The study was a cross-over design in which 30 patients received the capsule on the first occasion and 3 days later the solution formulation. The design was constrained to have a maximum of 4 blood samples per occasion for estimation of the popPK of both ITRA and OH-ITRA. The sampling times for the population model were optimized previously using POPT v.2.0.[1] POPT is a series of applications that run under MATLAB and provide an evaluation of the information matrix for a nonlinear mixed effects model given a particular design. In addition it can be used to optimize the design based on evaluation of the determinant of the information matrix. The model details for the design were based on prior information obtained from the literature, which suggested that ITRA may have either linear or non-linear elimination. The optimal sampling times were evaluated to provide information for both competing models for the parent and metabolite and for both capsule and solution simultaneously. Blood samples were assayed by validated HPLC.[2] PopPK modelling was performed using FOCE with interaction under NONMEM, version 5 (level 1.1; GloboMax LLC, Hanover, MD, USA). The PK of ITRA and OH‑ITRA was modelled simultaneously using ADVAN 5. Subsequently three methods were assessed for modelling concentrations less than the LOD (limit of detection). These methods (corresponding to methods 5, 6 & 4 from Beal[3], respectively) were (a) where all values less than LOD were assigned to half of LOD, (b) where the closest missing value that is less than LOD was assigned to half the LOD and all previous (if during absorption) or subsequent (if during elimination) missing samples were deleted, and (c) where the contribution of the expectation of each missing concentration to the likelihood is estimated. The LOD was 0.04 mg/L. The final model evaluation was performed via bootstrap with re-sampling and a visual predictive check. The optimal design and the sampling windows of the study were evaluated for execution errors and for agreement between the observed and predicted standard errors. Dosing regimens were simulated for the capsules and the oral solution to assess their ability to achieve ITRA target trough concentration (Cmin,ss of 0.5-2 mg/L) or a combined Cmin,ss for ITRA and OH-ITRA above 1.5mg/L. Results and Discussion: A total of 241 blood samples were collected and analysed, 94% of them were taken within the defined optimal sampling windows, of which 31% where taken within 5 min of the exact optimal times. Forty six per cent of the ITRA values and 28% of the OH-ITRA values were below LOD. The entire profile after administration of the capsule for five patients was below LOD and therefore the data from this occasion was omitted from estimation. A 2-compartment model with 1st order absorption and elimination best described ITRA PK, with 1st order metabolism of the parent to OH-ITRA. For ITRA the clearance (ClItra/F) was 31.5 L/h; apparent volumes of central and peripheral compartments were 56.7 L and 2090 L, respectively. Absorption rate constants for capsule (kacap) and solution (kasol) were 0.0315 h-1 and 0.125 h-1, respectively. Comparative bioavailability of the capsule was 0.82. There was no evidence of nonlinearity in the popPK of ITRA. No screened covariate significantly improved the fit to the data. The results of the parameter estimates from the final model were comparable between the different methods for accounting for missing data, (M4,5,6)[3] and provided similar parameter estimates. The prospective application of an optimal design was found to be successful. Due to the sampling windows, most of the samples could be collected within the daily hospital routine, but still at times that were near optimal for estimating the popPK parameters. The final model was one of the potential competing models considered in the original design. The asymptotic standard errors provided by NONMEM for the final model and empirical values from bootstrap were similar in magnitude to those predicted from the Fisher Information matrix associated with the D-optimal design. Simulations from the final model showed that the current dosing regimen of 200 mg twice daily (bd) would provide a target Cmin,ss (0.5-2 mg/L) for only 35% of patients when administered as the solution and 31% when administered as capsules. The optimal dosing schedule was 500mg bd for both formulations. The target success for this dosing regimen was 87% for the solution with an NNT=4 compared to capsules. This means, for every 4 patients treated with the solution one additional patient will achieve a target success compared to capsule but at an additional cost of AUD $220 per day. The therapeutic target however is still doubtful and potential risks of these dosing schedules need to be assessed on an individual basis. Conclusion: A model was developed which described the popPK of ITRA and its main active metabolite OH-ITRA in adult CF after administration of both capsule and solution. The relative bioavailability of ITRA from the capsule was 82% that of the solution, but considerably more variable. To incorporate missing data, using the simple Beal method 5 (using half LOD for all samples below LOD) provided comparable results to the more complex but theoretically better Beal method 4 (integration method). The optimal sparse design performed well for estimation of model parameters and provided a good fit to the data.
Resumo:
When the data consist of certain attributes measured on the same set of items in different situations, they would be described as a three-mode three-way array. A mixture likelihood approach can be implemented to cluster the items (i.e., one of the modes) on the basis of both of the other modes simultaneously (i.e,, the attributes measured in different situations). In this paper, it is shown that this approach can be extended to handle three-mode three-way arrays where some of the data values are missing at random in the sense of Little and Rubin (1987). The methodology is illustrated by clustering the genotypes in a three-way soybean data set where various attributes were measured on genotypes grown in several environments.
Resumo:
Columnar cell lesions (CCLs) of the breast are a spectrum of lesions that have posed difficulties to pathologists for many years, prompting discussion concerning their biologic and clinical significance. We present a study of CCL in context with hyperplasia of usual type (HUT) and the more advanced lesions ductal carcinoma in situ (DCIS) and invasive ductal carcinoma. A total of 81 lesions from 18 patients were subjected to a comprehensive morphologic review based upon a modified version of Schnitt's classification system for CCL, immunophenotypic analysis (estrogen receptor [ER], progesterone receptor [PgR], Her2/neu, cytokeratin 5/6 [CK5/6], cytokeratin 14 [CK14], E-cadherin, p53) and for the first time, a whole genome molecular analysis by comparative genomic hybridization. Multiple CCLs from 3 patients were studied in particular detail, with topographic information and/or showing a morphologic spectrum of CCL within individual terminal duct lobular units. CCLs were ER an PgR positive, CK5/6 and CK14 negative, exhibit low numbers of genetic alterations and recurrent 16q loss, features that are similar to those of low grade in situ and invasive carcinoma. The molecular genetic profiles closely reflect the degree of proliferation and atypia in CCL, indicating some of these lesions represent both a morphologic and molecular continuum. In addition, overlapping chromosomal alterations between CCL and more advanced lesions within individual terminal duct lobular units suggest a commonality in molecular evolution. These data further support the hypothesis that CCLs are a nonobligate, intermediary step in the development of some forms of low grade in situ and invasive carcinoma. Copyright: © 2005 Lippincott Williams & Wilkins, Inc.
Resumo:
This document records the process of migrating eprints.org data to a Fez repository. Fez is a Web-based digital repository and workflow management system based on Fedora (http://www.fedora.info/). At the time of migration, the University of Queensland Library was using EPrints 2.2.1 [pepper] for its ePrintsUQ repository. Once we began to develop Fez, we did not upgrade to later versions of eprints.org software since we knew we would be migrating data from ePrintsUQ to the Fez-based UQ eSpace. Since this document records our experiences of migration from an earlier version of eprints.org, anyone seeking to migrate eprints.org data into a Fez repository might encounter some small differences. Moving UQ publication data from an eprints.org repository into a Fez repository (hereafter called UQ eSpace (http://espace.uq.edu.au/) was part of a plan to integrate metadata (and, in some cases, full texts) about all UQ research outputs, including theses, images, multimedia and datasets, in a single repository. This tied in with the plan to identify and capture the research output of a single institution, the main task of the eScholarshipUQ testbed for the Australian Partnership for Sustainable Repositories project (http://www.apsr.edu.au/). The migration could not occur at UQ until the functionality in Fez was at least equal to that of the existing ePrintsUQ repository. Accordingly, as Fez development occurred throughout 2006, a list of eprints.org functionality not currently supported in Fez was created so that programming of such development could be planned for and implemented.
Resumo:
There is substantial disagreement among published epidemiological studies regarding environmental risk factors for Parkinson’s disease (PD). Differences in the quality of measurement of environmental exposures may contribute to this variation. The current study examined the test–retest repeatability of self-report data on risk factors for PD obtained from a series of 32 PD cases recruited from neurology clinics and 29 healthy sex-, age-and residential suburb-matched controls. Exposure data were collected in face-to-face interviews using a structured questionnaire derived from previous epidemiological studies. High repeatability was demonstrated for ‘lifestyle’ exposures, such as smoking and coffee/tea consumption (kappas 0.70–1.00). Environmental exposures that involved some action by the person, such as pesticide application and use of solvents and metals, also showed high repeatability (kappas>0.78). Lower repeatability was seen for rural residency and bore water consumption (kappa 0.39–0.74). In general, we found that case and control participants provided similar rates of incongruent and missing responses for categorical and continuous occupational, domestic, lifestyle and medical exposures.
Resumo:
The final-year project for Mechanical & Space Engineering students at UQ often involves the design and flight testing of an experiment. This report describes the design and use of a simple data logger that should be suitable for collecting data from the students' flight experiments. The exercise here was taken as far as the construction of a prototype device that is suitable for ground-based testing, say, the static firing of a hybrid rocket motor.
Resumo:
A combination of deductive reasoning, clustering, and inductive learning is given as an example of a hybrid system for exploratory data analysis. Visualization is replaced by a dialogue with the data.
Resumo:
This paper reports a comparative study of Australian and New Zealand leadership attributes, based on the GLOBE (Global Leadership and Organizational Behavior Effectiveness) program. Responses from 344 Australian managers and 184 New Zealand managers in three industries were analyzed using exploratory and confirmatory factor analysis. Results supported some of the etic leadership dimensions identified in the GLOBE study, but also found some emic dimensions of leadership for each country. An interesting finding of the study was that the New Zealand data fitted the Australian model, but not vice versa, suggesting asymmetric perceptions of leadership in the two countries.
Resumo:
In the context of cancer diagnosis and treatment, we consider the problem of constructing an accurate prediction rule on the basis of a relatively small number of tumor tissue samples of known type containing the expression data on very many (possibly thousands) genes. Recently, results have been presented in the literature suggesting that it is possible to construct a prediction rule from only a few genes such that it has a negligible prediction error rate. However, in these results the test error or the leave-one-out cross-validated error is calculated without allowance for the selection bias. There is no allowance because the rule is either tested on tissue samples that were used in the first instance to select the genes being used in the rule or because the cross-validation of the rule is not external to the selection process; that is, gene selection is not performed in training the rule at each stage of the cross-validation process. We describe how in practice the selection bias can be assessed and corrected for by either performing a cross-validation or applying the bootstrap external to the selection process. We recommend using 10-fold rather than leave-one-out cross-validation, and concerning the bootstrap, we suggest using the so-called. 632+ bootstrap error estimate designed to handle overfitted prediction rules. Using two published data sets, we demonstrate that when correction is made for the selection bias, the cross-validated error is no longer zero for a subset of only a few genes.
Resumo:
This paper examines the article system in interlanguage grammar focusing on Japanese learners of English, whose native language lacks articles. It will be demonstrated that for the acquisition of the English article system, count/mass distinctions and definiteness are the crucial factors. Although Japanese does not employ the article system to encode these aspects, it will be argued that they are nevertheless syntactically encoded through its classifier system. Hence, the problem for these learners must be to map these features onto the appropriate surface forms as the Missing Surface Inflection Hypothesis predicts (Prévost & White 2000). This suggestion will further be supported empirically by a fill-in-the article task. It will be concluded that these Japanese learners understand the English article system fairly well, possibly due to their native language, yet have problems with realizing the relevant features (i.e. count/mass distinctions and definiteness) in the target language.
Resumo:
Data mining is the process to identify valid, implicit, previously unknown, potentially useful and understandable information from large databases. It is an important step in the process of knowledge discovery in databases, (Olaru & Wehenkel, 1999). In a data mining process, input data can be structured, seme-structured, or unstructured. Data can be in text, categorical or numerical values. One of the important characteristics of data mining is its ability to deal data with large volume, distributed, time variant, noisy, and high dimensionality. A large number of data mining algorithms have been developed for different applications. For example, association rules mining can be useful for market basket problems, clustering algorithms can be used to discover trends in unsupervised learning problems, classification algorithms can be applied in decision-making problems, and sequential and time series mining algorithms can be used in predicting events, fault detection, and other supervised learning problems (Vapnik, 1999). Classification is among the most important tasks in the data mining, particularly for data mining applications into engineering fields. Together with regression, classification is mainly for predictive modelling. So far, there have been a number of classification algorithms in practice. According to (Sebastiani, 2002), the main classification algorithms can be categorized as: decision tree and rule based approach such as C4.5 (Quinlan, 1996); probability methods such as Bayesian classifier (Lewis, 1998); on-line methods such as Winnow (Littlestone, 1988) and CVFDT (Hulten 2001), neural networks methods (Rumelhart, Hinton & Wiliams, 1986); example-based methods such as k-nearest neighbors (Duda & Hart, 1973), and SVM (Cortes & Vapnik, 1995). Other important techniques for classification tasks include Associative Classification (Liu et al, 1998) and Ensemble Classification (Tumer, 1996).