924 resultados para Multivariate statistics
Resumo:
Many active pharmaceutical ingredients (APIs) have both anhydrate and hydrate forms. Due to the different physicochemical properties of solid forms, the changes in solid-state may result in therapeutic, pharmaceutical, legal and commercial problems. In order to obtain good solid dosage form quality and performance, there is a constant need to understand and control these phase transitions during manufacturing and storage. Thus it is important to detect and also quantify the possible transitions between the different forms. In recent years, vibrational spectroscopy has become an increasingly popular tool to characterise the solid-state forms and their phase transitions. It offers several advantages over other characterisation techniques including an ability to obtain molecular level information, minimal sample preparation, and the possibility of monitoring changes non-destructively in-line. Dehydration is the phase transition of hydrates which is frequently encountered during the dosage form production and storage. The aim of the present thesis was to investigate the dehydration behaviour of diverse pharmaceutical hydrates by near infrared (NIR), Raman and terahertz pulsed spectroscopic (TPS) monitoring together with multivariate data analysis. The goal was to reveal new perspectives for investigation of the dehydration at the molecular level. Solid-state transformations were monitored during dehydration of diverse hydrates on hot-stage. The results obtained from qualitative experiments were used to develop a method and perform the quantification of the solid-state forms during process induced dehydration in a fluidised bed dryer. Both in situ and in-line process monitoring and quantification was performed. This thesis demonstrated the utility of vibrational spectroscopy techniques and multivariate modelling to monitor and investigate dehydration behaviour in situ and during fluidised bed drying. All three spectroscopic methods proved complementary in the study of dehydration. NIR spectroscopy models could quantify the solid-state forms in the binary system, but were unable to quantify all the forms in the quaternary system. Raman spectroscopy models on the other hand could quantify all four solid-state forms that appeared upon isothermal dehydration. The speed of spectroscopic methods makes them applicable for monitoring dehydration and the quantification of multiple forms was performed during phase transition. Thus the solid-state structure information at the molecular level was directly obtained. TPS detected the intermolecular phonon modes and Raman spectroscopy detected mostly the changes in intramolecular vibrations. Both techniques revealed information about the crystal structure changes. NIR spectroscopy, on the other hand was more sensitive to water content and hydrogen bonding environment of water molecules. This study provides a basis for real time process monitoring using vibrational spectroscopy during pharmaceutical manufacturing.
Resumo:
In this dissertation, I present an overall methodological framework for studying linguistic alternations, focusing specifically on lexical variation in denoting a single meaning, that is, synonymy. As the practical example, I employ the synonymous set of the four most common Finnish verbs denoting THINK, namely ajatella, miettiä, pohtia and harkita ‘think, reflect, ponder, consider’. As a continuation to previous work, I describe in considerable detail the extension of statistical methods from dichotomous linguistic settings (e.g., Gries 2003; Bresnan et al. 2007) to polytomous ones, that is, concerning more than two possible alternative outcomes. The applied statistical methods are arranged into a succession of stages with increasing complexity, proceeding from univariate via bivariate to multivariate techniques in the end. As the central multivariate method, I argue for the use of polytomous logistic regression and demonstrate its practical implementation to the studied phenomenon, thus extending the work by Bresnan et al. (2007), who applied simple (binary) logistic regression to a dichotomous structural alternation in English. The results of the various statistical analyses confirm that a wide range of contextual features across different categories are indeed associated with the use and selection of the selected think lexemes; however, a substantial part of these features are not exemplified in current Finnish lexicographical descriptions. The multivariate analysis results indicate that the semantic classifications of syntactic argument types are on the average the most distinctive feature category, followed by overall semantic characterizations of the verb chains, and then syntactic argument types alone, with morphological features pertaining to the verb chain and extra-linguistic features relegated to the last position. In terms of overall performance of the multivariate analysis and modeling, the prediction accuracy seems to reach a ceiling at a Recall rate of roughly two-thirds of the sentences in the research corpus. The analysis of these results suggests a limit to what can be explained and determined within the immediate sentential context and applying the conventional descriptive and analytical apparatus based on currently available linguistic theories and models. The results also support Bresnan’s (2007) and others’ (e.g., Bod et al. 2003) probabilistic view of the relationship between linguistic usage and the underlying linguistic system, in which only a minority of linguistic choices are categorical, given the known context – represented as a feature cluster – that can be analytically grasped and identified. Instead, most contexts exhibit degrees of variation as to their outcomes, resulting in proportionate choices over longer stretches of usage in texts or speech.
Resumo:
Introduction: Decompressive hemicraniectomy, clot evacuation, and aneurysmal interventions are considered aggressive surgical therapeutic options for treatment of massive cerebral artery infarction (MCA), intracerebral hemorrhage (ICH), and severe subarachnoid hemorrhage (SAH) respectively. Although these procedures are saving lives, little is actually known about the impact on outcomes other than short-term survival and functional status. The purpose of this study was to gain a better understanding of personal and social consequences of surviving these aggressive surgical interventions in order to aid acute care clinicians in helping family members make difficult decisions about undertaking such interventions. Methods: An exploratory mixed method study using a convergent parallel design was conducted to examine functional recovery (NIHSS, mRS & BI), cognitive status (Montreal Cognitive Assessment Scale, MoCA), quality of life (Euroqol 5-D), and caregiver outcomes (Bakas Caregiver Outcome Scale, BCOS) in a cohort of patients and families who had undergone aggressive surgical intervention for severe stroke between the years 2000–2007. Data were analyzed using descriptive statistics, univariate and multivariate analysis of variance, and multivariate logistic regression. Content analysis was used to analyze the qualitative interviews conducted with stroke survivors and family members. Results: Twenty-seven patients and 13 spouses participated in this study. Based on patient MOCA scores, overall cognitive status was 25.18 (range 23.4-26.9); current functional outcomes scores: NIHSS 2.22, mRS 1.74, and BI 88.5. EQ-5D scores revealed no significant differences between patients and caregivers (p=0.585) and caregiver outcomes revealed no significant differences between male/female caregivers or patient diagnostic group (MCA, SAH, ICH; p=""0.103).<"/span><"/span> Discussion: Overall, patients and families were satisfied with quality of life and decisions made at the time of the initial stroke. There was consensus among study participants that formal community-based support (e.g., handibus, caregiving relief, rehabilitation assessments) should be continued for extended periods (e.g., years) post-stroke. Ongoing contact with health care professionals is valuable to help them navigate in the community as needs change over time.
Resumo:
Many statistical forecast systems are available to interested users. In order to be useful for decision-making, these systems must be based on evidence of underlying mechanisms. Once causal connections between the mechanism and their statistical manifestation have been firmly established, the forecasts must also provide some quantitative evidence of `quality’. However, the quality of statistical climate forecast systems (forecast quality) is an ill-defined and frequently misunderstood property. Often, providers and users of such forecast systems are unclear about what ‘quality’ entails and how to measure it, leading to confusion and misinformation. Here we present a generic framework to quantify aspects of forecast quality using an inferential approach to calculate nominal significance levels (p-values) that can be obtained either by directly applying non-parametric statistical tests such as Kruskal-Wallis (KW) or Kolmogorov-Smirnov (KS) or by using Monte-Carlo methods (in the case of forecast skill scores). Once converted to p-values, these forecast quality measures provide a means to objectively evaluate and compare temporal and spatial patterns of forecast quality across datasets and forecast systems. Our analysis demonstrates the importance of providing p-values rather than adopting some arbitrarily chosen significance levels such as p < 0.05 or p < 0.01, which is still common practice. This is illustrated by applying non-parametric tests (such as KW and KS) and skill scoring methods (LEPS and RPSS) to the 5-phase Southern Oscillation Index classification system using historical rainfall data from Australia, The Republic of South Africa and India. The selection of quality measures is solely based on their common use and does not constitute endorsement. We found that non-parametric statistical tests can be adequate proxies for skill measures such as LEPS or RPSS. The framework can be implemented anywhere, regardless of dataset, forecast system or quality measure. Eventually such inferential evidence should be complimented by descriptive statistical methods in order to fully assist in operational risk management.
Resumo:
Climate variability and change are risk factors for climate sensitive activities such as agriculture. Managing these risks requires "climate knowledge", i.e. a sound understanding of causes and consequences of climate variability and knowledge of potential management options that are suitable in light of the climatic risks posed. Often such information about prognostic variables (e.g. yield, rainfall, run-off) is provided in probabilistic terms (e.g. via cumulative distribution functions, CDF), whereby the quantitative assessments of these alternative management options is based on such CDFs. Sound statistical approaches are needed in order to assess whether difference between such CDFs are intrinsic features of systems dynamics or chance events (i.e. quantifying evidences against an appropriate null hypothesis). Statistical procedures that rely on such a hypothesis testing framework are referred to as "inferential statistics" in contrast to descriptive statistics (e.g. mean, median, variance of population samples, skill scores). Here we report on the extension of some of the existing inferential techniques that provides more relevant and adequate information for decision making under uncertainty.
Resumo:
The National Health Interview Survey - Disability supplement (NHIS-D) provides information that can be used to understand myriad topics related to health and disability. The survey provides comprehensive information on multiple disability conceptualizations that can be identified using information about health conditions (both physical and mental), activity limitations, and service receipt (e.g. SSI, SSDI, Vocational Rehabilitation). This provides flexibility for researchers in defining populations of interest. This paper provides a description of the data available in the NHIS-D and information on how the data can be used to better understand the lives of people with disabilities.
Resumo:
Cereal grain is one of the main export commodities of Australian agriculture. Over the past decade, crop yield forecasts for wheat and sorghum have shown appreciable utility for industry planning at shire, state, and national scales. There is now an increasing drive from industry for more accurate and cost-effective crop production forecasts. In order to generate production estimates, accurate crop area estimates are needed by the end of the cropping season. Multivariate methods for analysing remotely sensed Enhanced Vegetation Index (EVI) from 16-day Moderate Resolution Imaging Spectroradiometer (MODIS) satellite imagery within the cropping period (i.e. April-November) were investigated to estimate crop area for wheat, barley, chickpea, and total winter cropped area for a case study region in NE Australia. Each pixel classification method was trained on ground truth data collected from the study region. Three approaches to pixel classification were examined: (i) cluster analysis of trajectories of EVI values from consecutive multi-date imagery during the crop growth period; (ii) harmonic analysis of the time series (HANTS) of the EVI values; and (iii) principal component analysis (PCA) of the time series of EVI values. Images classified using these three approaches were compared with each other, and with a classification based on the single MODIS image taken at peak EVI. Imagery for the 2003 and 2004 seasons was used to assess the ability of the methods to determine wheat, barley, chickpea, and total cropped area estimates. The accuracy at pixel scale was determined by the percent correct classification metric by contrasting all pixel scale samples with independent pixel observations. At a shire level, aggregated total crop area estimates were compared with surveyed estimates. All multi-temporal methods showed significant overall capability to estimate total winter crop area. There was high accuracy at pixel scale (>98% correct classification) for identifying overall winter cropping. However, discrimination among crops was less accurate. Although the use of single-date EVI data produced high accuracy for estimates of wheat area at shire scale, the result contradicted the poor pixel-scale accuracy associated with this approach, due to fortuitous compensating errors. Further studies are needed to extrapolate the multi-temporal approaches to other geographical areas and to improve the lead time for deriving cropped-area estimates before harvest.
Resumo:
The use of near infrared (NIR) hyperspectral imaging and hyperspectral image analysis for distinguishing between hard, intermediate and soft maize kernels from inbred lines was evaluated. NIR hyperspectral images of two sets (12 and 24 kernels) of whole maize kernels were acquired using a Spectral Dimensions MatrixNIR camera with a spectral range of 960-1662 nm and a sisuChema SWIR (short wave infrared) hyperspectral pushbroom imaging system with a spectral range of 1000-2498 nm. Exploratory principal component analysis (PCA) was used on absorbance images to remove background, bad pixels and shading. On the cleaned images. PCA could be used effectively to find histological classes including glassy (hard) and floury (soft) endosperm. PCA illustrated a distinct difference between glassy and floury endosperm along principal component (PC) three on the MatrixNIR and PC two on the sisuChema with two distinguishable clusters. Subsequently partial least squares discriminant analysis (PLS-DA) was applied to build a classification model. The PLS-DA model from the MatrixNIR image (12 kernels) resulted in root mean square error of prediction (RMSEP) value of 0.18. This was repeated on the MatrixNIR image of the 24 kernels which resulted in RMSEP of 0.18. The sisuChema image yielded RMSEP value of 0.29. The reproducible results obtained with the different data sets indicate that the method proposed in this paper has a real potential for future classification uses.
Resumo:
Management of the commercial harvest of kangaroos relies on quotas set annually as a proportion of regular estimates of population size. Surveys to generate these estimates are expensive and, in the larger states, logistically difficult; a cheaper alternative is desirable. Rainfall is a disappointingly poor predictor of kangaroo rate of increase in many areas, but harvest statistics (sex ratio, carcass weight, skin size and animals shot per unit time) potentially offer cost-effective indirect monitoring of population abundance (and therefore trend) and status (i.e. under-or overharvest). Furthermore, because harvest data are collected continuously and throughout the harvested areas, they offer the promise of more intensive and more representative coverage of harvest areas than aerial surveys do. To be useful, harvest statistics would need to have a close and known relationship with either population size or harvest rate. We assessed this using longterm (11-22 years) data for three kangaroo species (Macropus rufus, M. giganteus and M. fuliginosus) and common wallaroos (M. robustus) across South Australia, New South Wales and Queensland. Regional variation in kangaroo body size, population composition, shooter efficiency and selectivity required separate analyses in different regions. Two approaches were taken. First, monthly harvest statistics were modelled as a function of a number of explanatory variables, including kangaroo density, harvest rate and rainfall. Second, density and harvest rate were modelled as a function of harvest statistics. Both approaches incorporated a correlated error structure. Many but not all regions had relationships with sufficient precision to be useful for indirect monitoring. However, there was no single relationship that could be applied across an entire state or across species. Combined with rainfall-driven population models and applied at a regional level, these relationships could be used to reduce the frequency of aerial surveys without compromising decisions about harvest management.
Resumo:
We have derived a versatile gene-based test for genome-wide association studies (GWAS). Our approach, called VEGAS (versatile gene-based association study), is applicable to all GWAS designs, including family-based GWAS, meta-analyses of GWAS on the basis of summary data, and DNA-pooling-based GWAS, where existing approaches based on permutation are not possible, as well as singleton data, where they are. The test incorporates information from a full set of markers (or a defined subset) within a gene and accounts for linkage disequilibrium between markers by using simulations from the multivariate normal distribution. We show that for an association study using singletons, our approach produces results equivalent to those obtained via permutation in a fraction of the computation time. We demonstrate proof-of-principle by using the gene-based test to replicate several genes known to be associated on the basis of results from a family-based GWAS for height in 11,536 individuals and a DNA-pooling-based GWAS for melanoma in approximately 1300 cases and controls. Our method has the potential to identify novel associated genes; provide a basis for selecting SNPs for replication; and be directly used in network (pathway) approaches that require per-gene association test statistics. We have implemented the approach in both an easy-to-use web interface, which only requires the uploading of markers with their association p-values, and a separate downloadable application.
Resumo:
Context: Identifying susceptibility genes for schizophrenia may be complicated by phenotypic heterogeneity, with some evidence suggesting that phenotypic heterogeneity reflects genetic heterogeneity. Objective: To evaluate the heritability and conduct genetic linkage analyses of empirically derived, clinically homogeneous schizophrenia subtypes. Design: Latent class and linkage analysis. Setting: Taiwanese field research centers. Participants: The latent class analysis included 1236 Han Chinese individuals with DSM-IV schizophrenia. These individuals were members of a large affected-sibling-pair sample of schizophrenia (606 ascertained families), original linkage analyses of which detected a maximum logarithm of odds (LOD) of 1.8 (z = 2.88) on chromosome 10q22.3. Main Outcome Measures: Multipoint exponential LOD scores by latent class assignment and parametric heterogeneity LOD scores. Results: Latent class analyses identified 4 classes, with 2 demonstrating familial aggregation. The first (LC2) described a group with severe negative symptoms, disorganization, and pronounced functional impairment, resembling “deficit schizophrenia.” The second (LC3) described a group with minimal functional impairment, mild or absent negative symptoms, and low disorganization. Using the negative/deficit subtype, we detected genome-wide significant linkage to 1q23-25 (LOD = 3.78, empiric genome-wide P = .01). This region was not detected using the DSM-IV schizophrenia diagnosis, but has been strongly implicated in schizophrenia pathogenesis by previous linkage and association studies.Variants in the 1q region may specifically increase risk for a negative/deficit schizophrenia subtype. Alternatively, these results may reflect increased familiality/heritability of the negative class, the presence of multiple 1q schizophrenia risk genes, or a pleiotropic 1q risk locus or loci, with stronger genotype-phenotype correlation with negative/deficit symptoms. Using the second familial latent class, we identified nominally significant linkage to the original 10q peak region. Conclusion: Genetic analyses of heritable, homogeneous phenotypes may improve the power of linkage and association studies of schizophrenia and thus have relevance to the design and analysis of genome-wide association studies.
Resumo:
Quality of fresh-cut carambola (Averrhoa carambola L) is related to many chemical and biochemical variables especially those involved with softening and browning, both influenced by storage temperature. To study these effects, a multivariate analysis was used to evaluate slices packaged in vacuum-sealed polyolefin bags, and stored at 2.5 degrees C, 5 degrees C and 10 degrees C, for up to 16 d. The quality of slices at each temperature was correlated with the duration of storage, O(2) and CO(2) concentration in the package, physical chemical constituents, and activity of enzymes involved in softening (PG) and browning (PPO) metabolism. Three quality groups were identified by hierarchical cluster analysis, and the classification of the components within each of these groups was obtained from a principal component analysis (PCA). The characterization of samples by PCA clearly distinguished acceptable and non-acceptable slices. According to PCA, acceptable slices presented higher ascorbic acid content, greater hue angles ((o)h) and final lightness (L-5) in the first principal component (PC1). On the other hand, non-acceptable slices presented higher total pectin content. PPO activity in the PC1. Non-acceptable slices also presented higher soluble pectin content, increased pectin solubilisation and higher CO(2) concentration in the second principal component (PC2) whereas acceptable slices showed lower total sugar content. The hierarchical cluster and PCA analyses were useful for discriminating the quality of slices stored at different temperatures.
Resumo:
The simultaneous state and parameter estimation problem for a linear discrete-time system with unknown noise statistics is treated as a large-scale optimization problem. The a posterioriprobability density function is maximized directly with respect to the states and parameters subject to the constraint of the system dynamics. The resulting optimization problem is too large for any of the standard non-linear programming techniques and hence an hierarchical optimization approach is proposed. It turns out that the states can be computed at the first levelfor given noise and system parameters. These, in turn, are to be modified at the second level.The states are to be computed from a large system of linear equations and two solution methods are considered for solving these equations, limiting the horizon to a suitable length. The resulting algorithm is a filter-smoother, suitable for off-line as well as on-line state estimation for given noise and system parameters. The second level problem is split up into two, one for modifying the noise statistics and the other for modifying the system parameters. An adaptive relaxation technique is proposed for modifying the noise statistics and a modified Gauss-Newton technique is used to adjust the system parameters.
Resumo:
A very general and numerically quite robust algorithm has been proposed by Sastry and Gauvrit (1980) for system identification. The present paper takes it up and examines its performance on a real test example. The example considered is the lateral dynamics of an aircraft. This is used as a vehicle for demonstrating the performance of various aspects of the algorithm in several possible modes.