172 resultados para Multivariate data

em University of Queensland eSpace - Australia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Binning and truncation of data are common in data analysis and machine learning. This paper addresses the problem of fitting mixture densities to multivariate binned and truncated data. The EM approach proposed by McLachlan and Jones (Biometrics, 44: 2, 571-578, 1988) for the univariate case is generalized to multivariate measurements. The multivariate solution requires the evaluation of multidimensional integrals over each bin at each iteration of the EM procedure. Naive implementation of the procedure can lead to computationally inefficient results. To reduce the computational cost a number of straightforward numerical techniques are proposed. Results on simulated data indicate that the proposed methods can achieve significant computational gains with no loss in the accuracy of the final parameter estimates. Furthermore, experimental results suggest that with a sufficient number of bins and data points it is possible to estimate the true underlying density almost as well as if the data were not binned. The paper concludes with a brief description of an application of this approach to diagnosis of iron deficiency anemia, in the context of binned and truncated bivariate measurements of volume and hemoglobin concentration from an individual's red blood cells.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Biological wastewater treatment is a complex, multivariate process, in which a number of physical and biological processes occur simultaneously. In this study, principal component analysis (PCA) and parallel factor analysis (PARAFAC) were used to profile and characterise Lagoon 115E, a multistage biological lagoon treatment system at Melbourne Water's Western Treatment Plant (WTP) in Melbourne, Australia. In this study, the objective was to increase our understanding of the multivariate processes taking place in the lagoon. The data used in the study span a 7-year period during which samples were collected as often as weekly from the ponds of Lagoon 115E and subjected to analysis. The resulting database, involving 19 chemical and physical variables, was studied using the multivariate data analysis methods PCA and PARAFAC. With these methods, alterations in the state of the wastewater due to intrinsic and extrinsic factors could be discerned. The methods were effective in illustrating and visually representing the complex purification stages and cyclic changes occurring along the lagoon system. The two methods proved complementary, with each having its own beneficial features. (C) 2003 Elsevier B.V. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Normal mixture models are being increasingly used to model the distributions of a wide variety of random phenomena and to cluster sets of continuous multivariate data. However, for a set of data containing a group or groups of observations with longer than normal tails or atypical observations, the use of normal components may unduly affect the fit of the mixture model. In this paper, we consider a more robust approach by modelling the data by a mixture of t distributions. The use of the ECM algorithm to fit this t mixture model is described and examples of its use are given in the context of clustering multivariate data in the presence of atypical observations in the form of background noise.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper use consider the problem of providing standard errors of the component means in normal mixture models fitted to univariate or multivariate data by maximum likelihood via the EM algorithm. Two methods of estimation of the standard errors are considered: the standard information-based method and the computationally-intensive bootstrap method. They are compared empirically by their application to three real data sets and by a small-scale Monte Carlo experiment.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Univariate linkage analysis is used routinely to localise genes for human complex traits. Often, many traits are analysed but the significance of linkage for each trait is not corrected for multiple trait testing, which increases the experiment-wise type-I error rate. In addition, univariate analyses do not realise the full power provided by multivariate data sets. Multivariate linkage is the ideal solution but it is computationally intensive, so genome-wide analysis and evaluation of empirical significance are often prohibitive. We describe two simple methods that efficiently alleviate these caveats by combining P-values from multiple univariate linkage analyses. The first method estimates empirical pointwise and genome-wide significance between one trait and one marker when multiple traits have been tested. It is as robust as an appropriate Bonferroni adjustment, with the advantage that no assumptions are required about the number of independent tests performed. The second method estimates the significance of linkage between multiple traits and one marker and, therefore, it can be used to localise regions that harbour pleiotropic quantitative trait loci (QTL). We show that this method has greater power than individual univariate analyses to detect a pleiotropic QTL across different situations. In addition, when traits are moderately correlated and the QTL influences all traits, it can outperform formal multivariate VC analysis. This approach is computationally feasible for any number of traits and was not affected by the residual correlation between traits. We illustrate the utility of our approach with a genome scan of three asthma traits measured in families with a twin proband.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Finite mixture models are being increasingly used to model the distributions of a wide variety of random phenomena. While normal mixture models are often used to cluster data sets of continuous multivariate data, a more robust clustering can be obtained by considering the t mixture model-based approach. Mixtures of factor analyzers enable model-based density estimation to be undertaken for high-dimensional data where the number of observations n is very large relative to their dimension p. As the approach using the multivariate normal family of distributions is sensitive to outliers, it is more robust to adopt the multivariate t family for the component error and factor distributions. The computational aspects associated with robustness and high dimensionality in these approaches to cluster analysis are discussed and illustrated.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Wilbur Zelinsky formulated a Hypothesis of Mobility Transition in 1971,in which he tried to relate all aspects of mobility to the Demographic Transition and modernisation. This dissertation applies the theoretical framework, proposed by Zelinsky and extended to encompass a family of transitions, to understand migration patterns of city regions. The two city regions, Brisbane and Stockholm, are selected as case studies, representing important city regions of similar size, but drawn from contrasting historical settings. A comparison of the case studies with the theoretical framework aims to determine how the relative contributions of net migration, the source areas of migrants, and the migration intensity change with modernisation. In addition, the research also aims to identify aspects of modernisation affecting migration. These aspects of migration are analysed with a "historical approach" and a "multivariate approach". An extensive investigation into the city regions' historical background provides the source, from which evidence for a relationship between migration and modernisation is extracted. With this historical approach, similarities and differences in migration patterns are identified. The other research approach analyse multivariate data, from the last two decades, on migration flows and modernisation. Correlations between migration and key aspects of modernisation are tested with multivariate regression, based on an alternative version of a spatial interaction model. The project demonstrates that the changing functions of cities and the structural modernisation are influential on migration. Similar patterns are found, regarding the relative contributions of net migration and natural increase to population growth. The research finds links between these changes in the relative contribution of net migration and demographic modernisation. The findings on variations in urban and rural source areas of migrants to city regions do not contradict the expected pattern, but data limitations prevent definite conclusion to be drawn. The assessment of variations in migration intensity resulted in the expected pattern not being supported. Based on Swedish data, the hypothesised increase in migration intensity is rejected. Interactional migration data also show patterns different from those derived from the theoretical framework. The findings, from both research approaches, suggested that structural modernisation affected migration flows more than demographic modernisation. The findings lead to a formulation of hypothesised patterns for migration to city regions. The study provides an important research contribution by applying the two research approaches to city regions. It also combines the study of internal and international migration to address the research objectives within a framework of transitional change.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The CASMIN Project is arguably the most influential contemporary study of class mobility in the world. However, CASMIN results with respect to weak vertical status effects on class mobility have been extensively criticized. Drawing on arguments about how to model vertical mobility, Hout and Hauser (1992) show that class mobility is strongly determined by vertical socioeconomic differences. This paper extends these arguments by estimating the CASMIN model while explicitly controlling for individual determinants of socioeconomic attainment. Using the 1972 Oxford Mobility Data and the 1979 and 1983 British Election Studies, the paper employs mixed legit models to show how individual socioeconomic factors and categorical differences between classes shape intergenerational mobility. The findings highlight the multidimensionality of class mobility and its irreducibility to vertical movement up and down a stratification hierarchy.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Research in conditioning (all the processes of preparation for competition) has used group research designs, where multiple athletes are observed at one or more points in time. However, empirical reports of large inter-individual differences in response to conditioning regimens suggest that applied conditioning research would greatly benefit from single-subject research designs. Single-subject research designs allow us to find out the extent to which a specific conditioning regimen works for a specific athlete, as opposed to the average athlete, who is the focal point of group research designs. The aim of the following review is to outline the strategies and procedures of single-subject research as they pertain to.. the assessment of conditioning for individual athletes. The four main experimental designs in single-subject research are: the AB design, reversal (withdrawal) designs and their extensions, multiple baseline designs and alternating treatment designs. Visual and statistical analyses commonly used to analyse single-subject data, and advantages and limitations are discussed. Modelling of multivariate single-subject data using techniques such as dynamic factor analysis and structural equation modelling may identify individualised models of conditioning leading to better prediction of performance. Despite problems associated with data analyses in single-subject research (e.g. serial dependency), sports scientists should use single-subject research designs in applied conditioning research to understand how well an intervention (e.g. a training method) works and to predict performance for a particular athlete.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

With mixed feature data, problems are induced in modeling the gating network of normalized Gaussian (NG) networks as the assumption of multivariate Gaussian becomes invalid. In this paper, we propose an independence model to handle mixed feature data within the framework of NG networks. The method is illustrated using a real example of breast cancer data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study examined the genetic and environmental relationships among 5 academic achievement skills of a standardized test of academic achievement, the Queensland Core Skills Test (QCST; Queensland Studies Authority, 2003a). QCST participants included 182 monozygotic pairs and 208 dizygotic pairs (mean 17 years +/- 0.4 standard deviation). IQ data were included in the analysis to correct for ascertainment bias. A genetic general factor explained virtually all genetic variance in the component academic skills scores, and accounted for 32% to 73% of their phenotypic variances. It also explained 56% and 42% of variation in Verbal IQ and Performance IQ respectively, suggesting that this factor is genetic g. Modest specific genetic effects were evident for achievement in mathematical problem solving and written expression. A single common factor adequately explained common environmental effects, which were also modest, and possibly due to assortative mating. The results suggest that general academic ability, derived from genetic influences and to a lesser extent common environmental influences, is the primary source of variation in component skills of the QCST.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Australian Pregnancy Registry, affiliated European Register of Antiepileptic drugs in Pregnancy (EURAP), recruits informed consenting women with epilepsy on treatment with antiepileptic drugs (AEDs), those untreated, and women on AEDs for other indications. Enrolment is considered prospective if it has occurred before presence or absence of major foetal malformations (FMs) are known, or retrospective, if they had occurred after the birth of infant or detection of major FM. Telephone Interviews are conducted to ascertain pregnancy outcome and collect data about seizures. To date 630 women have been enrolled, with 565 known pregnancy outcomes. Valproate (VPA) above 1100 mg/day was associated with a significantly higher incidence of FMs than other AEDs (P < 0.05). This was independent of other AED use or potentially confounding factors on multivariate analysis (OR = 7.3, P < 0.0001). Lamotrigine (LTG) monotherapy (n = 65), has so far been free of malformations. Although seizure control was not a primary outcome, we noted that more patients on LTG than on VPA required dose adjustments to control seizures. Data indicate an increased risk of FM in women taking VPA in doses > 1100 mg/day compared with other AEDs. The choice of AED for pregnant women with epilepsy requires assessment of balance of risks between teratogenicity and seizure control.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The main purpose of this article is to gain an insight into the relationships between variables describing the environmental conditions of the Far Northern section of the Great Barrier Reef, Australia, Several of the variables describing these conditions had different measurement levels and often they had non-linear relationships. Using non-linear principal component analysis, it was possible to acquire an insight into these relationships. Furthermore. three geographical areas with unique environmental characteristics could be identified. Copyright (c) 2005 John Wiley & Sons, Ltd.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Traditional vegetation mapping methods use high cost, labour-intensive aerial photography interpretation. This approach can be subjective and is limited by factors such as the extent of remnant vegetation, and the differing scale and quality of aerial photography over time. An alternative approach is proposed which integrates a data model, a statistical model and an ecological model using sophisticated Geographic Information Systems (GIS) techniques and rule-based systems to support fine-scale vegetation community modelling. This approach is based on a more realistic representation of vegetation patterns with transitional gradients from one vegetation community to another. Arbitrary, though often unrealistic, sharp boundaries can be imposed on the model by the application of statistical methods. This GIS-integrated multivariate approach is applied to the problem of vegetation mapping in the complex vegetation communities of the Innisfail Lowlands in the Wet Tropics bioregion of Northeastern Australia. The paper presents the full cycle of this vegetation modelling approach including sampling sites, variable selection, model selection, model implementation, internal model assessment, model prediction assessments, models integration of discrete vegetation community models to generate a composite pre-clearing vegetation map, independent data set model validation and model prediction's scale assessments. An accurate pre-clearing vegetation map of the Innisfail Lowlands was generated (0.83r(2)) through GIS integration of 28 separate statistical models. This modelling approach has good potential for wider application, including provision of. vital information for conservation planning and management; a scientific basis for rehabilitation of disturbed and cleared areas; a viable method for the production of adequate vegetation maps for conservation and forestry planning of poorly-studied areas. (c) 2006 Elsevier B.V. All rights reserved.