970 resultados para Multivariate data


Relevância:

30.00% 30.00%

Publicador:

Resumo:

When applying multivariate analysis techniques in information systems and social science disciplines, such as management information systems (MIS) and marketing, the assumption that the empirical data originate from a single homogeneous population is often unrealistic. When applying a causal modeling approach, such as partial least squares (PLS) path modeling, segmentation is a key issue in coping with the problem of heterogeneity in estimated cause-and-effect relationships. This chapter presents a new PLS path modeling approach which classifies units on the basis of the heterogeneity of the estimates in the inner model. If unobserved heterogeneity significantly affects the estimated path model relationships on the aggregate data level, the methodology will allow homogenous groups of observations to be created that exhibit distinctive path model estimates. The approach will, thus, provide differentiated analytical outcomes that permit more precise interpretations of each segment formed. An application on a large data set in an example of the American customer satisfaction index (ACSI) substantiates the methodology’s effectiveness in evaluating PLS path modeling results.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Overlaying maps using a desktop GIS is often the first step of a multivariate spatial analysis. The potential of this operation has increased considerably as data sources an dWeb services to manipulate them are becoming widely available via the Internet. Standards from the OGC enable such geospatial ‘mashups’ to be seamless and user driven, involving discovery of thematic data. The user is naturally inclined to look for spatial clusters and ‘correlation’ of outcomes. Using classical cluster detection scan methods to identify multivariate associations can be problematic in this context, because of a lack of control on or knowledge about background populations. For public health and epidemiological mapping, this limiting factor can be critical but often the focus is on spatial identification of risk factors associated with health or clinical status. In this article we point out that this association itself can ensure some control on underlying populations, and develop an exploratory scan statistic framework for multivariate associations. Inference using statistical map methodologies can be used to test the clustered associations. The approach is illustrated with a hypothetical data example and an epidemiological study on community MRSA. Scenarios of potential use for online mashups are introduced but full implementation is left for further research.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Analyzing geographical patterns by collocating events, objects or their attributes has a long history in surveillance and monitoring, and is particularly applied in environmental contexts, such as ecology or epidemiology. The identification of patterns or structures at some scales can be addressed using spatial statistics, particularly marked point processes methodologies. Classification and regression trees are also related to this goal of finding "patterns" by deducing the hierarchy of influence of variables on a dependent outcome. Such variable selection methods have been applied to spatial data, but, often without explicitly acknowledging the spatial dependence. Many methods routinely used in exploratory point pattern analysis are2nd-order statistics, used in a univariate context, though there is also a wide literature on modelling methods for multivariate point pattern processes. This paper proposes an exploratory approach for multivariate spatial data using higher-order statistics built from co-occurrences of events or marks given by the point processes. A spatial entropy measure, derived from these multinomial distributions of co-occurrences at a given order, constitutes the basis of the proposed exploratory methods. © 2010 Elsevier Ltd.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Analyzing geographical patterns by collocating events, objects or their attributes has a long history in surveillance and monitoring, and is particularly applied in environmental contexts, such as ecology or epidemiology. The identification of patterns or structures at some scales can be addressed using spatial statistics, particularly marked point processes methodologies. Classification and regression trees are also related to this goal of finding "patterns" by deducing the hierarchy of influence of variables on a dependent outcome. Such variable selection methods have been applied to spatial data, but, often without explicitly acknowledging the spatial dependence. Many methods routinely used in exploratory point pattern analysis are2nd-order statistics, used in a univariate context, though there is also a wide literature on modelling methods for multivariate point pattern processes. This paper proposes an exploratory approach for multivariate spatial data using higher-order statistics built from co-occurrences of events or marks given by the point processes. A spatial entropy measure, derived from these multinomial distributions of co-occurrences at a given order, constitutes the basis of the proposed exploratory methods. © 2010 Elsevier Ltd.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Strategic planning and more specifically, the impact of strategic planning on organisational performance has been the subject of significant academic interest since the early 1970's. However, despite the significant amount of previous work examining the relationship between strategic planning and organisational performance, a comprehensive literature review identified a number of areas where contributions to the domain of study could be made. In overview, the main areas for further study identified from the literature review were a) a further examination of both the dimensionality and conceptualisation of strategic planning and organisational performance and b) a further, multivariate, examination of the relationship between strategic planning and performance, to capture the newly identified dimensionality. In addition to the previously identified strategic planning and organisational performance constructs, a comprehensive literature based assessment was undertaken and five main areas were identified for further examination, these were a) organisational b) comprehensive strategic choice, c) the quality of strategic options generated, d) political behavior and e) implementation success. From this, a conceptual model incorporating a set of hypotheses to be tested was formulated. In order to test the conceptual model specified and also the stated hypotheses, data gathering was undertaken. The quantitative phase of the research involved a mail survey of senior managers in medium to large UK based organisations, of which a total of 366 fully useable responses were received. Following rigorous individual construct validity and reliability testing, the complete conceptual model was tested using latent variable path analysis. The results for the individual hypotheses and also the complete conceptual model were most encouraging. The findings, theoretical and managerial implications, limitations and directions for future research are discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents the results of a multivariate spatial analysis of 38 vowel formant variables in the language of 402 informants from 236 cities from across the contiguous United States, based on the acoustic data from the Atlas of North American English (Labov, Ash & Boberg, 2006). The results of the analysis both confirm and challenge the results of the Atlas. Most notably, while the analysis identifies similar patterns as the Atlas in the West and the Southeast, the analysis finds that the Midwest and the Northeast are distinct dialect regions that are considerably stronger than the traditional Midland and Northern dialect region indentified in the Atlas. The analysis also finds evidence that a western vowel shift is actively shaping the language of the Western United States.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The method (algorithm BIDIMS) of multivariate objects display to bidimensional structure in which the sum of differences of objects properties and their nearest neighbors is minimal is being described. The basic regularities on the set of objects at this ordering become evident. Besides, such structures (tables) have high inductive opportunities: many latent properties of objects may be predicted on their coordinates in this table. Opportunities of a method are illustrated on an example of bidimentional ordering of chemical elements. The table received in result practically coincides with the periodic Mendeleev table.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A rich material of Heteroptera extracted with Berlese funnels by Dr. I. Loksa between 1953–1974 in Hungary, has been examined. Altogether 157 true bug species have been identified. The ground-living heteropteran assemblages collected in different plant communities, substrata, phytogeographical provinces and seasons have been compared with multivariate methods. Because of the unequal number of samples, the objects have been standardized with stochastic simulation. There are several true bug species, which have been collected in almost all of the plant communities. However, characteristic ground-living heteropteran assemblages have been found in numerous Hungarian plant community types. Leaf litter and debris seem to have characteristic bug assemblages. Some differences have also been recognised between the bug fauna of mosses growing on different surfaces. Most of the species have been found in all of the great phytogeographical provinces of Hungary. Most high-dominance species, which have been collected, can be found at the ground-level almost throughout the year. Specimens of many other species have been collected with Berlese funnels in spring, autumn and/or winter. The diversities of the ground-living heteropteran assemblages of the examined objects have also been compared.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

As a third part of a series of papers on the ground-living true bugs of Hungary, the species belonging to the lace bug genus Acalypta Westwood, 1840 (Insecta: Heteroptera: Tingidae) were studied. Extensive materials collected with Berlese funnels during about 20 years all over Hungary were identified. Based on these sporadic data of many years, faunistic notes are given on some Hungarian species. The seasonal occurrence of the species are discussed. The numbers of specimens of different Acalypta species collected in diverse plant communities are compared with multivariate methods. Materials collected with pitfall traps between 1979–1982 at Bugac, Kiskunság National Park were also processed. In this area, only A. marginata and A. gracilis occurred, both in great number. The temporal changes of the populations are discussed. Significant differences could be observed between the microhabitat distribution of the two species: both species occurred in very low number in traps placed out in patches colonized by dune-slack purple moorgrass meadow; Acalypta gracilis preferred distinctly the Pannonic dune open grassland patches; A. marginata occurred in almost equal number in Pannonic dune open grassland and in Pannonic sand puszta patches.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Dissolved organic matter (DOM) in groundwater and surface water samples from the Florida coastal Everglades were studied using excitation–emission matrix fluorescence modeled through parallel factor analysis (EEM-PARAFAC). DOM in both surface and groundwater from the eastern Everglades S332 basin reflected a terrestrial-derived fingerprint through dominantly higher abundances of humic-like PARAFAC components. In contrast, surface water DOM from northeastern Florida Bay featured a microbial-derived DOM signature based on the higher abundance of microbial humic-like and protein-like components consistent with its marine source. Surprisingly, groundwater DOM from northeastern Florida Bay reflected a terrestrial-derived source except for samples from central Florida Bay well, which mirrored a combination of terrestrial and marine end-member origin. Furthermore, surface water and groundwater displayed effects of different degradation pathways such as photodegradation and biodegradation as exemplified by two PARAFAC components seemingly indicative of such degradation processes. Finally, Principal Component Analysis of the EEM-PARAFAC data was able to distinguish and classify most of the samples according to DOM origins and degradation processes experienced, except for a small overlap of S332 surface water and groundwater, implying rather active surface-to-ground water interaction in some sites particularly during the rainy season. This study highlights that EEM-PARAFAC could be used successfully to trace and differentiate DOM from diverse sources across both horizontal and vertical flow profiles, and as such could be a convenient and useful tool for the better understanding of hydrological interactions and carbon biogeochemical cycling.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study investigated the feasibility of using qualitative methods to provide empirical documentation of the long-term qualitative change in the life course trajectories of “at risk” youth in a school based positive youth development program (the Changing Lives Program—CLP). This work draws from life course theory for a developmental framework and from recent advances in the use of qualitative methods in general and a grounded theory approach in particular. Grounded theory provided a methodological framework for conceptualizing the use of qualitative methods for assessing qualitative life change. The study investigated the feasibility of using the Possible Selves Questionnaire-Qualitative Extension (PSQ-QE) for evaluating the impact of the program on qualitative change in participants' life trajectory relative to a non-intervention control group. Integrated Qualitative/Quantitative Data Analytic Strategies (IQ-DAS) that we have been developing a part of our program of research provided the data analytic framework for the study. ^ Change was evaluated in 85 at risk high school students in CLP high school counseling groups over three assessment periods (pre, post, and follow-up), and a non-intervention control group of 23 students over two assessment periods (pre and post). Intervention gains and maintenance and the extent to which these patterns of change were moderated by gender and ethnicity were evaluated using a mixed design Repeated Measures Multivariate Analysis of Variance (RMANOVA) in which Time (pre, post) was the within (repeated) factor and Condition, Gender, and Ethnicity the between group factors. The trends for the direction of qualitative change were positive from pre to post and maintained at the year-end follow-up. More important, the 3-way interaction for Time x Gender x Ethnicity was significant, Roy's Θ =. 205, F(2, 37) = 3.80, p <.032, indicating that the overall pattern of positive change was significantly moderated by gender and ethnicity. Thus, the findings also provided preliminary evidence for a positive impact of the youth development program on long-term change in life course trajectory, and were suggestive with respect to the issue of amenability to treatment, i.e., the identification of subgroups of individuals in a target population who are likely to be the most amenable or responsive to a treatment. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study subdivides the Weddell Sea, Antarctica, into seafloor regions using multivariate statistical methods. These regions are categories used for comparing, contrasting and quantifying biogeochemical processes and biodiversity between ocean regions geographically but also regions under development within the scope of global change. The division obtained is characterized by the dominating components and interpreted in terms of ruling environmental conditions. The analysis uses 28 environmental variables for the sea surface, 25 variables for the seabed and 9 variables for the analysis between surface and bottom variables. The data were taken during the years 1983-2013. Some data were interpolated. The statistical errors of several interpolation methods (e.g. IDW, Indicator, Ordinary and Co-Kriging) with changing settings have been compared for the identification of the most reasonable method. The multivariate mathematical procedures used are regionalized classification via k means cluster analysis, canonical-correlation analysis and multidimensional scaling. Canonical-correlation analysis identifies the influencing factors in the different parts of the cove. Several methods for the identification of the optimum number of clusters have been tested. For the seabed 8 and 12 clusters were identified as reasonable numbers for clustering the Weddell Sea. For the sea surface the numbers 8 and 13 and for the top/bottom analysis 8 and 3 were identified, respectively. Additionally, the results of 20 clusters are presented for the three alternatives offering the first small scale environmental regionalization of the Weddell Sea. Especially the results of 12 clusters identify marine-influenced regions which can be clearly separated from those determined by the geological catchment area and the ones dominated by river discharge.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Multivariate statistical analysis on the kaolinite/chlorite ratios from 20 South Atlantic sediment cores allowed for the extraction of two processes controlling the fluctuations of the kaolinite/chlorite ratio during the last 130,000 yrs, (1) the relative strength of North Atlantic Deep Water (NADW) inflow into the South Atlantic Ocean and (2) the influx of aeolian sediments from the south African continent. The NADW fluctuation can be traced in the entire deep South Atlantic while the dust signal is restricted to the vicinity of South Africa. Our data indicate that NADW formation underwent significant changes in response to glacial/interglacial climate changes with enhanced export to the Southern Hemisphere during interglacials. The most pronounced phases with Enhanced South African Dust Export (ESADE) occurred during cold Marine Isotope Stage (MIS) 5d and across the Late Glacial/Holocene transition from 16 ka to 4 ka (MIS 2 to 1). This particular pattern is attributed to the interaction of Antarctic Sea Ice extent, the position of the westerlies and the South African monsoon system.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.

Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract

Continuous variable is one of the major data types collected by the survey organizations. It can be incomplete such that the data collectors need to fill in the missingness. Or, it can contain sensitive information which needs protection from re-identification. One of the approaches to protect continuous microdata is to sum them up according to different cells of features. In this thesis, I represents novel methods of multiple imputation (MI) that can be applied to impute missing values and synthesize confidential values for continuous and magnitude data.

The first method is for limiting the disclosure risk of the continuous microdata whose marginal sums are fixed. The motivation for developing such a method comes from the magnitude tables of non-negative integer values in economic surveys. I present approaches based on a mixture of Poisson distributions to describe the multivariate distribution so that the marginals of the synthetic data are guaranteed to sum to the original totals. At the same time, I present methods for assessing disclosure risks in releasing such synthetic magnitude microdata. The illustration on a survey of manufacturing establishments shows that the disclosure risks are low while the information loss is acceptable.

The second method is for releasing synthetic continuous micro data by a nonstandard MI method. Traditionally, MI fits a model on the confidential values and then generates multiple synthetic datasets from this model. Its disclosure risk tends to be high, especially when the original data contain extreme values. I present a nonstandard MI approach conditioned on the protective intervals. Its basic idea is to estimate the model parameters from these intervals rather than the confidential values. The encouraging results of simple simulation studies suggest the potential of this new approach in limiting the posterior disclosure risk.

The third method is for imputing missing values in continuous and categorical variables. It is extended from a hierarchically coupled mixture model with local dependence. However, the new method separates the variables into non-focused (e.g., almost-fully-observed) and focused (e.g., missing-a-lot) ones. The sub-model structure of focused variables is more complex than that of non-focused ones. At the same time, their cluster indicators are linked together by tensor factorization and the focused continuous variables depend locally on non-focused values. The model properties suggest that moving the strongly associated non-focused variables to the side of focused ones can help to improve estimation accuracy, which is examined by several simulation studies. And this method is applied to data from the American Community Survey.