940 resultados para visitor information, network services, data collecting, data analysis, statistics, locating
Resumo:
Functional Data Analysis (FDA) deals with samples where a whole function is observedfor each individual. A particular case of FDA is when the observed functions are densityfunctions, that are also an example of infinite dimensional compositional data. In thiswork we compare several methods for dimensionality reduction for this particular typeof data: functional principal components analysis (PCA) with or without a previousdata transformation and multidimensional scaling (MDS) for diferent inter-densitiesdistances, one of them taking into account the compositional nature of density functions. The difeerent methods are applied to both artificial and real data (householdsincome distributions)
Resumo:
In this paper we examine the problem of compositional data from a different startingpoint. Chemical compositional data, as used in provenance studies on archaeologicalmaterials, will be approached from the measurement theory. The results will show, in avery intuitive way that chemical data can only be treated by using the approachdeveloped for compositional data. It will be shown that compositional data analysis is aparticular case in projective geometry, when the projective coordinates are in thepositive orthant, and they have the properties of logarithmic interval metrics. Moreover,it will be shown that this approach can be extended to a very large number ofapplications, including shape analysis. This will be exemplified with a case study inarchitecture of Early Christian churches dated back to the 5th-7th centuries AD
Resumo:
This analysis was stimulated by the real data analysis problem of householdexpenditure data. The full dataset contains expenditure data for a sample of 1224 households. The expenditure is broken down at 2 hierarchical levels: 9 major levels (e.g. housing, food, utilities etc.) and 92 minor levels. There are also 5 factors and 5 covariates at the household level. Not surprisingly, there are a small number of zeros at the major level, but many zeros at the minor level. The question is how best to model the zeros. Clearly, models that tryto add a small amount to the zero terms are not appropriate in general as at least some of the zeros are clearly structural, e.g. alcohol/tobacco for households that are teetotal. The key question then is how to build suitable conditional models. For example, is the sub-composition of spendingexcluding alcohol/tobacco similar for teetotal and non-teetotal households?In other words, we are looking for sub-compositional independence. Also, what determines whether a household is teetotal? Can we assume that it is independent of the composition? In general, whether teetotal will clearly depend on the household level variables, so we need to be able to model this dependence. The other tricky question is that with zeros on more than onecomponent, we need to be able to model dependence and independence of zeros on the different components. Lastly, while some zeros are structural, others may not be, for example, for expenditure on durables, it may be chance as to whether a particular household spends money on durableswithin the sample period. This would clearly be distinguishable if we had longitudinal data, but may still be distinguishable by looking at the distribution, on the assumption that random zeros will usually be for situations where any non-zero expenditure is not small.While this analysis is based on around economic data, the ideas carry over tomany other situations, including geological data, where minerals may be missing for structural reasons (similar to alcohol), or missing because they occur only in random regions which may be missed in a sample (similar to the durables)
Resumo:
The statistical analysis of compositional data should be treated using logratios of parts,which are difficult to use correctly in standard statistical packages. For this reason afreeware package, named CoDaPack was created. This software implements most of thebasic statistical methods suitable for compositional data.In this paper we describe the new version of the package that now is calledCoDaPack3D. It is developed in Visual Basic for applications (associated with Excel©),Visual Basic and Open GL, and it is oriented towards users with a minimum knowledgeof computers with the aim at being simple and easy to use.This new version includes new graphical output in 2D and 3D. These outputs could bezoomed and, in 3D, rotated. Also a customization menu is included and outputs couldbe saved in jpeg format. Also this new version includes an interactive help and alldialog windows have been improved in order to facilitate its use.To use CoDaPack one has to access Excel© and introduce the data in a standardspreadsheet. These should be organized as a matrix where Excel© rows correspond tothe observations and columns to the parts. The user executes macros that returnnumerical or graphical results. There are two kinds of numerical results: new variablesand descriptive statistics, and both appear on the same sheet. Graphical output appearsin independent windows. In the present version there are 8 menus, with a total of 38submenus which, after some dialogue, directly call the corresponding macro. Thedialogues ask the user to input variables and further parameters needed, as well aswhere to put these results. The web site http://ima.udg.es/CoDaPack contains thisfreeware package and only Microsoft Excel© under Microsoft Windows© is required torun the software.Kew words: Compositional data Analysis, Software
Resumo:
Analyzing functional data often leads to finding common factors, for which functional principal component analysis proves to be a useful tool to summarize and characterize the random variation in a function space. The representation in terms of eigenfunctions is optimal in the sense of L-2 approximation. However, the eigenfunctions are not always directed towards an interesting and interpretable direction in the context of functional data and thus could obscure the underlying structure. To overcome such difficulty, an alternative to functional principal component analysis is proposed that produces directed components which may be more informative and easier to interpret. These structural components are similar to principal components, but are adapted to situations in which the domain of the function may be decomposed into disjoint intervals such that there is effectively independence between intervals and positive correlation within intervals. The approach is demonstrated with synthetic examples as well as real data. Properties for special cases are also studied.
Resumo:
As stated in Aitchison (1986), a proper study of relative variation in a compositional data set should be based on logratios, and dealing with logratios excludes dealing with zeros. Nevertheless, it is clear that zero observations might be present in real data sets, either because the corresponding part is completelyabsent –essential zeros– or because it is below detection limit –rounded zeros. Because the second kind of zeros is usually understood as “a trace too small to measure”, it seems reasonable to replace them by a suitable small value, and this has been the traditional approach. As stated, e.g. by Tauber (1999) and byMartín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000), the principal problem in compositional data analysis is related to rounded zeros. One should be careful to use a replacement strategy that does not seriously distort the general structure of the data. In particular, the covariance structure of the involvedparts –and thus the metric properties– should be preserved, as otherwise further analysis on subpopulations could be misleading. Following this point of view, a non-parametric imputation method isintroduced in Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000). This method is analyzed in depth by Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2003) where it is shown that thetheoretical drawbacks of the additive zero replacement method proposed in Aitchison (1986) can be overcome using a new multiplicative approach on the non-zero parts of a composition. The new approachhas reasonable properties from a compositional point of view. In particular, it is “natural” in the sense thatit recovers the “true” composition if replacement values are identical to the missing values, and it is coherent with the basic operations on the simplex. This coherence implies that the covariance structure of subcompositions with no zeros is preserved. As a generalization of the multiplicative replacement, in thesame paper a substitution method for missing values on compositional data sets is introduced
Resumo:
In a seminal paper, Aitchison and Lauder (1985) introduced classical kernel densityestimation techniques in the context of compositional data analysis. Indeed, they gavetwo options for the choice of the kernel to be used in the kernel estimator. One ofthese kernels is based on the use the alr transformation on the simplex SD jointly withthe normal distribution on RD-1. However, these authors themselves recognized thatthis method has some deficiencies. A method for overcoming these dificulties based onrecent developments for compositional data analysis and multivariate kernel estimationtheory, combining the ilr transformation with the use of the normal density with a fullbandwidth matrix, was recently proposed in Martín-Fernández, Chacón and Mateu-Figueras (2006). Here we present an extensive simulation study that compares bothmethods in practice, thus exploring the finite-sample behaviour of both estimators
Resumo:
The quantitative estimation of Sea Surface Temperatures from fossils assemblages is afundamental issue in palaeoclimatic and paleooceanographic investigations. TheModern Analogue Technique, a widely adopted method based on direct comparison offossil assemblages with modern coretop samples, was revised with the aim ofconforming it to compositional data analysis. The new CODAMAT method wasdeveloped by adopting the Aitchison metric as distance measure. Modern coretopdatasets are characterised by a large amount of zeros. The zero replacement was carriedout by adopting a Bayesian approach to the zero replacement, based on a posteriorestimation of the parameter of the multinomial distribution. The number of modernanalogues from which reconstructing the SST was determined by means of a multipleapproach by considering the Proxies correlation matrix, Standardized Residual Sum ofSquares and Mean Squared Distance. This new CODAMAT method was applied to theplanktonic foraminiferal assemblages of a core recovered in the Tyrrhenian Sea.Kew words: Modern analogues, Aitchison distance, Proxies correlation matrix,Standardized Residual Sum of Squares
Resumo:
In 1993, Iowa Workforce Development (then the Department of Employment Services) conducted a survey to determine if there was a gender gap in wages paid. The results of that survey indicated that women were paid 68 cents per dollar paid to males. We felt a need to determine if this relationship of wages paid to each gender has changed since the 1993 study. In 1999, the Commission on the Status of Women requested that Iowa Workforce Development conduct research to update the 1993 information. A survey, cosponsored by the Commission on the Status of Women and Iowa Workforce Development, was conducted in 1999. The results of the survey showed that women earned 73 percent of what men earned when both jobs were considered. (The survey asked respondents to provide information on a primary job and a secondary job.) The ratio for the primary job was 72 percent, while the ratio for the secondary job was 85 percent. Additional survey results detail the types of jobs respondents had, the types of companies for which they worked and the education and experience levels. All of these characteristics can contribute to these ratios. While the large influx of women into the labor force may be over, it is still important to look at such information to determine if future action is needed. We present these results with that goal in mind. We are indebted to those Iowans, female and male, who voluntarily completed the survey. This study was completed under the general direction of Judy Erickson. The report was written by Shazada Khan, Teresa Wageman, Ann Wagner, and Yvonne Younes with administrative and technical assistance from Michael Blank, Margaret Lee and Gary Wilson. The Iowa State University Statistical Lab provided sampling advice, data entry and coding and data analysis.
Resumo:
The Iowa livestock industry generates large quantities of manure and other organic residues; composed of feces, urine, bedding material, waste feed, dilution water, and mortalities. Often viewed as a waste material, little has been done to characterize and determine the usefulness of this resource. The Iowa Department of Natural Resources initiated the process to assess in detail the manure resource and the potential utilization of this resource through anaerobic digestion coupled with energy recovery. Many of the pieces required to assess the manure resource already exist, albeit in disparate forms and locations. This study began by interpreting and integrating existing Federal, State, ISU studies, and other sources of livestock numbers, housing, and management information. With these data, models were analyzed to determine energy production and economic feasibility of energy recovery using anaerobic digestion facilities on livestock faxms. Having these data individual facilities and clusters that appear economically feasible can be identified specifically through the use of a GIs system for further investigation. Also livestock facilities and clusters of facilities with high methane recovery potential can be the focus of targeted educational programs through Cooperative Extension network and other outreach networks, providing a more intensive counterpoint to broadly based educational efforts.
Resumo:
The purpose of the Iowa TOPSpro Data Dictionary is to provide a statewide-standardized set of instructions and definitions for coding Tracking Of Programs And Students (TOPSpro) forms and effectively utilizing the TOPSpro software. This document is designed to serve as a companion document to the TOPS Technical Manual produced by the Comprehensive Adult Student Assessment System (CASAS). The data dictionary integrates information from various data systems to provide uniform data sets and definitions that meet local, state and federal reporting mandates. The sources for the data dictionary are: (1) the National Reporting System (NRS) Guidelines, (2) standard practices utilized in Iowa’s adult literacy program, (3) selected definitions from the Workforce Investment Act of 1998, (4) input from the state level Management Information System (MIS) personnel, and (5) selected definitions from other Iowa state agencies.
Resumo:
Objective To analyse the provision of health care actions and services for people living with AIDS and receiving specialised care in Ribeirão Preto, SP. Method A descriptive, exploratory, survey-type study that consisted of interviews with structured questionnaires and data analysis using descriptive statistics. Results The provision of health care actions and services is perceived as fair. For the 301 subjects, routine care provided by the reference team, laboratory tests and the availability of antiretroviral drugs, vaccines and condoms obtained satisfactory evaluations. The provision of tests for the prevention and diagnosis of comorbidities was assessed as fair, whereas the provisions of specialised care by other professionals, psychosocial support groups and medicines for the prevention of antiretroviral side effects were assessed as unsatisfactory. Conclusion Shortcomings were observed in follow-up and care management along with a predominantly biological, doctor-centred focus in which clinical control and access to antiretroviral therapy comprise the essential focus of the care provided.
Resumo:
The aim of this talk is to convince the reader that there are a lot of interesting statisticalproblems in presentday life science data analysis which seem ultimately connected withcompositional statistics.Key words: SAGE, cDNA microarrays, (1D-)NMR, virus quasispecies
Resumo:
B-1 Medicaid Reports The monthly Medicaid series of eight reports provide summaries of Medicaid eligibles, recipients served, and total payments by county, category of service, and aid category. These reports may also be known as the B-1 Reports. These reports are each available as a PDF for printing or as a CSV file for data analysis. Report name Report number Medically Needy by County - No Spenddown and With Spenddown IAMM1800-R001 Total Medically Needy, All Other Medicaid, and Grand Total by County IAMM1800-R002 Monthly Expenditures by Category of Service IAMM2200-R002 Fiscal YTD Expenditures by Category of Service IAMM2200-R003 ICF & ICF-MR Vendor Payments by County IAMM3800-R001 Monthly Expenditures by Eligibility Program IAMM4400-R001 Monthly Expenditures by Category of Service by Program IAMM4400-R002 Elderly Waiver Summary by County IAMM4600-R002
Resumo:
B-1 Medicaid Reports The monthly Medicaid series of eight reports provide summaries of Medicaid eligibles, recipients served, and total payments by county, category of service, and aid category. These reports may also be known as the B-1 Reports. These reports are each available as a PDF for printing or as a CSV file for data analysis. Report Report name IAMM1800-R001 Medically Needy by County - No Spenddown and With Spenddown IAMM1800-R002 Total Medically Needy, All Other Medicaid, and Grand Total by County IAMM2200-R002 Monthly Expenditures by Category of Service IAMM2200-R003 Fiscal YTD Expenditures by Category of Service IAMM3800-R001 ICF & ICF-MR Vendor Payments by County IAMM4400-R001 Monthly Expenditures by Eligibility Program IAMM4400-R002 Monthly Expenditures by Category of Service by Program IAMM4600-R002 Elderly Waiver Summary by County