875 resultados para statistical data analysis
Resumo:
This document provides a simple introduction to research methods and analysis tools for biologists or environmental scientists, with particular emphasis on fish biology in devleoping countries.
Resumo:
ENGLISH: A two-stage sampling design is used to estimate the variances of the numbers of yellowfin in different age groups caught in the eastern Pacific Ocean. For purse seiners, the primary sampling unit (n) is a brine well containing fish from a month-area stratum; the number of fish lengths (m) measured from each well are the secondary units. The fish cannot be selected at random from the wells because of practical limitations. The effects of different sampling methods and other factors on the reliability and precision of statistics derived from the length-frequency data were therefore examined. Modifications are recommended where necessary. Lengths of fish measured during the unloading of six test wells revealed two forms of inherent size stratification: 1) short-term disruptions of existing pattern of sizes, and 2) transition zones between long-term trends in sizes. To some degree, all wells exhibited cyclic changes in mean size and variance during unloading. In half of the wells, it was observed that size selection by the unloaders induced a change in mean size. As a result of stratification, the sequence of sizes removed from all wells was non-random, regardless of whether a well contained fish from a single set or from more than one set. The number of modal sizes in a well was not related to the number of sets. In an additional well composed of fish from several sets, an experiment on vertical mixing indicated that a representative sample of the contents may be restricted to the bottom half of the well. The contents of the test wells were used to generate 25 simulated wells and to compare the results of three sampling methods applied to them. The methods were: (1) random sampling (also used as a standard), (2) protracted sampling, in which the selection process was extended over a large portion of a well, and (3) measuring fish consecutively during removal from the well. Repeated sampling by each method and different combinations indicated that, because the principal source of size variation occurred among primary units, increasing n was the most effective way to reduce the variance estimates of both the age-group sizes and the total number of fish in the landings. Protracted sampling largely circumvented the effects of size stratification, and its performance was essentially comparable to that of random sampling. Sampling by this method is recommended. Consecutive-fish sampling produced more biased estimates with greater variances. Analysis of the 1988 length-frequency samples indicated that, for age groups that appear most frequently in the catch, a minimum sampling frequency of one primary unit in six for each month-area stratum would reduce the coefficients of variation (CV) of their size estimates to approximately 10 percent or less. Additional stratification of samples by set type, rather than month-area alone, further reduced the CV's of scarce age groups, such as the recruits, and potentially improved their accuracy. The CV's of recruitment estimates for completely-fished cohorts during the 198184 period were in the vicinity of 3 to 8 percent. Recruitment estimates and their variances were also relatively insensitive to changes in the individual quarterly catches and variances, respectively, of which they were composed. SPANISH: Se usa un diseño de muestreo de dos etapas para estimar las varianzas de los números de aletas amari11as en distintos grupos de edad capturados en el Océano Pacifico oriental. Para barcos cerqueros, la unidad primaria de muestreo (n) es una bodega de salmuera que contenía peces de un estrato de mes-área; el numero de ta11as de peces (m) medidas de cada bodega es la unidad secundaria. Limitaciones de carácter practico impiden la selección aleatoria de peces de las bodegas. Por 10 tanto, fueron examinados los efectos de distintos métodos de muestreo y otros factores sobre la confiabilidad y precisión de las estadísticas derivadas de los datos de frecuencia de ta11a. Se recomiendan modificaciones donde sean necesarias. Las ta11as de peces medidas durante la descarga de seis bodegas de prueba revelaron dos formas de estratificación inherente por ta11a: 1) perturbaciones a corto plazo en la pauta de ta11as existente, y 2) zonas de transición entre las tendencias a largo plazo en las ta11as. En cierto grado, todas las bodegas mostraron cambios cíclicos en ta11a media y varianza durante la descarga. En la mitad de las bodegas, se observo que selección por ta11a por los descargadores indujo un cambio en la ta11a media. Como resultado de la estratificación, la secuencia de ta11as sacadas de todas las bodegas no fue aleatoria, sin considerar si una bodega contenía peces de un solo lance 0 de mas de uno. El numero de ta11as modales en una bodega no estaba relacionado al numero de lances. En una bodega adicional compuesta de peces de varios lances, un experimento de mezcla vertical indico que una muestra representativa del contenido podría estar limitada a la mitad inferior de la bodega. Se uso el contenido de las bodegas de prueba para generar 25 bodegas simuladas y comparar los resultados de tres métodos de muestreo aplicados a estas. Los métodos fueron: (1) muestreo aleatorio (usado también como norma), (2) muestreo extendido, en el cual el proceso de selección fue extendido sobre una porción grande de una bodega, y (3) medición consecutiva de peces durante la descarga de la bodega. EI muestreo repetido con cada método y distintas combinaciones de n y m indico que, puesto que la fuente principal de variación de ta11a ocurría entre las unidades primarias, aumentar n fue la manera mas eficaz de reducir las estimaciones de la varianza de las ta11as de los grupos de edad y el numero total de peces en los desembarcos. El muestreo extendido evito mayormente los efectos de la estratificación por ta11a, y su desempeño fue esencialmente comparable a aquel del muestreo aleatorio. Se recomienda muestrear con este método. El muestreo de peces consecutivos produjo estimaciones mas sesgadas con mayores varianzas. Un análisis de las muestras de frecuencia de ta11a de 1988 indico que, para los grupos de edad que aparecen con mayor frecuencia en la captura, una frecuencia de muestreo minima de una unidad primaria de cada seis para cada estrato de mes-área reduciría los coeficientes de variación (CV) de las estimaciones de ta11a correspondientes a aproximadamente 10% 0 menos. Una estratificación adicional de las muestras por tipo de lance, y no solamente mes-área, redujo aun mas los CV de los grupos de edad escasos, tales como los reclutas, y mejoró potencialmente su precisión. Los CV de las estimaciones del reclutamiento para las cohortes completamente pescadas durante 1981-1984 fueron alrededor de 3-8%. Las estimaciones del reclutamiento y sus varianzas fueron también relativamente insensibles a cambios en las capturas de trimestres individuales y las varianzas, respectivamente, de las cuales fueron derivadas. (PDF contains 70 pages)
Resumo:
This thesis is an investigation into the nature of data analysis and computer software systems which support this activity.
The first chapter develops the notion of data analysis as an experimental science which has two major components: data-gathering and theory-building. The basic role of language in determining the meaningfulness of theory is stressed, and the informativeness of a language and data base pair is studied. The static and dynamic aspects of data analysis are then considered from this conceptual vantage point. The second chapter surveys the available types of computer systems which may be useful for data analysis. Particular attention is paid to the questions raised in the first chapter about the language restrictions imposed by the computer system and its dynamic properties.
The third chapter discusses the REL data analysis system, which was designed to satisfy the needs of the data analyzer in an operational relational data system. The major limitation on the use of such systems is the amount of access to data stored on a relatively slow secondary memory. This problem of the paging of data is investigated and two classes of data structure representations are found, each of which has desirable paging characteristics for certain types of queries. One representation is used by most of the generalized data base management systems in existence today, but the other is clearly preferred in the data analysis environment, as conceptualized in Chapter I.
This data representation has strong implications for a fundamental process of data analysis -- the quantification of variables. Since quantification is one of the few means of summarizing and abstracting, data analysis systems are under strong pressure to facilitate the process. Two implementations of quantification are studied: one analagous to the form of the lower predicate calculus and another more closely attuned to the data representation. A comparison of these indicates that the use of the "label class" method results in orders of magnitude improvement over the lower predicate calculus technique.