934 resultados para methods: data analysis


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The aim of this thesis is to develop a fully automatic lameness detection system that operates in a milking robot. The instrumentation, measurement software, algorithms for data analysis and a neural network model for lameness detection were developed. Automatic milking has become a common practice in dairy husbandry, and in the year 2006 about 4000 farms worldwide used over 6000 milking robots. There is a worldwide movement with the objective of fully automating every process from feeding to milking. Increase in automation is a consequence of increasing farm sizes, the demand for more efficient production and the growth of labour costs. As the level of automation increases, the time that the cattle keeper uses for monitoring animals often decreases. This has created a need for systems for automatically monitoring the health of farm animals. The popularity of milking robots also offers a new and unique possibility to monitor animals in a single confined space up to four times daily. Lameness is a crucial welfare issue in the modern dairy industry. Limb disorders cause serious welfare, health and economic problems especially in loose housing of cattle. Lameness causes losses in milk production and leads to early culling of animals. These costs could be reduced with early identification and treatment. At present, only a few methods for automatically detecting lameness have been developed, and the most common methods used for lameness detection and assessment are various visual locomotion scoring systems. The problem with locomotion scoring is that it needs experience to be conducted properly, it is labour intensive as an on-farm method and the results are subjective. A four balance system for measuring the leg load distribution of dairy cows during milking in order to detect lameness was developed and set up in the University of Helsinki Research farm Suitia. The leg weights of 73 cows were successfully recorded during almost 10,000 robotic milkings over a period of 5 months. The cows were locomotion scored weekly, and the lame cows were inspected clinically for hoof lesions. Unsuccessful measurements, caused by cows standing outside the balances, were removed from the data with a special algorithm, and the mean leg loads and the number of kicks during milking was calculated. In order to develop an expert system to automatically detect lameness cases, a model was needed. A probabilistic neural network (PNN) classifier model was chosen for the task. The data was divided in two parts and 5,074 measurements from 37 cows were used to train the model. The operation of the model was evaluated for its ability to detect lameness in the validating dataset, which had 4,868 measurements from 36 cows. The model was able to classify 96% of the measurements correctly as sound or lame cows, and 100% of the lameness cases in the validation data were identified. The number of measurements causing false alarms was 1.1%. The developed model has the potential to be used for on-farm decision support and can be used in a real-time lameness monitoring system.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work belongs to the field of computational high-energy physics (HEP). The key methods used in this thesis work to meet the challenges raised by the Large Hadron Collider (LHC) era experiments are object-orientation with software engineering, Monte Carlo simulation, the computer technology of clusters, and artificial neural networks. The first aspect discussed is the development of hadronic cascade models, used for the accurate simulation of medium-energy hadron-nucleus reactions, up to 10 GeV. These models are typically needed in hadronic calorimeter studies and in the estimation of radiation backgrounds. Various applications outside HEP include the medical field (such as hadron treatment simulations), space science (satellite shielding), and nuclear physics (spallation studies). Validation results are presented for several significant improvements released in Geant4 simulation tool, and the significance of the new models for computing in the Large Hadron Collider era is estimated. In particular, we estimate the ability of the Bertini cascade to simulate Compact Muon Solenoid (CMS) hadron calorimeter HCAL. LHC test beam activity has a tightly coupled cycle of simulation-to-data analysis. Typically, a Geant4 computer experiment is used to understand test beam measurements. Thus an another aspect of this thesis is a description of studies related to developing new CMS H2 test beam data analysis tools and performing data analysis on the basis of CMS Monte Carlo events. These events have been simulated in detail using Geant4 physics models, full CMS detector description, and event reconstruction. Using the ROOT data analysis framework we have developed an offline ANN-based approach to tag b-jets associated with heavy neutral Higgs particles, and we show that this kind of NN methodology can be successfully used to separate the Higgs signal from the background in the CMS experiment.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Accelerator mass spectrometry (AMS) is an ultrasensitive technique for measuring the concentration of a single isotope. The electric and magnetic fields of an electrostatic accelerator system are used to filter out other isotopes from the ion beam. The high velocity means that molecules can be destroyed and removed from the measurement background. As a result, concentrations down to one atom in 10^16 atoms are measurable. This thesis describes the construction of the new AMS system in the Accelerator Laboratory of the University of Helsinki. The system is described in detail along with the relevant ion optics. System performance and some of the 14C measurements done with the system are described. In a second part of the thesis, a novel statistical model for the analysis of AMS data is presented. Bayesian methods are used in order to make the best use of the available information. In the new model, instrumental drift is modelled with a continuous first-order autoregressive process. This enables rigorous normalization to standards measured at different times. The Poisson statistical nature of a 14C measurement is also taken into account properly, so that uncertainty estimates are much more stable. It is shown that, overall, the new model improves both the accuracy and the precision of AMS measurements. In particular, the results can be improved for samples with very low 14C concentrations or measured only a few times.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Aims: Develop and validate tools to estimate residual noise covariance in Planck frequency maps. Quantify signal error effects and compare different techniques to produce low-resolution maps. Methods: We derive analytical estimates of covariance of the residual noise contained in low-resolution maps produced using a number of map-making approaches. We test these analytical predictions using Monte Carlo simulations and their impact on angular power spectrum estimation. We use simulations to quantify the level of signal errors incurred in different resolution downgrading schemes considered in this work. Results: We find an excellent agreement between the optimal residual noise covariance matrices and Monte Carlo noise maps. For destriping map-makers, the extent of agreement is dictated by the knee frequency of the correlated noise component and the chosen baseline offset length. The significance of signal striping is shown to be insignificant when properly dealt with. In map resolution downgrading, we find that a carefully selected window function is required to reduce aliasing to the sub-percent level at multipoles, ell > 2Nside, where Nside is the HEALPix resolution parameter. We show that sufficient characterization of the residual noise is unavoidable if one is to draw reliable contraints on large scale anisotropy. Conclusions: We have described how to compute the low-resolution maps, with a controlled sky signal level, and a reliable estimate of covariance of the residual noise. We have also presented a method to smooth the residual noise covariance matrices to describe the noise correlations in smoothed, bandwidth limited maps.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This document provides a simple introduction to research methods and analysis tools for biologists or environmental scientists, with particular emphasis on fish biology in devleoping countries.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

DNA microarray, or DNA chip, is a technology that allows us to obtain the expression level of many genes in a single experiment. The fact that numerical expression values can be easily obtained gives us the possibility to use multiple statistical techniques of data analysis. In this project microarray data is obtained from Gene Expression Omnibus, the repository of National Center for Biotechnology Information (NCBI). Then, the noise is removed and data is normalized, also we use hypothesis tests to find the most relevant genes that may be involved in a disease and use machine learning methods like KNN, Random Forest or Kmeans. For performing the analysis we use Bioconductor, packages in R for the analysis of biological data, and we conduct a case study in Alzheimer disease. The complete code can be found in https://github.com/alberto-poncelas/ bioc-alzheimer

Relevância:

100.00% 100.00%

Publicador:

Resumo:

ENGLISH: A two-stage sampling design is used to estimate the variances of the numbers of yellowfin in different age groups caught in the eastern Pacific Ocean. For purse seiners, the primary sampling unit (n) is a brine well containing fish from a month-area stratum; the number of fish lengths (m) measured from each well are the secondary units. The fish cannot be selected at random from the wells because of practical limitations. The effects of different sampling methods and other factors on the reliability and precision of statistics derived from the length-frequency data were therefore examined. Modifications are recommended where necessary. Lengths of fish measured during the unloading of six test wells revealed two forms of inherent size stratification: 1) short-term disruptions of existing pattern of sizes, and 2) transition zones between long-term trends in sizes. To some degree, all wells exhibited cyclic changes in mean size and variance during unloading. In half of the wells, it was observed that size selection by the unloaders induced a change in mean size. As a result of stratification, the sequence of sizes removed from all wells was non-random, regardless of whether a well contained fish from a single set or from more than one set. The number of modal sizes in a well was not related to the number of sets. In an additional well composed of fish from several sets, an experiment on vertical mixing indicated that a representative sample of the contents may be restricted to the bottom half of the well. The contents of the test wells were used to generate 25 simulated wells and to compare the results of three sampling methods applied to them. The methods were: (1) random sampling (also used as a standard), (2) protracted sampling, in which the selection process was extended over a large portion of a well, and (3) measuring fish consecutively during removal from the well. Repeated sampling by each method and different combinations indicated that, because the principal source of size variation occurred among primary units, increasing n was the most effective way to reduce the variance estimates of both the age-group sizes and the total number of fish in the landings. Protracted sampling largely circumvented the effects of size stratification, and its performance was essentially comparable to that of random sampling. Sampling by this method is recommended. Consecutive-fish sampling produced more biased estimates with greater variances. Analysis of the 1988 length-frequency samples indicated that, for age groups that appear most frequently in the catch, a minimum sampling frequency of one primary unit in six for each month-area stratum would reduce the coefficients of variation (CV) of their size estimates to approximately 10 percent or less. Additional stratification of samples by set type, rather than month-area alone, further reduced the CV's of scarce age groups, such as the recruits, and potentially improved their accuracy. The CV's of recruitment estimates for completely-fished cohorts during the 198184 period were in the vicinity of 3 to 8 percent. Recruitment estimates and their variances were also relatively insensitive to changes in the individual quarterly catches and variances, respectively, of which they were composed. SPANISH: Se usa un diseño de muestreo de dos etapas para estimar las varianzas de los números de aletas amari11as en distintos grupos de edad capturados en el Océano Pacifico oriental. Para barcos cerqueros, la unidad primaria de muestreo (n) es una bodega de salmuera que contenía peces de un estrato de mes-área; el numero de ta11as de peces (m) medidas de cada bodega es la unidad secundaria. Limitaciones de carácter practico impiden la selección aleatoria de peces de las bodegas. Por 10 tanto, fueron examinados los efectos de distintos métodos de muestreo y otros factores sobre la confiabilidad y precisión de las estadísticas derivadas de los datos de frecuencia de ta11a. Se recomiendan modificaciones donde sean necesarias. Las ta11as de peces medidas durante la descarga de seis bodegas de prueba revelaron dos formas de estratificación inherente por ta11a: 1) perturbaciones a corto plazo en la pauta de ta11as existente, y 2) zonas de transición entre las tendencias a largo plazo en las ta11as. En cierto grado, todas las bodegas mostraron cambios cíclicos en ta11a media y varianza durante la descarga. En la mitad de las bodegas, se observo que selección por ta11a por los descargadores indujo un cambio en la ta11a media. Como resultado de la estratificación, la secuencia de ta11as sacadas de todas las bodegas no fue aleatoria, sin considerar si una bodega contenía peces de un solo lance 0 de mas de uno. El numero de ta11as modales en una bodega no estaba relacionado al numero de lances. En una bodega adicional compuesta de peces de varios lances, un experimento de mezcla vertical indico que una muestra representativa del contenido podría estar limitada a la mitad inferior de la bodega. Se uso el contenido de las bodegas de prueba para generar 25 bodegas simuladas y comparar los resultados de tres métodos de muestreo aplicados a estas. Los métodos fueron: (1) muestreo aleatorio (usado también como norma), (2) muestreo extendido, en el cual el proceso de selección fue extendido sobre una porción grande de una bodega, y (3) medición consecutiva de peces durante la descarga de la bodega. EI muestreo repetido con cada método y distintas combinaciones de n y m indico que, puesto que la fuente principal de variación de ta11a ocurría entre las unidades primarias, aumentar n fue la manera mas eficaz de reducir las estimaciones de la varianza de las ta11as de los grupos de edad y el numero total de peces en los desembarcos. El muestreo extendido evito mayormente los efectos de la estratificación por ta11a, y su desempeño fue esencialmente comparable a aquel del muestreo aleatorio. Se recomienda muestrear con este método. El muestreo de peces consecutivos produjo estimaciones mas sesgadas con mayores varianzas. Un análisis de las muestras de frecuencia de ta11a de 1988 indico que, para los grupos de edad que aparecen con mayor frecuencia en la captura, una frecuencia de muestreo minima de una unidad primaria de cada seis para cada estrato de mes-área reduciría los coeficientes de variación (CV) de las estimaciones de ta11a correspondientes a aproximadamente 10% 0 menos. Una estratificación adicional de las muestras por tipo de lance, y no solamente mes-área, redujo aun mas los CV de los grupos de edad escasos, tales como los reclutas, y mejoró potencialmente su precisión. Los CV de las estimaciones del reclutamiento para las cohortes completamente pescadas durante 1981-1984 fueron alrededor de 3-8%. Las estimaciones del reclutamiento y sus varianzas fueron también relativamente insensibles a cambios en las capturas de trimestres individuales y las varianzas, respectivamente, de las cuales fueron derivadas. (PDF contains 70 pages)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Vibration methods are used to identify faults, such as spanning and loss of cover, in long off-shore pipelines. A pipeline `pig', propelled by fluid flow, generates transverse vibration in the pipeline and the measured vibration amplitude reflects the nature of the support condition. Large quantities of vibration data are collected and analyzed by Fourier and wavelet methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this article, we offer a new way of exploring relationships between three different dimensions of a business operation, namely the stage of business development, the methods of creativity and the major cultural values. Although separately, each of these has gained enormous attention from the management research community, evidenced by a large volume of research studies, there have been not many studies that attempt to describe the logic that connect these three important aspects of a business; let alone empirical evidences that support any significant relationships among these variables. The paper also provides a data set and an empirical investigation on that data set, using a categorical data analysis, to conclude that examinations of these possible relationships are meaningful and possible for seemingly unquantifiable information. The results also show that the most significant category among all creativity methods employed in Vietnamese enterprises is the “creative disciplines” rule in the “entrepreneurial phase,” while in general creative disciplines have played a critical role in explaining the structure of our data sample, for both stages of development in our consideration.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: The inherent complexity of statistical methods and clinical phenomena compel researchers with diverse domains of expertise to work in interdisciplinary teams, where none of them have a complete knowledge in their counterpart's field. As a result, knowledge exchange may often be characterized by miscommunication leading to misinterpretation, ultimately resulting in errors in research and even clinical practice. Though communication has a central role in interdisciplinary collaboration and since miscommunication can have a negative impact on research processes, to the best of our knowledge, no study has yet explored how data analysis specialists and clinical researchers communicate over time. METHODS/PRINCIPAL FINDINGS: We conducted qualitative analysis of encounters between clinical researchers and data analysis specialists (epidemiologist, clinical epidemiologist, and data mining specialist). These encounters were recorded and systematically analyzed using a grounded theory methodology for extraction of emerging themes, followed by data triangulation and analysis of negative cases for validation. A policy analysis was then performed using a system dynamics methodology looking for potential interventions to improve this process. Four major emerging themes were found. Definitions using lay language were frequently employed as a way to bridge the language gap between the specialties. Thought experiments presented a series of "what if" situations that helped clarify how the method or information from the other field would behave, if exposed to alternative situations, ultimately aiding in explaining their main objective. Metaphors and analogies were used to translate concepts across fields, from the unfamiliar to the familiar. Prolepsis was used to anticipate study outcomes, thus helping specialists understand the current context based on an understanding of their final goal. CONCLUSION/SIGNIFICANCE: The communication between clinical researchers and data analysis specialists presents multiple challenges that can lead to errors.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This research aims to use the multivariate geochemical dataset, generated by the Tellus project, to investigate the appropriate use of transformation methods to maintain the integrity of geochemical data and inherent constrained behaviour in multivariate relationships. The widely used normal score transform is compared with the use of a stepwise conditional transform technique. The Tellus Project, managed by GSNI and funded by the Department of Enterprise Trade and Development and the EU’s Building Sustainable Prosperity Fund, involves the most comprehensive geological mapping project ever undertaken in Northern Ireland. Previous study has demonstrated spatial variability in the Tellus data but geostatistical analysis and interpretation of the datasets requires use of an appropriate methodology that reproduces the inherently complex multivariate relations. Previous investigation of the Tellus geochemical data has included use of Gaussian-based techniques. However, earth science variables are rarely Gaussian, hence transformation of data is integral to the approach. The multivariate geochemical dataset generated by the Tellus project provides an opportunity to investigate the appropriate use of transformation methods, as required for Gaussian-based geostatistical analysis. In particular, the stepwise conditional transform is investigated and developed for the geochemical datasets obtained as part of the Tellus project. The transform is applied to four variables in a bivariate nested fashion due to the limited availability of data. Simulation of these transformed variables is then carried out, along with a corresponding back transformation to original units. Results show that the stepwise transform is successful in reproducing both univariate statistics and the complex bivariate relations exhibited by the data. Greater fidelity to multivariate relationships will improve uncertainty models, which are required for consequent geological, environmental and economic inferences.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Quantile normalization (QN) is a technique for microarray data processing and is the default normalization method in the Robust Multi-array Average (RMA) procedure, which was primarily designed for analysing gene expression data from Affymetrix arrays. Given the abundance of Affymetrix microarrays and the popularity of the RMA method, it is crucially important that the normalization procedure is applied appropriately. In this study we carried out simulation experiments and also analysed real microarray data to investigate the suitability of RMA when it is applied to dataset with different groups of biological samples. From our experiments, we showed that RMA with QN does not preserve the biological signal included in each group, but rather it would mix the signals between the groups. We also showed that the Median Polish method in the summarization step of RMA has similar mixing effect. RMA is one of the most widely used methods in microarray data processing and has been applied to a vast volume of data in biomedical research. The problematic behaviour of this method suggests that previous studies employing RMA could have been misadvised or adversely affected. Therefore we think it is crucially important that the research community recognizes the issue and starts to address it. The two core elements of the RMA method, quantile normalization and Median Polish, both have the undesirable effects of mixing biological signals between different sample groups, which can be detrimental to drawing valid biological conclusions and to any subsequent analyses. Based on the evidence presented here and that in the literature, we recommend exercising caution when using RMA as a method of processing microarray gene expression data, particularly in situations where there are likely to be unknown subgroups of samples.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Statistics are regularly used to make some form of comparison between trace evidence or deploy the exclusionary principle (Morgan and Bull, 2007) in forensic investigations. Trace evidence are routinely the results of particle size, chemical or modal analyses and as such constitute compositional data. The issue is that compositional data including percentages, parts per million etc. only carry relative information. This may be problematic where a comparison of percentages and other constraint/closed data is deemed a statistically valid and appropriate way to present trace evidence in a court of law. Notwithstanding an awareness of the existence of the constant sum problem since the seminal works of Pearson (1896) and Chayes (1960) and the introduction of the application of log-ratio techniques (Aitchison, 1986; Pawlowsky-Glahn and Egozcue, 2001; Pawlowsky-Glahn and Buccianti, 2011; Tolosana-Delgado and van den Boogaart, 2013) the problem that a constant sum destroys the potential independence of variances and covariances required for correlation regression analysis and empirical multivariate methods (principal component analysis, cluster analysis, discriminant analysis, canonical correlation) is all too often not acknowledged in the statistical treatment of trace evidence. Yet the need for a robust treatment of forensic trace evidence analyses is obvious. This research examines the issues and potential pitfalls for forensic investigators if the constant sum constraint is ignored in the analysis and presentation of forensic trace evidence. Forensic case studies involving particle size and mineral analyses as trace evidence are used to demonstrate the use of a compositional data approach using a centred log-ratio (clr) transformation and multivariate statistical analyses.