906 resultados para Exploratory statistical data analysis
Resumo:
In survival analysis frailty is often used to model heterogeneity between individuals or correlation within clusters. Typically frailty is taken to be a continuous random effect, yielding a continuous mixture distribution for survival times. A Bayesian analysis of a correlated frailty model is discussed in the context of inverse Gaussian frailty. An MCMC approach is adopted and the deviance information criterion is used to compare models. As an illustration of the approach a bivariate data set of corneal graft survival times is analysed. (C) 2006 Elsevier B.V. All rights reserved.
Resumo:
Bayesian statistics allow scientists to easily incorporate prior knowledge into their data analysis. Nonetheless, the sheer amount of computational power that is required for Bayesian statistical analyses has previously limited their use in genetics. These computational constraints have now largely been overcome and the underlying advantages of Bayesian approaches are putting them at the forefront of genetic data analysis in an increasing number of areas.
Resumo:
A wireless sensor network (WSN) is a group of sensors linked by wireless medium to perform distributed sensing tasks. WSNs have attracted a wide interest from academia and industry alike due to their diversity of applications, including home automation, smart environment, and emergency services, in various buildings. The primary goal of a WSN is to collect data sensed by sensors. These data are characteristic of being heavily noisy, exhibiting temporal and spatial correlation. In order to extract useful information from such data, as this paper will demonstrate, people need to utilise various techniques to analyse the data. Data mining is a process in which a wide spectrum of data analysis methods is used. It is applied in the paper to analyse data collected from WSNs monitoring an indoor environment in a building. A case study is given to demonstrate how data mining can be used to optimise the use of the office space in a building.
Resumo:
Event-related functional magnetic resonance imaging (efMRI) has emerged as a powerful technique for detecting brains' responses to presented stimuli. A primary goal in efMRI data analysis is to estimate the Hemodynamic Response Function (HRF) and to locate activated regions in human brains when specific tasks are performed. This paper develops new methodologies that are important improvements not only to parametric but also to nonparametric estimation and hypothesis testing of the HRF. First, an effective and computationally fast scheme for estimating the error covariance matrix for efMRI is proposed. Second, methodologies for estimation and hypothesis testing of the HRF are developed. Simulations support the effectiveness of our proposed methods. When applied to an efMRI dataset from an emotional control study, our method reveals more meaningful findings than the popular methods offered by AFNI and FSL. (C) 2008 Elsevier B.V. All rights reserved.
Resumo:
The ability to display and inspect powder diffraction data quickly and efficiently is a central part of the data analysis process. Whilst many computer programs are capable of displaying powder data, their focus is typically on advanced operations such as structure solution or Rietveld refinement. This article describes a lightweight software package, Jpowder, whose focus is fast and convenient visualization and comparison of powder data sets in a variety of formats from computers with network access. Jpowder is written in Java and uses its associated Web Start technology to allow ‘single-click deployment’ from a web page, http://www.jpowder.org. Jpowder is open source, free and available for use by anyone.
Resumo:
There is under-representation of senior female managers within small construction firms in the United Kingdom. The position is denying the sector a valuable pool of labour to address acute knowledge and skill shortages. Grounded theory on the career progression of senior female managers in these firms is developed from biographical interviews. First, a turning point model which distinguishes the interplay between human agency and work/home structure is given. Second, four career development phases are identified. The career journeys are characterized by ad hoc decisions and opportunities which were not influenced by external policies aimed at improving the representation of women in construction. Third, the 'hidden', but potentially significant, contribution of women-owned small construction firms is noted. The key challenge for policy and practice is to balance these external approaches with recognition of the 'inside out' reality of the 'lived experiences' of female managers. To progress this agenda there is a need for: appropriate longitudinal statistical data to quantify the scale of senior female managers and owners of small construction firms over time; and, social construction and gendered organizational analysis research to develop a general discourse on gender difference with these firms.
Resumo:
Peak picking is an early key step in MS data analysis. We compare three commonly used approaches to peak picking and discuss their merits by means of statistical analysis. Methods investigated encompass signal-to-noise ratio, continuous wavelet transform, and a correlation-based approach using a Gaussian template. Functionality of the three methods is illustrated and discussed in a practical context using a mass spectral data set created with MALDI-TOF technology. Sensitivity and specificity are investigated using a manually defined reference set of peaks. As an additional criterion, the robustness of the three methods is assessed by a perturbation analysis and illustrated using ROC curves.
Resumo:
The organization of non-crystalline polymeric materials at a local level, namely on a spatial scale between a few and 100 a, is still unclear in many respects. The determination of the local structure in terms of the configuration and conformation of the polymer chain and of the packing characteristics of the chain in the bulk material represents a challenging problem. Data from wide-angle diffraction experiments are very difficult to interpret due to the very large amount of information that they carry, that is the large number of correlations present in the diffraction patterns.We describe new approaches that permit a detailed analysis of the complex neutron diffraction patterns characterizing polymer melts and glasses. The coupling of different computer modelling strategies with neutron scattering data over a wide Q range allows the extraction of detailed quantitative information on the structural arrangements of the materials of interest. Proceeding from modelling routes as diverse as force field calculations, single-chain modelling and reverse Monte Carlo, we show the successes and pitfalls of each approach in describing model systems, which illustrate the need to attack the data analysis problem simultaneously from several fronts.
Resumo:
Adaptive methods which “equidistribute” a given positive weight function are now used fairly widely for selecting discrete meshes. The disadvantage of such schemes is that the resulting mesh may not be smoothly varying. In this paper a technique is developed for equidistributing a function subject to constraints on the ratios of adjacent steps in the mesh. Given a weight function $f \geqq 0$ on an interval $[a,b]$ and constants $c$ and $K$, the method produces a mesh with points $x_0 = a,x_{j + 1} = x_j + h_j ,j = 0,1, \cdots ,n - 1$ and $x_n = b$ such that\[ \int_{xj}^{x_{j + 1} } {f \leqq c\quad {\text{and}}\quad \frac{1} {K}} \leqq \frac{{h_{j + 1} }} {{h_j }} \leqq K\quad {\text{for}}\, j = 0,1, \cdots ,n - 1 . \] A theoretical analysis of the procedure is presented, and numerical algorithms for implementing the method are given. Examples show that the procedure is effective in practice. Other types of constraints on equidistributing meshes are also discussed. The principal application of the procedure is to the solution of boundary value problems, where the weight function is generally some error indicator, and accuracy and convergence properties may depend on the smoothness of the mesh. Other practical applications include the regrading of statistical data.
Resumo:
Background: Microarray based comparative genomic hybridisation (CGH) experiments have been used to study numerous biological problems including understanding genome plasticity in pathogenic bacteria. Typically such experiments produce large data sets that are difficult for biologists to handle. Although there are some programmes available for interpretation of bacterial transcriptomics data and CGH microarray data for looking at genetic stability in oncogenes, there are none specifically to understand the mosaic nature of bacterial genomes. Consequently a bottle neck still persists in accurate processing and mathematical analysis of these data. To address this shortfall we have produced a simple and robust CGH microarray data analysis process that may be automated in the future to understand bacterial genomic diversity. Results: The process involves five steps: cleaning, normalisation, estimating gene presence and absence or divergence, validation, and analysis of data from test against three reference strains simultaneously. Each stage of the process is described and we have compared a number of methods available for characterising bacterial genomic diversity, for calculating the cut-off between gene presence and absence or divergence, and shown that a simple dynamic approach using a kernel density estimator performed better than both established, as well as a more sophisticated mixture modelling technique. We have also shown that current methods commonly used for CGH microarray analysis in tumour and cancer cell lines are not appropriate for analysing our data. Conclusion: After carrying out the analysis and validation for three sequenced Escherichia coli strains, CGH microarray data from 19 E. coli O157 pathogenic test strains were used to demonstrate the benefits of applying this simple and robust process to CGH microarray studies using bacterial genomes.
Resumo:
Nitrogen flows from European watersheds to coastal marine waters Executive summary Nature of the problem • Most regional watersheds in Europe constitute managed human territories importing large amounts of new reactive nitrogen. • As a consequence, groundwater, surface freshwater and coastal seawater are undergoing severe nitrogen contamination and/or eutrophication problems. Approaches • A comprehensive evaluation of net anthropogenic inputs of reactive nitrogen (NANI) through atmospheric deposition, crop N fixation,fertiliser use and import of food and feed has been carried out for all European watersheds. A database on N, P and Si fluxes delivered at the basin outlets has been assembled. • A number of modelling approaches based on either statistical regression analysis or mechanistic description of the processes involved in nitrogen transfer and transformations have been developed for relating N inputs to watersheds to outputs into coastal marine ecosystems. Key findings/state of knowledge • Throughout Europe, NANI represents 3700 kgN/km2/yr (range, 0–8400 depending on the watershed), i.e. five times the background rate of natural N2 fixation. • A mean of approximately 78% of NANI does not reach the basin outlet, but instead is stored (in soils, sediments or ground water) or eliminated to the atmosphere as reactive N forms or as N2. • N delivery to the European marine coastal zone totals 810 kgN/km2/yr (range, 200–4000 depending on the watershed), about four times the natural background. In areas of limited availability of silica, these inputs cause harmful algal blooms. Major uncertainties/challenges • The exact dimension of anthropogenic N inputs to watersheds is still imperfectly known and requires pursuing monitoring programmes and data integration at the international level. • The exact nature of ‘retention’ processes, which potentially represent a major management lever for reducing N contamination of water resources, is still poorly understood. • Coastal marine eutrophication depends to a large degree on local morphological and hydrographic conditions as well as on estuarine processes, which are also imperfectly known. Recommendations • Better control and management of the nitrogen cascade at the watershed scale is required to reduce N contamination of ground- and surface water, as well as coastal eutrophication. • In spite of the potential of these management measures, there is no choice at the European scale but to reduce the primary inputs of reactive nitrogen to watersheds, through changes in agriculture, human diet and other N flows related to human activity.
Resumo:
In the recent years, the area of data mining has been experiencing considerable demand for technologies that extract knowledge from large and complex data sources. There has been substantial commercial interest as well as active research in the area that aim to develop new and improved approaches for extracting information, relationships, and patterns from large datasets. Artificial neural networks (NNs) are popular biologically-inspired intelligent methodologies, whose classification, prediction, and pattern recognition capabilities have been utilized successfully in many areas, including science, engineering, medicine, business, banking, telecommunication, and many other fields. This paper highlights from a data mining perspective the implementation of NN, using supervised and unsupervised learning, for pattern recognition, classification, prediction, and cluster analysis, and focuses the discussion on their usage in bioinformatics and financial data analysis tasks. © 2012 Wiley Periodicals, Inc.
Resumo:
Advances in hardware and software technology enable us to collect, store and distribute large quantities of data on a very large scale. Automatically discovering and extracting hidden knowledge in the form of patterns from these large data volumes is known as data mining. Data mining technology is not only a part of business intelligence, but is also used in many other application areas such as research, marketing and financial analytics. For example medical scientists can use patterns extracted from historic patient data in order to determine if a new patient is likely to respond positively to a particular treatment or not; marketing analysts can use extracted patterns from customer data for future advertisement campaigns; finance experts have an interest in patterns that forecast the development of certain stock market shares for investment recommendations. However, extracting knowledge in the form of patterns from massive data volumes imposes a number of computational challenges in terms of processing time, memory, bandwidth and power consumption. These challenges have led to the development of parallel and distributed data analysis approaches and the utilisation of Grid and Cloud computing. This chapter gives an overview of parallel and distributed computing approaches and how they can be used to scale up data mining to large datasets.
Resumo:
The purpose of this lecture is to review recent development in data analysis, initialization and data assimilation. The development of 3-dimensional multivariate schemes has been very timely because of its suitability to handle the many different types of observations during FGGE. Great progress has taken place in the initialization of global models by the aid of non-linear normal mode technique. However, in spite of great progress, several fundamental problems are still unsatisfactorily solved. Of particular importance is the question of the initialization of the divergent wind fields in the Tropics and to find proper ways to initialize weather systems driven by non-adiabatic processes. The unsatisfactory ways in which such processes are being initialized are leading to excessively long spin-up times.
Resumo:
This chapter introduces the latest practices and technologies in the interactive interpretation of environmental data. With environmental data becoming ever larger, more diverse and more complex, there is a need for a new generation of tools that provides new capabilities over and above those of the standard workhorses of science. These new tools aid the scientist in discovering interesting new features (and also problems) in large datasets by allowing the data to be explored interactively using simple, intuitive graphical tools. In this way, new discoveries are made that are commonly missed by automated batch data processing. This chapter discusses the characteristics of environmental science data, common current practice in data analysis and the supporting tools and infrastructure. New approaches are introduced and illustrated from the points of view of both the end user and the underlying technology. We conclude by speculating as to future developments in the field and what must be achieved to fulfil this vision.