901 resultados para Data dissemination and sharing
Resumo:
there has been much research on analyzing various forms of competing risks data. Nevertheless, there are several occasions in survival studies, where the existing models and methodologies are inadequate for the analysis competing risks data. ldentifiabilty problem and various types of and censoring induce more complications in the analysis of competing risks data than in classical survival analysis. Parametric models are not adequate for the analysis of competing risks data since the assumptions about the underlying lifetime distributions may not hold well. Motivated by this, in the present study. we develop some new inference procedures, which are completely distribution free for the analysis of competing risks data.
Resumo:
In this paper, we discuss Conceptual Knowledge Discovery in Databases (CKDD) in its connection with Data Analysis. Our approach is based on Formal Concept Analysis, a mathematical theory which has been developed and proven useful during the last 20 years. Formal Concept Analysis has led to a theory of conceptual information systems which has been applied by using the management system TOSCANA in a wide range of domains. In this paper, we use such an application in database marketing to demonstrate how methods and procedures of CKDD can be applied in Data Analysis. In particular, we show the interplay and integration of data mining and data analysis techniques based on Formal Concept Analysis. The main concern of this paper is to explain how the transition from data to knowledge can be supported by a TOSCANA system. To clarify the transition steps we discuss their correspondence to the five levels of knowledge representation established by R. Brachman and to the steps of empirically grounded theory building proposed by A. Strauss and J. Corbin.
Resumo:
Developments in the statistical analysis of compositional data over the last two decades have made possible a much deeper exploration of the nature of variability, and the possible processes associated with compositional data sets from many disciplines. In this paper we concentrate on geochemical data sets. First we explain how hypotheses of compositional variability may be formulated within the natural sample space, the unit simplex, including useful hypotheses of subcompositional discrimination and specific perturbational change. Then we develop through standard methodology, such as generalised likelihood ratio tests, statistical tools to allow the systematic investigation of a complete lattice of such hypotheses. Some of these tests are simple adaptations of existing multivariate tests but others require special construction. We comment on the use of graphical methods in compositional data analysis and on the ordination of specimens. The recent development of the concept of compositional processes is then explained together with the necessary tools for a staying- in-the-simplex approach, namely compositional singular value decompositions. All these statistical techniques are illustrated for a substantial compositional data set, consisting of 209 major-oxide and rare-element compositions of metamorphosed limestones from the Northeast and Central Highlands of Scotland. Finally we point out a number of unresolved problems in the statistical analysis of compositional processes
Resumo:
In a seminal paper, Aitchison and Lauder (1985) introduced classical kernel density estimation techniques in the context of compositional data analysis. Indeed, they gave two options for the choice of the kernel to be used in the kernel estimator. One of these kernels is based on the use the alr transformation on the simplex SD jointly with the normal distribution on RD-1. However, these authors themselves recognized that this method has some deficiencies. A method for overcoming these dificulties based on recent developments for compositional data analysis and multivariate kernel estimation theory, combining the ilr transformation with the use of the normal density with a full bandwidth matrix, was recently proposed in Martín-Fernández, Chacón and Mateu- Figueras (2006). Here we present an extensive simulation study that compares both methods in practice, thus exploring the finite-sample behaviour of both estimators
Resumo:
This is a research discussion about the Hampshire Hub - see http://protohub.net/. The aim is to find out more about the project, and discuss future collaboration and sharing of ideas. Mark Braggins (Hampshire Hub Partnership) will introduce the Hampshire Hub programme, setting out its main objectives, work done to-date, next steps including the Hampshire data store (which will use the PublishMyData linked data platform), and opportunities for University of Southampton to engage with the programme , including the forthcoming Hampshire Hackathons Bill Roberts (Swirrl) will give an overview of the PublishMyData platform, and how it will help deliver the objectives of the Hampshire Hub. He will detail some of the new functionality being added to the platform Steve Peters (DCLG Open Data Communities) will focus on developing a web of data that blends and combines local and national data sources around localities, and common topics/themes. This will include observations on the potential employing emerging new, big data sources to help deliver more effective, better targeted public services. Steve will illustrate this with practical examples of DCLG’s work to publish its own data in a SPARQL end-point, so that it can be used over the web alongside related 3rd party sources. He will share examples of some of the practical challenges, particularly around querying and re-using geographic LinkedData in a federated world of SPARQL end-point.
Resumo:
Abstract This seminar is a research discussion around a very interesting problem, which may be a good basis for a WAISfest theme. A little over a year ago Professor Alan Dix came to tell us of his plans for a magnificent adventure:to walk all of the way round Wales - 1000 miles 'Alan Walks Wales'. The walk was a personal journey, but also a technological and community one, exploring the needs of the walker and the people along the way. Whilst walking he recorded his thoughts in an audio diary, took lots of photos, wrote a blog and collected data from the tech instruments he was wearing. As a result Alan has extensive quantitative data (bio-sensing and location) and qualitative data (text, images and some audio). There are challenges in analysing individual kinds of data, including merging similar data streams, entity identification, time-series and textual data mining, dealing with provenance, ontologies for paths, and journeys. There are also challenges for author and third-party annotation, linking the data-sets and visualising the merged narrative or facets of it.
Resumo:
In this session we'll explore how Microsoft uses data science and machine learning across it's entire business, from Windows and Office, to Skype and XBox. We'll look at how companies across the world use Microsoft technology for empowering their businesses in many different industries. And we'll look at data science technologies you can use yourselves, such as Azure Machine Learning and Power BI. Finally we'll discuss job opportunities for data scientists and tips on how you can be successful!
Resumo:
The amateur birding community has a long and proud tradition of contributing to bird surveys and bird atlases. Coordinated activities such as Breeding Bird Atlases and the Christmas Bird Count are examples of "citizen science" projects. With the advent of technology, Web 2.0 sites such as eBird have been developed to facilitate online sharing of data and thus increase the potential for real-time monitoring. However, as recently articulated in an editorial in this journal and elsewhere, monitoring is best served when based on a priori hypotheses. Harnessing citizen scientists to collect data following a hypothetico-deductive approach carries challenges. Moreover, the use of citizen science in scientific and monitoring studies has raised issues of data accuracy and quality. These issues are compounded when data collection moves into the Web 2.0 world. An examination of the literature from social geography on the concept of "citizen sensors" and volunteered geographic information (VGI) yields thoughtful reflections on the challenges of data quality/data accuracy when applying information from citizen sensors to research and management questions. VGI has been harnessed in a number of contexts, including for environmental and ecological monitoring activities. Here, I argue that conceptualizing a monitoring project as an experiment following the scientific method can further contribute to the use of VGI. I show how principles of experimental design can be applied to monitoring projects to better control for data quality of VGI. This includes suggestions for how citizen sensors can be harnessed to address issues of experimental controls and how to design monitoring projects to increase randomization and replication of sampled data, hence increasing scientific reliability and statistical power.
Resumo:
Site-specific management requires accurate knowledge of the spatial variation in a range of soil properties within fields. This involves considerable sampling effort, which is costly. Ancillary data, such as crop yield, elevation and apparent electrical conductivity (ECa) of the soil, can provide insight into the spatial variation of some soil properties. A multivariate classification with spatial constraint imposed by the variogram was used to classify data from two arable crop fields. The yield data comprised 5 years of crop yield, and the ancillary data 3 years of yield data, elevation and ECa. Information on soil chemical and physical properties was provided by intensive surveys of the soil. Multivariate variograms computed from these data were used to constrain sites spatially within classes to increase their contiguity. The constrained classifications resulted in coherent classes, and those based on the ancillary data were similar to those from the soil properties. The ancillary data seemed to identify areas in the field where the soil is reasonably homogeneous. The results of targeted sampling showed that these classes could be used as a basis for management and to guide future sampling of the soil.
Resumo:
A regional overview of the water quality and ecology of the River Lee catchment is presented. Specifically, data describing the chemical, microbiological and macrobiological water quality and fisheries communities have been analysed, based on a division into river, sewage treatment works, fish-farm, lake and industrial samples. Nutrient enrichment and the highest concentrations of metals and micro-organics were found in the urbanised, lower reaches of the Lee and in the Lee Navigation. Average annual concentrations of metals were generally within environmental quality standards although, oil many occasions, concentrations of cadmium, copper, lead, mercury and zinc were in excess of the standards. Various organic substances (used as herbicides, fungicides, insecticides, chlorination by-products and industrial solvents) were widely detected in the Lee system. Concentrations of ten micro-organic substances were observed in excess of their environmental quality standards, though not in terms of annual averages. Sewage treatment works were the principal point source input of nutrients. metals and micro-organic determinands to the catchment. Diffuse nitrogen sources contributed approximately 60% and 27% of the in-stream load in the upper and lower Lee respectively, whereas approximately 60% and 20% of the in-stream phosphorus load was derived from diffuse sources in the upper and lower Lee. For metals, the most significant source was the urban runoff from North London. In reaches less affected by effluent discharges, diffuse runoff from urban and agricultural areas dominated trends. Flig-h microbiological content, observed in the River Lee particularly in urbanised reaches, was far in excess of the EC Bathing Water Directive standards. Water quality issues and degraded habitat in the lower reaches of the Lee have led to impoverished aquatic fauna but, within the mid-catchment reaches and upper agricultural tributaries, less nutrient enrichment and channel alteration has permitted more diverse aquatic fauna.
Resumo:
The ability to display and inspect powder diffraction data quickly and efficiently is a central part of the data analysis process. Whilst many computer programs are capable of displaying powder data, their focus is typically on advanced operations such as structure solution or Rietveld refinement. This article describes a lightweight software package, Jpowder, whose focus is fast and convenient visualization and comparison of powder data sets in a variety of formats from computers with network access. Jpowder is written in Java and uses its associated Web Start technology to allow ‘single-click deployment’ from a web page, http://www.jpowder.org. Jpowder is open source, free and available for use by anyone.
Resumo:
Light Detection And Ranging (LIDAR) is an important modality in terrain and land surveying for many environmental, engineering and civil applications. This paper presents the framework for a recently developed unsupervised classification algorithm called Skewness Balancing for object and ground point separation in airborne LIDAR data. The main advantages of the algorithm are threshold-freedom and independence from LIDAR data format and resolution, while preserving object and terrain details. The framework for Skewness Balancing has been built in this contribution with a prediction model in which unknown LIDAR tiles can be categorised as “hilly” or “moderate” terrains. Accuracy assessment of the model is carried out using cross-validation with an overall accuracy of 95%. An extension to the algorithm is developed to address the overclassification issue for hilly terrain. For moderate terrain, the results show that from the classified tiles detached objects (buildings and vegetation) and attached objects (bridges and motorway junctions) are separated from bare earth (ground, roads and yards) which makes Skewness Balancing ideal to be integrated into geographic information system (GIS) software packages.
Resumo:
In a world of almost permanent and rapidly increasing electronic data availability, techniques of filtering, compressing, and interpreting this data to transform it into valuable and easily comprehensible information is of utmost importance. One key topic in this area is the capability to deduce future system behavior from a given data input. This book brings together for the first time the complete theory of data-based neurofuzzy modelling and the linguistic attributes of fuzzy logic in a single cohesive mathematical framework. After introducing the basic theory of data-based modelling, new concepts including extended additive and multiplicative submodels are developed and their extensions to state estimation and data fusion are derived. All these algorithms are illustrated with benchmark and real-life examples to demonstrate their efficiency. Chris Harris and his group have carried out pioneering work which has tied together the fields of neural networks and linguistic rule-based algortihms. This book is aimed at researchers and scientists in time series modeling, empirical data modeling, knowledge discovery, data mining, and data fusion.
Resumo:
It is generally assumed that the variability of neuronal morphology has an important effect on both the connectivity and the activity of the nervous system, but this effect has not been thoroughly investigated. Neuroanatomical archives represent a crucial tool to explore structure–function relationships in the brain. We are developing computational tools to describe, generate, store and render large sets of three–dimensional neuronal structures in a format that is compact, quantitative, accurate and readily accessible to the neuroscientist. Single–cell neuroanatomy can be characterized quantitatively at several levels. In computer–aided neuronal tracing files, a dendritic tree is described as a series of cylinders, each represented by diameter, spatial coordinates and the connectivity to other cylinders in the tree. This ‘Cartesian’ description constitutes a completely accurate mapping of dendritic morphology but it bears little intuitive information for the neuroscientist. In contrast, a classical neuroanatomical analysis characterizes neuronal dendrites on the basis of the statistical distributions of morphological parameters, e.g. maximum branching order or bifurcation asymmetry. This description is intuitively more accessible, but it only yields information on the collective anatomy of a group of dendrites, i.e. it is not complete enough to provide a precise ‘blueprint’ of the original data. We are adopting a third, intermediate level of description, which consists of the algorithmic generation of neuronal structures within a certain morphological class based on a set of ‘fundamental’, measured parameters. This description is as intuitive as a classical neuroanatomical analysis (parameters have an intuitive interpretation), and as complete as a Cartesian file (the algorithms generate and display complete neurons). The advantages of the algorithmic description of neuronal structure are immense. If an algorithm can measure the values of a handful of parameters from an experimental database and generate virtual neurons whose anatomy is statistically indistinguishable from that of their real counterparts, a great deal of data compression and amplification can be achieved. Data compression results from the quantitative and complete description of thousands of neurons with a handful of statistical distributions of parameters. Data amplification is possible because, from a set of experimental neurons, many more virtual analogues can be generated. This approach could allow one, in principle, to create and store a neuroanatomical database containing data for an entire human brain in a personal computer. We are using two programs, L–NEURON and ARBORVITAE, to investigate systematically the potential of several different algorithms for the generation of virtual neurons. Using these programs, we have generated anatomically plausible virtual neurons for several morphological classes, including guinea pig cerebellar Purkinje cells and cat spinal cord motor neurons. These virtual neurons are stored in an online electronic archive of dendritic morphology. This process highlights the potential and the limitations of the ‘computational neuroanatomy’ strategy for neuroscience databases.
Resumo:
The benefits and applications of virtual reality (VR) in the construction industry have been investigated for almost a decade. However, the practical implementation of VR in the construction industry has yet to reach maturity owing to technical constraints. The need for effective information management presents challenges: both transfer of building data to, and organisation of building information within, the virtual environment require consideration. This paper reviews the applications and benefits of VR in the built environment field and reports on a collaboration between Loughborough University and South Bank University to overcome constraints on the use of the overall VR model for whole lifecycle visualisation. The work at each research centre is concerned with an aspect of information management within VR applications for the built environment, and both data transfer and internal data organisation have been investigated. In this paper, similarities and differences between computer-aided design (CAD) and VR packages are first discussed. Three different approaches to the creation of VR models during the design stage are identified and described, with a view to providing sharing understanding across the interdiscipliary groups involved. The suitable organisation of building information within the virtual environment is then further investigated. This work focused on the visualisation of the degradation of a building, through its lifespan, with the view to provide a visual aid for developing an effective and economic project maintenance programme. Finally consideration is given to the potential of emerging standards to facilitate an integrated use of VR. The convergence towards similar data structures in VR and other construction packages may enable visualisation to be better utilised in the overall lifecycle model.