939 resultados para Scientific Data


Relevância:

30.00% 30.00%

Publicador:

Resumo:

The report provides recommendations to policy makers in science and scholarly research regarding IPR policy to increase the impact of research and make the outcomes more available. The report argues that the impact of publicly-funded research outputs can be increased through a fairer balance between private and public interest in copyright legislation. This will allow for wider access to and easier re-use of published research reports. The common practice of authors being required to assign all rights to a publisher restricts the impact of research outputs and should be replaced by wider use of a non-exclusive licence. Full access and re-use rights to research data should be encouraged through use of a research-friendly licence.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The possibilities of digital research have altered the production, publication and use of research results. Academic research practice and culture are changing or have already been transformed, but to a large degree the system of academic recognition has not yet adapted to the practices and possibilities of digital research. This applies especially to research data, which are increasingly produced, managed, published and archived, but play hardly a role yet in practices of research assessment. The aim of the workshop was to bring experts and stakeholders from research institutions, universities, scholarly societies and funding agencies together in order to review, discuss and build on possibilities to implement the culture of sharing and to integrate publication of data into research assessment procedures. The report 'The Value of Research Data - Metrics for datasets from a cultural and technical point of view' was presented and discussed. Some of the key finding were that data sharing should be considered normal research practice, in fact not sharing should be considered malpractice. Research funders and universities should support and encourage data sharing. There are a number of important aspects to consider when making data count in research and evaluation procedures. Metrics are a necessary tool in monitoring the sharing of data sets. However, data metrics are at present not very well developed and there is not yet enough experience in what these metrics actually mean. It is important to implement the culture of sharing through codes of conducts in the scientific communities. For further key findings please read the report.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The brain is perhaps the most complex system to have ever been subjected to rigorous scientific investigation. The scale is staggering: over 10^11 neurons, each making an average of 10^3 synapses, with computation occurring on scales ranging from a single dendritic spine, to an entire cortical area. Slowly, we are beginning to acquire experimental tools that can gather the massive amounts of data needed to characterize this system. However, to understand and interpret these data will also require substantial strides in inferential and statistical techniques. This dissertation attempts to meet this need, extending and applying the modern tools of latent variable modeling to problems in neural data analysis.

It is divided into two parts. The first begins with an exposition of the general techniques of latent variable modeling. A new, extremely general, optimization algorithm is proposed - called Relaxation Expectation Maximization (REM) - that may be used to learn the optimal parameter values of arbitrary latent variable models. This algorithm appears to alleviate the common problem of convergence to local, sub-optimal, likelihood maxima. REM leads to a natural framework for model size selection; in combination with standard model selection techniques the quality of fits may be further improved, while the appropriate model size is automatically and efficiently determined. Next, a new latent variable model, the mixture of sparse hidden Markov models, is introduced, and approximate inference and learning algorithms are derived for it. This model is applied in the second part of the thesis.

The second part brings the technology of part I to bear on two important problems in experimental neuroscience. The first is known as spike sorting; this is the problem of separating the spikes from different neurons embedded within an extracellular recording. The dissertation offers the first thorough statistical analysis of this problem, which then yields the first powerful probabilistic solution. The second problem addressed is that of characterizing the distribution of spike trains recorded from the same neuron under identical experimental conditions. A latent variable model is proposed. Inference and learning in this model leads to new principled algorithms for smoothing and clustering of spike data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We develop and test a method to estimate relative abundance from catch and effort data using neural networks. Most stock assessment models use time series of relative abundance as their major source of information on abundance levels. These time series of relative abundance are frequently derived from catch-per-unit-of-effort (CPUE) data, using general linearized models (GLMs). GLMs are used to attempt to remove variation in CPUE that is not related to the abundance of the population. However, GLMs are restricted in the types of relationships between the CPUE and the explanatory variables. An alternative approach is to use structural models based on scientific understanding to develop complex non-linear relationships between CPUE and the explanatory variables. Unfortunately, the scientific understanding required to develop these models may not be available. In contrast to structural models, neural networks uses the data to estimate the structure of the non-linear relationship between CPUE and the explanatory variables. Therefore neural networks may provide a better alternative when the structure of the relationship is uncertain. We use simulated data based on a habitat based-method to test the neural network approach and to compare it to the GLM approach. Cross validation and simulation tests show that the neural network performed better than nominal effort and the GLM approach. However, the improvement over GLMs is not substantial. We applied the neural network model to CPUE data for bigeye tuna (Thunnus obesus) in the Pacific Ocean.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Quantifying scientific uncertainty when setting total allowable catch limits for fish stocks is a major challenge, but it is a requirement in the United States since changes to national fisheries legislation. Multiple sources of error are readily identifiable, including estimation error, model specification error, forecast error, and errors associated with the definition and estimation of reference points. Our focus here, however, is to quantify the influence of estimation error and model specification error on assessment outcomes. These are fundamental sources of uncertainty in developing scientific advice concerning appropriate catch levels and although a study of these two factors may not be inclusive, it is feasible with available information. For data-rich stock assessments conducted on the U.S. west coast we report approximate coefficients of variation in terminal biomass estimates from assessments based on inversion of the assessment of the model’s Hessian matrix (i.e., the asymptotic standard error). To summarize variation “among” stock assessments, as a proxy for model specification error, we characterize variation among multiple historical assessments of the same stock. Results indicate that for 17 groundfish and coastal pelagic species, the mean coefficient of variation of terminal biomass is 18%. In contrast, the coefficient of variation ascribable to model specification error (i.e., pooled among-assessment variation) is 37%. We show that if a precautionary probability of overfishing equal to 0.40 is adopted by managers, and only model specification error is considered, a 9% reduction in the overfishing catch level is indicated.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the problem of one-class classification (OCC) one of the classes, the target class, has to be distinguished from all other possible objects, considered as nontargets. In many biomedical problems this situation arises, for example, in diagnosis, image based tumor recognition or analysis of electrocardiogram data. In this paper an approach to OCC based on a typicality test is experimentally compared with reference state-of-the-art OCC techniques-Gaussian, mixture of Gaussians, naive Parzen, Parzen, and support vector data description-using biomedical data sets. We evaluate the ability of the procedures using twelve experimental data sets with not necessarily continuous data. As there are few benchmark data sets for one-class classification, all data sets considered in the evaluation have multiple classes. Each class in turn is considered as the target class and the units in the other classes are considered as new units to be classified. The results of the comparison show the good performance of the typicality approach, which is available for high dimensional data; it is worth mentioning that it can be used for any kind of data (continuous, discrete, or nominal), whereas state-of-the-art approaches application is not straightforward when nominal variables are present.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Compared with construction data sources that are usually stored and analyzed in spreadsheets and single data tables, data sources with more complicated structures, such as text documents, site images, web pages, and project schedules have been less intensively studied due to additional challenges in data preparation, representation, and analysis. In this paper, our definition and vision for advanced data analysis addressing such challenges are presented, together with related research results from previous work, as well as our recent developments of data analysis on text-based, image-based, web-based, and network-based construction sources. It is shown in this paper that particular data preparation, representation, and analysis operations should be identified, and integrated with careful problem investigations and scientific validation measures in order to provide general frameworks in support of information search and knowledge discovery from such information-abundant data sources.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Combining numerical techniques with ideas from symbolic computation and with methods incorporating knowledge of science and mathematics leads to a new category of intelligent computational tools for scientists and engineers. These tools autonomously prepare simulation experiments from high-level specifications of physical models. For computationally intensive experiments, they automatically design special-purpose numerical engines optimized to perform the necessary computations. They actively monitor numerical and physical experiments. They interpret experimental data and formulate numerical results in qualitative terms. They enable their human users to control computational experiments in terms of high-level behavioral descriptions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

King, R. D. and Wise, P. H. and Clare, A. (2004) Confirmation of Data Mining Based Predictions of Protein Function. Bioinformatics 20(7), 1110-1118

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Fusion ARTMAP is a self-organizing neural network architecture for multi-channel, or multi-sensor, data fusion. Single-channel Fusion ARTMAP is functionally equivalent to Fuzzy ART during unsupervised learning and to Fuzzy ARTMAP during supervised learning. The network has a symmetric organization such that each channel can be dynamically configured to serve as either a data input or a teaching input to the system. An ART module forms a compressed recognition code within each channel. These codes, in turn, become inputs to a single ART system that organizes the global recognition code. When a predictive error occurs, a process called paraellel match tracking simultaneously raises vigilances in multiple ART modules until reset is triggered in one of them. Parallel match tracking hereby resets only that portion of the recognition code with the poorest match, or minimum predictive confidence. This internally controlled selective reset process is a type of credit assignment that creates a parsimoniously connected learned network. Fusion ARTMAP's multi-channel coding is illustrated by simulations of the Quadruped Mammal database.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

It is estimated that the quantity of digital data being transferred, processed or stored at any one time currently stands at 4.4 zettabytes (4.4 × 2 70 bytes) and this figure is expected to have grown by a factor of 10 to 44 zettabytes by 2020. Exploiting this data is, and will remain, a significant challenge. At present there is the capacity to store 33% of digital data in existence at any one time; by 2020 this capacity is expected to fall to 15%. These statistics suggest that, in the era of Big Data, the identification of important, exploitable data will need to be done in a timely manner. Systems for the monitoring and analysis of data, e.g. stock markets, smart grids and sensor networks, can be made up of massive numbers of individual components. These components can be geographically distributed yet may interact with one another via continuous data streams, which in turn may affect the state of the sender or receiver. This introduces a dynamic causality, which further complicates the overall system by introducing a temporal constraint that is difficult to accommodate. Practical approaches to realising the system described above have led to a multiplicity of analysis techniques, each of which concentrates on specific characteristics of the system being analysed and treats these characteristics as the dominant component affecting the results being sought. The multiplicity of analysis techniques introduces another layer of heterogeneity, that is heterogeneity of approach, partitioning the field to the extent that results from one domain are difficult to exploit in another. The question is asked can a generic solution for the monitoring and analysis of data that: accommodates temporal constraints; bridges the gap between expert knowledge and raw data; and enables data to be effectively interpreted and exploited in a transparent manner, be identified? The approach proposed in this dissertation acquires, analyses and processes data in a manner that is free of the constraints of any particular analysis technique, while at the same time facilitating these techniques where appropriate. Constraints are applied by defining a workflow based on the production, interpretation and consumption of data. This supports the application of different analysis techniques on the same raw data without the danger of incorporating hidden bias that may exist. To illustrate and to realise this approach a software platform has been created that allows for the transparent analysis of data, combining analysis techniques with a maintainable record of provenance so that independent third party analysis can be applied to verify any derived conclusions. In order to demonstrate these concepts, a complex real world example involving the near real-time capturing and analysis of neurophysiological data from a neonatal intensive care unit (NICU) was chosen. A system was engineered to gather raw data, analyse that data using different analysis techniques, uncover information, incorporate that information into the system and curate the evolution of the discovered knowledge. The application domain was chosen for three reasons: firstly because it is complex and no comprehensive solution exists; secondly, it requires tight interaction with domain experts, thus requiring the handling of subjective knowledge and inference; and thirdly, given the dearth of neurophysiologists, there is a real world need to provide a solution for this domain

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND: The ability to write clearly and effectively is of central importance to the scientific enterprise. Encouraged by the success of simulation environments in other biomedical sciences, we developed WriteSim TCExam, an open-source, Web-based, textual simulation environment for teaching effective writing techniques to novice researchers. We shortlisted and modified an existing open source application - TCExam to serve as a textual simulation environment. After testing usability internally in our team, we conducted formal field usability studies with novice researchers. These were followed by formal surveys with researchers fitting the role of administrators and users (novice researchers) RESULTS: The development process was guided by feedback from usability tests within our research team. Online surveys and formal studies, involving members of the Research on Research group and selected novice researchers, show that the application is user-friendly. Additionally it has been used to train 25 novice researchers in scientific writing to date and has generated encouraging results. CONCLUSION: WriteSim TCExam is the first Web-based, open-source textual simulation environment designed to complement traditional scientific writing instruction. While initial reviews by students and educators have been positive, a formal study is needed to measure its benefits in comparison to standard instructional methods.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND: Sharing of epidemiological and clinical data sets among researchers is poor at best, in detriment of science and community at large. The purpose of this paper is therefore to (1) describe a novel Web application designed to share information on study data sets focusing on epidemiological clinical research in a collaborative environment and (2) create a policy model placing this collaborative environment into the current scientific social context. METHODOLOGY: The Database of Databases application was developed based on feedback from epidemiologists and clinical researchers requiring a Web-based platform that would allow for sharing of information about epidemiological and clinical study data sets in a collaborative environment. This platform should ensure that researchers can modify the information. A Model-based predictions of number of publications and funding resulting from combinations of different policy implementation strategies (for metadata and data sharing) were generated using System Dynamics modeling. PRINCIPAL FINDINGS: The application allows researchers to easily upload information about clinical study data sets, which is searchable and modifiable by other users in a wiki environment. All modifications are filtered by the database principal investigator in order to maintain quality control. The application has been extensively tested and currently contains 130 clinical study data sets from the United States, Australia, China and Singapore. Model results indicated that any policy implementation would be better than the current strategy, that metadata sharing is better than data-sharing, and that combined policies achieve the best results in terms of publications. CONCLUSIONS: Based on our empirical observations and resulting model, the social network environment surrounding the application can assist epidemiologists and clinical researchers contribute and search for metadata in a collaborative environment, thus potentially facilitating collaboration efforts among research communities distributed around the globe.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Technology-supported citizen science has created huge volumes of data with increasing potential to facilitate scientific progress, however, verifying data quality is still a substantial hurdle due to the limitations of existing data quality mechanisms. In this study, we adopted a mixed methods approach to investigate community-based data validation practices and the characteristics of records of wildlife species observations that affected the outcomes of collaborative data quality management in an online community where people record what they see in the nature. The findings describe the processes that both relied upon and added to information provenance through information stewardship behaviors, which led to improved reliability and informativity. The likelihood of community-based validation interactions were predicted by several factors, including the types of organisms observed and whether the data were submitted from a mobile device. We conclude with implications for technology design, citizen science practices, and research.