918 resultados para Real data sets
Resumo:
Recently within the machine learning and spatial statistics communities many papers have explored the potential of reduced rank representations of the covariance matrix, often referred to as projected or fixed rank approaches. In such methods the covariance function of the posterior process is represented by a reduced rank approximation which is chosen such that there is minimal information loss. In this paper a sequential framework for inference in such projected processes is presented, where the observations are considered one at a time. We introduce a C++ library for carrying out such projected, sequential estimation which adds several novel features. In particular we have incorporated the ability to use a generic observation operator, or sensor model, to permit data fusion. We can also cope with a range of observation error characteristics, including non-Gaussian observation errors. Inference for the variogram parameters is based on maximum likelihood estimation. We illustrate the projected sequential method in application to synthetic and real data sets. We discuss the software implementation and suggest possible future extensions.
A priori parameterisation of the CERES soil-crop models and tests against several European data sets
Resumo:
Mechanistic soil-crop models have become indispensable tools to investigate the effect of management practices on the productivity or environmental impacts of arable crops. Ideally these models may claim to be universally applicable because they simulate the major processes governing the fate of inputs such as fertiliser nitrogen or pesticides. However, because they deal with complex systems and uncertain phenomena, site-specific calibration is usually a prerequisite to ensure their predictions are realistic. This statement implies that some experimental knowledge on the system to be simulated should be available prior to any modelling attempt, and raises a tremendous limitation to practical applications of models. Because the demand for more general simulation results is high, modellers have nevertheless taken the bold step of extrapolating a model tested within a limited sample of real conditions to a much larger domain. While methodological questions are often disregarded in this extrapolation process, they are specifically addressed in this paper, and in particular the issue of models a priori parameterisation. We thus implemented and tested a standard procedure to parameterize the soil components of a modified version of the CERES models. The procedure converts routinely-available soil properties into functional characteristics by means of pedo-transfer functions. The resulting predictions of soil water and nitrogen dynamics, as well as crop biomass, nitrogen content and leaf area index were compared to observations from trials conducted in five locations across Europe (southern Italy, northern Spain, northern France and northern Germany). In three cases, the model’s performance was judged acceptable when compared to experimental errors on the measurements, based on a test of the model’s root mean squared error (RMSE). Significant deviations between observations and model outputs were however noted in all sites, and could be ascribed to various model routines. In decreasing importance, these were: water balance, the turnover of soil organic matter, and crop N uptake. A better match to field observations could therefore be achieved by visually adjusting related parameters, such as field-capacity water content or the size of soil microbial biomass. As a result, model predictions fell within the measurement errors in all sites for most variables, and the model’s RMSE was within the range of published values for similar tests. We conclude that the proposed a priori method yields acceptable simulations with only a 50% probability, a figure which may be greatly increased through a posteriori calibration. Modellers should thus exercise caution when extrapolating their models to a large sample of pedo-climatic conditions for which they have only limited information.
Resumo:
We present a participant study that compares biological data exploration tasks using volume renderings of laser confocal microscopy data across three environments that vary in level of immersion: a desktop, fishtank, and cave system. For the tasks, data, and visualization approach used in our study, we found that subjects qualitatively preferred and quantitatively performed better in the cave compared with the fishtank and desktop. Subjects performed real-world biological data analysis tasks that emphasized understanding spatial relationships including characterizing the general features in a volume, identifying colocated features, and reporting geometric relationships such as whether clusters of cells were coplanar. After analyzing data in each environment, subjects were asked to choose which environment they wanted to analyze additional data sets in - subjects uniformly selected the cave environment.
Resumo:
Funding The International Primary Care Respiratory Group (IPCRG) provided funding for this research project as an UNLOCK group study for which the funding was obtained through an unrestricted grant by Novartis AG, Basel, Switzerland. The latter funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. Database access for the OPCRD was provided by the Respiratory Effectiveness Group (REG) and Research in Real Life; the OPCRD statistical analysis was funded by REG. The Bocholtz Study was funded by PICASSO for COPD, an initiative of Boehringer Ingelheim, Pfizer and the Caphri Research Institute, Maastricht University, The Netherlands.
Resumo:
During the SINOPS project, an optimal state of the art simulation of the marine silicon cycle is attempted employing a biogeochemical ocean general circulation model (BOGCM) through three particular time steps relevant for global (paleo-) climate. In order to tune the model optimally, results of the simulations are compared to a comprehensive data set of 'real' observations. SINOPS' scientific data management ensures that data structure becomes homogeneous throughout the project. Practical work routine comprises systematic progress from data acquisition, through preparation, processing, quality check and archiving, up to the presentation of data to the scientific community. Meta-information and analytical data are mapped by an n-dimensional catalogue in order to itemize the analytical value and to serve as an unambiguous identifier. In practice, data management is carried out by means of the online-accessible information system PANGAEA, which offers a tool set comprising a data warehouse, Graphical Information System (GIS), 2-D plot, cross-section plot, etc. and whose multidimensional data model promotes scientific data mining. Besides scientific and technical aspects, this alliance between scientific project team and data management crew serves to integrate the participants and allows them to gain mutual respect and appreciation.
Resumo:
For the first time, we introduce and study some mathematical properties of the Kumaraswamy Weibull distribution that is a quite flexible model in analyzing positive data. It contains as special sub-models the exponentiated Weibull, exponentiated Rayleigh, exponentiated exponential, Weibull and also the new Kumaraswamy exponential distribution. We provide explicit expressions for the moments and moment generating function. We examine the asymptotic distributions of the extreme values. Explicit expressions are derived for the mean deviations, Bonferroni and Lorenz curves, reliability and Renyi entropy. The moments of the order statistics are calculated. We also discuss the estimation of the parameters by maximum likelihood. We obtain the expected information matrix. We provide applications involving two real data sets on failure times. Finally, some multivariate generalizations of the Kumaraswamy Weibull distribution are discussed. (C) 2010 The Franklin Institute. Published by Elsevier Ltd. All rights reserved.
Resumo:
The rapidly increasing computing power, available storage and communication capabilities of mobile devices makes it possible to start processing and storing data locally, rather than offloading it to remote servers; allowing scenarios of mobile clouds without infrastructure dependency. We can now aim at connecting neighboring mobile devices, creating a local mobile cloud that provides storage and computing services on local generated data. In this paper, we describe an early overview of a distributed mobile system that allows accessing and processing of data distributed across mobile devices without an external communication infrastructure. Copyright © 2015 ICST.
Resumo:
The study of electricity markets operation has been gaining an increasing importance in the last years, as result of the new challenges that the restructuring process produced. Currently, lots of information concerning electricity markets is available, as market operators provide, after a period of confidentiality, data regarding market proposals and transactions. These data can be used as source of knowledge to define realistic scenarios, which are essential for understanding and forecast electricity markets behavior. The development of tools able to extract, transform, store and dynamically update data, is of great importance to go a step further into the comprehension of electricity markets and of the behaviour of the involved entities. In this paper an adaptable tool capable of downloading, parsing and storing data from market operators’ websites is presented, assuring constant updating and reliability of the stored data.
Resumo:
Currently, due to the widespread use of computers and the internet, students are trading libraries for the World Wide Web and laboratories with simulation programs. In most courses, simulators are made available to students and can be used to proof theoretical results or to test a developing hardware/product. Although this is an interesting solution: low cost, easy and fast way to perform some courses work, it has indeed major disadvantages. As everything is currently being done with/in a computer, the students are loosing the “feel” of the real values of the magnitudes. For instance in engineering studies, and mainly in the first years, students need to learn electronics, algorithmic, mathematics and physics. All of these areas can use numerical analysis software, simulation software or spreadsheets and in the majority of the cases data used is either simulated or random numbers, but real data could be used instead. For example, if a course uses numerical analysis software and needs a dataset, the students can learn to manipulate arrays. Also, when using the spreadsheets to build graphics, instead of using a random table, students could use a real dataset based, for instance, in the room temperature and its variation across the day. In this work we present a framework which uses a simple interface allowing it to be used by different courses where the computers are the teaching/learning process in order to give a more realistic feeling to students by using real data. A framework is proposed based on a set of low cost sensors for different physical magnitudes, e.g. temperature, light, wind speed, which are connected to a central server, that the students have access with an Ethernet protocol or are connected directly to the student computer/laptop. These sensors use the communication ports available such as: serial ports, parallel ports, Ethernet or Universal Serial Bus (USB). Since a central server is used, the students are encouraged to use sensor values results in their different courses and consequently in different types of software such as: numerical analysis tools, spreadsheets or simply inside any programming language when a dataset is needed. In order to do this, small pieces of hardware were developed containing at least one sensor using different types of computer communication. As long as the sensors are attached in a server connected to the internet, these tools can also be shared between different schools. This allows sensors that aren't available in a determined school to be used by getting the values from other places that are sharing them. Another remark is that students in the more advanced years and (theoretically) more know how, can use the courses that have some affinities with electronic development to build new sensor pieces and expand the framework further. The final solution provided is very interesting, low cost, simple to develop, allowing flexibility of resources by using the same materials in several courses bringing real world data into the students computer works.
Resumo:
Magdeburg, Univ., Fak. für Informatik, Diss., 2014
Resumo:
Magdeburg, Univ., Fak. für Informatik, Diss., 2014
Resumo:
Given the very large amount of data obtained everyday through population surveys, much of the new research again could use this information instead of collecting new samples. Unfortunately, relevant data are often disseminated into different files obtained through different sampling designs. Data fusion is a set of methods used to combine information from different sources into a single dataset. In this article, we are interested in a specific problem: the fusion of two data files, one of which being quite small. We propose a model-based procedure combining a logistic regression with an Expectation-Maximization algorithm. Results show that despite the lack of data, this procedure can perform better than standard matching procedures.
Resumo:
As stated in Aitchison (1986), a proper study of relative variation in a compositional data set should be based on logratios, and dealing with logratios excludes dealing with zeros. Nevertheless, it is clear that zero observations might be present in real data sets, either because the corresponding part is completelyabsent –essential zeros– or because it is below detection limit –rounded zeros. Because the second kind of zeros is usually understood as “a trace too small to measure”, it seems reasonable to replace them by a suitable small value, and this has been the traditional approach. As stated, e.g. by Tauber (1999) and byMartín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000), the principal problem in compositional data analysis is related to rounded zeros. One should be careful to use a replacement strategy that does not seriously distort the general structure of the data. In particular, the covariance structure of the involvedparts –and thus the metric properties– should be preserved, as otherwise further analysis on subpopulations could be misleading. Following this point of view, a non-parametric imputation method isintroduced in Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000). This method is analyzed in depth by Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2003) where it is shown that thetheoretical drawbacks of the additive zero replacement method proposed in Aitchison (1986) can be overcome using a new multiplicative approach on the non-zero parts of a composition. The new approachhas reasonable properties from a compositional point of view. In particular, it is “natural” in the sense thatit recovers the “true” composition if replacement values are identical to the missing values, and it is coherent with the basic operations on the simplex. This coherence implies that the covariance structure of subcompositions with no zeros is preserved. As a generalization of the multiplicative replacement, in thesame paper a substitution method for missing values on compositional data sets is introduced
Resumo:
We consider the application of normal theory methods to the estimation and testing of a general type of multivariate regressionmodels with errors--in--variables, in the case where various data setsare merged into a single analysis and the observable variables deviatepossibly from normality. The various samples to be merged can differ on the set of observable variables available. We show that there is a convenient way to parameterize the model so that, despite the possiblenon--normality of the data, normal--theory methods yield correct inferencesfor the parameters of interest and for the goodness--of--fit test. Thetheory described encompasses both the functional and structural modelcases, and can be implemented using standard software for structuralequations models, such as LISREL, EQS, LISCOMP, among others. An illustration with Monte Carlo data is presented.