940 resultados para COMBINING DATA
Resumo:
In this study, we applied the integration methodology developed in the companion paper by Aires (2014) by using real satellite observations over the Mississippi Basin. The methodology provides basin-scale estimates of the four water budget components (precipitation P, evapotranspiration E, water storage change Delta S, and runoff R) in a two-step process: the Simple Weighting (SW) integration and a Postprocessing Filtering (PF) that imposes the water budget closure. A comparison with in situ observations of P and E demonstrated that PF improved the estimation of both components. A Closure Correction Model (CCM) has been derived from the integrated product (SW+PF) that allows to correct each observation data set independently, unlike the SW+PF method which requires simultaneous estimates of the four components. The CCM allows to standardize the various data sets for each component and highly decrease the budget residual (P - E - Delta S - R). As a direct application, the CCM was combined with the water budget equation to reconstruct missing values in any component. Results of a Monte Carlo experiment with synthetic gaps demonstrated the good performances of the method, except for the runoff data that has a variability of the same order of magnitude as the budget residual. Similarly, we proposed a reconstruction of Delta S between 1990 and 2002 where no Gravity Recovery and Climate Experiment data are available. Unlike most of the studies dealing with the water budget closure at the basin scale, only satellite observations and in situ runoff measurements are used. Consequently, the integrated data sets are model independent and can be used for model calibration or validation.
Resumo:
In this position paper, we claim that the need for time consuming data preparation and result interpretation tasks in knowledge discovery, as well as for costly expert consultation and consensus building activities required for ontology building can be reduced through exploiting the interplay of data mining and ontology engineering. The aim is to obtain in a semi-automatic way new knowledge from distributed data sources that can be used for inference and reasoning, as well as to guide the extraction of further knowledge from these data sources. The proposed approach is based on the creation of a novel knowledge discovery method relying on the combination, through an iterative ?feedbackloop?, of (a) data mining techniques to make emerge implicit models from data and (b) pattern-based ontology engineering to capture these models in reusable, conceptual and inferable artefacts.
Resumo:
Software development methodologies are becoming increasingly abstract, progressing from low level assembly and implementation languages such as C and Ada, to component based approaches that can be used to assemble applications using technologies such as JavaBeans and the .NET framework. Meanwhile, model driven approaches emphasise the role of higher level models and notations, and embody a process of automatically deriving lower level representations and concrete software implementations. The relationship between data and software is also evolving. Modern data formats are becoming increasingly standardised, open and empowered in order to support a growing need to share data in both academia and industry. Many contemporary data formats, most notably those based on XML, are self-describing, able to specify valid data structure and content, and can also describe data manipulations and transformations. Furthermore, while applications of the past have made extensive use of data, the runtime behaviour of future applications may be driven by data, as demonstrated by the field of dynamic data driven application systems. The combination of empowered data formats and high level software development methodologies forms the basis of modern game development technologies, which drive software capabilities and runtime behaviour using empowered data formats describing game content. While low level libraries provide optimised runtime execution, content data is used to drive a wide variety of interactive and immersive experiences. This thesis describes the Fluid project, which combines component based software development and game development technologies in order to define novel component technologies for the description of data driven component based applications. The thesis makes explicit contributions to the fields of component based software development and visualisation of spatiotemporal scenes, and also describes potential implications for game development technologies. The thesis also proposes a number of developments in dynamic data driven application systems in order to further empower the role of data in this field.
Resumo:
Big Data presents many challenges related to volume, whether one is interested in studying past datasets or, even more problematically, attempting to work with live streams of data. The most obvious challenge, in a ‘noisy’ environment such as contemporary social media, is to collect the pertinent information; be that information for a specific study, tweets which can inform emergency services or other responders to an ongoing crisis, or give an advantage to those involved in prediction markets. Often, such a process is iterative, with keywords and hashtags changing with the passage of time, and both collection and analytic methodologies need to be continually adapted to respond to this changing information. While many of the data sets collected and analyzed are preformed, that is they are built around a particular keyword, hashtag, or set of authors, they still contain a large volume of information, much of which is unnecessary for the current purpose and/or potentially useful for future projects. Accordingly, this panel considers methods for separating and combining data to optimize big data research and report findings to stakeholders. The first paper considers possible coding mechanisms for incoming tweets during a crisis, taking a large stream of incoming tweets and selecting which of those need to be immediately placed in front of responders, for manual filtering and possible action. The paper suggests two solutions for this, content analysis and user profiling. In the former case, aspects of the tweet are assigned a score to assess its likely relationship to the topic at hand, and the urgency of the information, whilst the latter attempts to identify those users who are either serving as amplifiers of information or are known as an authoritative source. Through these techniques, the information contained in a large dataset could be filtered down to match the expected capacity of emergency responders, and knowledge as to the core keywords or hashtags relating to the current event is constantly refined for future data collection. The second paper is also concerned with identifying significant tweets, but in this case tweets relevant to particular prediction market; tennis betting. As increasing numbers of professional sports men and women create Twitter accounts to communicate with their fans, information is being shared regarding injuries, form and emotions which have the potential to impact on future results. As has already been demonstrated with leading US sports, such information is extremely valuable. Tennis, as with American Football (NFL) and Baseball (MLB) has paid subscription services which manually filter incoming news sources, including tweets, for information valuable to gamblers, gambling operators, and fantasy sports players. However, whilst such services are still niche operations, much of the value of information is lost by the time it reaches one of these services. The paper thus considers how information could be filtered from twitter user lists and hash tag or keyword monitoring, assessing the value of the source, information, and the prediction markets to which it may relate. The third paper examines methods for collecting Twitter data and following changes in an ongoing, dynamic social movement, such as the Occupy Wall Street movement. It involves the development of technical infrastructure to collect and make the tweets available for exploration and analysis. A strategy to respond to changes in the social movement is also required or the resulting tweets will only reflect the discussions and strategies the movement used at the time the keyword list is created — in a way, keyword creation is part strategy and part art. In this paper we describe strategies for the creation of a social media archive, specifically tweets related to the Occupy Wall Street movement, and methods for continuing to adapt data collection strategies as the movement’s presence in Twitter changes over time. We also discuss the opportunities and methods to extract data smaller slices of data from an archive of social media data to support a multitude of research projects in multiple fields of study. The common theme amongst these papers is that of constructing a data set, filtering it for a specific purpose, and then using the resulting information to aid in future data collection. The intention is that through the papers presented, and subsequent discussion, the panel will inform the wider research community not only on the objectives and limitations of data collection, live analytics, and filtering, but also on current and in-development methodologies that could be adopted by those working with such datasets, and how such approaches could be customized depending on the project stakeholders.
Resumo:
Meta-analyses estimate a statistical effect size for a test or an analysis by combining results from multiple studies without necessarily having access to each individual study's raw data. Multi-site meta-analysis is crucial for imaging genetics, as single sites rarely have a sample size large enough to pick up effects of single genetic variants associated with brain measures. However, if raw data can be shared, combining data in a "mega-analysis" is thought to improve power and precision in estimating global effects. As part of an ENIGMA-DTI investigation, we use fractional anisotropy (FA) maps from 5 studies (total N=2, 203 subjects, aged 9-85) to estimate heritability. We combine the studies through meta-and mega-analyses as well as a mixture of the two - combining some cohorts with mega-analysis and meta-analyzing the results with those of the remaining sites. A combination of mega-and meta-approaches may boost power compared to meta-analysis alone.
Resumo:
Data fusion can be defined as the process of combining data or information for estimating the state of an entity. Data fusion is a multidisciplinary field that has several benefits, such as enhancing the confidence, improving reliability, and reducing ambiguity of measurements for estimating the state of entities in engineering systems. It can also enhance completeness of fused data that may be required for estimating the state of engineering systems. Data fusion has been applied to different fields, such as robotics, automation, and intelligent systems. This paper reviews some examples of recent applications of data fusion in civil engineering and presents some of the potential benefits of using data fusion in civil engineering.
Resumo:
Tese de doutoramento, Biologia (Biologia Marinha e Aquacultura), Universidade de Lisboa, Faculdade de Ciências, 2015
Resumo:
We present a new composite of geomagnetic activity which is designed to be as homogeneous in its construction as possible. This is done by only combining data that, by virtue of the locations of the source observatories used, have similar responses to solar wind and IMF (interplanetary magnetic field) variations. This will enable us (in Part 2, Lockwood et al., 2013a) to use the new index to reconstruct the interplanetary magnetic field, B, back to 1846 with a full analysis of errors. Allowance is made for the effects of secular change in the geomagnetic field. The composite uses interdiurnal variation data from Helsinki for 1845–1890 (inclusive) and 1893–1896 and from Eskdalemuir from 1911 to the present. The gaps are filled using data from the Potsdam (1891–1892 and 1897–1907) and the nearby Seddin observatories (1908–1910) and intercalibration achieved using the Potsdam–Seddin sequence. The new index is termed IDV(1d) because it employs many of the principles of the IDV index derived by Svalgaard and Cliver (2010), inspired by the u index of Bartels (1932); however, we revert to using one-day (1d) means, as employed by Bartels, because the use of near-midnight values in IDV introduces contamination by the substorm current wedge auroral electrojet, giving noise and a dependence on solar wind speed that varies with latitude. The composite is compared with independent, early data from European-sector stations, Greenwich, St Petersburg, Parc St Maur, and Ekaterinburg, as well as the composite u index, compiled from 2–6 stations by Bartels, and the IDV index of Svalgaard and Cliver. Agreement is found to be extremely good in all cases, except two. Firstly, the Greenwich data are shown to have gradually degraded in quality until new instrumentation was installed in 1915. Secondly, we infer that the Bartels u index is increasingly unreliable before about 1886 and overestimates the solar cycle amplitude between 1872 and 1883 and this is amplified in the proxy data used before 1872. This is therefore also true of the IDV index which makes direct use of the u index values.
Resumo:
We investigated gas bubble emissions at the Don-Kuban paleo-fan in the northeastern Black Sea regarding their geological setting, quantities as well as spatial and temporal variabilities during three ship expeditions between 2007 and 2011. About 600 bubble-induced hydroacoustic anomalies in the water column (flares) originating from the seafloor above the gas hydrate stability zone (GHSZ) at ~700 m water depth were found. At about 890 m water depth a hydrocarbon seep area named "Kerch seep area" was newly discovered within the GHSZ. We propose locally domed sediments ('mounds') discovered during ultra-high resolution bathymetric mapping with an autonomous underwater vehicle (AUV) to result from gas hydrate accumulation at shallow depths. In situ measurements indicated spatially limited temperature elevations in the shallow sediment likely induced by upward fluid flow which may confine the local GHSZ to a few meters below the seafloor. As a result, gas bubbles are suspected to migrate into near-surface sediments and to escape the seafloor through small-scale faults. Hydroacoustic surveys revealed that several flares originated from a seafloor area of about 1 km**2 in size. The highest flare disappeared in about 350 m water depth, suggesting that the released methane remains in the water column. A methane flux estimate, combining data from visual quantifications during dives with a remotely operated vehicle (ROV) with results from ship-based hydroacoustic surveys and gas analysis revealed that between 2 and 87 x 10**6 mol CH4 yr-1 escaped into the water column above the Kerch seep area. Our results show that the finding of the Kerch seep area represents a so far underestimated type of hydrocarbon seep, which has to be considered in methane budget calculations.
Resumo:
We discuss aggregation of data from neuropsychological patients and the process of evaluating models using data from a series of patients. We argue that aggregation can be misleading but not aggregating can also result in information loss. The basis for combining data needs to be theoretically defined, and the particular method of aggregation depends on the theoretical question and characteristics of the data. We present examples, often drawn from our own research, to illustrate these points. We also argue that statistical models and formal methods of model selection are a useful way to test theoretical accounts using data from several patients in multiple-case studies or case series. Statistical models can often measure fit in a way that explicitly captures what a theory allows; the parameter values that result from model fitting often measure theoretically important dimensions and can lead to more constrained theories or new predictions; and model selection allows the strength of evidence for models to be quantified without forcing this into the artificial binary choice that characterizes hypothesis testing methods. Methods that aggregate and then formally model patient data, however, are not automatically preferred to other methods. Which method is preferred depends on the question to be addressed, characteristics of the data, and practical issues like availability of suitable patients, but case series, multiple-case studies, single-case studies, statistical models, and process models should be complementary methods when guided by theory development.
Resumo:
Background: Depression and alcohol misuse are among the most prevalent diagnoses in suicide fatalities. The risk posed by these disorders is exacerbated when they co-occur. Limited research has evaluated the effectiveness of common depression and alcohol treatments for the reduction of suicide vulnerability in individuals experiencing comorbidity. Methods: Participants with depressive symptoms and hazardous alcohol use were selected from two randomised controlled trials. They had received either a brief (1 session) intervention, or depression-focused cognitive behaviour therapy (CBT), alcohol-focused CBT, therapist-delivered integrated CBT, computer-delivered integrated CBT or person-centred therapy (PCT) over a 10-week period. Suicidal ideation, hopelessness, depression severity and alcohol consumption were assessed at baseline and 12-month follow-up. Results: Three hundred three participants were assessed at baseline and 12 months. Both suicidal ideation and hopelessness were associated with higher severity of depressive symptoms, but not with alcohol consumption. Suicidal ideation did not improve significantly at follow-up, with no differences between treatment conditions. Improvements in hopelessness differed between treatment conditions; hopelessness improved more in the CBT conditions compared to PCT and in single-focused CBT compared to integrated CBT. Limitations: Low retention rates may have impacted on the reliability of our findings. Combining data from two studies may have resulted in heterogeneity of samples between conditions. Conclusions: CBT appears to be associated with reductions in hopelessness in people with co-occurring depression and alcohol misuse, even when it is not the focus of treatment. Less consistent results were observed for suicidal ideation. Establishing specific procedures or therapeutic content for clinicians to monitor these outcomes may result in better management of individuals with higher vulnerability for suicide.
Resumo:
Earlier phylogenetic studies, including species belonging to the Neckeraceae, have indicated that this pleurocarpous moss family shares a strongly supported sister group relationship with the Lembophyllaceae, but the family delimitation of the former needs adjustment. To test the monophyly of the Neckeraceae, as well as to redefine the family circumscription and to pinpoint its phylogenetic position in a larger context, a phylogenetic study based on molecular data was carried out. Sequence data were compiled, combining data from all three genomes: nuclear ITS1 and 2, plastid trnS-rps4-trnT-trnL-trnF and rpl16, and mitochondrial nad5 intron. The Neckeraceae have sometimes been divided into the two families, Neckeraceae and Thamnobryaceae, a division rejected here. Both parsimony and Bayesian analyses of molecular data revealed that the family concept of the Neckeraceae needs several further adjustments, such as the exclusion of some individual species and smaller genera as well as the inclusion of the Leptodontaceae. Within the family three well-supported clades (A, B and C) can be distinguished. Members of clade A are mainly non-Asiatic and nontropical. Most species have a weak costa and immersed capsules with reduced peristomes (mainly Neckera spp.) and the teeth at the leaf margins are usually unicellular. Clade B members are also mainly non-Asiatic. They are typically fairly robust, distinctly stipilate, having a single, at least relatively strong costa, long setae (capsules exserted), and the peristomes are well developed or only somewhat reduced. Members of clade C are essentially Asiatic and tropical. The species of this clade usually have a strong costa and a long seta, the seta often being mammillose in its upper part. The peristome types in this clade are mixed, since both reduced and unreduced types are found. Several neckeraceous genera that were recognised on a morphological basis are polyphyletic (e.g. Neckera, Homalia, Thamnobryum, Porotrichum). Ancestral state reconstructions revealed that currently used diagnostic traits, such as the leaf asymmetry and costa strength are highly homoplastic. Similarly, the reconstructions revealed that the 'reduced' sporophyte features have evolved independently in each of the three clades.