952 resultados para Core data set
Resumo:
significant amount of Expendable Bathythermograph (XBT) data has been collected in the Mediterranean Sea since 1999 in the framework of operational oceanography activities. The management and storage of such a volume of data poses significant challenges and opportunities. The SeaDataNet project, a pan-European infrastructure for marine data diffusion, provides a convenient way to avoid dispersion of these temperature vertical profiles and to facilitate access to a wider public. The XBT data flow, along with the recent improvements in the quality check procedures and the consistence of the available historical data set are described. The main features of SeaDataNet services and the advantage of using this system for long-term data archiving are presented. Finally, focus on the Ligurian Sea is included in order to provide an example of the kind of information and final products devoted to different users can be easily derived from the SeaDataNet web portal.
Resumo:
Mass flows on volcanic islands generated by volcanic lava dome collapse and by larger-volume flank collapse can be highly dangerous locally and may generate tsunamis that threaten a wider area. It is therefore important to understand their frequency, emplacement dynamics, and relationship to volcanic eruption cycles. The best record of mass flow on volcanic islands may be found offshore, where most material is deposited and where intervening hemipelagic sediment aids dating. Here we analyze what is arguably the most comprehensive sediment core data set collected offshore from a volcanic island. The cores are located southeast of Montserrat, on which the Soufriere Hills volcano has been erupting since 1995. The cores provide a record of mass flow events during the last 110 thousand years. Older mass flow deposits differ significantly from those generated by the repeated lava dome collapses observed since 1995. The oldest mass flow deposit originated through collapse of the basaltic South Soufriere Hills at 103-110 ka, some 20-30 ka after eruptions formed this volcanic center. A ∼1.8 km3 blocky debris avalanche deposit that extends from a chute in the island shelf records a particularly deep-seated failure. It likely formed from a collapse of almost equal amounts of volcanic edifice and coeval carbonate shelf, emplacing a mixed bioclastic-andesitic turbidite in a complex series of stages. This study illustrates how volcanic island growth and collapse involved extensive, large-volume submarine mass flows with highly variable composition. Runout turbidites indicate that mass flows are emplaced either in multiple stages or as single events.
Resumo:
Big Data presents many challenges related to volume, whether one is interested in studying past datasets or, even more problematically, attempting to work with live streams of data. The most obvious challenge, in a ‘noisy’ environment such as contemporary social media, is to collect the pertinent information; be that information for a specific study, tweets which can inform emergency services or other responders to an ongoing crisis, or give an advantage to those involved in prediction markets. Often, such a process is iterative, with keywords and hashtags changing with the passage of time, and both collection and analytic methodologies need to be continually adapted to respond to this changing information. While many of the data sets collected and analyzed are preformed, that is they are built around a particular keyword, hashtag, or set of authors, they still contain a large volume of information, much of which is unnecessary for the current purpose and/or potentially useful for future projects. Accordingly, this panel considers methods for separating and combining data to optimize big data research and report findings to stakeholders. The first paper considers possible coding mechanisms for incoming tweets during a crisis, taking a large stream of incoming tweets and selecting which of those need to be immediately placed in front of responders, for manual filtering and possible action. The paper suggests two solutions for this, content analysis and user profiling. In the former case, aspects of the tweet are assigned a score to assess its likely relationship to the topic at hand, and the urgency of the information, whilst the latter attempts to identify those users who are either serving as amplifiers of information or are known as an authoritative source. Through these techniques, the information contained in a large dataset could be filtered down to match the expected capacity of emergency responders, and knowledge as to the core keywords or hashtags relating to the current event is constantly refined for future data collection. The second paper is also concerned with identifying significant tweets, but in this case tweets relevant to particular prediction market; tennis betting. As increasing numbers of professional sports men and women create Twitter accounts to communicate with their fans, information is being shared regarding injuries, form and emotions which have the potential to impact on future results. As has already been demonstrated with leading US sports, such information is extremely valuable. Tennis, as with American Football (NFL) and Baseball (MLB) has paid subscription services which manually filter incoming news sources, including tweets, for information valuable to gamblers, gambling operators, and fantasy sports players. However, whilst such services are still niche operations, much of the value of information is lost by the time it reaches one of these services. The paper thus considers how information could be filtered from twitter user lists and hash tag or keyword monitoring, assessing the value of the source, information, and the prediction markets to which it may relate. The third paper examines methods for collecting Twitter data and following changes in an ongoing, dynamic social movement, such as the Occupy Wall Street movement. It involves the development of technical infrastructure to collect and make the tweets available for exploration and analysis. A strategy to respond to changes in the social movement is also required or the resulting tweets will only reflect the discussions and strategies the movement used at the time the keyword list is created — in a way, keyword creation is part strategy and part art. In this paper we describe strategies for the creation of a social media archive, specifically tweets related to the Occupy Wall Street movement, and methods for continuing to adapt data collection strategies as the movement’s presence in Twitter changes over time. We also discuss the opportunities and methods to extract data smaller slices of data from an archive of social media data to support a multitude of research projects in multiple fields of study. The common theme amongst these papers is that of constructing a data set, filtering it for a specific purpose, and then using the resulting information to aid in future data collection. The intention is that through the papers presented, and subsequent discussion, the panel will inform the wider research community not only on the objectives and limitations of data collection, live analytics, and filtering, but also on current and in-development methodologies that could be adopted by those working with such datasets, and how such approaches could be customized depending on the project stakeholders.
Resumo:
Core Vector Machine(CVM) is suitable for efficient large-scale pattern classification. In this paper, a method for improving the performance of CVM with Gaussian kernel function irrespective of the orderings of patterns belonging to different classes within the data set is proposed. This method employs a selective sampling based training of CVM using a novel kernel based scalable hierarchical clustering algorithm. Empirical studies made on synthetic and real world data sets show that the proposed strategy performs well on large data sets.
Resumo:
Even though several techniques have been proposed in the literature for achieving multiclass classification using Support Vector Machine(SVM), the scalability aspect of these approaches to handle large data sets still needs much of exploration. Core Vector Machine(CVM) is a technique for scaling up a two class SVM to handle large data sets. In this paper we propose a Multiclass Core Vector Machine(MCVM). Here we formulate the multiclass SVM problem as a Quadratic Programming(QP) problem defining an SVM with vector valued output. This QP problem is then solved using the CVM technique to achieve scalability to handle large data sets. Experiments done with several large synthetic and real world data sets show that the proposed MCVM technique gives good generalization performance as that of SVM at a much lesser computational expense. Further, it is observed that MCVM scales well with the size of the data set.
Resumo:
Support Vector Clustering has gained reasonable attention from the researchers in exploratory data analysis due to firm theoretical foundation in statistical learning theory. Hard Partitioning of the data set achieved by support vector clustering may not be acceptable in real world scenarios. Rough Support Vector Clustering is an extension of Support Vector Clustering to attain a soft partitioning of the data set. But the Quadratic Programming Problem involved in Rough Support Vector Clustering makes it computationally expensive to handle large datasets. In this paper, we propose Rough Core Vector Clustering algorithm which is a computationally efficient realization of Rough Support Vector Clustering. Here Rough Support Vector Clustering problem is formulated using an approximate Minimum Enclosing Ball problem and is solved using an approximate Minimum Enclosing Ball finding algorithm. Experiments done with several Large Multi class datasets such as Forest cover type, and other Multi class datasets taken from LIBSVM page shows that the proposed strategy is efficient, finds meaningful soft cluster abstractions which provide a superior generalization performance than the SVM classifier.
Resumo:
This study presents a comprehensive evaluation of five widely used multisatellite precipitation estimates (MPEs) against 1 degrees x 1 degrees gridded rain gauge data set as ground truth over India. One decade observations are used to assess the performance of various MPEs (Climate Prediction Center (CPC)-South Asia data set, CPC Morphing Technique (CMORPH), Precipitation Estimation From Remotely Sensed Information Using Artificial Neural Networks, Tropical Rainfall Measuring Mission's Multisatellite Precipitation Analysis (TMPA-3B42), and Global Precipitation Climatology Project). All MPEs have high detection skills of rain with larger probability of detection (POD) and smaller ``missing'' values. However, the detection sensitivity differs from one product (and also one region) to the other. While the CMORPH has the lowest sensitivity of detecting rain, CPC shows highest sensitivity and often overdetects rain, as evidenced by large POD and false alarm ratio and small missing values. All MPEs show higher rain sensitivity over eastern India than western India. These differential sensitivities are found to alter the biases in rain amount differently. All MPEs show similar spatial patterns of seasonal rain bias and root-mean-square error, but their spatial variability across India is complex and pronounced. The MPEs overestimate the rainfall over the dry regions (northwest and southeast India) and severely underestimate over mountainous regions (west coast and northeast India), whereas the bias is relatively small over the core monsoon zone. Higher occurrence of virga rain due to subcloud evaporation and possible missing of small-scale convective events by gauges over the dry regions are the main reasons for the observed overestimation of rain by MPEs. The decomposed components of total bias show that the major part of overestimation is due to false precipitation. The severe underestimation of rain along the west coast is attributed to the predominant occurrence of shallow rain and underestimation of moderate to heavy rain by MPEs. The decomposed components suggest that the missed precipitation and hit bias are the leading error sources for the total bias along the west coast. All evaluation metrics are found to be nearly equal in two contrasting monsoon seasons (southwest and northeast), indicating that the performance of MPEs does not change with the season, at least over southeast India. Among various MPEs, the performance of TMPA is found to be better than others, as it reproduced most of the spatial variability exhibited by the reference.
Resumo:
This is the first report from ALT’s new Annual Survey launched in December 2014. This survey was primarily for ALT members (individual or at an organisation which is an organisational member) it could however also be filled in by others, perhaps those interested in taking out membership. The report and data highlight emerging work areas that are important to the survey respondents. Analysis of the survey responses indicates a number of areas ALT should continue to support and develop. Priorities for the membership are ‘Intelligent use of learning technology’ and ‘Research and practice’, aligned to this is the value placed by respondent’s on by communication via the ALT Newsletter/News, social media and Research in Learning Technology. The survey also reveals ‘Data and Analytics’ and ‘Open Education’ are areas where the majority of respondents are finding are becoming increasingly important. As such our community may benefit from development opportunities ALT can provide. The survey is also a reminder that ALT has an essential role in enabling members to develop research and practice in areas which might be considered as minority interest. For example whilst the majority of respondents didn't indicate areas such as ‘Digital and Open Badges’, and ‘Game Based Learning’ as important there are still members who consider these areas are very significant and becoming increasingly valuable and as such ALT will continue to better support these groups within our community. Whilst ALT has conducted previous surveys of ALT membership this is the first iteration in this form. ALT has committed to surveying the sector on an annual basis, refining the core question set but trying to preserve an opportunity for longitudinal analysis.
Resumo:
Some reasons for registering trials might be considered as self-serving, such as satisfying the requirements of a journal in which the researchers wish to publish their eventual findings or publicising the trial to boost recruitment. Registry entries also help others, including systematic reviewers, to know about ongoing or unpublished studies and contribute to reducing research waste by making it clear what studies are ongoing. Other sources of research waste include inconsistency in outcome measurement across trials in the same area, missing data on important outcomes from some trials, and selective reporting of outcomes. One way to reduce this waste is through the use of core outcome sets: standardised sets of outcomes for research in specific areas of health and social care. These do not restrict the outcomes that will be measured, but provide the minimum to include if a trial is to be of the most use to potential users. We propose that trial registries, such as ISRCTN, encourage researchers to note their use of a core outcome set in their entry. This will help people searching for trials and those worried about selective reporting in closed trials. Trial registries can facilitate these efforts to make new trials as useful as possible and reduce waste. The outcomes section in the entry could prompt the researcher to consider using a core outcome set and facilitate the specification of that core outcome set and its component outcomes through linking to the original core outcome set. In doing this, registries will contribute to the global effort to ensure that trials answer important uncertainties, can be brought together in systematic reviews, and better serve their ultimate aim of improving health and well-being through improving health and social care.
Resumo:
The Greenland NEEM (North Greenland Eemian Ice Drilling) operation in 2010 provided the first opportunity to combine trace-gas measurements by laser spectroscopic instruments and continuous-flow analysis along a freshly drilled ice core in a field-based setting. We present the resulting atmospheric methane (CH4) record covering the time period from 107.7 to 9.5 ka b2k (thousand years before 2000 AD). Companion discrete CH4 measurements are required to transfer the laser spectroscopic data from a relative to an absolute scale. However, even on a relative scale, the high-resolution CH4 data set significantly improves our knowledge of past atmospheric methane concentration changes. New significant sub-millennial-scale features appear during interstadials and stadials, generally associated with similar changes in water isotopic ratios of the ice, a proxy for local temperature. In addition to the midpoint of Dansgaard–Oeschger (D/O) CH4 transitions usually used for cross-dating, sharp definition of the start and end of these events brings precise depth markers (with ±20 cm uncertainty) for further cross-dating with other palaeo- or ice core records, e.g. speleothems. The method also provides an estimate of CH4 rates of change. The onsets of D/O events in the methane signal show a more rapid rate of change than their endings. The rate of CH4 increase associated with the onsets of D/O events progressively declines from 1.7 to 0.6 ppbv yr−1 in the course of marine isotope stage 3. The largest observed rate of increase takes place at the onset of D/O event #21 and reaches 2.5 ppbv yr−1.
Resumo:
The radar reflectivity of an ice-sheet bed is a primary measurement for discriminating between thawed and frozen beds. Uncertainty in englacial radar attenuation and its spatial variation introduces corresponding uncertainty in estimates of basal reflectivity. Radar attenuation is proportional to ice conductivity, which depends on the concentrations of acid and sea-salt chloride and the temperature of the ice. We synthesize published conductivity measurements to specify an ice-conductivity model and find that some of the dielectric properties of ice at radar frequencies are not yet well constrained. Using depth profiles of ice-core chemistry and borehole temperature and an average of the experimental values for the dielectric properties, we calculate an attenuation rate profile for Siple Dome, West Antarctica. The depth-averaged modeled attenuation rate at Siple Dome (20.0 +/- 5.7 dB km(-1)) is somewhat lower than the value derived from radar profiles (25.3 +/- 1.1 dB km(-1)). Pending more experimental data on the dielectric properties of ice, we can match the modeled and radar-derived attenuation rates by an adjustment to the value for the pure ice conductivity that is within the range of reported values. Alternatively, using the pure ice dielectric properties derived from the most extensive single data set, the modeled depth-averaged attenuation rate is 24.0 +/- 2.2 dB km(-1). This work shows how to calculate englacial radar attenuation using ice chemistry and temperature data and establishes a basis for mapping spatial variations in radar attenuation across an ice sheet.