232 resultados para data sets


Relevância:

70.00% 70.00%

Publicador:

Resumo:

The upstream oil & gas industry has been contending with massive data sets and monolithic files for many years, but “Big Data”—that is, the ability to apply more sophisticated types of analytical tools to information in a way that extracts new insights or creates new forms of value—is a relatively new concept that has the potential to significantly re-shape the industry. Despite the impressive amount of value that is being realized by Big Data technologies in other parts of the marketplace, however, much of the data collected within the oil & gas sector tends to be discarded, ignored, or analyzed in a very cursory way. This paper examines existing data management practices in the upstream oil & gas industry, and compares them to practices and philosophies that have emerged in organizations that are leading the Big Data revolution. The comparison shows that, in companies that are leading the Big Data revolution, data is regarded as a valuable asset. The presented evidence also shows, however, that this is usually not true within the oil & gas industry insofar as data is frequently regarded there as descriptive information about a physical asset rather than something that is valuable in and of itself. The paper then discusses how upstream oil & gas companies could potentially extract more value from data, and concludes with a series of specific technical and management-related recommendations to this end.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

When crystallization screening is conducted many outcomes are observed but typically the only trial recorded in the literature is the condition that yielded the crystal(s) used for subsequent diffraction studies. The initial hit that was optimized and the results of all the other trials are lost. These missing results contain information that would be useful for an improved general understanding of crystallization. This paper provides a report of a crystallization data exchange (XDX) workshop organized by several international large-scale crystallization screening laboratories to discuss how this information may be captured and utilized. A group that administers a significant fraction of the worlds crystallization screening results was convened, together with chemical and structural data informaticians and computational scientists who specialize in creating and analysing large disparate data sets. The development of a crystallization ontology for the crystallization community was proposed. This paper (by the attendees of the workshop) provides the thoughts and rationale leading to this conclusion. This is brought to the attention of the wider audience of crystallographers so that they are aware of these early efforts and can contribute to the process going forward. © 2012 International Union of Crystallography All rights reserved.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Repeatable and accurate seagrass mapping is required for understanding seagrass ecology and supporting management decisions. For shallow (< 5 m) seagrass habitats, these maps can be created by integrating high spatial resolution imagery with field survey data. Field survey data for seagrass is often collected via snorkelling or diving. However, these methods are limited by environmental and safety considerations. Autonomous Underwater Vehicles (AUVs) are used increasingly to collect field data for habitat mapping, albeit mostly in deeper waters (>20 m). Here we demonstrate and evaluate the use and potential advantages of AUV field data collection for calibration and validation of seagrass habitat mapping of shallow waters (< 5 m), from multispectral satellite imagery. The study was conducted in the seagrass habitats of the Eastern Banks (142 km2), Moreton Bay, Australia. In the field, georeferenced photos of the seagrass were collected along transects via snorkelling or an AUV. Photos from both collection methods were analysed manually for seagrass species composition and then used as calibration and validation data to map seagrass using an established semi-automated object based mapping routine. A comparison of the relative advantages and disadvantages of AUV and snorkeller collected field data sets and their influence on the mapping routine was conducted. AUV data collection was more consistent, repeatable and safer in comparison to snorkeller transects. Inclusion of deeper water AUV data resulted in mapping of a larger extent of seagrass (~7 km2, 5 % of study area) in the deeper waters of the site. Although overall map accuracies did not differ considerably, inclusion of the AUV data from deeper water transects corrected errors in seagrass mapped at depths to 5 m, but where the bottom is visible on satellite imagery. Our results demonstrate that further development of AUV technology is justified for the monitoring of seagrass habitats in ongoing management programs.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The upstream oil and gas industry has been contending with massive data sets and monolithic files for many years, but “Big Data” is a relatively new concept that has the potential to significantly re-shape the industry. Despite the impressive amount of value that is being realized by Big Data technologies in other parts of the marketplace, however, much of the data collected within the oil and gas sector tends to be discarded, ignored, or analyzed in a very cursory way. This viewpoint examines existing data management practices in the upstream oil and gas industry, and compares them to practices and philosophies that have emerged in organizations that are leading the way in Big Data. The comparison shows that, in companies that are widely considered to be leaders in Big Data analytics, data is regarded as a valuable asset—but this is usually not true within the oil and gas industry insofar as data is frequently regarded there as descriptive information about a physical asset rather than something that is valuable in and of itself. The paper then discusses how the industry could potentially extract more value from data, and concludes with a series of policy-related questions to this end.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Long-term systematic population monitoring data sets are rare but are essential in identifying changes in species abundance. In contrast, community groups and natural history organizations have collected many species lists. These represent a large, untapped source of information on changes in abundance but are generally considered of little value. The major problem with using species lists to detect population changes is that the amount of effort used to obtain the list is often uncontrolled and usually unknown. It has been suggested that using the number of species on the list, the "list length," can be a measure of effort. This paper significantly extends the utility of Franklin's approach using Bayesian logistic regression. We demonstrate the value of List Length Analysis to model changes in species prevalence (i.e., the proportion of lists on which the species occurs) using bird lists collected by a local bird club over 40 years around Brisbane, southeast Queensland, Australia. We estimate the magnitude and certainty of change for 269 bird species and calculate the probabilities that there have been declines and increases of given magnitudes. List Length Analysis confirmed suspected species declines and increases. This method is an important complement to systematically designed intensive monitoring schemes and provides a means of utilizing data that may otherwise be deemed useless. The results of List Length Analysis can be used for targeting species of conservation concern for listing purposes or for more intensive monitoring. While Bayesian methods are not essential for List Length Analysis, they can offer more flexibility in interrogating the data and are able to provide a range of parameters that are easy to interpret and can facilitate conservation listing and prioritization. © 2010 by the Ecological Society of America.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Despite being used since 1976, Delusions-Symptoms-States-Inventory/states of Anxiety and Depression (DSSI/sAD) has not yet been validated for use among people with diabetes. The aim of this study was to examine the validity of the personal disturbance scale (DSSI/sAD) among women with diabetes using Mater-University of Queensland Study of Pregnancy (MUSP) cohort data. The DSSI subscales were compared against DSM-IV disorders, the Mental Component Score of the Short Form 36 (SF-36 MCS), and Center for Epidemiologic Studies Depression Scale (CES-D). Factor analyses, odds ratios, receiver operating characteristic (ROC) analyses and diagnostic efficiency tests were used to report findings. Exploratory factor analysis and fit indices confirmed the hypothesized two-factor model of DSSI/sAD. We found significant variations in the DSSI/sAD domain scores that could be explained by CES-D (DSSI-Anxiety: 55%, DSSI-Depression: 46%) and SF-36 MCS (DSSI-Anxiety: 66%, DSSI-Depression: 56%). The DSSI subscales predicted DSM-IV diagnosed depression and anxiety disorders. The ROC analyses show that although the DSSI symptoms and DSM-IV disorders were measured concurrently the estimates of concordance remained only moderate. The findings demonstrate that the DSSI/sAD items have similar relationships to one another in both the diabetes and non-diabetes data sets which therefore suggest that they have similar interpretations.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The concept of big data has already outperformed traditional data management efforts in almost all industries. Other instances it has succeeded in obtaining promising results that provide value from large-scale integration and analysis of heterogeneous data sources for example Genomic and proteomic information. Big data analytics have become increasingly important in describing the data sets and analytical techniques in software applications that are so large and complex due to its significant advantages including better business decisions, cost reduction and delivery of new product and services [1]. In a similar context, the health community has experienced not only more complex and large data content, but also information systems that contain a large number of data sources with interrelated and interconnected data attributes. That have resulted in challenging, and highly dynamic environments leading to creation of big data with its enumerate complexities, for instant sharing of information with the expected security requirements of stakeholders. When comparing big data analysis with other sectors, the health sector is still in its early stages. Key challenges include accommodating the volume, velocity and variety of healthcare data with the current deluge of exponential growth. Given the complexity of big data, it is understood that while data storage and accessibility are technically manageable, the implementation of Information Accountability measures to healthcare big data might be a practical solution in support of information security, privacy and traceability measures. Transparency is one important measure that can demonstrate integrity which is a vital factor in the healthcare service. Clarity about performance expectations is considered to be another Information Accountability measure which is necessary to avoid data ambiguity and controversy about interpretation and finally, liability [2]. According to current studies [3] Electronic Health Records (EHR) are key information resources for big data analysis and is also composed of varied co-created values [3]. Common healthcare information originates from and is used by different actors and groups that facilitate understanding of the relationship for other data sources. Consequently, healthcare services often serve as an integrated service bundle. Although a critical requirement in healthcare services and analytics, it is difficult to find a comprehensive set of guidelines to adopt EHR to fulfil the big data analysis requirements. Therefore as a remedy, this research work focus on a systematic approach containing comprehensive guidelines with the accurate data that must be provided to apply and evaluate big data analysis until the necessary decision making requirements are fulfilled to improve quality of healthcare services. Hence, we believe that this approach would subsequently improve quality of life.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A combined data matrix consisting of high performance liquid chromatography–diode array detector (HPLC–DAD) and inductively coupled plasma-mass spectrometry (ICP-MS) measurements of samples from the plant roots of the Cortex moutan (CM), produced much better classification and prediction results in comparison with those obtained from either of the individual data sets. The HPLC peaks (organic components) of the CM samples, and the ICP-MS measurements (trace metal elements) were investigated with the use of principal component analysis (PCA) and the linear discriminant analysis (LDA) methods of data analysis; essentially, qualitative results suggested that discrimination of the CM samples from three different provinces was possible with the combined matrix producing best results. Another three methods, K-nearest neighbor (KNN), back-propagation artificial neural network (BP-ANN) and least squares support vector machines (LS-SVM) were applied for the classification and prediction of the samples. Again, the combined data matrix analyzed by the KNN method produced best results (100% correct; prediction set data). Additionally, multiple linear regression (MLR) was utilized to explore any relationship between the organic constituents and the metal elements of the CM samples; the extracted linear regression equations showed that the essential metals as well as some metallic pollutants were related to the organic compounds on the basis of their concentrations

Relevância:

70.00% 70.00%

Publicador:

Resumo:

We propose a new model for estimating the size of a population from successive catches taken during a removal experiment. The data from these experiments often have excessive variation, known as overdispersion, as compared with that predicted by the multinomial model. The new model allows catchability to vary randomly among samplings, which accounts for overdispersion. When the catchability is assumed to have a beta distribution, the likelihood function, which is refered to as beta-multinomial, is derived, and hence the maximum likelihood estimates can be evaluated. Simulations show that in the presence of extravariation in the data, the confidence intervals have been substantially underestimated in previous models (Leslie-DeLury, Moran) and that the new model provides more reliable confidence intervals. The performance of these methods was also demonstrated using two real data sets: one with overdispersion, from smallmouth bass (Micropterus dolomieu), and the other without overdispersion, from rat (Rattus rattus).

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This article develops a method for analysis of growth data with multiple recaptures when the initial ages for all individuals are unknown. The existing approaches either impute the initial ages or model them as random effects. Assumptions about the initial age are not verifiable because all the initial ages are unknown. We present an alternative approach that treats all the lengths including the length at first capture as correlated repeated measures for each individual. Optimal estimating equations are developed using the generalized estimating equations approach that only requires the first two moment assumptions. Explicit expressions for estimation of both mean growth parameters and variance components are given to minimize the computational complexity. Simulation studies indicate that the proposed method works well. Two real data sets are analyzed for illustration, one from whelks (Dicathais aegaota) and the other from southern rock lobster (Jasus edwardsii) in South Australia.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Criminological theories of cross-national studies of homicide have underestimated the effects of quality governance of liberal democracy and region. Data sets from several sources are combined and a comprehensive model of homicide is proposed. Results of the spatial regression model, which controls for the effect of spatial autocorrelation, show that quality governance, human development, economic inequality, and ethnic heterogeneity are statistically significant in predicting homicide. In addition, regions of Latin America and non-Muslim Sub-Saharan Africa have significantly higher rates of homicides ceteris paribus while the effects of East Asian countries and Islamic societies are not statistically significant. These findings are consistent with the expectation of the new modernization and regional theories.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This research studied distributed computing of all-to-all comparison problems with big data sets. The thesis formalised the problem, and developed a high-performance and scalable computing framework with a programming model, data distribution strategies and task scheduling policies to solve the problem. The study considered storage usage, data locality and load balancing for performance improvement in solving the problem. The research outcomes can be applied in bioinformatics, biometrics and data mining and other domains in which all-to-all comparisons are a typical computing pattern.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Big Data and Learning Analytics’ promise to revolutionise educational institutions, endeavours, and actions through more and better data is now compelling. Multiple, and continually updating, data sets produce a new sense of ‘personalised learning’. A crucial attribute of the datafication, and subsequent profiling, of learner behaviour and engagement is the continual modification of the learning environment to induce greater levels of investment on the parts of each learner. The assumption is that more and better data, gathered faster and fed into ever-updating algorithms, provide more complete tools to understand, and therefore improve, learning experiences through adaptive personalisation. The argument in this paper is that Learning Personalisation names a new logistics of investment as the common ‘sense’ of the school, in which disciplinary education is ‘both disappearing and giving way to frightful continual training, to continual monitoring'.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Objectives. Considerable evidence suggests that enforcement efforts cannot fully explain the high degree of tax compliance. To resolve this puzzle of tax compliance, several researchers have argued that citizens' attitudes toward paying taxes, defined as tax morale, helps to explain the high degree of tax compliance. However, most studies have treated tax morale as a black box, without discussing which factors shape it. Additionally, the tax compliance literature provides little empirical research that investigates attitudes toward paying taxes in Europe. Methods. Thus, this article is unique in its examination of citizen tax morale within three multicultural European countries, Switzerland, Belgium, and Spain, a choice that allows far more detailed examination of the impact of culture and institutions using data sets from the World Values Survey and the European Values Survey. Results. The results indicate the tendency that cultural and regional differences affect tax morale. Conclusion. The findings suggest that higher legitimacy for political institutions leads to higher tax morale.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Principal Topic The Comprehensive Australian Study of Entrepreneurial Emergence (CAUSEE) represents the first Australian study to employ and extend the longitudinal and large scale systematic research developed for the Panel Study of Entrepreneurial Dynamics (PSED) in the US (Gartner, Shaver, Carter and Reynolds, 2004; Reynolds, 2007). This research approach addresses several shortcomings of other data sets including under coverage; selection bias; memory decay and hindsight bias, and lack of time separation between the assessment of causes and their assumed effects (Johnson et al 2006; Davidsson 2006). However, a remaining problem is that any a random sample of start-ups will be dominated by low potential, imitative ventures. In recognition of this issue CAUSEE supplemented PSED-type random samples with theoretically representative samples of the 'high potential' emerging ventures employing a unique methodology using novel multiple screening criteria. We define new ''high-potential'' ventures as new entrepreneurial innovative ventures with high aspirations and potential for growth. This distinguishes them from those ''lifestyle'' imitative businesses that start small and remain intentionally small (Timmons, 1986). CAUSEE is providing the opportunity to explore, for the first time, if process and outcomes of high potentials differ from those of traditional lifestyle firms. This will allows us to compare process and outcome attributes of the random sample with the high potential over sample of new firms and young firms. The attributes in which we will examine potential differences will include source of funding, and internationalisation. This is interesting both in terms of helping to explain why different outcomes occur but also in terms of assistance to future policymaking, given that high growth potential firms are increasingly becoming the focus of government intervention in economic development policies around the world. The first wave of data of a four year longitudinal study has been collected using these samples, allowing us to also provide some initial analysis on which to continue further research. The aim of this paper therefore is to present some selected preliminary results from the first wave of the data collection, with comparisons of high potential with lifestyle firms. We expect to see owing to greater resource requirements and higher risk profiles, more use of venture capital and angel investment, and more internationalisation activity to assist in recouping investment and to overcome Australia's smaller economic markets Methodology/Key Propositions In order to develop the samples of 'high potential' in the NF and YF categories a set of qualification criteria were developed. Specifically, to qualify, firms as nascent or young high potentials, we used multiple, partly compensating screening criteria related to the human capital and aspirations of the founders as well as the novelty of the venture idea, and venture high technology. A variety of techniques were also employed to develop a multi level dataset of sources to develop leads and firm details. A dataset was generated from a variety of websites including major stakeholders including the Federal and State Governments, Australian Chamber of Commerce, University Commercialisation Offices, Patent and Trademark Attorneys, Government Awards and Industry Awards in Entrepreneurship and Innovation, Industry lead associations, Venture Capital Association, Innovation directories including Australian Technology Showcase, Business and Entrepreneurs Magazines including BRW and Anthill. In total, over 480 industry, association, government and award sources were generated in this process. Of these, 74 discrete sources generated high potentials that fufilled the criteria. 1116 firms were contacted as high potential cases. 331 cases agreed to participate in the screener, with 279 firms (134 nascents, and 140 young firms) successfully passing the high potential criteria. 222 Firms (108 Nascents and 113 Young firms) completed the full interview. For the general sample CAUSEE conducts screening phone interviews with a very large number of adult members of households randomly selected through random digit dialing using screening questions which determine whether respondents qualify as 'nascent entrepreneurs'. CAUSEE additionally targets 'young firms' those that commenced trading from 2004 or later. This process yielded 977 Nascent Firms (3.4%) and 1,011 Young Firms (3.6%). These were directed to the full length interview (40-60 minutes) either directly following the screener or later by appointment. The full length interviews were completed by 594 NF and 514 YF cases. These are the cases we will use in the comparative analysis in this report. Results and Implications The results for this paper are based on Wave one of the survey which has been completed and the data obtained. It is expected that the findings will assist in beginning to develop an understanding of high potential nascent and young firms in Australia, how they differ from the larger lifestyle entrepreneur group that makes up the vast majority of the new firms created each year, and the elements that may contribute to turning high potential growth status into high growth realities. The results have implications for Government in the design of better conditions for the creation of new business, firms who assist high potentials in developing better advice programs in line with a better understanding of their needs and requirements, individuals who may be considering becoming entrepreneurs in high potential arenas and existing entrepreneurs make better decisions.