993 resultados para data summarization
Resumo:
Big Data and predictive analytics have received significant attention from the media and academic literature throughout the past few years, and it is likely that these emerging technologies will materially impact the mining sector. This short communication argues, however, that these technological forces will probably unfold differently in the mining industry than they have in many other sectors because of significant differences in the marginal cost of data capture and storage. To this end, we offer a brief overview of what Big Data and predictive analytics are, and explain how they are bringing about changes in a broad range of sectors. We discuss the “N=all” approach to data collection being promoted by many consultants and technology vendors in the marketplace but, by considering the economic and technical realities of data acquisition and storage, we then explain why a “n « all” data collection strategy probably makes more sense for the mining sector. Finally, towards shaping the industry’s policies with regards to technology-related investments in this area, we conclude by putting forward a conceptual model for leveraging Big Data tools and analytical techniques that is a more appropriate fit for the mining sector.
Resumo:
The explosive growth in the development of Traditional Chinese Medicine (TCM) has resulted in the continued increase in clinical and research data. The lack of standardised terminology, flaws in data quality planning and management of TCM informatics are preventing clinical decision-making, drug discovery and education. This paper argues that the introduction of data warehousing technologies to enhance the effectiveness and durability in TCM is paramount. To showcase the role of data warehousing in the improvement of TCM, this paper presents a practical model for data warehousing with detailed explanation, which is based on the structured electronic records, for TCM clinical researches and medical knowledge discovery.
Resumo:
Cancer is the leading contributor to the disease burden in Australia. This thesis develops and applies Bayesian hierarchical models to facilitate an investigation of the spatial and temporal associations for cancer diagnosis and survival among Queenslanders. The key objectives are to document and quantify the importance of spatial inequalities, explore factors influencing these inequalities, and investigate how spatial inequalities change over time. Existing Bayesian hierarchical models are refined, new models and methods developed, and tangible benefits obtained for cancer patients in Queensland. The versatility of using Bayesian models in cancer control are clearly demonstrated through these detailed and comprehensive analyses.
Resumo:
A method for reconstruction of an object f(x) x=(x,y,z) from a limited set of cone-beam projection data has been developed. This method uses a modified form of convolution back-projection and projection onto convex sets (POCS) for handling the limited (or incomplete) data problem. In cone-beam tomography, one needs to have a complete geometry to completely reconstruct the original three-dimensional object. While complete geometries do exist, they are of little use in practical implementations. The most common trajectory used in practical scanners is circular, which is incomplete. It is, however, possible to recover some of the information of the original signal f(x) based on a priori knowledge of the nature of f(x). If this knowledge can be posed in a convex set framework, then POCS can be utilized. In this report, we utilize this a priori knowledge as convex set constraints to reconstruct f(x) using POCS. While we demonstrate the effectiveness of our algorithm for circular trajectories, it is essentially geometry independent and will be useful in any limited-view cone-beam reconstruction.
Resumo:
The Taylor coefficients c and d of the EM form factor of the pion are constrained using analyticity, knowledge of the phase of the form factor in the time-like region, 4m(pi)(2) <= t <= t(in) and its value at one space-like point, using as input the (g - 2) of the muon. This is achieved using the technique of Lagrange multipliers, which gives a transparent expression for the corresponding bounds. We present a detailed study of the sensitivity of the bounds to the choice of time-like phase and errors present in the space-like data, taken from recent experiments. We find that our results constrain c stringently. We compare our results with those in the literature and find agreement with the chiral perturbation-theory results for c. We obtain d similar to O(10) GeV-6 when c is set to the chiral perturbation-theory values.
Resumo:
There is an error in the JANAF (1985) data on the standard enthalpy, Gibbs energy and equilibrium constant for the formation of C2H2 (g) from elements. The error has arisen on account of an incorrect expression used for computing these parameters from the heat capacity, entropy and the relative heat content. Presented in this paper are the corrected values of the enthalpy, the Gibbs energy of formation and the corresponding equilibrium constant.
Resumo:
The correlation dimension D 2 and correlation entropy K 2 are both important quantifiers in nonlinear time series analysis. However, use of D 2 has been more common compared to K 2 as a discriminating measure. One reason for this is that D 2 is a static measure and can be easily evaluated from a time series. However, in many cases, especially those involving coloured noise, K 2 is regarded as a more useful measure. Here we present an efficient algorithmic scheme to compute K 2 directly from a time series data and show that K 2 can be used as a more effective measure compared to D 2 for analysing practical time series involving coloured noise.
Resumo:
Over recent years, the focus in road safety has shifted towards a greater understanding of road crash serious injuries in addition to fatalities. Police reported crash data are often the primary source of crash information; however, the definition of serious injury within these data is not consistent across jurisdictions and may not be accurately operationalised. This study examined the linkage of police-reported road crash data with hospital data to explore the potential for linked data to enhance the quantification of serious injury. Data from the Queensland Road Crash Database (QRCD), the Queensland Hospital Admitted Patients Data Collection (QHAPDC), Emergency Department Information System (EDIS), and the Queensland Injury Surveillance Unit (QISU) for the year 2009 were linked. Nine different estimates of serious road crash injury were produced. Results showed that there was a large amount of variation in the estimates of the number and profile of serious road crash injuries depending on the definition or measure used. The results also showed that as the definition of serious injury becomes more precise the vulnerable road users become more prominent. These results have major implications in terms of how serious injuries are identified for reporting purposes. Depending on the definitions used, the calculation of cost and understanding of the impact of serious injuries would vary greatly. This study has shown how data linkage can be used to investigate issues of data quality. It has also demonstrated the potential improvements to the understanding of the road safety problem, particularly serious injury, by conducting data linkage.
Resumo:
Background: A genetic network can be represented as a directed graph in which a node corresponds to a gene and a directed edge specifies the direction of influence of one gene on another. The reconstruction of such networks from transcript profiling data remains an important yet challenging endeavor. A transcript profile specifies the abundances of many genes in a biological sample of interest. Prevailing strategies for learning the structure of a genetic network from high-dimensional transcript profiling data assume sparsity and linearity. Many methods consider relatively small directed graphs, inferring graphs with up to a few hundred nodes. This work examines large undirected graphs representations of genetic networks, graphs with many thousands of nodes where an undirected edge between two nodes does not indicate the direction of influence, and the problem of estimating the structure of such a sparse linear genetic network (SLGN) from transcript profiling data. Results: The structure learning task is cast as a sparse linear regression problem which is then posed as a LASSO (l1-constrained fitting) problem and solved finally by formulating a Linear Program (LP). A bound on the Generalization Error of this approach is given in terms of the Leave-One-Out Error. The accuracy and utility of LP-SLGNs is assessed quantitatively and qualitatively using simulated and real data. The Dialogue for Reverse Engineering Assessments and Methods (DREAM) initiative provides gold standard data sets and evaluation metrics that enable and facilitate the comparison of algorithms for deducing the structure of networks. The structures of LP-SLGNs estimated from the INSILICO1, INSILICO2 and INSILICO3 simulated DREAM2 data sets are comparable to those proposed by the first and/or second ranked teams in the DREAM2 competition. The structures of LP-SLGNs estimated from two published Saccharomyces cerevisae cell cycle transcript profiling data sets capture known regulatory associations. In each S. cerevisiae LP-SLGN, the number of nodes with a particular degree follows an approximate power law suggesting that its degree distributions is similar to that observed in real-world networks. Inspection of these LP-SLGNs suggests biological hypotheses amenable to experimental verification. Conclusion: A statistically robust and computationally efficient LP-based method for estimating the topology of a large sparse undirected graph from high-dimensional data yields representations of genetic networks that are biologically plausible and useful abstractions of the structures of real genetic networks. Analysis of the statistical and topological properties of learned LP-SLGNs may have practical value; for example, genes with high random walk betweenness, a measure of the centrality of a node in a graph, are good candidates for intervention studies and hence integrated computational – experimental investigations designed to infer more realistic and sophisticated probabilistic directed graphical model representations of genetic networks. The LP-based solutions of the sparse linear regression problem described here may provide a method for learning the structure of transcription factor networks from transcript profiling and transcription factor binding motif data.
Resumo:
The standard free energies of formation of CaO derived from a variety of high-temperature equilibrium measurements made by seven groups of experimentalists are significantly different from those given in the standard compilations of thermodynamic data. Indirect support for the validity of the compiled data comes from new solid-state electrochemical measurements using single-crystal CaF2 and SrF2 as electrolytes. The change in free energy for the following reactions are obtained: CaO + MgF2 --> MgO + CaF2 Delta G degrees = -68,050 -2.47 T(+/-100) J mol(-1) SrO + CaF2 --> SrF2 + CaO Delta G degrees = -35,010 + 6.39 T (+/-80) J mol(-1) The standard free energy changes associated with cell reactions agree with data in standard compilations within +/- 4 kJ mol(-1). The results of this study do not support recent suggestions for a major revision in thermodynamic data for CaO.
Resumo:
Neural data are inevitably contaminated by noise. When such noisy data are subjected to statistical analysis, misleading conclusions can be reached. Here we attempt to address this problem by applying a state-space smoothing method, based on the combined use of the Kalman filter theory and the Expectation–Maximization algorithm, to denoise two datasets of local field potentials recorded from monkeys performing a visuomotor task. For the first dataset, it was found that the analysis of the high gamma band (60–90 Hz) neural activity in the prefrontal cortex is highly susceptible to the effect of noise, and denoising leads to markedly improved results that were physiologically interpretable. For the second dataset, Granger causality between primary motor and primary somatosensory cortices was not consistent across two monkeys and the effect of noise was suspected. After denoising, the discrepancy between the two subjects was significantly reduced.
Resumo:
Data generated via user activity on social media platforms is routinely used for research across a wide range of social sciences and humanities disciplines. The availability of data through the Twitter APIs in particular has afforded new modes of research, including in media and communication studies; however, there are practical and political issues with gaining access to such data, and with the consequences of how that access is controlled. In their paper ‘Easy Data, Hard Data’, Burgess and Bruns (2015) discuss both the practical and political aspects of Twitter data as they relate to academic research, describing how communication research has been enabled, shaped and constrained by Twitter’s “regimes of access” to data, the politics of data use, and emerging economies of data exchange. This conceptual model, including the ‘easy data, hard data’ formulation, can also be applied to Sina Weibo. In this paper, we build on this model to explore the practical and political challenges and opportunities associated with the ‘regimes of access’ to Weibo data, and their consequences for digital media and communication studies. We argue that in the Chinese context, the politics of data access can be even more complicated than in the case of Twitter, which makes scientific research relying on large social data from this platform more challenging in some ways, but potentially richer and more rewarding in others.
Resumo:
Introduction Two symposia on “cardiovascular diseases and vulnerable plaques” Cardiovascular disease (CVD) is the leading cause of death worldwide. Huge effort has been made in many disciplines including medical imaging, computational modeling, bio- mechanics, bioengineering, medical devices, animal and clinical studies, population studies as well as genomic, molecular, cellular and organ-level studies seeking improved methods for early detection, diagnosis, prevention and treatment of these diseases [1-14]. However, the mechanisms governing the initiation, progression and the occurrence of final acute clinical CVD events are still poorly understood. A large number of victims of these dis- eases who are apparently healthy die suddenly without prior symptoms. Available screening and diagnostic methods are insufficient to identify the victims before the event occurs [8,9]. Most cardiovascular diseases are associated with vulnerable plaques. A grand challenge here is to develop new imaging techniques, predictive methods and patient screening tools to identify vulnerable plaques and patients who are more vulnerable to plaque rupture and associated clinical events such as stroke and heart attack, and recommend proper treatment plans to prevent those clinical events from happening. Articles in this special issue came from two symposia held recently focusing on “Cardio-vascular Diseases and Vulnerable Plaques: Data, Modeling, Predictions and Clinical Applications.” One was held at Worcester Polytechnic Institute (WPI), Worcester, MA, USA, July 13-14, 2014, right after the 7th World Congress of Biomechanics. This symposium was endorsed by the World Council of Biomechanics, and partially supported by a grant from NIH-National Institute of Biomedical Image and Bioengineering. The other was held at Southeast University (SEU), Nanjing, China, April 18-20, 2014.
Resumo:
A series of bimetallic acetylacetonate (acac) complexes, AlxCr1-x(acac)(3), 0 <= x <= 1, have been synthesized for application as precursors for the CVD Of Substituted oxides, such as (AlxCr1-x)(2)O-3. Detailed thermal analysis has been carried out on these complexes, which are solids that begin subliming at low temperatures, followed by melting, and evaporation from the melt. By applying the Langmuir equation to differential thermogravimetry data, the vapour pressure of these complexes is estimated. From these vapour pressure data, the distinctly different enthalpies of sublimation and evaporation are calculated, using the Clausius-Clapeyron equation. Such a determination of both the enthalpies of sublimation and evaporation of complexes, which sublime and melt congruently, does not appear to have been reported in the literature to date.
Resumo:
The conformational flexibility inherent in the polynucleotide chain plays an important role in deciding its three-dimensonal structure and enables it to undergo structural transitions in order to fulfil all its functions. Following certain stereochemical guidelines, both right and left handed double-helical models have been built in our laboratory and they are in reasonably good agreement with the fibre patterns for various polymorphous forms of DNA. Recently, nuclear magnetic resonance spectroscopy has become an important technique for studying the solution conformation and polymorphism of nucleic acids. Several workers have used 1H nuclear magnetic resonance nuclear Overhauser enhancement measurements to estimate the interproton distances for the various DNA oligomers and compared them with the interproton distances for particular models of A and Β form DNA. In some cases the solution conformation does not seem to fit either of these models. We have been studying various models for DNA with a view to exploring the full conformational space allowed for nucleic acid polymers. In this paper, the interproton distances calculated for the different stereochemically feasible models of DNA are presented and they are compared and correlated against those obtained from 1Η nuclear magnetic resonance nuclear Overhauser enhancement measurements of various nucleic acid oligomers.