976 resultados para E-catalogs, Data Integrations, Peer-to-peer, Data Summarisation


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The high morbidity and mortality associated with atherosclerotic coronary vascular disease (CVD) and its complications are being lessened by the increased knowledge of risk factors, effective preventative measures and proven therapeutic interventions. However, significant CVD morbidity remains and sudden cardiac death continues to be a presenting feature for some subsequently diagnosed with CVD. Coronary vascular disease is also the leading cause of anaesthesia related complications. Stress electrocardiography/exercise testing is predictive of 10 year risk of CVD events and the cardiovascular variables used to score this test are monitored peri-operatively. Similar physiological time-series datasets are being subjected to data mining methods for the prediction of medical diagnoses and outcomes. This study aims to find predictors of CVD using anaesthesia time-series data and patient risk factor data. Several pre-processing and predictive data mining methods are applied to this data. Physiological time-series data related to anaesthetic procedures are subjected to pre-processing methods for removal of outliers, calculation of moving averages as well as data summarisation and data abstraction methods. Feature selection methods of both wrapper and filter types are applied to derived physiological time-series variable sets alone and to the same variables combined with risk factor variables. The ability of these methods to identify subsets of highly correlated but non-redundant variables is assessed. The major dataset is derived from the entire anaesthesia population and subsets of this population are considered to be at increased anaesthesia risk based on their need for more intensive monitoring (invasive haemodynamic monitoring and additional ECG leads). Because of the unbalanced class distribution in the data, majority class under-sampling and Kappa statistic together with misclassification rate and area under the ROC curve (AUC) are used for evaluation of models generated using different prediction algorithms. The performance based on models derived from feature reduced datasets reveal the filter method, Cfs subset evaluation, to be most consistently effective although Consistency derived subsets tended to slightly increased accuracy but markedly increased complexity. The use of misclassification rate (MR) for model performance evaluation is influenced by class distribution. This could be eliminated by consideration of the AUC or Kappa statistic as well by evaluation of subsets with under-sampled majority class. The noise and outlier removal pre-processing methods produced models with MR ranging from 10.69 to 12.62 with the lowest value being for data from which both outliers and noise were removed (MR 10.69). For the raw time-series dataset, MR is 12.34. Feature selection results in reduction in MR to 9.8 to 10.16 with time segmented summary data (dataset F) MR being 9.8 and raw time-series summary data (dataset A) being 9.92. However, for all time-series only based datasets, the complexity is high. For most pre-processing methods, Cfs could identify a subset of correlated and non-redundant variables from the time-series alone datasets but models derived from these subsets are of one leaf only. MR values are consistent with class distribution in the subset folds evaluated in the n-cross validation method. For models based on Cfs selected time-series derived and risk factor (RF) variables, the MR ranges from 8.83 to 10.36 with dataset RF_A (raw time-series data and RF) being 8.85 and dataset RF_F (time segmented time-series variables and RF) being 9.09. The models based on counts of outliers and counts of data points outside normal range (Dataset RF_E) and derived variables based on time series transformed using Symbolic Aggregate Approximation (SAX) with associated time-series pattern cluster membership (Dataset RF_ G) perform the least well with MR of 10.25 and 10.36 respectively. For coronary vascular disease prediction, nearest neighbour (NNge) and the support vector machine based method, SMO, have the highest MR of 10.1 and 10.28 while logistic regression (LR) and the decision tree (DT) method, J48, have MR of 8.85 and 9.0 respectively. DT rules are most comprehensible and clinically relevant. The predictive accuracy increase achieved by addition of risk factor variables to time-series variable based models is significant. The addition of time-series derived variables to models based on risk factor variables alone is associated with a trend to improved performance. Data mining of feature reduced, anaesthesia time-series variables together with risk factor variables can produce compact and moderately accurate models able to predict coronary vascular disease. Decision tree analysis of time-series data combined with risk factor variables yields rules which are more accurate than models based on time-series data alone. The limited additional value provided by electrocardiographic variables when compared to use of risk factors alone is similar to recent suggestions that exercise electrocardiography (exECG) under standardised conditions has limited additional diagnostic value over risk factor analysis and symptom pattern. The effect of the pre-processing used in this study had limited effect when time-series variables and risk factor variables are used as model input. In the absence of risk factor input, the use of time-series variables after outlier removal and time series variables based on physiological variable values’ being outside the accepted normal range is associated with some improvement in model performance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Telecommunications network management is based on huge amounts of data that are continuously collected from elements and devices from all around the network. The data is monitored and analysed to provide information for decision making in all operation functions. Knowledge discovery and data mining methods can support fast-pace decision making in network operations. In this thesis, I analyse decision making on different levels of network operations. I identify the requirements decision-making sets for knowledge discovery and data mining tools and methods, and I study resources that are available to them. I then propose two methods for augmenting and applying frequent sets to support everyday decision making. The proposed methods are Comprehensive Log Compression for log data summarisation and Queryable Log Compression for semantic compression of log data. Finally I suggest a model for a continuous knowledge discovery process and outline how it can be implemented and integrated to the existing network operations infrastructure.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The study of supermassive black hole (SMBH) accretion during their phase of activity (hence becoming active galactic nuclei, AGN), and its relation to the host-galaxy growth, requires large datasets of AGN, including a significant fraction of obscured sources. X-ray data are strategic in AGN selection, because at X-ray energies the contamination from non-active galaxies is far less significant than in optical/infrared surveys, and the selection of obscured AGN, including also a fraction of heavily obscured AGN, is much more effective. In this thesis, I present the results of the Chandra COSMOS Legacy survey, a 4.6 Ms X-ray survey covering the equatorial COSMOS area. The COSMOS Legacy depth (flux limit f=2x10^(-16) erg/s/cm^(-2) in the 0.5-2 keV band) is significantly better than that of other X-ray surveys on similar area, and represents the path for surveys with future facilities, like Athena and X-ray Surveyor. The final Chandra COSMOS Legacy catalog contains 4016 point-like sources, 97% of which with redshift. 65% of the sources are optically obscured and potentially caught in the phase of main BH growth. We used the sample of 174 Chandra COSMOS Legacy at z>3 to place constraints on the BH formation scenario. We found a significant disagreement between our space density and the predictions of a physical model of AGN activation through major-merger. This suggests that in our luminosity range the BH triggering through secular accretion is likely preferred to a major-merger triggering scenario. Thanks to its large statistics, the Chandra COSMOS Legacy dataset, combined with the other multiwavelength COSMOS catalogs, will be used to answer questions related to a large number of astrophysical topics, with particular focus on the SMBH accretion in different luminosity and redshift regimes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This chapter presents an exploratory study involving a group of athletic shoe enthusiasts and their feelings towards customized footwear. These "sneakerheads" demonstrate their infatuation with sneakers via activities ranging from creating catalogs of custom shoes to buying and selling rare athletic footwear online. The key characteristic these individuals share is that, for them, athletic shoes are a fundamental fashion accessory stepped in symbolism and meaning. A series of in-depth interviews utilizing the Zaltman Metaphor Elicitation Technique (ZMET) provide a better understanding of how issues such as art, self-expression, exclusivity, peer recognition, and counterfeit goods interact with the mass customization of symbolic products by category experts.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many conventional statistical machine learning al- gorithms generalise poorly if distribution bias ex- ists in the datasets. For example, distribution bias arises in the context of domain generalisation, where knowledge acquired from multiple source domains need to be used in a previously unseen target domains. We propose Elliptical Summary Randomisation (ESRand), an efficient domain generalisation approach that comprises of a randomised kernel and elliptical data summarisation. ESRand learns a domain interdependent projection to a la- tent subspace that minimises the existing biases to the data while maintaining the functional relationship between domains. In the latent subspace, ellipsoidal summaries replace the samples to enhance the generalisation by further removing bias and noise in the data. Moreover, the summarisation enables large-scale data processing by significantly reducing the size of the data. Through comprehensive analysis, we show that our subspace-based approach outperforms state-of-the-art results on several activity recognition benchmark datasets, while keeping the computational complexity significantly low.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A large ensemble of general circulation model (GCM) integrations coupled to a fully interactive sulfur cycle scheme were run on the climateprediction.net platform to investigate the uncertainty in the climate response to sulfate aerosol and carbon dioxide (CO2) forcing. The sulfate burden within the model (and the atmosphere) depends on the balance between formation processes and deposition (wet and dry). The wet removal processes for sulfate aerosol are much faster than dry removal and so any changes in atmospheric circulation, cloud cover, and precipitation will feed back on the sulfate burden. When CO2 is doubled in the Hadley Centre Slab Ocean Model (HadSM3), global mean precipitation increased by 5%; however, the global mean sulfate burden increased by 10%. Despite the global mean increase in precipitation, there were large areas of the model showing decreases in precipitation (and cloud cover) in the Northern Hemisphere during June–August, which reduced wet deposition and allowed the sulfate burden to increase. Further experiments were also undertaken with and without doubling CO2 while including a future anthropogenic sulfur emissions scenario. Doubling CO2 further enhanced the increases in sulfate burden associated with increased anthropogenic sulfur emissions as observed in the doubled CO2-only experiment. The implications are that the climate response to doubling CO2 can influence the amount of sulfate within the atmosphere and, despite increases in global mean precipitation, may act to increase it.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper discusses some economic integrations in Latin America, which have become an expression of governance in the neoliberalist context -- These integrations are also the results of second-generation adjustments in terms of trade openness, sale of state assets, free short-term capital mobility and Asian and European integrations that preceded the regional ones -- In addition to this, this paper provides answers to the following questions: Are integrations aiming to achieve development? Would North-countries integrations take the same endangering course as in South America? Who should benefit from the integrations? Is there a link between development and demographics?

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We make three contributions to the theory of contracting under asymmetric information. First, we establish a competitive analog to the revelation principIe which we call the implementation principIe. This principIe provides a complete characterization of all incentive compatible, indirect contracting mechanisms in terms of contract catalogs (or menus), and allows us to conclude that in competi tive contracting situations, firms in choosing their contracting strategies can restrict attention, without loss of generality, to contract catalogs. Second, we establish a competi tive taxation principIe. This principIe, a refinement of the implementation principIe, provides a complete characterization of all implementable nonlinear pricing schedules in terms of product-price catalogs and allows us to reduce any game played over nonlinear pricing schedules to a strategically equivalent game played over product-price catalogs. Third, using the competitive taxation principIe and a recent result due to Reny (1999) on the existence of Nash equilibria in discontinuous games, we demonstrate the existence of a N ash equilibrium for the mixed extension of the nonlinear pricing game.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present a modified version of the cosmic crystallography method, especially useful for testing closed models of negative spatial curvature. The images of clusters of galaxies in simulated catalogs are 'pulled back' to the fundamental domain before the set of distances is calculated. © 1999 Published by Elsevier Science B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)