Biblioteca Digital

37 resultados para downloading of data

Introducing a pilot data collection model for real-time evaluation of data redundancy

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In order to reduce serious health incidents, individuals with high risks need to be identified as early as possible so that effective intervention and preventive care can be provided. This requires regular and efficient assessments of risk within communities that are the first point of contacts for individuals. Clinical Decision Support Systems CDSSs have been developed to help with the task of risk assessment, however such systems and their underpinning classification models are tailored towards those with clinical expertise. Communities where regular risk assessments are required lack such expertise. This paper presents the continuation of GRiST research team efforts to disseminate clinical expertise to communities. Based on our earlier published findings, this paper introduces the framework and skeleton for a data collection and risk classification model that evaluates data redundancy in real-time, detects the risk-informative data and guides the risk assessors towards collecting those data. By doing so, it enables non-experts within the communities to conduct reliable Mental Health risk triage.

A cautionary note on methods of comparing programmatic efficiency between two or more groups of DMUs in data envelopment analysis

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In some applications of data envelopment analysis (DEA) there may be doubt as to whether all the DMUs form a single group with a common efficiency distribution. The Mann-Whitney rank statistic has been used to evaluate if two groups of DMUs come from a common efficiency distribution under the assumption of them sharing a common frontier and to test if the two groups have a common frontier. These procedures have subsequently been extended using the Kruskal-Wallis rank statistic to consider more than two groups. This technical note identifies problems with the second of these applications of both the Mann-Whitney and Kruskal-Wallis rank statistics. It also considers possible alternative methods of testing if groups have a common frontier, and the difficulties of disaggregating managerial and programmatic efficiency within a non-parametric framework. © 2007 Springer Science+Business Media, LLC.

Estimation of functional connectivity from electromagnetic signals and the amount of empirical data required

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An increasing number of neuroimaging studies are concerned with the identification of interactions or statistical dependencies between brain areas. Dependencies between the activities of different brain regions can be quantified with functional connectivity measures such as the cross-correlation coefficient. An important factor limiting the accuracy of such measures is the amount of empirical data available. For event-related protocols, the amount of data also affects the temporal resolution of the analysis. We use analytical expressions to calculate the amount of empirical data needed to establish whether a certain level of dependency is significant when the time series are autocorrelated, as is the case for biological signals. These analytical results are then contrasted with estimates from simulations based on real data recorded with magnetoencephalography during a resting-state paradigm and during the presentation of visual stimuli. Results indicate that, for broadband signals, 50-100 s of data is required to detect a true underlying cross-correlations coefficient of 0.05. This corresponds to a resolution of a few hundred milliseconds for typical event-related recordings. The required time window increases for narrow band signals as frequency decreases. For instance, approximately 3 times as much data is necessary for signals in the alpha band. Important implications can be derived for the design and interpretation of experiments to characterize weak interactions, which are potentially important for brain processing.

Applications of fractals to image data compression

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Digital image processing is exploited in many diverse applications but the size of digital images places excessive demands on current storage and transmission technology. Image data compression is required to permit further use of digital image processing. Conventional image compression techniques based on statistical analysis have reached a saturation level so it is necessary to explore more radical methods. This thesis is concerned with novel methods, based on the use of fractals, for achieving significant compression of image data within reasonable processing time without introducing excessive distortion. Images are modelled as fractal data and this model is exploited directly by compression schemes. The validity of this is demonstrated by showing that the fractal complexity measure of fractal dimension is an excellent predictor of image compressibility. A method of fractal waveform coding is developed which has low computational demands and performs better than conventional waveform coding methods such as PCM and DPCM. Fractal techniques based on the use of space-filling curves are developed as a mechanism for hierarchical application of conventional techniques. Two particular applications are highlighted: the re-ordering of data during image scanning and the mapping of multi-dimensional data to one dimension. It is shown that there are many possible space-filling curves which may be used to scan images and that selection of an optimum curve leads to significantly improved data compression. The multi-dimensional mapping property of space-filling curves is used to speed up substantially the lookup process in vector quantisation. Iterated function systems are compared with vector quantisers and the computational complexity or iterated function system encoding is also reduced by using the efficient matching algcnithms identified for vector quantisers.

Transmission of image data over digital networks involving mobile terminals

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There is a growing demand for data transmission over digital networks involving mobile terminals. An important class of data required for transmission over mobile terminals is image information such as street maps, floor plans and identikit images. This sort of transmission is of particular interest to the service industries such as the Police force, Fire brigade, medical services and other services. These services cannot be applied directly to mobile terminals because of the limited capacity of the mobile channels and the transmission errors caused by the multipath (Rayleigh) fading. In this research, transmission of line diagram images such as floor plans and street maps, over digital networks involving mobile terminals at transmission rates of 2400 bits/s and 4800 bits/s have been studied. A low bit-rate source encoding technique using geometric codes is found to be suitable to represent line diagram images. In geometric encoding, the amount of data required to represent or store the line diagram images is proportional to the image detail. Thus a simple line diagram image would require a small amount of data. To study the effect of transmission errors due to mobile channels on the transmitted images, error sources (error files), which represent mobile channels under different conditions, have been produced using channel modelling techniques. Satisfactory models of the mobile channel have been obtained when compared to the field test measurements. Subjective performance tests have been carried out to evaluate the quality and usefulness of the received line diagram images under various mobile channel conditions. The effect of mobile transmission errors on the quality of the received images has been determined. To improve the quality of the received images under various mobile channel conditions, forward error correcting codes (FEC) with interleaving and automatic repeat request (ARQ) schemes have been proposed. The performance of the error control codes have been evaluated under various mobile channel conditions. It has been shown that a FEC code with interleaving can be used effectively to improve the quality of the received images under normal and severe mobile channel conditions. Under normal channel conditions, similar results have been obtained when using ARQ schemes. However, under severe mobile channel conditions, the FEC code with interleaving shows better performance.

Using secondary knowledge to support decision tree classification of retrospective clinical data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Retrospective clinical data presents many challenges for data mining and machine learning. The transcription of patient records from paper charts and subsequent manipulation of data often results in high volumes of noise as well as a loss of other important information. In addition, such datasets often fail to represent expert medical knowledge and reasoning in any explicit manner. In this research we describe applying data mining methods to retrospective clinical data to build a prediction model for asthma exacerbation severity for pediatric patients in the emergency department. Difficulties in building such a model forced us to investigate alternative strategies for analyzing and processing retrospective data. This paper describes this process together with an approach to mining retrospective clinical data by incorporating formalized external expert knowledge (secondary knowledge sources) into the classification task. This knowledge is used to partition the data into a number of coherent sets, where each set is explicitly described in terms of the secondary knowledge source. Instances from each set are then classified in a manner appropriate for the characteristics of the particular set. We present our methodology and outline a set of experiential results that demonstrate some advantages and some limitations of our approach. © 2008 Springer-Verlag Berlin Heidelberg.

Combination of altimetry data from different satellite missions

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Substantial altimetry datasets collected by different satellites have only become available during the past five years, but the future will bring a variety of new altimetry missions, both parallel and consecutive in time. The characteristics of each produced dataset vary with the different orbital heights and inclinations of the spacecraft, as well as with the technical properties of the radar instrument. An integral analysis of datasets with different properties offers advantages both in terms of data quantity and data quality. This thesis is concerned with the development of the means for such integral analysis, in particular for dynamic solutions in which precise orbits for the satellites are computed simultaneously. The first half of the thesis discusses the theory and numerical implementation of dynamic multi-satellite altimetry analysis. The most important aspect of this analysis is the application of dual satellite altimetry crossover points as a bi-directional tracking data type in simultaneous orbit solutions. The central problem is that the spatial and temporal distributions of the crossovers are in conflict with the time-organised nature of traditional solution methods. Their application to the adjustment of the orbits of both satellites involved in a dual crossover therefore requires several fundamental changes of the classical least-squares prediction/correction methods. The second part of the thesis applies the developed numerical techniques to the problems of precise orbit computation and gravity field adjustment, using the altimetry datasets of ERS-1 and TOPEX/Poseidon. Although the two datasets can be considered less compatible that those of planned future satellite missions, the obtained results adequately illustrate the merits of a simultaneous solution technique. In particular, the geographically correlated orbit error is partially observable from a dataset consisting of crossover differences between two sufficiently different altimetry datasets, while being unobservable from the analysis of altimetry data of both satellites individually. This error signal, which has a substantial gravity-induced component, can be employed advantageously in simultaneous solutions for the two satellites in which also the harmonic coefficients of the gravity field model are estimated.

Classification and contextual enhancement of remotely sensed data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The aims of the project were twofold: 1) To investigate classification procedures for remotely sensed digital data, in order to develop modifications to existing algorithms and propose novel classification procedures; and 2) To investigate and develop algorithms for contextual enhancement of classified imagery in order to increase classification accuracy. The following classifiers were examined: box, decision tree, minimum distance, maximum likelihood. In addition to these the following algorithms were developed during the course of the research: deviant distance, look up table and an automated decision tree classifier using expert systems technology. Clustering techniques for unsupervised classification were also investigated. Contextual enhancements investigated were: mode filters, small area replacement and Wharton's CONAN algorithm. Additionally methods for noise and edge based declassification and contextual reclassification, non-probabilitic relaxation and relaxation based on Markov chain theory were developed. The advantages of per-field classifiers and Geographical Information Systems were investigated. The conclusions presented suggest suitable combinations of classifier and contextual enhancement, given user accuracy requirements and time constraints. These were then tested for validity using a different data set. A brief examination of the utility of the recommended contextual algorithms for reducing the effects of data noise was also carried out.

A new fuzzy additive model for determining the common set of weights in Data Envelopment Analysis

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The main advantage of Data Envelopment Analysis (DEA) is that it does not require any priori weights for inputs and outputs and allows individual DMUs to evaluate their efficiencies with the input and output weights that are only most favorable weights for calculating their efficiency. It can be argued that if DMUs are experiencing similar circumstances, then the pricing of inputs and outputs should apply uniformly across all DMUs. That is using of different weights for DMUs makes their efficiencies unable to be compared and not possible to rank them on the same basis. This is a significant drawback of DEA; however literature observed many solutions including the use of common set of weights (CSW). Besides, the conventional DEA methods require accurate measurement of both the inputs and outputs; however, crisp input and output data may not relevant be available in real world applications. This paper develops a new model for the calculation of CSW in fuzzy environments using fuzzy DEA. Further, a numerical example is used to show the validity and efficacy of the proposed model and to compare the results with previous models available in the literature.

Hierarchical classification of G-protein-coupled receptors with data-driven selection of attributes and classifiers

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We address the important bioinformatics problem of predicting protein function from a protein's primary sequence. We consider the functional classification of G-Protein-Coupled Receptors (GPCRs), whose functions are specified in a class hierarchy. We tackle this task using a novel top-down hierarchical classification system where, for each node in the class hierarchy, the predictor attributes to be used in that node and the classifier to be applied to the selected attributes are chosen in a data-driven manner. Compared with a previous hierarchical classification system selecting classifiers only, our new system significantly reduced processing time without significantly sacrificing predictive accuracy.

A comparison of sales response predictions from demand models applied to store-level versus panel data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In order to generate sales promotion response predictions, marketing analysts estimate demand models using either disaggregated (consumer-level) or aggregated (store-level) scanner data. Comparison of predictions from these demand models is complicated by the fact that models may accommodate different forms of consumer heterogeneity depending on the level of data aggregation. This study shows via simulation that demand models with various heterogeneity specifications do not produce more accurate sales response predictions than a homogeneous demand model applied to store-level data, with one major exception: a random coefficients model designed to capture within-store heterogeneity using store-level data produced significantly more accurate sales response predictions (as well as better fit) compared to other model specifications. An empirical application to the paper towel product category adds additional insights. This article has supplementary material online.

Integration of semantically annotated data by the KnoFuss architecture

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Most of the existing work on information integration in the Semantic Web concentrates on resolving schema-level problems. Specific issues of data-level integration (instance coreferencing, conflict resolution, handling uncertainty) are usually tackled by applying the same techniques as for ontology schema matching or by reusing the solutions produced in the database domain. However, data structured according to OWL ontologies has its specific features: e.g., the classes are organized into a hierarchy, the properties are inherited, data constraints differ from those defined by database schema. This paper describes how these features are exploited in our architecture KnoFuss, designed to support data-level integration of semantic annotations.

A tale of two analyses: the use of archived qualitative data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This article provides a unique contribution to the debates about archived qualitative data by drawing on two uses of the same data - British Migrants in Spain: the Extent and Nature of Social Integration, 2003-2005 - by Jones (2009) and Oliver and O'Reilly (2010), both of which utilise Bourdieu's concepts analytically and produce broadly similar findings. We argue that whilst the insights and experiences of those researchers directly involved in data collection are important resources for developing contextual knowledge used in data analysis, other kinds of critical distance can also facilitate credible data use. We therefore challenge the assumption that the idiosyncratic relationship between context, reflexivity and interpretation limits the future use of data. Moreover, regardless of the complex genealogy of the data itself, given the number of contingencies shaping the qualitative research process and thus the potential for partial or inaccurate interpretation, contextual familiarity need not be privileged over other aspects of qualitative praxis such as sustained theoretical insight, sociological imagination and methodological rigour. Â© Sociological Research Online, 1996-2012.

Semisupervised learning of hierarchical latent trait models for data visualization

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recently, we have developed the hierarchical Generative Topographic Mapping (HGTM), an interactive method for visualization of large high-dimensional real-valued data sets. In this paper, we propose a more general visualization system by extending HGTM in three ways, which allows the user to visualize a wider range of data sets and better support the model development process. 1) We integrate HGTM with noise models from the exponential family of distributions. The basic building block is the Latent Trait Model (LTM). This enables us to visualize data of inherently discrete nature, e.g., collections of documents, in a hierarchical manner. 2) We give the user a choice of initializing the child plots of the current plot in either interactive, or automatic mode. In the interactive mode, the user selects "regions of interest," whereas in the automatic mode, an unsupervised minimum message length (MML)-inspired construction of a mixture of LTMs is employed. The unsupervised construction is particularly useful when high-level plots are covered with dense clusters of highly overlapping data projections, making it difficult to use the interactive mode. Such a situation often arises when visualizing large data sets. 3) We derive general formulas for magnification factors in latent trait models. Magnification factors are a useful tool to improve our understanding of the visualization plots, since they can highlight the boundaries between data clusters. We illustrate our approach on a toy example and evaluate it on three more complex real data sets. © 2005 IEEE.

Understanding data collection behaviour of mental health practitioners

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Failure to detect patients at risk of attempting suicide can result in tragic consequences. Identifying risks earlier and more accurately helps prevent serious incidents occurring and is the objective of the GRiST clinical decision support system (CDSS). One of the problems it faces is high variability in the type and quantity of data submitted for patients, who are assessed in multiple contexts along the care pathway. Although GRiST identifies up to 138 patient cues to collect, only about half of them are relevant for any one patient and their roles may not be for risk evaluation but more for risk management. This paper explores the data collection behaviour of clinicians using GRiST to see whether it can elucidate which variables are important for risk evaluations and when. The GRiST CDSS is based on a cognitive model of human expertise manifested by a sophisticated hierarchical knowledge structure or tree. This structure is used by the GRiST interface to provide top-down controlled access to the patient data. Our research explores relationships between the answers given to these higher-level 'branch' questions to see whether they can help direct assessors to the most important data, depending on the patient profile and assessment context. The outcome is a model for dynamic data collection driven by the knowledge hierarchy. It has potential for improving other clinical decision support systems operating in domains with high dimensional data that are only partially collected and in a variety of combinations.

«
1
2
3
»