Biblioteca Digital

985 resultados para Data handling

Handling high level of censoring for endovascular aortic repair risk prediction

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Feature selection is important in medical field for many reasons. However, selecting important variables is a difficult task with the presence of censoring that is a unique feature in survival data analysis. This paper proposed an approach to deal with the censoring problem in endovascular aortic repair survival data through Bayesian networks. It was merged and embedded with a hybrid feature selection process that combines cox's univariate analysis with machine learning approaches such as ensemble artificial neural networks to select the most relevant predictive variables. The proposed algorithm was compared with common survival variable selection approaches such as; least absolute shrinkage and selection operator LASSO, and Akaike information criterion AIC methods. The results showed that it was capable of dealing with high censoring in the datasets. Moreover, ensemble classifiers increased the area under the roc curves of the two datasets collected from two centers located in United Kingdom separately. Furthermore, ensembles constructed with center 1 enhanced the concordance index of center 2 prediction compared to the model built with a single network. Although the size of the final reduced model using the neural networks and its ensembles is greater than other methods, the model outperformed the others in both concordance index and sensitivity for center 2 prediction. This indicates the reduced model is more powerful for cross center prediction.

Dynamic data retrieval on the World Wide Web

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Methods for accessing data on the Web have been the focus of active research over the past few years. In this thesis we propose a method for representing Web sites as data sources. We designed a Data Extractor data retrieval solution that allows us to define queries to Web sites and process resulting data sets. Data Extractor is being integrated into the MSemODB heterogeneous database management system. With its help database queries can be distributed over both local and Web data sources within MSemODB framework. ^ Data Extractor treats Web sites as data sources, controlling query execution and data retrieval. It works as an intermediary between the applications and the sites. Data Extractor utilizes a twofold “custom wrapper” approach for information retrieval. Wrappers for the majority of sites are easily built using a powerful and expressive scripting language, while complex cases are processed using Java-based wrappers that utilize specially designed library of data retrieval, parsing and Web access routines. In addition to wrapper development we thoroughly investigate issues associated with Web site selection, analysis and processing. ^ Data Extractor is designed to act as a data retrieval server, as well as an embedded data retrieval solution. We also use it to create mobile agents that are shipped over the Internet to the client's computer to perform data retrieval on behalf of the user. This approach allows Data Extractor to distribute and scale well. ^ This study confirms feasibility of building custom wrappers for Web sites. This approach provides accuracy of data retrieval, and power and flexibility in handling of complex cases. ^

Adding concurrent data transfer to transport layer

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In recent years, the internet has grown exponentially, and become more complex. This increased complexity potentially introduces more network-level instability. But for any end-to-end internet connection, maintaining the connection's throughput and reliability at a certain level is very important. This is because it can directly affect the connection's normal operation. Therefore, a challenging research task is to improve a network's connection performance by optimizing its throughput and reliability. This dissertation proposed an efficient and reliable transport layer protocol (called concurrent TCP (cTCP)), an extension of the current TCP protocol, to optimize end-to-end connection throughput and enhance end-to-end connection fault tolerance. The proposed cTCP protocol could aggregate multiple paths' bandwidth by supporting concurrent data transfer (CDT) on a single connection. Here concurrent data transfer was defined as the concurrent transfer of data from local hosts to foreign hosts via two or more end-to-end paths. An RTT-Based CDT mechanism, which was based on a path's RTT (Round Trip Time) to optimize CDT performance, was developed for the proposed cTCP protocol. This mechanism primarily included an RTT-Based load distribution and path management scheme, which was used to optimize connections' throughput and reliability. A congestion control and retransmission policy based on RTT was also provided. According to experiment results, under different network conditions, our RTT-Based CDT mechanism could acquire good CDT performance. Finally a CWND-Based CDT mechanism, which was based on a path's CWND (Congestion Window), to optimize CDT performance was introduced. This mechanism primarily included: a CWND-Based load allocation scheme, which assigned corresponding data to paths based on their CWND to achieve aggregate bandwidth; a CWND-Based path management, which was used to optimize connections' fault tolerance; and a congestion control and retransmission management policy, which was similar to regular TCP in its separate path handling. According to corresponding experiment results, this mechanism could acquire near-optimal CDT performance under different network conditions.

Hospitality Data Mining Myths

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Electronic database handling of buisness information has gradually gained its popularity in the hospitality industry. This article provides an overview on the fundamental concepts of a hotel database and investigates the feasibility of incorporating computer-assisted data mining techniques into hospitality database applications. The author also exposes some potential myths associated with data mining in hospitaltiy database applications.

Dynamic data retrieval on the world wide web

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Methods for accessing data on the Web have been the focus of active research over the past few years. In this thesis we propose a method for representing Web sites as data sources. We designed a Data Extractor data retrieval solution that allows us to define queries to Web sites and process resulting data sets. Data Extractor is being integrated into the MSemODB heterogeneous database management system. With its help database queries can be distributed over both local and Web data sources within MSemODB framework. Data Extractor treats Web sites as data sources, controlling query execution and data retrieval. It works as an intermediary between the applications and the sites. Data Extractor utilizes a two-fold "custom wrapper" approach for information retrieval. Wrappers for the majority of sites are easily built using a powerful and expressive scripting language, while complex cases are processed using Java-based wrappers that utilize specially designed library of data retrieval, parsing and Web access routines. In addition to wrapper development we thoroughly investigate issues associated with Web site selection, analysis and processing. Data Extractor is designed to act as a data retrieval server, as well as an embedded data retrieval solution. We also use it to create mobile agents that are shipped over the Internet to the client's computer to perform data retrieval on behalf of the user. This approach allows Data Extractor to distribute and scale well. This study confirms feasibility of building custom wrappers for Web sites. This approach provides accuracy of data retrieval, and power and flexibility in handling of complex cases.

State-space modelling of geolocation data reveals sex differences in the use of management areas by breeding northern fulmars

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We thank Orkney Islands Council for access to Eynhallow and Talisman Energy (UK) Ltd and Marine Scotland for fieldwork and equipment support. Handling and tagging of fulmars was conducted under licences from the British Trust for Ornithology and the UK Home Office. EE was funded by a Marine Alliance for Science and Technology for Scotland/University of Aberdeen College of Life Sciences and Medicine studentship and LQ was supported by a NERC Studentship. Thanks also to the many colleagues who assisted with fieldwork during the project, and to Helen Bailey and Arliss Winship for advice on implementing the state-space model.

State-space modelling of geolocation data reveals sex differences in the use of management areas by breeding northern fulmars

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We thank Orkney Islands Council for access to Eynhallow and Talisman Energy (UK) Ltd and Marine Scotland for fieldwork and equipment support. Handling and tagging of fulmars was conducted under licences from the British Trust for Ornithology and the UK Home Office. EE was funded by a Marine Alliance for Science and Technology for Scotland/University of Aberdeen College of Life Sciences and Medicine studentship and LQ was supported by a NERC Studentship. Thanks also to the many colleagues who assisted with fieldwork during the project, and to Helen Bailey and Arliss Winship for advice on implementing the state-space model.

Statistical methods for polyhedral shape classification with incomplete data - application to cryo-electron tomographic images

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A certain type of bacterial inclusion, known as a bacterial microcompartment, was recently identified and imaged through cryo-electron tomography. A reconstructed 3D object from single-axis limited angle tilt-series cryo-electron tomography contains missing regions and this problem is known as the missing wedge problem. Due to missing regions on the reconstructed images, analyzing their 3D structures is a challenging problem. The existing methods overcome this problem by aligning and averaging several similar shaped objects. These schemes work well if the objects are symmetric and several objects with almost similar shapes and sizes are available. Since the bacterial inclusions studied here are not symmetric, are deformed, and show a wide range of shapes and sizes, the existing approaches are not appropriate. This research develops new statistical methods for analyzing geometric properties, such as volume, symmetry, aspect ratio, polyhedral structures etc., of these bacterial inclusions in presence of missing data. These methods work with deformed and non-symmetric varied shaped objects and do not necessitate multiple objects for handling the missing wedge problem. The developed methods and contributions include: (a) an improved method for manual image segmentation, (b) a new approach to 'complete' the segmented and reconstructed incomplete 3D images, (c) a polyhedral structural distance model to predict the polyhedral shapes of these microstructures, (d) a new shape descriptor for polyhedral shapes, named as polyhedron profile statistic, and (e) the Bayes classifier, linear discriminant analysis and support vector machine based classifiers for supervised incomplete polyhedral shape classification. Finally, the predicted 3D shapes for these bacterial microstructures belong to the Johnson solids family, and these shapes along with their other geometric properties are important for better understanding of their chemical and biological characteristics.

Comparing sensor-based and camera-based approaches to recognizing the occupancy status of the load handling device of forklift trucks

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Real-time data of key performance enablers in logistics warehouses are of growing importance as they permit decision-makers to instantaneously react to alerts, deviations and damages. Several technologies appear as adequate data sources to collect the information required in order to achieve the goal. In the present re-search paper, the load status of the fork of a forklift is to be recognized with the help of a sensor-based and a camera-based solution approach. The comparison of initial experimentation results yields a statement about which direction to pursue for promising further research.

Dynamically managing sensed context data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis reports on an investigation of the feasibility and usefulness of incorporating dynamic management facilities for managing sensed context data in a distributed contextaware mobile application. The investigation focuses on reducing the work required to integrate new sensed context streams in an existing context aware architecture. Current architectures require integration work for new streams and new contexts that are encountered. This means of operation is acceptable for current fixed architectures. However, as systems become more mobile the number of discoverable streams increases. Without the ability to discover and use these new streams the functionality of any given device will be limited to the streams that it knows how to decode. The integration of new streams requires that the sensed context data be understood by the current application. If the new source provides data of a type that an application currently requires then the new source should be connected to the application without any prior knowledge of the new source. If the type is similar and can be converted then this stream too should be appropriated by the application. Such applications are based on portable devices (phones, PDAs) for semi-autonomous services that use data from sensors connected to the devices, plus data exchanged with other such devices and remote servers. Such applications must handle input from a variety of sensors, refining the data locally and managing its communication from the device in volatile and unpredictable network conditions. The choice to focus on locally connected sensory input allows for the introduction of privacy and access controls. This local control can determine how the information is communicated to others. This investigation focuses on the evaluation of three approaches to sensor data management. The first system is characterised by its static management based on the pre-pended metadata. This was the reference system. Developed for a mobile system, the data was processed based on the attached metadata. The code that performed the processing was static. The second system was developed to move away from the static processing and introduce a greater freedom of handling for the data stream, this resulted in a heavy weight approach. The approach focused on pushing the processing of the data into a number of networked nodes rather than the monolithic design of the previous system. By creating a separate communication channel for the metadata it is possible to be more flexible with the amount and type of data transmitted. The final system pulled the benefits of the other systems together. By providing a small management class that would load a separate handler based on the incoming data, Dynamism was maximised whilst maintaining ease of code understanding. The three systems were then compared to highlight their ability to dynamically manage new sensed context. The evaluation took two approaches, the first is a quantitative analysis of the code to understand the complexity of the relative three systems. This was done by evaluating what changes to the system were involved for the new context. The second approach takes a qualitative view of the work required by the software engineer to reconfigure the systems to provide support for a new data stream. The evaluation highlights the various scenarios in which the three systems are most suited. There is always a trade-o↵ in the development of a system. The three approaches highlight this fact. The creation of a statically bound system can be quick to develop but may need to be completely re-written if the requirements move too far. Alternatively a highly dynamic system may be able to cope with new requirements but the developer time to create such a system may be greater than the creation of several simpler systems.

Data structures for SIMD logic simulation

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Due to the growth of design size and complexity, design verification is an important aspect of the Logic Circuit development process. The purpose of verification is to validate that the design meets the system requirements and specification. This is done by either functional or formal verification. The most popular approach to functional verification is the use of simulation based techniques. Using models to replicate the behaviour of an actual system is called simulation. In this thesis, a software/data structure architecture without explicit locks is proposed to accelerate logic gate circuit simulation. We call thus system ZSIM. The ZSIM software architecture simulator targets low cost SIMD multi-core machines. Its performance is evaluated on the Intel Xeon Phi and 2 other machines (Intel Xeon and AMD Opteron). The aim of these experiments is to: • Verify that the data structure used allows SIMD acceleration, particularly on machines with gather instructions ( section 5.3.1). • Verify that, on sufficiently large circuits, substantial gains could be made from multicore parallelism ( section 5.3.2 ). • Show that a simulator using this approach out-performs an existing commercial simulator on a standard workstation ( section 5.3.3 ). • Show that the performance on a cheap Xeon Phi card is competitive with results reported elsewhere on much more expensive super-computers ( section 5.3.5 ). To evaluate the ZSIM, two types of test circuits were used: 1. Circuits from the IWLS benchmark suit [1] which allow direct comparison with other published studies of parallel simulators.2. Circuits generated by a parametrised circuit synthesizer. The synthesizer used an algorithm that has been shown to generate circuits that are statistically representative of real logic circuits. The synthesizer allowed testing of a range of very large circuits, larger than the ones for which it was possible to obtain open source files. The experimental results show that with SIMD acceleration and multicore, ZSIM gained a peak parallelisation factor of 300 on Intel Xeon Phi and 11 on Intel Xeon. With only SIMD enabled, ZSIM achieved a maximum parallelistion gain of 10 on Intel Xeon Phi and 4 on Intel Xeon. Furthermore, it was shown that this software architecture simulator running on a SIMD machine is much faster than, and can handle much bigger circuits than a widely used commercial simulator (Xilinx) running on a workstation. The performance achieved by ZSIM was also compared with similar pre-existing work on logic simulation targeting GPUs and supercomputers. It was shown that ZSIM simulator running on a Xeon Phi machine gives comparable simulation performance to the IBM Blue Gene supercomputer at very much lower cost. The experimental results have shown that the Xeon Phi is competitive with simulation on GPUs and allows the handling of much larger circuits than have been reported for GPU simulation. When targeting Xeon Phi architecture, the automatic cache management of the Xeon Phi, handles and manages the on-chip local store without any explicit mention of the local store being made in the architecture of the simulator itself. However, targeting GPUs, explicit cache management in program increases the complexity of the software architecture. Furthermore, one of the strongest points of the ZSIM simulator is its portability. Note that the same code was tested on both AMD and Xeon Phi machines. The same architecture that efficiently performs on Xeon Phi, was ported into a 64 core NUMA AMD Opteron. To conclude, the two main achievements are restated as following: The primary achievement of this work was proving that the ZSIM architecture was faster than previously published logic simulators on low cost platforms. The secondary achievement was the development of a synthetic testing suite that went beyond the scale range that was previously publicly available, based on prior work that showed the synthesis technique is valid.

Comparison of five popular normalization methods using a spike-in proteomics data set

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Mass spectrometry (MS)-based proteomics has seen significant technical advances during the past two decades and mass spectrometry has become a central tool in many biosciences. Despite the popularity of MS-based methods, the handling of the systematic non-biological variation in the data remains a common problem. This biasing variation can result from several sources ranging from sample handling to differences caused by the instrumentation. Normalization is the procedure which aims to account for this biasing variation and make samples comparable. Many normalization methods commonly used in proteomics have been adapted from the DNA-microarray world. Studies comparing normalization methods with proteomics data sets using some variability measures exist. However, a more thorough comparison looking at the quantitative and qualitative differences of the performance of the different normalization methods and at their ability in preserving the true differential expression signal of proteins, is lacking. In this thesis, several popular and widely used normalization methods (the Linear regression normalization, Local regression normalization, Variance stabilizing normalization, Quantile-normalization, Median central tendency normalization and also variants of some of the forementioned methods), representing different strategies in normalization are being compared and evaluated with a benchmark spike-in proteomics data set. The normalization methods are evaluated in several ways. The performance of the normalization methods is evaluated qualitatively and quantitatively on a global scale and in pairwise comparisons of sample groups. In addition, it is investigated, whether performing the normalization globally on the whole data or pairwise for the comparison pairs examined, affects the performance of the normalization method in normalizing the data and preserving the true differential expression signal. In this thesis, both major and minor differences in the performance of the different normalization methods were found. Also, the way in which the normalization was performed (global normalization of the whole data or pairwise normalization of the comparison pair) affected the performance of some of the methods in pairwise comparisons. Differences among variants of the same methods were also observed.

Factors affecting vaccine handling and storage practices among immunization service providers in Ibadan, Oyo State, Nigeria.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Improper handling has been identified as one of the major reasons for the decline in vaccine potency at the time of administration. Loss of potency becomes evident when immunised individuals contract the diseases the vaccines were meant to prevent. Objective: Assessing the factors associated with vaccine handling and storage practices. Methods: This was a cross-sectional study. Three-stage sampling was used to recruit 380 vaccine handlers from 273 health facilities from 11 Local Government areas in Ibadan. Data was analysed using SPSS version 16 Results: Seventy-three percent were aware of vaccine handling and storage guidelines with 68.4% having ever read such guidelines. Only 15.3% read a guideline less than 1 month prior to the study. About 65.0% had received training on vaccine management. Incorrect handling practices reported included storing injections with vaccines (13.7%) and maintaining vaccine temperature using ice blocks (7.6%). About 43.0% had good knowledge of vaccine management, while 66.1% had good vaccine management practices. Respondents who had good knowledge of vaccine handling and storage [OR=10.0, 95%CI (5.28 – 18.94), p < 0.001] and had received formal training on vaccine management [OR=5.3, 95%CI (2.50 – 11.14), p< 0.001] were more likely to have good vaccine handling and storage practices. Conclusion: Regular training is recommended to enhance vaccine handling and storage practices.

Using text-mining-assisted analysis to examine the applicability of unstructured data in the context of customer complaint management

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Double Degree

Sparse time-frequency data analysis : a multi-scale approach

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this work, we further extend the recently developed adaptive data analysis method, the Sparse Time-Frequency Representation (STFR) method. This method is based on the assumption that many physical signals inherently contain AM-FM representations. We propose a sparse optimization method to extract the AM-FM representations of such signals. We prove the convergence of the method for periodic signals under certain assumptions and provide practical algorithms specifically for the non-periodic STFR, which extends the method to tackle problems that former STFR methods could not handle, including stability to noise and non-periodic data analysis. This is a significant improvement since many adaptive and non-adaptive signal processing methods are not fully capable of handling non-periodic signals. Moreover, we propose a new STFR algorithm to study intrawave signals with strong frequency modulation and analyze the convergence of this new algorithm for periodic signals. Such signals have previously remained a bottleneck for all signal processing methods. Furthermore, we propose a modified version of STFR that facilitates the extraction of intrawaves that have overlaping frequency content. We show that the STFR methods can be applied to the realm of dynamical systems and cardiovascular signals. In particular, we present a simplified and modified version of the STFR algorithm that is potentially useful for the diagnosis of some cardiovascular diseases. We further explain some preliminary work on the nature of Intrinsic Mode Functions (IMFs) and how they can have different representations in different phase coordinates. This analysis shows that the uncertainty principle is fundamental to all oscillating signals.

«
1
2
...
6
7
8
9
10
11
12
...
65
66
»