874 resultados para electronic healthcare data
Resumo:
This thesis investigates how web search evaluation can be improved using historical interaction data. Modern search engines combine offline and online evaluation approaches in a sequence of steps that a tested change needs to pass through to be accepted as an improvement and subsequently deployed. We refer to such a sequence of steps as an evaluation pipeline. In this thesis, we consider the evaluation pipeline to contain three sequential steps: an offline evaluation step, an online evaluation scheduling step, and an online evaluation step. In this thesis we show that historical user interaction data can aid in improving the accuracy or efficiency of each of the steps of the web search evaluation pipeline. As a result of these improvements, the overall efficiency of the entire evaluation pipeline is increased. Firstly, we investigate how user interaction data can be used to build accurate offline evaluation methods for query auto-completion mechanisms. We propose a family of offline evaluation metrics for query auto-completion that represents the effort the user has to spend in order to submit their query. The parameters of our proposed metrics are trained against a set of user interactions recorded in the search engine’s query logs. From our experimental study, we observe that our proposed metrics are significantly more correlated with an online user satisfaction indicator than the metrics proposed in the existing literature. Hence, fewer changes will pass the offline evaluation step to be rejected after the online evaluation step. As a result, this would allow us to achieve a higher efficiency of the entire evaluation pipeline. Secondly, we state the problem of the optimised scheduling of online experiments. We tackle this problem by considering a greedy scheduler that prioritises the evaluation queue according to the predicted likelihood of success of a particular experiment. This predictor is trained on a set of online experiments, and uses a diverse set of features to represent an online experiment. Our study demonstrates that a higher number of successful experiments per unit of time can be achieved by deploying such a scheduler on the second step of the evaluation pipeline. Consequently, we argue that the efficiency of the evaluation pipeline can be increased. Next, to improve the efficiency of the online evaluation step, we propose the Generalised Team Draft interleaving framework. Generalised Team Draft considers both the interleaving policy (how often a particular combination of results is shown) and click scoring (how important each click is) as parameters in a data-driven optimisation of the interleaving sensitivity. Further, Generalised Team Draft is applicable beyond domains with a list-based representation of results, i.e. in domains with a grid-based representation, such as image search. Our study using datasets of interleaving experiments performed both in document and image search domains demonstrates that Generalised Team Draft achieves the highest sensitivity. A higher sensitivity indicates that the interleaving experiments can be deployed for a shorter period of time or use a smaller sample of users. Importantly, Generalised Team Draft optimises the interleaving parameters w.r.t. historical interaction data recorded in the interleaving experiments. Finally, we propose to apply the sequential testing methods to reduce the mean deployment time for the interleaving experiments. We adapt two sequential tests for the interleaving experimentation. We demonstrate that one can achieve a significant decrease in experiment duration by using such sequential testing methods. The highest efficiency is achieved by the sequential tests that adjust their stopping thresholds using historical interaction data recorded in diagnostic experiments. Our further experimental study demonstrates that cumulative gains in the online experimentation efficiency can be achieved by combining the interleaving sensitivity optimisation approaches, including Generalised Team Draft, and the sequential testing approaches. Overall, the central contributions of this thesis are the proposed approaches to improve the accuracy or efficiency of the steps of the evaluation pipeline: the offline evaluation frameworks for the query auto-completion, an approach for the optimised scheduling of online experiments, a general framework for the efficient online interleaving evaluation, and a sequential testing approach for the online search evaluation. The experiments in this thesis are based on massive real-life datasets obtained from Yandex, a leading commercial search engine. These experiments demonstrate the potential of the proposed approaches to improve the efficiency of the evaluation pipeline.
Resumo:
For a long time, electronic data analysis has been associated with quantitative methods. However, Computer Assisted Qualitative Data Analysis Software (CAQDAS) are increasingly being developed. Although the CAQDAS has been there for decades, very few qualitative health researchers report using it. This may be due to the difficulties that one has to go through to master the software and the misconceptions that are associated with using CAQDAS. While the issue of mastering CAQDAS has received ample attention, little has been done to address the misconceptions associated with CAQDAS. In this paper, the author reflects on his experience of interacting with one of the popular CAQDAS (NVivo) in order to provide evidence-based implications of using the software. The key message is that unlike statistical software, the main function of CAQDAS is not to analyse data but rather to aid the analysis process, which the researcher must always remain in control of. In other words, researchers must equally know that no software can analyse qualitative data. CAQDAS are basically data management packages, which support the researcher during analysis.
Resumo:
Part 20: Health and Care Networks
Resumo:
Due to the sensitive nature of patient data, the secondary use of electronic health records (EHR) is restricted in scientific research and product development. Such restrictions pursue to preserve the privacy of respective patients by limiting the availability and variety of sensitive patient data. Current limitations do not correspond with the actual needs requested by the potential secondary users. In this thesis, the secondary use of Finnish and Swedish EHR data is explored for the purpose of enhancing the availability of such data for clinical research and product development. Involved EHR-related procedures and technologies are analysed to identify the issues limiting the secondary use of patient data. Successful secondary use of patient data increases the data value. To explore the identified circumstances, a case study of potential secondary users and use intentions regarding EHR data was carried out in Finland and Sweden. The data collection for the conducted case study was performed using semi-structured interviews. In total, 14 Finnish and Swedish experts representing scientific research, health management, and business were interviewed. The motivation for the corresponding interviews was to evaluate the protection of EHR data used for secondary purposes. The efficiency of implemented procedures and technologies was analysed in terms of data availability and privacy preserving. The results of the conducted case study show that the factors affecting EHR availability are divided to three categories: management of patient data, preservation of patients' privacy, and potential secondary users. Identified issues regarding data management included laborious and inconsistent data request procedures and the role and effect of external service providers. Based on the study findings, two secondary use approaches enabling the secondary use of EHR data are identified: data alteration and protected processing environment. Data alteration increases the availability of relevant EHR data, further decreasing the value of such data. Protected processing approach restricts the amount of potential users and use intentions while providing more valuable data content.
Resumo:
Purpose: To evaluate the prevalence of patients suffering from registered chronic disease list (CDL) conditions in a section of the South African private health sector from 2008 - 2012. Methods: This study was a retrospective analysis of the medicine claims database of a nationally (South African) representative Pharmacy Benefit Management (PBM) company data between 2008 and 2012. Statistical analysis was used to analyse the data. Descriptive analysis was performed to calculate the prevalence of CDL conditions for the entire population, and stratified by age and gender. However, MIXED linear modelling was used to determine changes in the average number of CDL conditions per patient, adjusted for age and gender from 2008 - 2012. Results: An increase of 0.20 in chronic diseases was observed from 2008 - 2012 in patients having any CDL condition, with an average of 1.57 (1.57 - 1.58, 95 % CI) co-morbid CDL conditions in 2008 and 1.77 (1.77 - 1.78, 95 % CI) in 2012. This increase in average number of CDL conditions per patient between 2008 and 2012 was statistically significant (p < 0.05), but with no large practical significance (d < 0.8). Conclusion: Prevalence of patients with CDL conditions along with risk of co-morbidity has been increasing with time in the private health sector of South Africa. Risk of increased co-morbidity with age and among different genders was prevalent.
Resumo:
Cette étude visait à documenter les perceptions et les croyances sur l’hygiène des mains chez des infirmières de deux hôpitaux de la République démocratique du Congo (RDC). Le modèle PRECEDE-PROCEED a guidé les travaux et permis de centrer l’analyse sur les facteurs prédisposants et les facteurs facilitants, éléments favorisant l’adoption des comportements de santé. Le devis utilisé est de type descriptif corrélationnel. Un échantillon de convenance incluant 74 infirmières recrutées dans les deux hôpitaux a été assemblé. Les données ont été recueillies au moyen d’un questionnaire auto-administré composé de 34 questions, tirées d’outils repérés dans la recension des écrits. Les questions portaient sur les connaissances, les perceptions au regard de l’hygiène des mains et l’accès aux infrastructures facilitant l’adoption de ce comportement. La collecte des données s’est déroulée à Kinshasa, capitale de la RDC. Les résultats révèlent d’importantes lacunes dans les connaissances. Les perceptions relatives aux normes sociales sont ressorties comme davantage favorables. Les résultats révèlent également des lacunes en ce qui a trait aux facteurs facilitants, notamment dans l’utilisation de la friction hydro-alcoolique. Par ailleurs, les infirmières les plus instruites et les plus expérimentées étaient plus nombreuses à percevoir l'importance de la pratique d’hygiène des mains. La discussion aborde quelques pistes en termes d’actions à entreprendre pour améliorer les comportements d’hygiène chez les infirmières dans les pays en développement telle la RDC.
Resumo:
L’accessibilité à des soins de santé pour une population habitant une région éloignée au Québec représente un défi de taille pour le Ministère de la santé et des services sociaux. Des solutions, telles que la télésanté, ont été présentées afin de pallier ce problème. Le RUIS McGill a ainsi développé un programme de téléobstétrique afin de desservir une population de femmes inuites à grossesse à risque élevé (GARE) habitant le Nunavik. L’objectif de ce mémoire fut de comprendre l’impact du service de téléobstétrique du RUIS McGill sur la santé des femmes et de leur nouveau-né ainsi que sur les coûts de santé et l’utilisation des services suite à son implantation au Centre de santé et de services sociaux Inuulitsivik sur la côte de la baie d’Hudson. Les femmes inuites à grossesse à risque élevé et leurs enfants de la région de la baie d’Hudson du Nunavik, éloignés des services obstétriques spécialisés, sont visés. Le service de téléobstétrique permet un accès aux obstétriciens du RUIS McGill localisés à Montréal. Un devis quasi-expérimental est utilisé pour examiner trois hypothèses portant sur l’état de santé des mères et des enfants, sur l’utilisation des services de santé et sur leurs coûts. Le service de téléobstétrique est devenu fonctionnel en 2006, offrant la possibilité de constituer une étude avant-après à deux groupes de femmes, soit celles ayant accouché avant 2006 (prétest) et celle ayant accouché après 2012 (post-test). La collecte de donnée se fit, dans son intégralité, par l’entremise des dossiers médicaux papier des participantes permettant l’analyse de 47 dossiers pour le prétest et de 81 dossiers pour le post-test. L’exécution d’analyse de covariance, de régression logistique et du test non paramétrique de Mann-Witney permit de conclure que le prétest et le post-test ne différent que sur deux variables, soient le poids à la naissance, plus faible dans le post-test et la pression artérielle de la mère à la naissance, plus élevée dans le post-test. Pour l’ensemble des autres variables portant sur les trois hypothèses à l’étude, les résultats de ce mémoire ne démontrent aucune différence significative entre les deux groupes démontrant ainsi qu’une même qualité de soins a été conservée suite à l’implantation du programme de téléobstétrique. Sur la base des résultats, ce mémoire recommande de revoir et modifier les objectifs du programme; de partager les bornes de communication de télésanté avec d’autres spécialités; d’entreprendre une évaluation du programme axée sur les coûts; de suivre rigoureusement l’utilisation du programme pour en maximiser l’efficacité et le potentiel; d’établir un tableau de bord; et d’entreprendre une étude évaluative comparative dans un service de téléobstétrique comparable.
Resumo:
Modern data centers host hundreds of thousands of servers to achieve economies of scale. Such a huge number of servers create challenges for the data center network (DCN) to provide proportionally large bandwidth. In addition, the deployment of virtual machines (VMs) in data centers raises the requirements for efficient resource allocation and find-grained resource sharing. Further, the large number of servers and switches in the data center consume significant amounts of energy. Even though servers become more energy efficient with various energy saving techniques, DCN still accounts for 20% to 50% of the energy consumed by the entire data center. The objective of this dissertation is to enhance DCN performance as well as its energy efficiency by conducting optimizations on both host and network sides. First, as the DCN demands huge bisection bandwidth to interconnect all the servers, we propose a parallel packet switch (PPS) architecture that directly processes variable length packets without segmentation-and-reassembly (SAR). The proposed PPS achieves large bandwidth by combining switching capacities of multiple fabrics, and it further improves the switch throughput by avoiding padding bits in SAR. Second, since certain resource demands of the VM are bursty and demonstrate stochastic nature, to satisfy both deterministic and stochastic demands in VM placement, we propose the Max-Min Multidimensional Stochastic Bin Packing (M3SBP) algorithm. M3SBP calculates an equivalent deterministic value for the stochastic demands, and maximizes the minimum resource utilization ratio of each server. Third, to provide necessary traffic isolation for VMs that share the same physical network adapter, we propose the Flow-level Bandwidth Provisioning (FBP) algorithm. By reducing the flow scheduling problem to multiple stages of packet queuing problems, FBP guarantees the provisioned bandwidth and delay performance for each flow. Finally, while DCNs are typically provisioned with full bisection bandwidth, DCN traffic demonstrates fluctuating patterns, we propose a joint host-network optimization scheme to enhance the energy efficiency of DCNs during off-peak traffic hours. The proposed scheme utilizes a unified representation method that converts the VM placement problem to a routing problem and employs depth-first and best-fit search to find efficient paths for flows.
Resumo:
The rapid growth of virtualized data centers and cloud hosting services is making the management of physical resources such as CPU, memory, and I/O bandwidth in data center servers increasingly important. Server management now involves dealing with multiple dissimilar applications with varying Service-Level-Agreements (SLAs) and multiple resource dimensions. The multiplicity and diversity of resources and applications are rendering administrative tasks more complex and challenging. This thesis aimed to develop a framework and techniques that would help substantially reduce data center management complexity. We specifically addressed two crucial data center operations. First, we precisely estimated capacity requirements of client virtual machines (VMs) while renting server space in cloud environment. Second, we proposed a systematic process to efficiently allocate physical resources to hosted VMs in a data center. To realize these dual objectives, accurately capturing the effects of resource allocations on application performance is vital. The benefits of accurate application performance modeling are multifold. Cloud users can size their VMs appropriately and pay only for the resources that they need; service providers can also offer a new charging model based on the VMs performance instead of their configured sizes. As a result, clients will pay exactly for the performance they are actually experiencing; on the other hand, administrators will be able to maximize their total revenue by utilizing application performance models and SLAs. This thesis made the following contributions. First, we identified resource control parameters crucial for distributing physical resources and characterizing contention for virtualized applications in a shared hosting environment. Second, we explored several modeling techniques and confirmed the suitability of two machine learning tools, Artificial Neural Network and Support Vector Machine, to accurately model the performance of virtualized applications. Moreover, we suggested and evaluated modeling optimizations necessary to improve prediction accuracy when using these modeling tools. Third, we presented an approach to optimal VM sizing by employing the performance models we created. Finally, we proposed a revenue-driven resource allocation algorithm which maximizes the SLA-generated revenue for a data center.
Resumo:
The development cost of any civil infrastructure is very high; during its life span, the civil structure undergoes a lot of physical loads and environmental effects which damage the structure. Failing to identify this damage at an early stage may result in severe property loss and may become a potential threat to people and the environment. Thus, there is a need to develop effective damage detection techniques to ensure the safety and integrity of the structure. One of the Structural Health Monitoring methods to evaluate a structure is by using statistical analysis. In this study, a civil structure measuring 8 feet in length, 3 feet in diameter, embedded with thermocouple sensors at 4 different levels is analyzed under controlled and variable conditions. With the help of statistical analysis, possible damage to the structure was analyzed. The analysis could detect the structural defects at various levels of the structure.
Resumo:
Thanks to the advanced technologies and social networks that allow the data to be widely shared among the Internet, there is an explosion of pervasive multimedia data, generating high demands of multimedia services and applications in various areas for people to easily access and manage multimedia data. Towards such demands, multimedia big data analysis has become an emerging hot topic in both industry and academia, which ranges from basic infrastructure, management, search, and mining to security, privacy, and applications. Within the scope of this dissertation, a multimedia big data analysis framework is proposed for semantic information management and retrieval with a focus on rare event detection in videos. The proposed framework is able to explore hidden semantic feature groups in multimedia data and incorporate temporal semantics, especially for video event detection. First, a hierarchical semantic data representation is presented to alleviate the semantic gap issue, and the Hidden Coherent Feature Group (HCFG) analysis method is proposed to capture the correlation between features and separate the original feature set into semantic groups, seamlessly integrating multimedia data in multiple modalities. Next, an Importance Factor based Temporal Multiple Correspondence Analysis (i.e., IF-TMCA) approach is presented for effective event detection. Specifically, the HCFG algorithm is integrated with the Hierarchical Information Gain Analysis (HIGA) method to generate the Importance Factor (IF) for producing the initial detection results. Then, the TMCA algorithm is proposed to efficiently incorporate temporal semantics for re-ranking and improving the final performance. At last, a sampling-based ensemble learning mechanism is applied to further accommodate the imbalanced datasets. In addition to the multimedia semantic representation and class imbalance problems, lack of organization is another critical issue for multimedia big data analysis. In this framework, an affinity propagation-based summarization method is also proposed to transform the unorganized data into a better structure with clean and well-organized information. The whole framework has been thoroughly evaluated across multiple domains, such as soccer goal event detection and disaster information management.
Resumo:
Purpose: Most individuals do not perceive a need for substance use treatment despite meeting diagnostic criteria for substance use disorders and they are least likely to pursue treatment voluntarily. There are also those who perceive a need for treatment and yet do not pursue it. This study aimed to understand which factors increase the likelihood of perceiving a need for treatment for individuals who meet diagnostic criteria for substance use disorders in the hopes to better assist with more targeted efforts for gender-specific treatment recruitment and retention. Using Andersen and Newman’s (1973/2005) model of individual determinants of healthcare utilization, the central hypothesis of the study was that gender moderates the relationship between substance use problem severity and perceived treatment need, so that women with increasing problems due to their use of substances are more likely than men to perceive a need for treatment. Additional predisposing and enabling factors from Andersen and Newman’s (1973/2005) model were included in the study to understand their impact on perceived need. Method: The study was a secondary data analysis of the 2010 National Survey on Drug Use and Health (NSDUH) using logistic regression. The weighted sample consisted of a total 20,077,235 American household residents (The unweighted sample was 5,484 participants). Results of the logistic regression were verified using Relogit software for rare events logistic regression due to the rare event of perceived treatment need (King & Zeng, 2001a; 2001b). Results: The moderating effect of female gender was not found. Conversely, men were significantly more likely than women to perceive a need for treatment as substance use problem severity increased. The study also found that a number of factors such as race, ethnicity, socioeconomic status, age, marital status, education, co-occurring mental health disorders, and prior treatment history differently impacted the likelihood of perceiving a need for treatment among men and women. Conclusion: Perceived treatment need among individuals who meet criteria for substance use disorders is rare, but identifying factors associated with an increased likelihood of perceiving need for treatment can help the development of gender-appropriate outreach and recruitment for social work treatment, and public health messages.
Resumo:
Multivariate normal distribution is commonly encountered in any field, a frequent issue is the missing values in practice. The purpose of this research was to estimate the parameters in three-dimensional covariance permutation-symmetric normal distribution with complete data and all possible patterns of incomplete data. In this study, MLE with missing data were derived, and the properties of the MLE as well as the sampling distributions were obtained. A Monte Carlo simulation study was used to evaluate the performance of the considered estimators for both cases when ρ was known and unknown. All results indicated that, compared to estimators in the case of omitting observations with missing data, the estimators derived in this article led to better performance. Furthermore, when ρ was unknown, using the estimate of ρ would lead to the same conclusion.
Resumo:
With the exponential growth of the usage of web-based map services, the web GIS application has become more and more popular. Spatial data index, search, analysis, visualization and the resource management of such services are becoming increasingly important to deliver user-desired Quality of Service. First, spatial indexing is typically time-consuming and is not available to end-users. To address this, we introduce TerraFly sksOpen, an open-sourced an Online Indexing and Querying System for Big Geospatial Data. Integrated with the TerraFly Geospatial database [1-9], sksOpen is an efficient indexing and query engine for processing Top-k Spatial Boolean Queries. Further, we provide ergonomic visualization of query results on interactive maps to facilitate the user’s data analysis. Second, due to the highly complex and dynamic nature of GIS systems, it is quite challenging for the end users to quickly understand and analyze the spatial data, and to efficiently share their own data and analysis results with others. Built on the TerraFly Geo spatial database, TerraFly GeoCloud is an extra layer running upon the TerraFly map and can efficiently support many different visualization functions and spatial data analysis models. Furthermore, users can create unique URLs to visualize and share the analysis results. TerraFly GeoCloud also enables the MapQL technology to customize map visualization using SQL-like statements [10]. Third, map systems often serve dynamic web workloads and involve multiple CPU and I/O intensive tiers, which make it challenging to meet the response time targets of map requests while using the resources efficiently. Virtualization facilitates the deployment of web map services and improves their resource utilization through encapsulation and consolidation. Autonomic resource management allows resources to be automatically provisioned to a map service and its internal tiers on demand. v-TerraFly are techniques to predict the demand of map workloads online and optimize resource allocations, considering both response time and data freshness as the QoS target. The proposed v-TerraFly system is prototyped on TerraFly, a production web map service, and evaluated using real TerraFly workloads. The results show that v-TerraFly can accurately predict the workload demands: 18.91% more accurate; and efficiently allocate resources to meet the QoS target: improves the QoS by 26.19% and saves resource usages by 20.83% compared to traditional peak load-based resource allocation.
Resumo:
Ensemble Stream Modeling and Data-cleaning are sensor information processing systems have different training and testing methods by which their goals are cross-validated. This research examines a mechanism, which seeks to extract novel patterns by generating ensembles from data. The main goal of label-less stream processing is to process the sensed events to eliminate the noises that are uncorrelated, and choose the most likely model without over fitting thus obtaining higher model confidence. Higher quality streams can be realized by combining many short streams into an ensemble which has the desired quality. The framework for the investigation is an existing data mining tool. First, to accommodate feature extraction such as a bush or natural forest-fire event we make an assumption of the burnt area (BA*), sensed ground truth as our target variable obtained from logs. Even though this is an obvious model choice the results are disappointing. The reasons for this are two: One, the histogram of fire activity is highly skewed. Two, the measured sensor parameters are highly correlated. Since using non descriptive features does not yield good results, we resort to temporal features. By doing so we carefully eliminate the averaging effects; the resulting histogram is more satisfactory and conceptual knowledge is learned from sensor streams. Second is the process of feature induction by cross-validating attributes with single or multi-target variables to minimize training error. We use F-measure score, which combines precision and accuracy to determine the false alarm rate of fire events. The multi-target data-cleaning trees use information purity of the target leaf-nodes to learn higher order features. A sensitive variance measure such as f-test is performed during each node’s split to select the best attribute. Ensemble stream model approach proved to improve when using complicated features with a simpler tree classifier. The ensemble framework for data-cleaning and the enhancements to quantify quality of fitness (30% spatial, 10% temporal, and 90% mobility reduction) of sensor led to the formation of streams for sensor-enabled applications. Which further motivates the novelty of stream quality labeling and its importance in solving vast amounts of real-time mobile streams generated today.