979 resultados para Methods : Statistical
Resumo:
The microarray technology provides a high-throughput technique to study gene expression. Microarrays can help us diagnose different types of cancers, understand biological processes, assess host responses to drugs and pathogens, find markers for specific diseases, and much more. Microarray experiments generate large amounts of data. Thus, effective data processing and analysis are critical for making reliable inferences from the data. ^ The first part of dissertation addresses the problem of finding an optimal set of genes (biomarkers) to classify a set of samples as diseased or normal. Three statistical gene selection methods (GS, GS-NR, and GS-PCA) were developed to identify a set of genes that best differentiate between samples. A comparative study on different classification tools was performed and the best combinations of gene selection and classifiers for multi-class cancer classification were identified. For most of the benchmarking cancer data sets, the gene selection method proposed in this dissertation, GS, outperformed other gene selection methods. The classifiers based on Random Forests, neural network ensembles, and K-nearest neighbor (KNN) showed consistently god performance. A striking commonality among these classifiers is that they all use a committee-based approach, suggesting that ensemble classification methods are superior. ^ The same biological problem may be studied at different research labs and/or performed using different lab protocols or samples. In such situations, it is important to combine results from these efforts. The second part of the dissertation addresses the problem of pooling the results from different independent experiments to obtain improved results. Four statistical pooling techniques (Fisher inverse chi-square method, Logit method. Stouffer's Z transform method, and Liptak-Stouffer weighted Z-method) were investigated in this dissertation. These pooling techniques were applied to the problem of identifying cell cycle-regulated genes in two different yeast species. As a result, improved sets of cell cycle-regulated genes were identified. The last part of dissertation explores the effectiveness of wavelet data transforms for the task of clustering. Discrete wavelet transforms, with an appropriate choice of wavelet bases, were shown to be effective in producing clusters that were biologically more meaningful. ^
Resumo:
The purpose of the present dissertation was to evaluate the internal validity of symptoms of four common anxiety disorders included in the Diagnostic and Statistical Manual of Mental Disorders fourth edition (text revision) (DSM-IV-TR; American Psychiatric Association, 2000), namely, separation anxiety disorder (SAD), social phobia (SOP), specific phobia (SP), and generalized anxiety disorder (GAD), in a sample of 625 youth (ages 6 to 17 years) referred to an anxiety disorders clinic and 479 parents. Confirmatory factor analyses (CFAs) were conducted on the dichotomous items of the SAD, SOP, SP, and GAD sections of the youth and parent versions of the Anxiety Disorders Interview Schedule for DSM-IV (ADIS-IV: C/P; Silverman & Albano, 1996) to test and compare a number of factor models including a factor model based on the DSM. Contrary to predictions, findings from CFAs showed that a correlated model with five factors of SAD, SOP, SP, GAD worry, and GAD somatic distress, provided the best fit of the youth data as well as the parent data. Multiple group CFAs supported the metric invariance of the correlated five factor model across boys and girls. Thus, the present study’s finding supports the internal validity of DSM-IV SAD, SOP, and SP, but raises doubt regarding the internal validity of GAD.^
Resumo:
The adverse health effects of long-term exposure to lead are well established, with major uptake into the human body occurring mainly through oral ingestion by young children. Lead-based paint was frequently used in homes built before 1978, particularly in inner-city areas. Minority populations experience the effects of lead poisoning disproportionately. ^ Lead-based paint abatement is costly. In the United States, residents of about 400,000 homes, occupied by 900,000 young children, lack the means to correct lead-based paint hazards. The magnitude of this problem demands research on affordable methods of hazard control. One method is encapsulation, defined as any covering or coating that acts as a permanent barrier between the lead-based paint surface and the environment. ^ Two encapsulants were tested for reliability and effective life span through an accelerated lifetime experiment that applied stresses exceeding those encountered under normal use conditions. The resulting time-to-failure data were used to extrapolate the failure time under conditions of normal use. Statistical analysis and models of the test data allow forecasting of long-term reliability relative to the 20-year encapsulation requirement. Typical housing material specimens simulating walls and doors coated with lead-based paint were overstressed before encapsulation. A second, un-aged set was also tested. Specimens were monitored after the stress test with a surface chemical testing pad to identify the presence of lead breaking through the encapsulant. ^ Graphical analysis proposed by Shapiro and Meeker and the general log-linear model developed by Cox were used to obtain results. Findings for the 80% reliability time to failure varied, with close to 21 years of life under normal use conditions for encapsulant A. The application of product A on the aged gypsum and aged wood substrates yielded slightly lower times. Encapsulant B had an 80% reliable life of 19.78 years. ^ This study reveals that encapsulation technologies can offer safe and effective control of lead-based paint hazards and may be less expensive than other options. The U.S. Department of Health and Human Services and the CDC are committed to eliminating childhood lead poisoning by 2010. This ambitious target is feasible, provided there is an efficient application of innovative technology, a goal to which this study aims to contribute. ^
Resumo:
Gene-based tests of association are frequently applied to common SNPs (MAF>5%) as an alternative to single-marker tests. In this analysis we conduct a variety of simulation studies applied to five popular gene-based tests investigating general trends related to their performance in realistic situations. In particular, we focus on the impact of non-causal SNPs and a variety of LD structures on the behavior of these tests. Ultimately, we find that non-causal SNPs can significantly impact the power of all gene-based tests. On average, we find that the “noise” from 6–12 non-causal SNPs will cancel out the “signal” of one causal SNP across five popular gene-based tests. Furthermore, we find complex and differing behavior of the methods in the presence of LD within and between non-causal and causal SNPs. Ultimately, better approaches for a priori prioritization of potentially causal SNPs (e.g., predicting functionality of non-synonymous SNPs), application of these methods to sequenced or fully imputed datasets, and limited use of window-based methods for assigning inter-genic SNPs to genes will improve power. However, significant power loss from non-causal SNPs may remain unless alternative statistical approaches robust to the inclusion of non-causal SNPs are developed.
Resumo:
Research endeavors on spoken dialogue systems in the 1990s and 2000s have led to the deployment of commercial spoken dialogue systems (SDS) in microdomains such as customer service automation, reservation/booking and question answering systems. Recent research in SDS has been focused on the development of applications in different domains (e.g. virtual counseling, personal coaches, social companions) which requires more sophistication than the previous generation of commercial SDS. The focus of this research project is the delivery of behavior change interventions based on the brief intervention counseling style via spoken dialogue systems. ^ Brief interventions (BI) are evidence-based, short, well structured, one-on-one counseling sessions. Many challenges are involved in delivering BIs to people in need, such as finding the time to administer them in busy doctors' offices, obtaining the extra training that helps staff become comfortable providing these interventions, and managing the cost of delivering the interventions. Fortunately, recent developments in spoken dialogue systems make the development of systems that can deliver brief interventions possible. ^ The overall objective of this research is to develop a data-driven, adaptable dialogue system for brief interventions for problematic drinking behavior, based on reinforcement learning methods. The implications of this research project includes, but are not limited to, assessing the feasibility of delivering structured brief health interventions with a data-driven spoken dialogue system. Furthermore, while the experimental system focuses on harmful alcohol drinking as a target behavior in this project, the produced knowledge and experience may also lead to implementation of similarly structured health interventions and assessments other than the alcohol domain (e.g. obesity, drug use, lack of exercise), using statistical machine learning approaches. ^ In addition to designing a dialog system, the semantic and emotional meanings of user utterances have high impact on interaction. To perform domain specific reasoning and recognize concepts in user utterances, a named-entity recognizer and an ontology are designed and evaluated. To understand affective information conveyed through text, lexicons and sentiment analysis module are developed and tested.^
Resumo:
It was recently shown [Phys. Rev. Lett. 110, 227201 (2013)] that the critical behavior of the random-field Ising model in three dimensions is ruled by a single universality class. This conclusion was reached only after a proper taming of the large scaling corrections of the model by applying a combined approach of various techniques, coming from the zero-and positive-temperature toolboxes of statistical physics. In the present contribution we provide a detailed description of this combined scheme, explaining in detail the zero-temperature numerical scheme and developing the generalized fluctuation-dissipation formula that allowed us to compute connected and disconnected correlation functions of the model. We discuss the error evolution of our method and we illustrate the infinite limit-size extrapolation of several observables within phenomenological renormalization. We present an extension of the quotients method that allows us to obtain estimates of the critical exponent a of the specific heat of the model via the scaling of the bond energy and we discuss the self-averaging properties of the system and the algorithmic aspects of the maximum-flow algorithm used.
Resumo:
This thesis introduces two related lines of study on classification of hyperspectral images with nonlinear methods. First, it describes a quantitative and systematic evaluation, by the author, of each major component in a pipeline for classifying hyperspectral images (HSI) developed earlier in a joint collaboration [23]. The pipeline, with novel use of nonlinear classification methods, has reached beyond the state of the art in classification accuracy on commonly used benchmarking HSI data [6], [13]. More importantly, it provides a clutter map, with respect to a predetermined set of classes, toward the real application situations where the image pixels not necessarily fall into a predetermined set of classes to be identified, detected or classified with.
The particular components evaluated are a) band selection with band-wise entropy spread, b) feature transformation with spatial filters and spectral expansion with derivatives c) graph spectral transformation via locally linear embedding for dimension reduction, and d) statistical ensemble for clutter detection. The quantitative evaluation of the pipeline verifies that these components are indispensable to high-accuracy classification.
Secondly, the work extends the HSI classification pipeline with a single HSI data cube to multiple HSI data cubes. Each cube, with feature variation, is to be classified of multiple classes. The main challenge is deriving the cube-wise classification from pixel-wise classification. The thesis presents the initial attempt to circumvent it, and discuss the potential for further improvement.
Resumo:
OBJECTIVE: To demonstrate the application of causal inference methods to observational data in the obstetrics and gynecology field, particularly causal modeling and semi-parametric estimation. BACKGROUND: Human immunodeficiency virus (HIV)-positive women are at increased risk for cervical cancer and its treatable precursors. Determining whether potential risk factors such as hormonal contraception are true causes is critical for informing public health strategies as longevity increases among HIV-positive women in developing countries. METHODS: We developed a causal model of the factors related to combined oral contraceptive (COC) use and cervical intraepithelial neoplasia 2 or greater (CIN2+) and modified the model to fit the observed data, drawn from women in a cervical cancer screening program at HIV clinics in Kenya. Assumptions required for substantiation of a causal relationship were assessed. We estimated the population-level association using semi-parametric methods: g-computation, inverse probability of treatment weighting, and targeted maximum likelihood estimation. RESULTS: We identified 2 plausible causal paths from COC use to CIN2+: via HPV infection and via increased disease progression. Study data enabled estimation of the latter only with strong assumptions of no unmeasured confounding. Of 2,519 women under 50 screened per protocol, 219 (8.7%) were diagnosed with CIN2+. Marginal modeling suggested a 2.9% (95% confidence interval 0.1%, 6.9%) increase in prevalence of CIN2+ if all women under 50 were exposed to COC; the significance of this association was sensitive to method of estimation and exposure misclassification. CONCLUSION: Use of causal modeling enabled clear representation of the causal relationship of interest and the assumptions required to estimate that relationship from the observed data. Semi-parametric estimation methods provided flexibility and reduced reliance on correct model form. Although selected results suggest an increased prevalence of CIN2+ associated with COC, evidence is insufficient to conclude causality. Priority areas for future studies to better satisfy causal criteria are identified.
Resumo:
Current state of the art techniques for landmine detection in ground penetrating radar (GPR) utilize statistical methods to identify characteristics of a landmine response. This research makes use of 2-D slices of data in which subsurface landmine responses have hyperbolic shapes. Various methods from the field of visual image processing are adapted to the 2-D GPR data, producing superior landmine detection results. This research goes on to develop a physics-based GPR augmentation method motivated by current advances in visual object detection. This GPR specific augmentation is used to mitigate issues caused by insufficient training sets. This work shows that augmentation improves detection performance under training conditions that are normally very difficult. Finally, this work introduces the use of convolutional neural networks as a method to learn feature extraction parameters. These learned convolutional features outperform hand-designed features in GPR detection tasks. This work presents a number of methods, both borrowed from and motivated by the substantial work in visual image processing. The methods developed and presented in this work show an improvement in overall detection performance and introduce a method to improve the robustness of statistical classification.
Resumo:
Research endeavors on spoken dialogue systems in the 1990s and 2000s have led to the deployment of commercial spoken dialogue systems (SDS) in microdomains such as customer service automation, reservation/booking and question answering systems. Recent research in SDS has been focused on the development of applications in different domains (e.g. virtual counseling, personal coaches, social companions) which requires more sophistication than the previous generation of commercial SDS. The focus of this research project is the delivery of behavior change interventions based on the brief intervention counseling style via spoken dialogue systems. Brief interventions (BI) are evidence-based, short, well structured, one-on-one counseling sessions. Many challenges are involved in delivering BIs to people in need, such as finding the time to administer them in busy doctors' offices, obtaining the extra training that helps staff become comfortable providing these interventions, and managing the cost of delivering the interventions. Fortunately, recent developments in spoken dialogue systems make the development of systems that can deliver brief interventions possible. The overall objective of this research is to develop a data-driven, adaptable dialogue system for brief interventions for problematic drinking behavior, based on reinforcement learning methods. The implications of this research project includes, but are not limited to, assessing the feasibility of delivering structured brief health interventions with a data-driven spoken dialogue system. Furthermore, while the experimental system focuses on harmful alcohol drinking as a target behavior in this project, the produced knowledge and experience may also lead to implementation of similarly structured health interventions and assessments other than the alcohol domain (e.g. obesity, drug use, lack of exercise), using statistical machine learning approaches. In addition to designing a dialog system, the semantic and emotional meanings of user utterances have high impact on interaction. To perform domain specific reasoning and recognize concepts in user utterances, a named-entity recognizer and an ontology are designed and evaluated. To understand affective information conveyed through text, lexicons and sentiment analysis module are developed and tested.
Resumo:
Hypertrophic cardiomyopathy (HCM) is a cardiovascular disease where the heart muscle is partially thickened and blood flow is - potentially fatally - obstructed. It is one of the leading causes of sudden cardiac death in young people. Electrocardiography (ECG) and Echocardiography (Echo) are the standard tests for identifying HCM and other cardiac abnormalities. The American Heart Association has recommended using a pre-participation questionnaire for young athletes instead of ECG or Echo tests due to considerations of cost and time involved in interpreting the results of these tests by an expert cardiologist. Initially we set out to develop a classifier for automated prediction of young athletes’ heart conditions based on the answers to the questionnaire. Classification results and further in-depth analysis using computational and statistical methods indicated significant shortcomings of the questionnaire in predicting cardiac abnormalities. Automated methods for analyzing ECG signals can help reduce cost and save time in the pre-participation screening process by detecting HCM and other cardiac abnormalities. Therefore, the main goal of this dissertation work is to identify HCM through computational analysis of 12-lead ECG. ECG signals recorded on one or two leads have been analyzed in the past for classifying individual heartbeats into different types of arrhythmia as annotated primarily in the MIT-BIH database. In contrast, we classify complete sequences of 12-lead ECGs to assign patients into two groups: HCM vs. non-HCM. The challenges and issues we address include missing ECG waves in one or more leads and the dimensionality of a large feature-set. We address these by proposing imputation and feature-selection methods. We develop heartbeat-classifiers by employing Random Forests and Support Vector Machines, and propose a method to classify full 12-lead ECGs based on the proportion of heartbeats classified as HCM. The results from our experiments show that the classifiers developed using our methods perform well in identifying HCM. Thus the two contributions of this thesis are the utilization of computational and statistical methods for discovering shortcomings in a current screening procedure and the development of methods to identify HCM through computational analysis of 12-lead ECG signals.
Resumo:
Energy saving, reduction of greenhouse gasses and increased use of renewables are key policies to achieve the European 2020 targets. In particular, distributed renewable energy sources, integrated with spatial planning, require novel methods to optimise supply and demand. In contrast with large scale wind turbines, small and medium wind turbines (SMWTs) have a less extensive impact on the use of space and the power system, nevertheless, a significant spatial footprint is still present and the need for good spatial planning is a necessity. To optimise the location of SMWTs, detailed knowledge of the spatial distribution of the average wind speed is essential, hence, in this article, wind measurements and roughness maps were used to create a reliable annual mean wind speed map of Flanders at 10 m above the Earth’s surface. Via roughness transformation, the surface wind speed measurements were converted into meso- and macroscale wind data. The data were further processed by using seven different spatial interpolation methods in order to develop regional wind resource maps. Based on statistical analysis, it was found that the transformation into mesoscale wind, in combination with Simple Kriging, was the most adequate method to create reliable maps for decision-making on optimal production sites for SMWTs in Flanders (Belgium).
Resumo:
Shape-based registration methods frequently encounters in the domains of computer vision, image processing and medical imaging. The registration problem is to find an optimal transformation/mapping between sets of rigid or nonrigid objects and to automatically solve for correspondences. In this paper we present a comparison of two different probabilistic methods, the entropy and the growing neural gas network (GNG), as general feature-based registration algorithms. Using entropy shape modelling is performed by connecting the point sets with the highest probability of curvature information, while with GNG the points sets are connected using nearest-neighbour relationships derived from competitive hebbian learning. In order to compare performances we use different levels of shape deformation starting with a simple shape 2D MRI brain ventricles and moving to more complicated shapes like hands. Results both quantitatively and qualitatively are given for both sets.
Resumo:
Harmful algal blooms (HABs) are a natural global phenomena emerging in severity and extent. Incidents have many economic, ecological and human health impacts. Monitoring and providing early warning of toxic HABs are critical for protecting public health. Current monitoring programmes include measuring the number of toxic phytoplankton cells in the water and biotoxin levels in shellfish tissue. As these efforts are demanding and labour intensive, methods which improve the efficiency are essential. This study compares the utilisation of a multitoxin surface plasmon resonance (multitoxin SPR) biosensor with enzyme-linked immunosorbent assay (ELISA) and analytical methods such as high performance liquid chromatography with fluorescence detection (HPLC-FLD) and liquid chromatography–tandem mass spectrometry (LC–MS/MS) for toxic HAB monitoring efforts in Europe. Seawater samples (n = 256) from European waters, collected 2009–2011, were analysed for biotoxins: saxitoxin and analogues, okadaic acid and dinophysistoxins 1/2 (DTX1/DTX2) and domoic acid responsible for paralytic shellfish poisoning (PSP), diarrheic shellfish poisoning (DSP) and amnesic shellfish poisoning (ASP), respectively. Biotoxins were detected mainly in samples from Spain and Ireland. France and Norway appeared to have the lowest number of toxic samples. Both the multitoxin SPR biosensor and the RNA microarray were more sensitive at detecting toxic HABs than standard light microscopy phytoplankton monitoring. Correlations between each of the detection methods were performed with the overall agreement, based on statistical 2 × 2 comparison tables, between each testing platform ranging between 32% and 74% for all three toxin families illustrating that one individual testing method may not be an ideal solution. An efficient early warning monitoring system for the detection of toxic HABs could therefore be achieved by combining both the multitoxin SPR biosensor and RNA microarray.
Resumo:
Harmful algal blooms (HABs) are a natural global phenomena emerging in severity and extent. Incidents have many economic, ecological and human health impacts. Monitoring and providing early warning of toxic HABs are critical for protecting public health. Current monitoring programmes include measuring the number of toxic phytoplankton cells in the water and biotoxin levels in shellfish tissue. As these efforts are demanding and labour intensive, methods which improve the efficiency are essential. This study compares the utilisation of a multitoxin surface plasmon resonance (multitoxin SPR) biosensor with enzyme-linked immunosorbent assay (ELISA) and analytical methods such as high performance liquid chromatography with fluorescence detection (HPLC-FLD) and liquid chromatography–tandem mass spectrometry (LC–MS/MS) for toxic HAB monitoring efforts in Europe. Seawater samples (n = 256) from European waters, collected 2009–2011, were analysed for biotoxins: saxitoxin and analogues, okadaic acid and dinophysistoxins 1/2 (DTX1/DTX2) and domoic acid responsible for paralytic shellfish poisoning (PSP), diarrheic shellfish poisoning (DSP) and amnesic shellfish poisoning (ASP), respectively. Biotoxins were detected mainly in samples from Spain and Ireland. France and Norway appeared to have the lowest number of toxic samples. Both the multitoxin SPR biosensor and the RNA microarray were more sensitive at detecting toxic HABs than standard light microscopy phytoplankton monitoring. Correlations between each of the detection methods were performed with the overall agreement, based on statistical 2 × 2 comparison tables, between each testing platform ranging between 32% and 74% for all three toxin families illustrating that one individual testing method may not be an ideal solution. An efficient early warning monitoring system for the detection of toxic HABs could therefore be achieved by combining both the multitoxin SPR biosensor and RNA microarray.