6 resultados para automated thematic analysis of textual data
em Digital Commons at Florida International University
Resumo:
The microarray technology provides a high-throughput technique to study gene expression. Microarrays can help us diagnose different types of cancers, understand biological processes, assess host responses to drugs and pathogens, find markers for specific diseases, and much more. Microarray experiments generate large amounts of data. Thus, effective data processing and analysis are critical for making reliable inferences from the data. ^ The first part of dissertation addresses the problem of finding an optimal set of genes (biomarkers) to classify a set of samples as diseased or normal. Three statistical gene selection methods (GS, GS-NR, and GS-PCA) were developed to identify a set of genes that best differentiate between samples. A comparative study on different classification tools was performed and the best combinations of gene selection and classifiers for multi-class cancer classification were identified. For most of the benchmarking cancer data sets, the gene selection method proposed in this dissertation, GS, outperformed other gene selection methods. The classifiers based on Random Forests, neural network ensembles, and K-nearest neighbor (KNN) showed consistently god performance. A striking commonality among these classifiers is that they all use a committee-based approach, suggesting that ensemble classification methods are superior. ^ The same biological problem may be studied at different research labs and/or performed using different lab protocols or samples. In such situations, it is important to combine results from these efforts. The second part of the dissertation addresses the problem of pooling the results from different independent experiments to obtain improved results. Four statistical pooling techniques (Fisher inverse chi-square method, Logit method. Stouffer's Z transform method, and Liptak-Stouffer weighted Z-method) were investigated in this dissertation. These pooling techniques were applied to the problem of identifying cell cycle-regulated genes in two different yeast species. As a result, improved sets of cell cycle-regulated genes were identified. The last part of dissertation explores the effectiveness of wavelet data transforms for the task of clustering. Discrete wavelet transforms, with an appropriate choice of wavelet bases, were shown to be effective in producing clusters that were biologically more meaningful. ^
Resumo:
This dissertation introduces a new approach for assessing the effects of pediatric epilepsy on the language connectome. Two novel data-driven network construction approaches are presented. These methods rely on connecting different brain regions using either extent or intensity of language related activations as identified by independent component analysis of fMRI data. An auditory description decision task (ADDT) paradigm was used to activate the language network for 29 patients and 30 controls recruited from three major pediatric hospitals. Empirical evaluations illustrated that pediatric epilepsy can cause, or is associated with, a network efficiency reduction. Patients showed a propensity to inefficiently employ the whole brain network to perform the ADDT language task; on the contrary, controls seemed to efficiently use smaller segregated network components to achieve the same task. To explain the causes of the decreased efficiency, graph theoretical analysis was carried out. The analysis revealed no substantial global network feature differences between the patient and control groups. It also showed that for both subject groups the language network exhibited small-world characteristics; however, the patient's extent of activation network showed a tendency towards more random networks. It was also shown that the intensity of activation network displayed ipsilateral hub reorganization on the local level. The left hemispheric hubs displayed greater centrality values for patients, whereas the right hemispheric hubs displayed greater centrality values for controls. This hub hemispheric disparity was not correlated with a right atypical language laterality found in six patients. Finally it was shown that a multi-level unsupervised clustering scheme based on self-organizing maps, a type of artificial neural network, and k-means was able to fairly and blindly separate the subjects into their respective patient or control groups. The clustering was initiated using the local nodal centrality measurements only. Compared to the extent of activation network, the intensity of activation network clustering demonstrated better precision. This outcome supports the assertion that the local centrality differences presented by the intensity of activation network can be associated with focal epilepsy.^
Resumo:
This dissertation introduces a new approach for assessing the effects of pediatric epilepsy on the language connectome. Two novel data-driven network construction approaches are presented. These methods rely on connecting different brain regions using either extent or intensity of language related activations as identified by independent component analysis of fMRI data. An auditory description decision task (ADDT) paradigm was used to activate the language network for 29 patients and 30 controls recruited from three major pediatric hospitals. Empirical evaluations illustrated that pediatric epilepsy can cause, or is associated with, a network efficiency reduction. Patients showed a propensity to inefficiently employ the whole brain network to perform the ADDT language task; on the contrary, controls seemed to efficiently use smaller segregated network components to achieve the same task. To explain the causes of the decreased efficiency, graph theoretical analysis was carried out. The analysis revealed no substantial global network feature differences between the patient and control groups. It also showed that for both subject groups the language network exhibited small-world characteristics; however, the patient’s extent of activation network showed a tendency towards more random networks. It was also shown that the intensity of activation network displayed ipsilateral hub reorganization on the local level. The left hemispheric hubs displayed greater centrality values for patients, whereas the right hemispheric hubs displayed greater centrality values for controls. This hub hemispheric disparity was not correlated with a right atypical language laterality found in six patients. Finally it was shown that a multi-level unsupervised clustering scheme based on self-organizing maps, a type of artificial neural network, and k-means was able to fairly and blindly separate the subjects into their respective patient or control groups. The clustering was initiated using the local nodal centrality measurements only. Compared to the extent of activation network, the intensity of activation network clustering demonstrated better precision. This outcome supports the assertion that the local centrality differences presented by the intensity of activation network can be associated with focal epilepsy.
Resumo:
Historical accuracy is only one of the components of a scholarly college textbook used to teach the history of jazz music. Textbooks in this field should include accurate ethnic representation of the most important musical figures as jazz is considered the only original American art form. As college and universities celebrate diversity, it is important that jazz history be accurate and complete. ^ The purpose of this study was to examine the content of the most commonly used jazz history textbooks currently used at American colleges and universities. This qualitative study utilized grounded and textual analysis to explore the existence of ethnic representation in these texts. The methods used were modeled after the work of Kane and Selden each of whom conducted a content analysis focused on a limited field of study. This study is focused on key jazz artists and composers whose work was created in the periods of early jazz (1915-1930), swing (1930-1945) and modern jazz (1945-1960). ^ This study considered jazz notables within the texts in terms of ethnic representation, authors' use of language, contributions to the jazz canon, and place in the standard jazz repertoire. Appropriate historical sections of the selected texts were reviewed and coded using predetermined rubrics. Data were then aggregated into categories and then analyzed according to the character assigned to the key jazz personalities noted in the text as well as the comparative standing afforded each personality. ^ The results of this study demonstrate that particular key African-American jazz artists and composers occupy a significant place in these texts while other significant individuals representing other ethnic groups are consistently overlooked. This finding suggests that while America and the world celebrates the quality of the product of American jazz as great musically and significant socially, many ethnic contributors are not mentioned with the result being a less than complete picture of the evolution of this American art form. ^
Resumo:
The primary goal of this dissertation is the study of patterns of viral evolution inferred from serially-sampled sequence data, i.e., sequence data obtained from strains isolated at consecutive time points from a single patient or host. RNA viral populations have an extremely high genetic variability, largely due to their astronomical population sizes within host systems, high replication rate, and short generation time. It is this aspect of their evolution that demands special attention and a different approach when studying the evolutionary relationships of serially-sampled sequence data. New methods that analyze serially-sampled data were developed shortly after a groundbreaking HIV-1 study of several patients from which viruses were isolated at recurring intervals over a period of 10 or more years. These methods assume a tree-like evolutionary model, while many RNA viruses have the capacity to exchange genetic material with one another using a process called recombination. ^ A genealogy involving recombination is best described by a network structure. A more general approach was implemented in a new computational tool, Sliding MinPD, one that is mindful of the sampling times of the input sequences and that reconstructs the viral evolutionary relationships in the form of a network structure with implicit representations of recombination events. The underlying network organization reveals unique patterns of viral evolution and could help explain the emergence of disease-associated mutants and drug-resistant strains, with implications for patient prognosis and treatment strategies. In order to comprehensively test the developed methods and to carry out comparison studies with other methods, synthetic data sets are critical. Therefore, appropriate sequence generators were also developed to simulate the evolution of serially-sampled recombinant viruses, new and more through evaluation criteria for recombination detection methods were established, and three major comparison studies were performed. The newly developed tools were also applied to "real" HIV-1 sequence data and it was shown that the results represented within an evolutionary network structure can be interpreted in biologically meaningful ways. ^
Resumo:
Disasters are complex events characterized by damage to key infrastructure and population displacements into disaster shelters. Assessing the living environment in shelters during disasters is a crucial health security concern. Until now, jurisdictional knowledge and preparedness on those assessment methods, or deficiencies found in shelters is limited. A cross-sectional survey (STUSA survey) ascertained knowledge and preparedness for those assessments in all 50 states, DC, and 5 US territories. Descriptive analysis of overall knowledge and preparedness was performed. Fisher’s exact statistics analyzed differences between two groups: jurisdiction type and population size. Two logistic regression models analyzed earthquakes and hurricane risks as predictors of knowledge and preparedness. A convenience sample of state shelter assessments records (n=116) was analyzed to describe environmental health deficiencies found during selected events. Overall, 55 (98%) of jurisdictions responded (states and territories) and appeared to be knowledgeable of these assessments (states 92%, territories 100%, p = 1.000), and engaged in disaster planning with shelter partners (states 96%, territories 83%, p = 0.564). Few had shelter assessment procedures (states 53%, territories 50%, p = 1.000); or training in disaster shelter assessments (states 41%, 60% territories, p = 0.638). Knowledge or preparedness was not predicted by disaster risks, population size, and jurisdiction type in neither model. Knowledge: hurricane (Adjusted OR 0.69, 95% C.I. 0.06-7.88); earthquake (OR 0.82, 95% C.I. 0.17-4.06); and both risks (OR 1.44, 95% C.I. 0.24-8.63); preparedness model: hurricane (OR 1.91, 95% C.I. 0.06-20.69); earthquake (OR 0.47, 95% C.I. 0.7-3.17); and both risks (OR 0.50, 95% C.I. 0.06-3.94). Environmental health deficiencies documented in shelter assessments occurred mostly in: sanitation (30%); facility (17%); food (15%); and sleeping areas (12%); and during ice storms and tornadoes. More research is needed in the area of environmental health assessments of disaster shelters, particularly, in those areas that may provide better insight into the living environment of all shelter occupants and potential effects in disaster morbidity and mortality. Also, to evaluate the effectiveness and usefulness of these assessments methods and the data available on environmental health deficiencies in risk management to protect those at greater risk in shelter facilities during disasters.