12 resultados para Data Interpretation, Statistical

em Digital Commons at Florida International University


Relevância:

90.00% 90.00%

Publicador:

Resumo:

Multivariate normal distribution is commonly encountered in any field, a frequent issue is the missing values in practice. The purpose of this research was to estimate the parameters in three-dimensional covariance permutation-symmetric normal distribution with complete data and all possible patterns of incomplete data. In this study, MLE with missing data were derived, and the properties of the MLE as well as the sampling distributions were obtained. A Monte Carlo simulation study was used to evaluate the performance of the considered estimators for both cases when was known and unknown. All results indicated that, compared to estimators in the case of omitting observations with missing data, the estimators derived in this article led to better performance. Furthermore, when was unknown, using the estimate of would lead to the same conclusion.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Higher education is a distribution center of knowledge and economic, social, and cultural power (Cervero & Wilson, 2001). A critical approach to understanding a higher education classroom begins with recognizing the instructor's position of power and authority (Tisdell, Hanley, & Taylor, 2000). The power instructors wield exists mostly unquestioned, allowing for teaching practices that reproduce the existing societal patterns of inequity in the classroom (Brookfield, 2000). ^ The purpose of this hermeneutic phenomenological study was to explore students' experiences with the power of their instructors in a higher education classroom. A hermeneutic phenomenological study intertwines the interpretations of both the participants and the researcher about a lived experience to uncover layers of meaning because the meanings of lived experiences are usually not readily apparent (van Manen, 1990). Fifteen participants were selected using criterion, convenience, and snowball sampling. The primary data gathering method were semi-structured interviews guided by an interview protocol (Creswell, 2003). Data were interpreted using thematic reflection (van Manen, 1990). ^ Three themes emerged from data interpretation: (a) structuring of instructor-student relationships, (b) connecting power to instructor personality, and (c) learning to navigate the terrains of higher education. How interpersonal relationships were structured in a higher education classroom shaped how students perceived power in that higher education classroom. Positive relationships were described using the metaphor of family and a perceived ethic of caring and nurturing by the instructor. As participants were consistently exposed to exercises of instructor power in a higher education classroom, they attributed those exercises of power to particular instructor traits rather than systemic exercises of power. As participants progressed from undergraduate to graduate studies, they perceived the benefits of expertise in content or knowledge development as secondary to expertise in successfully navigating the social, cultural, political, and interpersonal terrains of higher education. Ultimately, participants expressed that higher education is not about what you know; it is about learning how to play the game. Implications for teaching in higher education and considerations for future research conclude the study.^

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Large read-only or read-write transactions with a large read set and a small write set constitute an important class of transactions used in such applications as data mining, data warehousing, statistical applications, and report generators. Such transactions are best supported with optimistic concurrency, because locking of large amounts of data for extended periods of time is not an acceptable solution. The abort rate in regular optimistic concurrency algorithms increases exponentially with the size of the transaction. The algorithm proposed in this dissertation solves this problem by using a new transaction scheduling technique that allows a large transaction to commit safely with significantly greater probability that can exceed several orders of magnitude versus regular optimistic concurrency algorithms. A performance simulation study and a formal proof of serializability and external consistency of the proposed algorithm are also presented.^ This dissertation also proposes a new query optimization technique (lazy queries). Lazy Queries is an adaptive query execution scheme which optimizes itself as the query runs. Lazy queries can be used to find an intersection of sub-queries in a very efficient way, which does not require full execution of large sub-queries nor does it require any statistical knowledge about the data.^ An efficient optimistic concurrency control algorithm used in a massively parallel B-tree with variable-length keys is introduced. B-trees with variable-length keys can be effectively used in a variety of database types. In particular, we show how such a B-tree was used in our implementation of a semantic object-oriented DBMS. The concurrency control algorithm uses semantically safe optimistic virtual "locks" that achieve very fine granularity in conflict detection. This algorithm ensures serializability and external consistency by using logical clocks and backward validation of transactional queries. A formal proof of correctness of the proposed algorithm is also presented. ^

Relevância:

80.00% 80.00%

Publicador:

Resumo:

There is an increasing demand for DNA analysis because of the sensitivity of the method and the ability to uniquely identify and distinguish individuals with a high degree of certainty. But this demand has led to huge backlogs in evidence lockers since the current DNA extraction protocols require long processing time. The DNA analysis procedure becomes more complicated when analyzing sexual assault casework samples where the evidence contains more than one contributor. Additional processing to separate different cell types in order to simplify the final data interpretation further contributes to the existing cumbersome protocols. The goal of the present project is to develop a rapid and efficient extraction method that permits selective digestion of mixtures. ^ Selective recovery of male DNA was achieved with as little as 15 minutes lysis time upon exposure to high pressure under alkaline conditions. Pressure cycling technology (PCT) is carried out in a barocycler that has a small footprint and is semi-automated. Typically less than 10% male DNA is recovered using the standard extraction protocol for rape kits, almost seven times more male DNA was recovered from swabs using this novel method. Various parameters including instrument setting and buffer composition were optimized to achieve selective recovery of sperm DNA. Some developmental validation studies were also done to determine the efficiency of this method in processing samples exposed to various conditions that can affect the quality of the extraction and the final DNA profile. ^ Easy to use interface, minimal manual interference and the ability to achieve high yields with simple reagents in a relatively short time make this an ideal method for potential application in analyzing sexual assault samples.^

Relevância:

80.00% 80.00%

Publicador:

Resumo:

There is an increasing demand for DNA analysis because of the sensitivity of the method and the ability to uniquely identify and distinguish individuals with a high degree of certainty. But this demand has led to huge backlogs in evidence lockers since the current DNA extraction protocols require long processing time. The DNA analysis procedure becomes more complicated when analyzing sexual assault casework samples where the evidence contains more than one contributor. Additional processing to separate different cell types in order to simplify the final data interpretation further contributes to the existing cumbersome protocols. The goal of the present project is to develop a rapid and efficient extraction method that permits selective digestion of mixtures. Selective recovery of male DNA was achieved with as little as 15 minutes lysis time upon exposure to high pressure under alkaline conditions. Pressure cycling technology (PCT) is carried out in a barocycler that has a small footprint and is semi-automated. Typically less than 10% male DNA is recovered using the standard extraction protocol for rape kits, almost seven times more male DNA was recovered from swabs using this novel method. Various parameters including instrument setting and buffer composition were optimized to achieve selective recovery of sperm DNA. Some developmental validation studies were also done to determine the efficiency of this method in processing samples exposed to various conditions that can affect the quality of the extraction and the final DNA profile. Easy to use interface, minimal manual interference and the ability to achieve high yields with simple reagents in a relatively short time make this an ideal method for potential application in analyzing sexual assault samples.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A purpose of this research study was to demonstrate the practical linguistic study and evaluation of dissertations by using two examples of the latest technology, the microcomputer and optical scanner. That involved developing efficient methods for data entry plus creating computer algorithms appropriate for personal, linguistic studies. The goal was to develop a prototype investigation which demonstrated practical solutions for maximizing the linguistic potential of the dissertation data base. The mode of text entry was from a Dest PC Scan 1000 Optical Scanner. The function of the optical scanner was to copy the complete stack of educational dissertations from the Florida Atlantic University Library into an I.B.M. XT microcomputer. The optical scanner demonstrated its practical value by copying 15,900 pages of dissertation text directly into the microcomputer. A total of 199 dissertations or 72% of the entire stack of education dissertations (277) were successfully copied into the microcomputer's word processor where each dissertation was analyzed for a variety of syntax frequencies. The results of the study demonstrated the practical use of the optical scanner for data entry, the microcomputer for data and statistical analysis, and the availability of the college library as a natural setting for text studies. A supplemental benefit was the establishment of a computerized dissertation corpus which could be used for future research and study. The final step was to build a linguistic model of the differences in dissertation writing styles by creating 7 factors from 55 dependent variables through principal components factor analysis. The 7 factors (textual components) were then named and described on a hypothetical construct defined as a continuum from a conversational, interactional style to a formal, academic writing style. The 7 factors were then grouped through discriminant analysis to create discriminant functions for each of the 7 independent variables. The results indicated that a conversational, interactional writing style was associated with more recent dissertations (1972-1987), an increase in author's age, females, and the department of Curriculum and Instruction. A formal, academic writing style was associated with older dissertations (1972-1987), younger authors, males, and the department of Administration and Supervision. It was concluded that there were no significant differences in writing style due to subject matter (community college studies) compared to other subject matter. It was also concluded that there were no significant differences in writing style due to the location of dissertation origin (Florida Atlantic University, University of Central Florida, Florida International University).

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This dissertation develops a new mathematical approach that overcomes the effect of a data processing phenomenon known as histogram binning inherent to flow cytometry data. A real-time procedure is introduced to prove the effectiveness and fast implementation of such an approach on real-world data. The histogram binning effect is a dilemma posed by two seemingly antagonistic developments: (1) flow cytometry data in its histogram form is extended in its dynamic range to improve its analysis and interpretation, and (2) the inevitable dynamic range extension introduces an unwelcome side effect, the binning effect, which skews the statistics of the data, undermining as a consequence the accuracy of the analysis and the eventual interpretation of the data. ^ Researchers in the field contended with such a dilemma for many years, resorting either to hardware approaches that are rather costly with inherent calibration and noise effects; or have developed software techniques based on filtering the binning effect but without successfully preserving the statistical content of the original data. ^ The mathematical approach introduced in this dissertation is so appealing that a patent application has been filed. The contribution of this dissertation is an incremental scientific innovation based on a mathematical framework that will allow researchers in the field of flow cytometry to improve the interpretation of data knowing that its statistical meaning has been faithfully preserved for its optimized analysis. Furthermore, with the same mathematical foundation, proof of the origin of such an inherent artifact is provided. ^ These results are unique in that new mathematical derivations are established to define and solve the critical problem of the binning effect faced at the experimental assessment level, providing a data platform that preserves its statistical content. ^ In addition, a novel method for accumulating the log-transformed data was developed. This new method uses the properties of the transformation of statistical distributions to accumulate the output histogram in a non-integer and multi-channel fashion. Although the mathematics of this new mapping technique seem intricate, the concise nature of the derivations allow for an implementation procedure that lends itself to a real-time implementation using lookup tables, a task that is also introduced in this dissertation. ^

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This dissertation develops a new figure of merit to measure the similarity (or dissimilarity) of Gaussian distributions through a novel concept that relates the Fisher distance to the percentage of data overlap. The derivations are expanded to provide a generalized mathematical platform for determining an optimal separating boundary of Gaussian distributions in multiple dimensions. Real-world data used for implementation and in carrying out feasibility studies were provided by Beckman-Coulter. It is noted that although the data used is flow cytometric in nature, the mathematics are general in their derivation to include other types of data as long as their statistical behavior approximate Gaussian distributions. ^ Because this new figure of merit is heavily based on the statistical nature of the data, a new filtering technique is introduced to accommodate for the accumulation process involved with histogram data. When data is accumulated into a frequency histogram, the data is inherently smoothed in a linear fashion, since an averaging effect is taking place as the histogram is generated. This new filtering scheme addresses data that is accumulated in the uneven resolution of the channels of the frequency histogram. ^ The qualitative interpretation of flow cytometric data is currently a time consuming and imprecise method for evaluating histogram data. This method offers a broader spectrum of capabilities in the analysis of histograms, since the figure of merit derived in this dissertation integrates within its mathematics both a measure of similarity and the percentage of overlap between the distributions under analysis. ^

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The microarray technology provides a high-throughput technique to study gene expression. Microarrays can help us diagnose different types of cancers, understand biological processes, assess host responses to drugs and pathogens, find markers for specific diseases, and much more. Microarray experiments generate large amounts of data. Thus, effective data processing and analysis are critical for making reliable inferences from the data. ^ The first part of dissertation addresses the problem of finding an optimal set of genes (biomarkers) to classify a set of samples as diseased or normal. Three statistical gene selection methods (GS, GS-NR, and GS-PCA) were developed to identify a set of genes that best differentiate between samples. A comparative study on different classification tools was performed and the best combinations of gene selection and classifiers for multi-class cancer classification were identified. For most of the benchmarking cancer data sets, the gene selection method proposed in this dissertation, GS, outperformed other gene selection methods. The classifiers based on Random Forests, neural network ensembles, and K-nearest neighbor (KNN) showed consistently god performance. A striking commonality among these classifiers is that they all use a committee-based approach, suggesting that ensemble classification methods are superior. ^ The same biological problem may be studied at different research labs and/or performed using different lab protocols or samples. In such situations, it is important to combine results from these efforts. The second part of the dissertation addresses the problem of pooling the results from different independent experiments to obtain improved results. Four statistical pooling techniques (Fisher inverse chi-square method, Logit method. Stouffer's Z transform method, and Liptak-Stouffer weighted Z-method) were investigated in this dissertation. These pooling techniques were applied to the problem of identifying cell cycle-regulated genes in two different yeast species. As a result, improved sets of cell cycle-regulated genes were identified. The last part of dissertation explores the effectiveness of wavelet data transforms for the task of clustering. Discrete wavelet transforms, with an appropriate choice of wavelet bases, were shown to be effective in producing clusters that were biologically more meaningful. ^

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Flow Cytometry analyzers have become trusted companions due to their ability to perform fast and accurate analyses of human blood. The aim of these analyses is to determine the possible existence of abnormalities in the blood that have been correlated with serious disease states, such as infectious mononucleosis, leukemia, and various cancers. Though these analyzers provide important feedback, it is always desired to improve the accuracy of the results. This is evidenced by the occurrences of misclassifications reported by some users of these devices. It is advantageous to provide a pattern interpretation framework that is able to provide better classification ability than is currently available. Toward this end, the purpose of this dissertation was to establish a feature extraction and pattern classification framework capable of providing improved accuracy for detecting specific hematological abnormalities in flow cytometric blood data. ^ This involved extracting a unique and powerful set of shift-invariant statistical features from the multi-dimensional flow cytometry data and then using these features as inputs to a pattern classification engine composed of an artificial neural network (ANN). The contribution of this method consisted of developing a descriptor matrix that can be used to reliably assess if a donors blood pattern exhibits a clinically abnormal level of variant lymphocytes, which are blood cells that are potentially indicative of disorders such as leukemia and infectious mononucleosis. ^ This study showed that the set of shift-and-rotation-invariant statistical features extracted from the eigensystem of the flow cytometric data pattern performs better than other commonly-used features in this type of disease detection, exhibiting an accuracy of 80.7%, a sensitivity of 72.3%, and a specificity of 89.2%. This performance represents a major improvement for this type of hematological classifier, which has historically been plagued by poor performance, with accuracies as low as 60% in some cases. This research ultimately shows that an improved feature space was developed that can deliver improved performance for the detection of variant lymphocytes in human blood, thus providing significant utility in the realm of suspect flagging algorithms for the detection of blood-related diseases.^

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Stable isotope analysis has become a standard ecological tool for elucidating feeding relationships of organisms and determining food web structure and connectivity. There remain important questions concerning rates at which stable isotope values are incorporated into tissues (turnover rates) and the change in isotope value between a tissue and a food source (discrimination values). These gaps in our understanding necessitate experimental studies to adequately interpret field data. Tissue turnover rates and discrimination values vary among species and have been investigated in a broad array of taxa. However, little attention has been paid to ectothermic top predators in this regard. We quantified the turnover rates and discrimination values for three tissues (scutes, red blood cells, and plasma) in American alligators (Alligator mississippiensis). Plasma turned over faster than scutes or red blood cells, but turnover rates of all three tissues were very slow in comparison to those in endothermic species. Alligator 15N discrimination values were surprisingly low in comparison to those of other top predators and varied between experimental and control alligators. The variability of 15N discrimination values highlights the difficulties in using 15N to assign absolute and possibly even relative trophic levels in field studies. Our results suggest that interpreting stable isotope data based on parameter estimates from other species can be problematic and that large ectothermic tetrapod tissues may be characterized by unique stable isotope dynamics relative to species occupying lower trophic levels and endothermic tetrapods.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This dissertation develops a new mathematical approach that overcomes the effect of a data processing phenomenon known as "histogram binning" inherent to flow cytometry data. A real-time procedure is introduced to prove the effectiveness and fast implementation of such an approach on real-world data. The histogram binning effect is a dilemma posed by two seemingly antagonistic developments: (1) flow cytometry data in its histogram form is extended in its dynamic range to improve its analysis and interpretation, and (2) the inevitable dynamic range extension introduces an unwelcome side effect, the binning effect, which skews the statistics of the data, undermining as a consequence the accuracy of the analysis and the eventual interpretation of the data. Researchers in the field contended with such a dilemma for many years, resorting either to hardware approaches that are rather costly with inherent calibration and noise effects; or have developed software techniques based on filtering the binning effect but without successfully preserving the statistical content of the original data. The mathematical approach introduced in this dissertation is so appealing that a patent application has been filed. The contribution of this dissertation is an incremental scientific innovation based on a mathematical framework that will allow researchers in the field of flow cytometry to improve the interpretation of data knowing that its statistical meaning has been faithfully preserved for its optimized analysis. Furthermore, with the same mathematical foundation, proof of the origin of such an inherent artifact is provided. These results are unique in that new mathematical derivations are established to define and solve the critical problem of the binning effect faced at the experimental assessment level, providing a data platform that preserves its statistical content. In addition, a novel method for accumulating the log-transformed data was developed. This new method uses the properties of the transformation of statistical distributions to accumulate the output histogram in a non-integer and multi-channel fashion. Although the mathematics of this new mapping technique seem intricate, the concise nature of the derivations allow for an implementation procedure that lends itself to a real-time implementation using lookup tables, a task that is also introduced in this dissertation.