12 resultados para Data processing and analysis
em Digital Commons at Florida International University
Resumo:
This paper details the research methods an introductory qualitative research class used to both study an issue related to race and identity, and to familiarize themselves with data collection strategies. Throughout the paper the authors attempt to capture the challenges, disagreements, and consensus building that marked this unusual research endeavor.
Resumo:
The microarray technology provides a high-throughput technique to study gene expression. Microarrays can help us diagnose different types of cancers, understand biological processes, assess host responses to drugs and pathogens, find markers for specific diseases, and much more. Microarray experiments generate large amounts of data. Thus, effective data processing and analysis are critical for making reliable inferences from the data. ^ The first part of dissertation addresses the problem of finding an optimal set of genes (biomarkers) to classify a set of samples as diseased or normal. Three statistical gene selection methods (GS, GS-NR, and GS-PCA) were developed to identify a set of genes that best differentiate between samples. A comparative study on different classification tools was performed and the best combinations of gene selection and classifiers for multi-class cancer classification were identified. For most of the benchmarking cancer data sets, the gene selection method proposed in this dissertation, GS, outperformed other gene selection methods. The classifiers based on Random Forests, neural network ensembles, and K-nearest neighbor (KNN) showed consistently god performance. A striking commonality among these classifiers is that they all use a committee-based approach, suggesting that ensemble classification methods are superior. ^ The same biological problem may be studied at different research labs and/or performed using different lab protocols or samples. In such situations, it is important to combine results from these efforts. The second part of the dissertation addresses the problem of pooling the results from different independent experiments to obtain improved results. Four statistical pooling techniques (Fisher inverse chi-square method, Logit method. Stouffer's Z transform method, and Liptak-Stouffer weighted Z-method) were investigated in this dissertation. These pooling techniques were applied to the problem of identifying cell cycle-regulated genes in two different yeast species. As a result, improved sets of cell cycle-regulated genes were identified. The last part of dissertation explores the effectiveness of wavelet data transforms for the task of clustering. Discrete wavelet transforms, with an appropriate choice of wavelet bases, were shown to be effective in producing clusters that were biologically more meaningful. ^
Resumo:
This research presents several components encompassing the scope of the objective of Data Partitioning and Replication Management in Distributed GIS Database. Modern Geographic Information Systems (GIS) databases are often large and complicated. Therefore data partitioning and replication management problems need to be addresses in development of an efficient and scalable solution. ^ Part of the research is to study the patterns of geographical raster data processing and to propose the algorithms to improve availability of such data. These algorithms and approaches are targeting granularity of geographic data objects as well as data partitioning in geographic databases to achieve high data availability and Quality of Service(QoS) considering distributed data delivery and processing. To achieve this goal a dynamic, real-time approach for mosaicking digital images of different temporal and spatial characteristics into tiles is proposed. This dynamic approach reuses digital images upon demand and generates mosaicked tiles only for the required region according to user's requirements such as resolution, temporal range, and target bands to reduce redundancy in storage and to utilize available computing and storage resources more efficiently. ^ Another part of the research pursued methods for efficient acquiring of GIS data from external heterogeneous databases and Web services as well as end-user GIS data delivery enhancements, automation and 3D virtual reality presentation. ^ There are vast numbers of computing, network, and storage resources idling or not fully utilized available on the Internet. Proposed "Crawling Distributed Operating System "(CDOS) approach employs such resources and creates benefits for the hosts that lend their CPU, network, and storage resources to be used in GIS database context. ^ The results of this dissertation demonstrate effective ways to develop a highly scalable GIS database. The approach developed in this dissertation has resulted in creation of TerraFly GIS database that is used by US government, researchers, and general public to facilitate Web access to remotely-sensed imagery and GIS vector information. ^
Resumo:
Accurately assessing the extent of myocardial tissue injury induced by Myocardial infarction (MI) is critical to the planning and optimization of MI patient management. With this in mind, this study investigated the feasibility of using combined fluorescence and diffuse reflectance spectroscopy to characterize a myocardial infarct at the different stages of its development. An animal study was conducted using twenty male Sprague-Dawley rats with MI. In vivo fluorescence spectra at 337 nm excitation and diffuse reflectance between 400 nm and 900 nm were measured from the heart using a portable fiber-optic spectroscopic system. Spectral acquisition was performed on (1) the normal heart region; (2) the region immediately surrounding the infarct; and (3) the infarcted region—one, two, three and four weeks into MI development. The spectral data were divided into six subgroups according to the histopathological features associated with various degrees/severities of myocardial tissue injury as well as various stages of myocardial tissue remodeling, post infarction. Various data processing and analysis techniques were employed to recognize the representative spectral features corresponding to various histopathological features associated with myocardial infarction. The identified spectral features were utilized in discriminant analysis to further evaluate their effectiveness in classifying tissue injuries induced by MI. In this study, it was observed that MI induced significant alterations (p < 0.05) in the diffuse reflectance spectra, especially between 450 nm and 600 nm, from myocardial tissue within the infarcted and surrounding regions. In addition, MI induced a significant elevation in fluorescence intensities at 400 and 460 nm from the myocardial tissue from the same regions. The extent of these spectral alterations was related to the duration of the infarction. Using the spectral features identified, an effective tissue injury classification algorithm was developed which produced a satisfactory overall classification result (87.8%). The findings of this research support the concept that optical spectroscopy represents a useful tool to non-invasively determine the in vivo pathophysiological features of a myocardial infarct and its surrounding tissue, thereby providing valuable real-time feedback to surgeons during various surgical interventions for MI.
Resumo:
Accurately assessing the extent of myocardial tissue injury induced by Myocardial infarction (MI) is critical to the planning and optimization of MI patient management. With this in mind, this study investigated the feasibility of using combined fluorescence and diffuse reflectance spectroscopy to characterize a myocardial infarct at the different stages of its development. An animal study was conducted using twenty male Sprague-Dawley rats with MI. In vivo fluorescence spectra at 337 nm excitation and diffuse reflectance between 400 nm and 900 nm were measured from the heart using a portable fiber-optic spectroscopic system. Spectral acquisition was performed on - (1) the normal heart region; (2) the region immediately surrounding the infarct; and (3) the infarcted region - one, two, three and four weeks into MI development. The spectral data were divided into six subgroups according to the histopathological features associated with various degrees / severities of myocardial tissue injury as well as various stages of myocardial tissue remodeling, post infarction. Various data processing and analysis techniques were employed to recognize the representative spectral features corresponding to various histopathological features associated with myocardial infarction. The identified spectral features were utilized in discriminant analysis to further evaluate their effectiveness in classifying tissue injuries induced by MI. In this study, it was observed that MI induced significant alterations (p < 0.05) in the diffuse reflectance spectra, especially between 450 nm and 600 nm, from myocardial tissue within the infarcted and surrounding regions. In addition, MI induced a significant elevation in fluorescence intensities at 400 and 460 nm from the myocardial tissue from the same regions. The extent of these spectral alterations was related to the duration of the infarction. Using the spectral features identified, an effective tissue injury classification algorithm was developed which produced a satisfactory overall classification result (87.8%). The findings of this research support the concept that optical spectroscopy represents a useful tool to non-invasively determine the in vivo pathophysiological features of a myocardial infarct and its surrounding tissue, thereby providing valuable real-time feedback to surgeons during various surgical interventions for MI.
Resumo:
This research pursued the conceptualization and real-time verification of a system that allows a computer user to control the cursor of a computer interface without using his/her hands. The target user groups for this system are individuals who are unable to use their hands due to spinal dysfunction or other afflictions, and individuals who must use their hands for higher priority tasks while still requiring interaction with a computer. ^ The system receives two forms of input from the user: Electromyogram (EMG) signals from muscles in the face and point-of-gaze coordinates produced by an Eye Gaze Tracking (EGT) system. In order to produce reliable cursor control from the two forms of user input, the development of this EMG/EGT system addressed three key requirements: an algorithm was created to accurately translate EMG signals due to facial movements into cursor actions, a separate algorithm was created that recognized an eye gaze fixation and provided an estimate of the associated eye gaze position, and an information fusion protocol was devised to efficiently integrate the outputs of these algorithms. ^ Experiments were conducted to compare the performance of EMG/EGT cursor control to EGT-only control and mouse control. These experiments took the form of two different types of point-and-click trials. The data produced by these experiments were evaluated using statistical analysis, Fitts' Law analysis and target re-entry (TRE) analysis. ^ The experimental results revealed that though EMG/EGT control was slower than EGT-only and mouse control, it provided effective hands-free control of the cursor without a spatial accuracy limitation, and it also facilitated a reliable click operation. This combination of qualities is not possessed by either EGT-only or mouse control, making EMG/EGT cursor control a unique and practical alternative for a user's cursor control needs. ^
Resumo:
This dissertation established a software-hardware integrated design for a multisite data repository in pediatric epilepsy. A total of 16 institutions formed a consortium for this web-based application. This innovative fully operational web application allows users to upload and retrieve information through a unique human-computer graphical interface that is remotely accessible to all users of the consortium. A solution based on a Linux platform with My-SQL and Personal Home Page scripts (PHP) has been selected. Research was conducted to evaluate mechanisms to electronically transfer diverse datasets from different hospitals and collect the clinical data in concert with their related functional magnetic resonance imaging (fMRI). What was unique in the approach considered is that all pertinent clinical information about patients is synthesized with input from clinical experts into 4 different forms, which were: Clinical, fMRI scoring, Image information, and Neuropsychological data entry forms. A first contribution of this dissertation was in proposing an integrated processing platform that was site and scanner independent in order to uniformly process the varied fMRI datasets and to generate comparative brain activation patterns. The data collection from the consortium complied with the IRB requirements and provides all the safeguards for security and confidentiality requirements. An 1-MR1-based software library was used to perform data processing and statistical analysis to obtain the brain activation maps. Lateralization Index (LI) of healthy control (HC) subjects in contrast to localization-related epilepsy (LRE) subjects were evaluated. Over 110 activation maps were generated, and their respective LIs were computed yielding the following groups: (a) strong right lateralization: (HC=0%, LRE=18%), (b) right lateralization: (HC=2%, LRE=10%), (c) bilateral: (HC=20%, LRE=15%), (d) left lateralization: (HC=42%, LRE=26%), e) strong left lateralization: (HC=36%, LRE=31%). Moreover, nonlinear-multidimensional decision functions were used to seek an optimal separation between typical and atypical brain activations on the basis of the demographics as well as the extent and intensity of these brain activations. The intent was not to seek the highest output measures given the inherent overlap of the data, but rather to assess which of the many dimensions were critical in the overall assessment of typical and atypical language activations with the freedom to select any number of dimensions and impose any degree of complexity in the nonlinearity of the decision space.
Resumo:
The purpose of this ethnographic study was to describe and explain the congruency of psychological preferences identified by the Myers-Briggs Type Indicator (MBTI) and the human resource development (HRD) role of instructor/facilitator. This investigation was conducted with 23 HRD professionals who worked in the Miami, Florida area as instructors/facilitators with adult learners in job-related contexts.^ The study was conducted using qualitative strategies of data collection and analysis. The research participants were selected through a purposive sampling strategy. Data collection strategies included: (a) administration and scoring of the MBTI, Form G, (b) open-ended and semi-structured interviews, (c) participant observations of the research subjects at their respective work sites and while conducting training sessions, (d) field notes, and (e) contact summary sheets to record field research encounters. Data analysis was conducted with the use of a computer program for qualitative analysis called FolioViews 3.1 for Windows. This included: (a) coding of transcribed interviews and field notes, (b) theme analysis, (c) memoing, and (d) cross-case analysis.^ The three major themes that emerged in relation to the congruency of psychological preferences and the role of instructor/facilitator were: (1) designing and preparing instruction/facilitation, (2) conducting training and managing group process, and (3) interpersonal relations and perspectives among instructors/facilitators.^ The first two themes were analyzed through the combination of the four Jungian personality functions. These combinations are: sensing-thinking (ST), sensing-feeling (SF), intuition-thinking (NT), and intuition-feeling (NF). The third theme was analyzed through the combination of the attitudes or energy focus and the judgment function. These combinations are: extraversion-thinking (ET), extraversion-feeling (EF), introversion-thinking (IT), and introversion-feeling (IF).^ A last area uncovered by this ethnographic study was the influence exerted by a training and development culture on the instructor/facilitator role. This professional culture is described and explained in terms of the shared values and expectations reported by the study respondents. ^
Resumo:
As massive data sets become increasingly available, people are facing the problem of how to effectively process and understand these data. Traditional sequential computing models are giving way to parallel and distributed computing models, such as MapReduce, both due to the large size of the data sets and their high dimensionality. This dissertation, as in the same direction of other researches that are based on MapReduce, tries to develop effective techniques and applications using MapReduce that can help people solve large-scale problems. Three different problems are tackled in the dissertation. The first one deals with processing terabytes of raster data in a spatial data management system. Aerial imagery files are broken into tiles to enable data parallel computation. The second and third problems deal with dimension reduction techniques that can be used to handle data sets of high dimensionality. Three variants of the nonnegative matrix factorization technique are scaled up to factorize matrices of dimensions in the order of millions in MapReduce based on different matrix multiplication implementations. Two algorithms, which compute CANDECOMP/PARAFAC and Tucker tensor decompositions respectively, are parallelized in MapReduce based on carefully partitioning the data and arranging the computation to maximize data locality and parallelism.
Resumo:
Petri Nets are a formal, graphical and executable modeling technique for the specification and analysis of concurrent and distributed systems and have been widely applied in computer science and many other engineering disciplines. Low level Petri nets are simple and useful for modeling control flows but not powerful enough to define data and system functionality. High level Petri nets (HLPNs) have been developed to support data and functionality definitions, such as using complex structured data as tokens and algebraic expressions as transition formulas. Compared to low level Petri nets, HLPNs result in compact system models that are easier to be understood. Therefore, HLPNs are more useful in modeling complex systems. ^ There are two issues in using HLPNs—modeling and analysis. Modeling concerns the abstracting and representing the systems under consideration using HLPNs, and analysis deals with effective ways study the behaviors and properties of the resulting HLPN models. In this dissertation, several modeling and analysis techniques for HLPNs are studied, which are integrated into a framework that is supported by a tool. ^ For modeling, this framework integrates two formal languages: a type of HLPNs called Predicate Transition Net (PrT Net) is used to model a system's behavior and a first-order linear time temporal logic (FOLTL) to specify the system's properties. The main contribution of this dissertation with regard to modeling is to develop a software tool to support the formal modeling capabilities in this framework. ^ For analysis, this framework combines three complementary techniques, simulation, explicit state model checking and bounded model checking (BMC). Simulation is a straightforward and speedy method, but only covers some execution paths in a HLPN model. Explicit state model checking covers all the execution paths but suffers from the state explosion problem. BMC is a tradeoff as it provides a certain level of coverage while more efficient than explicit state model checking. The main contribution of this dissertation with regard to analysis is adapting BMC to analyze HLPN models and integrating the three complementary analysis techniques in a software tool to support the formal analysis capabilities in this framework. ^ The SAMTools developed for this framework in this dissertation integrates three tools: PIPE+ for HLPNs behavioral modeling and simulation, SAMAT for hierarchical structural modeling and property specification, and PIPE+Verifier for behavioral verification.^
Resumo:
Petri Nets are a formal, graphical and executable modeling technique for the specification and analysis of concurrent and distributed systems and have been widely applied in computer science and many other engineering disciplines. Low level Petri nets are simple and useful for modeling control flows but not powerful enough to define data and system functionality. High level Petri nets (HLPNs) have been developed to support data and functionality definitions, such as using complex structured data as tokens and algebraic expressions as transition formulas. Compared to low level Petri nets, HLPNs result in compact system models that are easier to be understood. Therefore, HLPNs are more useful in modeling complex systems. There are two issues in using HLPNs - modeling and analysis. Modeling concerns the abstracting and representing the systems under consideration using HLPNs, and analysis deals with effective ways study the behaviors and properties of the resulting HLPN models. In this dissertation, several modeling and analysis techniques for HLPNs are studied, which are integrated into a framework that is supported by a tool. For modeling, this framework integrates two formal languages: a type of HLPNs called Predicate Transition Net (PrT Net) is used to model a system's behavior and a first-order linear time temporal logic (FOLTL) to specify the system's properties. The main contribution of this dissertation with regard to modeling is to develop a software tool to support the formal modeling capabilities in this framework. For analysis, this framework combines three complementary techniques, simulation, explicit state model checking and bounded model checking (BMC). Simulation is a straightforward and speedy method, but only covers some execution paths in a HLPN model. Explicit state model checking covers all the execution paths but suffers from the state explosion problem. BMC is a tradeoff as it provides a certain level of coverage while more efficient than explicit state model checking. The main contribution of this dissertation with regard to analysis is adapting BMC to analyze HLPN models and integrating the three complementary analysis techniques in a software tool to support the formal analysis capabilities in this framework. The SAMTools developed for this framework in this dissertation integrates three tools: PIPE+ for HLPNs behavioral modeling and simulation, SAMAT for hierarchical structural modeling and property specification, and PIPE+Verifier for behavioral verification.
Resumo:
A purpose of this research study was to demonstrate the practical linguistic study and evaluation of dissertations by using two examples of the latest technology, the microcomputer and optical scanner. That involved developing efficient methods for data entry plus creating computer algorithms appropriate for personal, linguistic studies. The goal was to develop a prototype investigation which demonstrated practical solutions for maximizing the linguistic potential of the dissertation data base. The mode of text entry was from a Dest PC Scan 1000 Optical Scanner. The function of the optical scanner was to copy the complete stack of educational dissertations from the Florida Atlantic University Library into an I.B.M. XT microcomputer. The optical scanner demonstrated its practical value by copying 15,900 pages of dissertation text directly into the microcomputer. A total of 199 dissertations or 72% of the entire stack of education dissertations (277) were successfully copied into the microcomputer's word processor where each dissertation was analyzed for a variety of syntax frequencies. The results of the study demonstrated the practical use of the optical scanner for data entry, the microcomputer for data and statistical analysis, and the availability of the college library as a natural setting for text studies. A supplemental benefit was the establishment of a computerized dissertation corpus which could be used for future research and study. The final step was to build a linguistic model of the differences in dissertation writing styles by creating 7 factors from 55 dependent variables through principal components factor analysis. The 7 factors (textual components) were then named and described on a hypothetical construct defined as a continuum from a conversational, interactional style to a formal, academic writing style. The 7 factors were then grouped through discriminant analysis to create discriminant functions for each of the 7 independent variables. The results indicated that a conversational, interactional writing style was associated with more recent dissertations (1972-1987), an increase in author's age, females, and the department of Curriculum and Instruction. A formal, academic writing style was associated with older dissertations (1972-1987), younger authors, males, and the department of Administration and Supervision. It was concluded that there were no significant differences in writing style due to subject matter (community college studies) compared to other subject matter. It was also concluded that there were no significant differences in writing style due to the location of dissertation origin (Florida Atlantic University, University of Central Florida, Florida International University).