4 resultados para square method
em Digital Commons at Florida International University
Resumo:
The microarray technology provides a high-throughput technique to study gene expression. Microarrays can help us diagnose different types of cancers, understand biological processes, assess host responses to drugs and pathogens, find markers for specific diseases, and much more. Microarray experiments generate large amounts of data. Thus, effective data processing and analysis are critical for making reliable inferences from the data. ^ The first part of dissertation addresses the problem of finding an optimal set of genes (biomarkers) to classify a set of samples as diseased or normal. Three statistical gene selection methods (GS, GS-NR, and GS-PCA) were developed to identify a set of genes that best differentiate between samples. A comparative study on different classification tools was performed and the best combinations of gene selection and classifiers for multi-class cancer classification were identified. For most of the benchmarking cancer data sets, the gene selection method proposed in this dissertation, GS, outperformed other gene selection methods. The classifiers based on Random Forests, neural network ensembles, and K-nearest neighbor (KNN) showed consistently god performance. A striking commonality among these classifiers is that they all use a committee-based approach, suggesting that ensemble classification methods are superior. ^ The same biological problem may be studied at different research labs and/or performed using different lab protocols or samples. In such situations, it is important to combine results from these efforts. The second part of the dissertation addresses the problem of pooling the results from different independent experiments to obtain improved results. Four statistical pooling techniques (Fisher inverse chi-square method, Logit method. Stouffer's Z transform method, and Liptak-Stouffer weighted Z-method) were investigated in this dissertation. These pooling techniques were applied to the problem of identifying cell cycle-regulated genes in two different yeast species. As a result, improved sets of cell cycle-regulated genes were identified. The last part of dissertation explores the effectiveness of wavelet data transforms for the task of clustering. Discrete wavelet transforms, with an appropriate choice of wavelet bases, were shown to be effective in producing clusters that were biologically more meaningful. ^
Resumo:
Annual average daily traffic (AADT) is important information for many transportation planning, design, operation, and maintenance activities, as well as for the allocation of highway funds. Many studies have attempted AADT estimation using factor approach, regression analysis, time series, and artificial neural networks. However, these methods are unable to account for spatially variable influence of independent variables on the dependent variable even though it is well known that to many transportation problems, including AADT estimation, spatial context is important. ^ In this study, applications of geographically weighted regression (GWR) methods to estimating AADT were investigated. The GWR based methods considered the influence of correlations among the variables over space and the spatially non-stationarity of the variables. A GWR model allows different relationships between the dependent and independent variables to exist at different points in space. In other words, model parameters vary from location to location and the locally linear regression parameters at a point are affected more by observations near that point than observations further away. ^ The study area was Broward County, Florida. Broward County lies on the Atlantic coast between Palm Beach and Miami-Dade counties. In this study, a total of 67 variables were considered as potential AADT predictors, and six variables (lanes, speed, regional accessibility, direct access, density of roadway length, and density of seasonal household) were selected to develop the models. ^ To investigate the predictive powers of various AADT predictors over the space, the statistics including local r-square, local parameter estimates, and local errors were examined and mapped. The local variations in relationships among parameters were investigated, measured, and mapped to assess the usefulness of GWR methods. ^ The results indicated that the GWR models were able to better explain the variation in the data and to predict AADT with smaller errors than the ordinary linear regression models for the same dataset. Additionally, GWR was able to model the spatial non-stationarity in the data, i.e., the spatially varying relationship between AADT and predictors, which cannot be modeled in ordinary linear regression. ^
Resumo:
The study of the angular distribution of photon plus jet events in pp collisions at [special characters omitted] = 7 TeV with the Compact Muon Solenoid (CMS) detector is presented. The photon is restricted to the central region of the detector (:η: <1.4442) while the jet is allowed to be present in both central and forward regions of CMS (:η: < 2.4). Dominant backgrounds due to jets fragmenting into neutral mesons are accounted for through the use of a template method that discriminates between signal and background. The angular distribution, :η*:, is defined as the absolute value of the difference in η between the leading photon and leading jet in an event divided by two. The angular distribution ranging from 0–1.4 was examined and compared with next-to-leading order QCD predictions and was found to be in good agreement.
Resumo:
In an effort to improve instruction and better accommodate the needs of students, community colleges are offering courses delivered in a variety of delivery formats that require students to have some level of technology fluency to be successful in the course. This study was conducted to investigate the relationship between student socioeconomic status (SES), course delivery method, and course type on enrollment, final course grades, course completion status, and course passing status at a state college. ^ A dataset for 20,456 students of low and not low SES enrolled in science, technology, engineering, and mathematics (STEM) course types delivered using traditional, online, blended, and web enhanced course delivery formats at Miami Dade College, a large open access 4-year state college located in Miami-Dade County, Florida, was analyzed. A factorial ANOVA using course type, course delivery method, and student SES found no significant differences in final course grades when used to determine if course delivery methods were equally effective for students of low and not low SES taking STEM course types. Additionally, three chi-square goodness-of-fit tests were used to investigate for differences in enrollment, course completion and course passing status by SES, course type, and course delivery method. The findings of the chi-square tests indicated that: (a) there were significant differences in enrollment by SES and course delivery methods for the Engineering/Technology, Math, and overall course types but not for the Natural Science course type and (b) there were no significant differences in course completion status and course passing status by SES and course types overall and SES and course delivery methods overall. However, there were statistically significant but weak relationships between course passing status, SES and the math course type as well as between course passing status, SES, and online and traditional course delivery methods. ^ The mixed findings in the study indicate that strides have been made in closing the theoretical gap in education and technology skills that may exist for students of different SES levels. MDC's course delivery and student support models may assist other institutions address student success in courses that necessitate students having some level of technology fluency. ^