12 resultados para Population set-based methods
em Digital Commons at Florida International University
Resumo:
Gene-based tests of association are frequently applied to common SNPs (MAF>5%) as an alternative to single-marker tests. In this analysis we conduct a variety of simulation studies applied to five popular gene-based tests investigating general trends related to their performance in realistic situations. In particular, we focus on the impact of non-causal SNPs and a variety of LD structures on the behavior of these tests. Ultimately, we find that non-causal SNPs can significantly impact the power of all gene-based tests. On average, we find that the “noise” from 6–12 non-causal SNPs will cancel out the “signal” of one causal SNP across five popular gene-based tests. Furthermore, we find complex and differing behavior of the methods in the presence of LD within and between non-causal and causal SNPs. Ultimately, better approaches for a priori prioritization of potentially causal SNPs (e.g., predicting functionality of non-synonymous SNPs), application of these methods to sequenced or fully imputed datasets, and limited use of window-based methods for assigning inter-genic SNPs to genes will improve power. However, significant power loss from non-causal SNPs may remain unless alternative statistical approaches robust to the inclusion of non-causal SNPs are developed.
Resumo:
The primary goal of this dissertation is to develop point-based rigid and non-rigid image registration methods that have better accuracy than existing methods. We first present point-based PoIRe, which provides the framework for point-based global rigid registrations. It allows a choice of different search strategies including (a) branch-and-bound, (b) probabilistic hill-climbing, and (c) a novel hybrid method that takes advantage of the best characteristics of the other two methods. We use a robust similarity measure that is insensitive to noise, which is often introduced during feature extraction. We show the robustness of PoIRe using it to register images obtained with an electronic portal imaging device (EPID), which have large amounts of scatter and low contrast. To evaluate PoIRe we used (a) simulated images and (b) images with fiducial markers; PoIRe was extensively tested with 2D EPID images and images generated by 3D Computer Tomography (CT) and Magnetic Resonance (MR) images. PoIRe was also evaluated using benchmark data sets from the blind retrospective evaluation project (RIRE). We show that PoIRe is better than existing methods such as Iterative Closest Point (ICP) and methods based on mutual information. We also present a novel point-based local non-rigid shape registration algorithm. We extend the robust similarity measure used in PoIRe to non-rigid registrations adapting it to a free form deformation (FFD) model and making it robust to local minima, which is a drawback common to existing non-rigid point-based methods. For non-rigid registrations we show that it performs better than existing methods and that is less sensitive to starting conditions. We test our non-rigid registration method using available benchmark data sets for shape registration. Finally, we also explore the extraction of features invariant to changes in perspective and illumination, and explore how they can help improve the accuracy of multi-modal registration. For multimodal registration of EPID-DRR images we present a method based on a local descriptor defined by a vector of complex responses to a circular Gabor filter.
Resumo:
In the last decade, large numbers of social media services have emerged and been widely used in people's daily life as important information sharing and acquisition tools. With a substantial amount of user-contributed text data on social media, it becomes a necessity to develop methods and tools for text analysis for this emerging data, in order to better utilize it to deliver meaningful information to users. ^ Previous work on text analytics in last several decades is mainly focused on traditional types of text like emails, news and academic literatures, and several critical issues to text data on social media have not been well explored: 1) how to detect sentiment from text on social media; 2) how to make use of social media's real-time nature; 3) how to address information overload for flexible information needs. ^ In this dissertation, we focus on these three problems. First, to detect sentiment of text on social media, we propose a non-negative matrix tri-factorization (tri-NMF) based dual active supervision method to minimize human labeling efforts for the new type of data. Second, to make use of social media's real-time nature, we propose approaches to detect events from text streams on social media. Third, to address information overload for flexible information needs, we propose two summarization framework, dominating set based summarization framework and learning-to-rank based summarization framework. The dominating set based summarization framework can be applied for different types of summarization problems, while the learning-to-rank based summarization framework helps utilize the existing training data to guild the new summarization tasks. In addition, we integrate these techneques in an application study of event summarization for sports games as an example of how to better utilize social media data. ^
Resumo:
The primary goal of this dissertation is to develop point-based rigid and non-rigid image registration methods that have better accuracy than existing methods. We first present point-based PoIRe, which provides the framework for point-based global rigid registrations. It allows a choice of different search strategies including (a) branch-and-bound, (b) probabilistic hill-climbing, and (c) a novel hybrid method that takes advantage of the best characteristics of the other two methods. We use a robust similarity measure that is insensitive to noise, which is often introduced during feature extraction. We show the robustness of PoIRe using it to register images obtained with an electronic portal imaging device (EPID), which have large amounts of scatter and low contrast. To evaluate PoIRe we used (a) simulated images and (b) images with fiducial markers; PoIRe was extensively tested with 2D EPID images and images generated by 3D Computer Tomography (CT) and Magnetic Resonance (MR) images. PoIRe was also evaluated using benchmark data sets from the blind retrospective evaluation project (RIRE). We show that PoIRe is better than existing methods such as Iterative Closest Point (ICP) and methods based on mutual information. We also present a novel point-based local non-rigid shape registration algorithm. We extend the robust similarity measure used in PoIRe to non-rigid registrations adapting it to a free form deformation (FFD) model and making it robust to local minima, which is a drawback common to existing non-rigid point-based methods. For non-rigid registrations we show that it performs better than existing methods and that is less sensitive to starting conditions. We test our non-rigid registration method using available benchmark data sets for shape registration. Finally, we also explore the extraction of features invariant to changes in perspective and illumination, and explore how they can help improve the accuracy of multi-modal registration. For multimodal registration of EPID-DRR images we present a method based on a local descriptor defined by a vector of complex responses to a circular Gabor filter.
Resumo:
In the last decade, large numbers of social media services have emerged and been widely used in people's daily life as important information sharing and acquisition tools. With a substantial amount of user-contributed text data on social media, it becomes a necessity to develop methods and tools for text analysis for this emerging data, in order to better utilize it to deliver meaningful information to users. Previous work on text analytics in last several decades is mainly focused on traditional types of text like emails, news and academic literatures, and several critical issues to text data on social media have not been well explored: 1) how to detect sentiment from text on social media; 2) how to make use of social media's real-time nature; 3) how to address information overload for flexible information needs. In this dissertation, we focus on these three problems. First, to detect sentiment of text on social media, we propose a non-negative matrix tri-factorization (tri-NMF) based dual active supervision method to minimize human labeling efforts for the new type of data. Second, to make use of social media's real-time nature, we propose approaches to detect events from text streams on social media. Third, to address information overload for flexible information needs, we propose two summarization framework, dominating set based summarization framework and learning-to-rank based summarization framework. The dominating set based summarization framework can be applied for different types of summarization problems, while the learning-to-rank based summarization framework helps utilize the existing training data to guild the new summarization tasks. In addition, we integrate these techneques in an application study of event summarization for sports games as an example of how to better utilize social media data.
Resumo:
Annual average daily traffic (AADT) is important information for many transportation planning, design, operation, and maintenance activities, as well as for the allocation of highway funds. Many studies have attempted AADT estimation using factor approach, regression analysis, time series, and artificial neural networks. However, these methods are unable to account for spatially variable influence of independent variables on the dependent variable even though it is well known that to many transportation problems, including AADT estimation, spatial context is important. ^ In this study, applications of geographically weighted regression (GWR) methods to estimating AADT were investigated. The GWR based methods considered the influence of correlations among the variables over space and the spatially non-stationarity of the variables. A GWR model allows different relationships between the dependent and independent variables to exist at different points in space. In other words, model parameters vary from location to location and the locally linear regression parameters at a point are affected more by observations near that point than observations further away. ^ The study area was Broward County, Florida. Broward County lies on the Atlantic coast between Palm Beach and Miami-Dade counties. In this study, a total of 67 variables were considered as potential AADT predictors, and six variables (lanes, speed, regional accessibility, direct access, density of roadway length, and density of seasonal household) were selected to develop the models. ^ To investigate the predictive powers of various AADT predictors over the space, the statistics including local r-square, local parameter estimates, and local errors were examined and mapped. The local variations in relationships among parameters were investigated, measured, and mapped to assess the usefulness of GWR methods. ^ The results indicated that the GWR models were able to better explain the variation in the data and to predict AADT with smaller errors than the ordinary linear regression models for the same dataset. Additionally, GWR was able to model the spatial non-stationarity in the data, i.e., the spatially varying relationship between AADT and predictors, which cannot be modeled in ordinary linear regression. ^
Resumo:
In recent years, wireless communication infrastructures have been widely deployed for both personal and business applications. IEEE 802.11 series Wireless Local Area Network (WLAN) standards attract lots of attention due to their low cost and high data rate. Wireless ad hoc networks which use IEEE 802.11 standards are one of hot spots of recent network research. Designing appropriate Media Access Control (MAC) layer protocols is one of the key issues for wireless ad hoc networks. ^ Existing wireless applications typically use omni-directional antennas. When using an omni-directional antenna, the gain of the antenna in all directions is the same. Due to the nature of the Distributed Coordination Function (DCF) mechanism of IEEE 802.11 standards, only one of the one-hop neighbors can send data at one time. Nodes other than the sender and the receiver must be either in idle or listening state, otherwise collisions could occur. The downside of the omni-directionality of antennas is that the spatial reuse ratio is low and the capacity of the network is considerably limited. ^ It is therefore obvious that the directional antenna has been introduced to improve spatial reutilization. As we know, a directional antenna has the following benefits. It can improve transport capacity by decreasing interference of a directional main lobe. It can increase coverage range due to a higher SINR (Signal Interference to Noise Ratio), i.e., with the same power consumption, better connectivity can be achieved. And the usage of power can be reduced, i.e., for the same coverage, a transmitter can reduce its power consumption. ^ To utilizing the advantages of directional antennas, we propose a relay-enabled MAC protocol. Two relay nodes are chosen to forward data when the channel condition of direct link from the sender to the receiver is poor. The two relay nodes can transfer data at the same time and a pipelined data transmission can be achieved by using directional antennas. The throughput can be improved significant when introducing the relay-enabled MAC protocol. ^ Besides the strong points, directional antennas also have some explicit drawbacks, such as the hidden terminal and deafness problems and the requirements of retaining location information for each node. Therefore, an omni-directional antenna should be used in some situations. The combination use of omni-directional and directional antennas leads to the problem of configuring heterogeneous antennas, i e., given a network topology and a traffic pattern, we need to find a tradeoff between using omni-directional and using directional antennas to obtain a better network performance over this configuration. ^ Directly and mathematically establishing the relationship between the network performance and the antenna configurations is extremely difficult, if not intractable. Therefore, in this research, we proposed several clustering-based methods to obtain approximate solutions for heterogeneous antennas configuration problem, which can improve network performance significantly. ^ Our proposed methods consist of two steps. The first step (i.e., clustering links) is to cluster the links into different groups based on the matrix-based system model. After being clustered, the links in the same group have similar neighborhood nodes and will use the same type of antenna. The second step (i.e., labeling links) is to decide the type of antenna for each group. For heterogeneous antennas, some groups of links will use directional antenna and others will adopt omni-directional antenna. Experiments are conducted to compare the proposed methods with existing methods. Experimental results demonstrate that our clustering-based methods can improve the network performance significantly. ^
Resumo:
The accurate and reliable estimation of travel time based on point detector data is needed to support Intelligent Transportation System (ITS) applications. It has been found that the quality of travel time estimation is a function of the method used in the estimation and varies for different traffic conditions. In this study, two hybrid on-line travel time estimation models, and their corresponding off-line methods, were developed to achieve better estimation performance under various traffic conditions, including recurrent congestion and incidents. The first model combines the Mid-Point method, which is a speed-based method, with a traffic flow-based method. The second model integrates two speed-based methods: the Mid-Point method and the Minimum Speed method. In both models, the switch between travel time estimation methods is based on the congestion level and queue status automatically identified by clustering analysis. During incident conditions with rapidly changing queue lengths, shock wave analysis-based refinements are applied for on-line estimation to capture the fast queue propagation and recovery. Travel time estimates obtained from existing speed-based methods, traffic flow-based methods, and the models developed were tested using both simulation and real-world data. The results indicate that all tested methods performed at an acceptable level during periods of low congestion. However, their performances vary with an increase in congestion. Comparisons with other estimation methods also show that the developed hybrid models perform well in all cases. Further comparisons between the on-line and off-line travel time estimation methods reveal that off-line methods perform significantly better only during fast-changing congested conditions, such as during incidents. The impacts of major influential factors on the performance of travel time estimation, including data preprocessing procedures, detector errors, detector spacing, frequency of travel time updates to traveler information devices, travel time link length, and posted travel time range, were investigated in this study. The results show that these factors have more significant impacts on the estimation accuracy and reliability under congested conditions than during uncongested conditions. For the incident conditions, the estimation quality improves with the use of a short rolling period for data smoothing, more accurate detector data, and frequent travel time updates.
Resumo:
Reduced organic sulfur (ROS) compounds are environmentally ubiquitous and play an important role in sulfur cycling as well as in biogeochemical cycles of toxic metals, in particular mercury. Development of effective methods for analysis of ROS in environmental samples and investigations on the interactions of ROS with mercury are critical for understanding the role of ROS in mercury cycling, yet both of which are poorly studied. Covalent affinity chromatography-based methods were attempted for analysis of ROS in environmental water samples. A method was developed for analysis of environmental thiols, by preconcentration using affinity covalent chromatographic column or solid phase extraction, followed by releasing of thiols from the thiopropyl sepharose gel using TCEP and analysis using HPLC-UV or HPLC-FL. Under the optimized conditions, the detection limits of the method using HPLC-FL detection were 0.45 and 0.36 nM for Cys and GSH, respectively. Our results suggest that covalent affinity methods are efficient for thiol enrichment and interference elimination, demonstrating their promising applications in developing a sensitive, reliable, and useful technique for thiol analysis in environmental water samples. The dissolution of mercury sulfide (HgS) in the presence of ROS and dissolved organic matter (DOM) was investigated, by quantifying the effects of ROS on HgS dissolution and determining the speciation of the mercury released from ROS-induced HgS dissolution. It was observed that the presence of small ROS (e.g., Cys and GSH) and large molecule DOM, in particular at high concentrations, could significantly enhance the dissolution of HgS. The dissolved Hg during HgS dissolution determined using the conventional 0.22 μm cutoff method could include colloidal Hg (e.g., HgS colloids) and truly dissolved Hg (e.g., Hg-ROS complexes). A centrifugal filtration method (with 3 kDa MWCO) was employed to characterize the speciation and reactivity of the Hg released during ROS-enhanced HgS dissolution. The presence of small ROS could produce a considerable fraction (about 40% of total mercury in the solution) of truly dissolved mercury (< 3 kDa), probably due to the formation of Hg-Cys or Hg-GSH complexes. The truly dissolved Hg formed during GSH- or Cys-enhanced HgS dissolution was directly reducible (100% for GSH and 40% for Cys) by stannous chloride, demonstrating its potential role in Hg transformation and bioaccumulation.
Resumo:
The assessment of organic matter (OM) sources in sediments and soils is a key to better understand the biogeochemical cycling of carbon in aquatic environments. While traditional molecular marker-based methods have provided such information for typical two end member (allochthonous/terrestrial vs. autochthonous/microbial)-dominated systems, more detailed, biomass-specific assessments are needed for ecosystems with complex OM inputs such as tropical and sub-tropical wetlands and estuaries where aquatic macrophytes and macroalgae may play an important role as OM sources. The aim of this study was to assess the utility of a combined approach using compound specific stable carbon isotope analysis and an n-alkane based proxy (Paq) to differentiate submerged and emergent/terrestrial vegetation OM inputs to soils/sediments from a sub-tropical wetland and estuarine system, the Florida Coastal Everglades. Results show that Paq values (0.13–0.51) for the emergent/terrestrial plants were generally lower than those for freshwater/marine submerged vegetation (0.45–1.00) and that compound specific δ13C values for the n-alkanes (C23 to C31) were distinctively different for terrestrial/emergent and freshwater/marine submerged plants. While crossplots of the Paq and n-alkane stable isotope values for the C23n-alkane suggest that OM inputs are controlled by vegetation changes along the freshwater to marine transect, further resolution regarding OM input changes along this landscape was obtained through principal component analysis (PCA), successfully grouping the study sites according to the OM source strengths. The data show the potential for this n-alkane based multi-proxy approach as a means of assessing OM inputs to complex ecosystems.
Resumo:
The general method for determining organomercurials in environmental and biological samples is gas chromatography with electron capture detection (GC-ECD). However, tedious sample work up protocols and poor chromatographic response show the need for the development of new methods. Here, Atomic Fluorescence-based methods are described, free from these deficiencies. The organomercurials in soil, sediment and tissue samples are first released from the matrices with acidic KBr and cupric ions and extracted into dichloromethane. The initial extracts are subjected to thiosulfate clean up and the organomercury species are isolated as their chloride derivatives by cupric chloride and subsequent extraction into a small volume of dichloromethane. In water samples the organomercurials are pre-concentrated using a sulfhydryl cotton fiber adsorbent, followed by elution with acidic KBr and CuSO 4 and extraction into dichloromethane. Analysis of the organomercurials is accomplished by capillary column chromatography with atomic fluorescence detection.
Changing Bacterial Growth Efficiencies across a Natural Nutrient Gradient in an Oligotrophic Estuary
Resumo:
Recent studies have characterized coastal estuarine systems as important components of the global carbon cycle. This study investigated carbon cycling through the microbial loop of Florida Bay by use of bacterial growth efficiency calculations. Bacterial production, bacterial respiration, and other environmental parameters were measured at three sites located along a historic phosphorus-limitation gradient in Florida Bay and compared to a relatively nutrient enriched site in Biscayne Bay. A new method for measuring bacterial respiration in oligotrophic waters involving tracing respiration of 13C-glucose was developed. The results of the study indicate that 13C tracer assays may provide a better means of measuring bacterial respiration in low nutrient environments than traditional dissolved oxygen consumption-based methods due to strong correlations between incubation length and δ13C values. Results also suggest that overall bacterial growth efficiency may be lower at the most nutrient limited sites.