16 resultados para Weighted histogram analysis method
em Digital Commons at Florida International University
Resumo:
Annual average daily traffic (AADT) is important information for many transportation planning, design, operation, and maintenance activities, as well as for the allocation of highway funds. Many studies have attempted AADT estimation using factor approach, regression analysis, time series, and artificial neural networks. However, these methods are unable to account for spatially variable influence of independent variables on the dependent variable even though it is well known that to many transportation problems, including AADT estimation, spatial context is important. ^ In this study, applications of geographically weighted regression (GWR) methods to estimating AADT were investigated. The GWR based methods considered the influence of correlations among the variables over space and the spatially non-stationarity of the variables. A GWR model allows different relationships between the dependent and independent variables to exist at different points in space. In other words, model parameters vary from location to location and the locally linear regression parameters at a point are affected more by observations near that point than observations further away. ^ The study area was Broward County, Florida. Broward County lies on the Atlantic coast between Palm Beach and Miami-Dade counties. In this study, a total of 67 variables were considered as potential AADT predictors, and six variables (lanes, speed, regional accessibility, direct access, density of roadway length, and density of seasonal household) were selected to develop the models. ^ To investigate the predictive powers of various AADT predictors over the space, the statistics including local r-square, local parameter estimates, and local errors were examined and mapped. The local variations in relationships among parameters were investigated, measured, and mapped to assess the usefulness of GWR methods. ^ The results indicated that the GWR models were able to better explain the variation in the data and to predict AADT with smaller errors than the ordinary linear regression models for the same dataset. Additionally, GWR was able to model the spatial non-stationarity in the data, i.e., the spatially varying relationship between AADT and predictors, which cannot be modeled in ordinary linear regression. ^
Resumo:
Given the importance of color processing in computer vision and computer graphics, estimating and rendering illumination spectral reflectance of image scenes is important to advance the capability of a large class of applications such as scene reconstruction, rendering, surface segmentation, object recognition, and reflectance estimation. Consequently, this dissertation proposes effective methods for reflection components separation and rendering in single scene images. Based on the dichromatic reflectance model, a novel decomposition technique, named the Mean-Shift Decomposition (MSD) method, is introduced to separate the specular from diffuse reflectance components. This technique provides a direct access to surface shape information through diffuse shading pixel isolation. More importantly, this process does not require any local color segmentation process, which differs from the traditional methods that operate by aggregating color information along each image plane. ^ Exploiting the merits of the MSD method, a scene illumination rendering technique is designed to estimate the relative contributing specular reflectance attributes of a scene image. The image feature subset targeted provides a direct access to the surface illumination information, while a newly introduced efficient rendering method reshapes the dynamic range distribution of the specular reflectance components over each image color channel. This image enhancement technique renders the scene illumination reflection effectively without altering the scene’s surface diffuse attributes contributing to realistic rendering effects. ^ As an ancillary contribution, an effective color constancy algorithm based on the dichromatic reflectance model was also developed. This algorithm selects image highlights in order to extract the prominent surface reflectance that reproduces the exact illumination chromaticity. This evaluation is presented using a novel voting scheme technique based on histogram analysis. ^ In each of the three main contributions, empirical evaluations were performed on synthetic and real-world image scenes taken from three different color image datasets. The experimental results show over 90% accuracy in illumination estimation contributing to near real world illumination rendering effects. ^
Resumo:
Thanks to the advanced technologies and social networks that allow the data to be widely shared among the Internet, there is an explosion of pervasive multimedia data, generating high demands of multimedia services and applications in various areas for people to easily access and manage multimedia data. Towards such demands, multimedia big data analysis has become an emerging hot topic in both industry and academia, which ranges from basic infrastructure, management, search, and mining to security, privacy, and applications. Within the scope of this dissertation, a multimedia big data analysis framework is proposed for semantic information management and retrieval with a focus on rare event detection in videos. The proposed framework is able to explore hidden semantic feature groups in multimedia data and incorporate temporal semantics, especially for video event detection. First, a hierarchical semantic data representation is presented to alleviate the semantic gap issue, and the Hidden Coherent Feature Group (HCFG) analysis method is proposed to capture the correlation between features and separate the original feature set into semantic groups, seamlessly integrating multimedia data in multiple modalities. Next, an Importance Factor based Temporal Multiple Correspondence Analysis (i.e., IF-TMCA) approach is presented for effective event detection. Specifically, the HCFG algorithm is integrated with the Hierarchical Information Gain Analysis (HIGA) method to generate the Importance Factor (IF) for producing the initial detection results. Then, the TMCA algorithm is proposed to efficiently incorporate temporal semantics for re-ranking and improving the final performance. At last, a sampling-based ensemble learning mechanism is applied to further accommodate the imbalanced datasets. In addition to the multimedia semantic representation and class imbalance problems, lack of organization is another critical issue for multimedia big data analysis. In this framework, an affinity propagation-based summarization method is also proposed to transform the unorganized data into a better structure with clean and well-organized information. The whole framework has been thoroughly evaluated across multiple domains, such as soccer goal event detection and disaster information management.
Resumo:
Adaptability and invisibility are hallmarks of modern terrorism, and keeping pace with its dynamic nature presents a serious challenge for societies throughout the world. Innovations in computer science have incorporated applied mathematics to develop a wide array of predictive models to support the variety of approaches to counterterrorism. Predictive models are usually designed to forecast the location of attacks. Although this may protect individual structures or locations, it does not reduce the threat—it merely changes the target. While predictive models dedicated to events or social relationships receive much attention where the mathematical and social science communities intersect, models dedicated to terrorist locations such as safe-houses (rather than their targets or training sites) are rare and possibly nonexistent. At the time of this research, there were no publically available models designed to predict locations where violent extremists are likely to reside. This research uses France as a case study to present a complex systems model that incorporates multiple quantitative, qualitative and geospatial variables that differ in terms of scale, weight, and type. Though many of these variables are recognized by specialists in security studies, there remains controversy with respect to their relative importance, degree of interaction, and interdependence. Additionally, some of the variables proposed in this research are not generally recognized as drivers, yet they warrant examination based on their potential role within a complex system. This research tested multiple regression models and determined that geographically-weighted regression analysis produced the most accurate result to accommodate non-stationary coefficient behavior, demonstrating that geographic variables are critical to understanding and predicting the phenomenon of terrorism. This dissertation presents a flexible prototypical model that can be refined and applied to other regions to inform stakeholders such as policy-makers and law enforcement in their efforts to improve national security and enhance quality-of-life.
Resumo:
A methodology for formally modeling and analyzing software architecture of mobile agent systems provides a solid basis to develop high quality mobile agent systems, and the methodology is helpful to study other distributed and concurrent systems as well. However, it is a challenge to provide the methodology because of the agent mobility in mobile agent systems.^ The methodology was defined from two essential parts of software architecture: a formalism to define the architectural models and an analysis method to formally verify system properties. The formalism is two-layer Predicate/Transition (PrT) nets extended with dynamic channels, and the analysis method is a hierarchical approach to verify models on different levels. The two-layer modeling formalism smoothly transforms physical models of mobile agent systems into their architectural models. Dynamic channels facilitate the synchronous communication between nets, and they naturally capture the dynamic architecture configuration and agent mobility of mobile agent systems. Component properties are verified based on transformed individual components, system properties are checked in a simplified system model, and interaction properties are analyzed on models composing from involved nets. Based on the formalism and the analysis method, this researcher formally modeled and analyzed a software architecture of mobile agent systems, and designed an architectural model of a medical information processing system based on mobile agents. The model checking tool SPIN was used to verify system properties such as reachability, concurrency and safety of the medical information processing system. ^ From successful modeling and analyzing the software architecture of mobile agent systems, the conclusion is that PrT nets extended with channels are a powerful tool to model mobile agent systems, and the hierarchical analysis method provides a rigorous foundation for the modeling tool. The hierarchical analysis method not only reduces the complexity of the analysis, but also expands the application scope of model checking techniques. The results of formally modeling and analyzing the software architecture of the medical information processing system show that model checking is an effective and an efficient way to verify software architecture. Moreover, this system shows a high level of flexibility, efficiency and low cost of mobile agent technologies. ^
Resumo:
The standard highway assignment model in the Florida Standard Urban Transportation Modeling Structure (FSUTMS) is based on the equilibrium traffic assignment method. This method involves running several iterations of all-or-nothing capacity-restraint assignment with an adjustment of travel time to reflect delays encountered in the associated iteration. The iterative link time adjustment process is accomplished through the Bureau of Public Roads (BPR) volume-delay equation. Since FSUTMS' traffic assignment procedure outputs daily volumes, and the input capacities are given in hourly volumes, it is necessary to convert the hourly capacities to their daily equivalents when computing the volume-to-capacity ratios used in the BPR function. The conversion is accomplished by dividing the hourly capacity by a factor called the peak-to-daily ratio, or referred to as CONFAC in FSUTMS. The ratio is computed as the highest hourly volume of a day divided by the corresponding total daily volume. ^ While several studies have indicated that CONFAC is a decreasing function of the level of congestion, a constant value is used for each facility type in the current version of FSUTMS. This ignores the different congestion level associated with each roadway and is believed to be one of the culprits of traffic assignment errors. Traffic counts data from across the state of Florida were used to calibrate CONFACs as a function of a congestion measure using the weighted least squares method. The calibrated functions were then implemented in FSUTMS through a procedure that takes advantage of the iterative nature of FSUTMS' equilibrium assignment method. ^ The assignment results based on constant and variable CONFACs were then compared against the ground counts for three selected networks. It was found that the accuracy from the two assignments was not significantly different, that the hypothesized improvement in assignment results from the variable CONFAC model was not empirically evident. It was recognized that many other factors beyond the scope and control of this study could contribute to this finding. It was recommended that further studies focus on the use of the variable CONFAC model with recalibrated parameters for the BPR function and/or with other forms of volume-delay functions. ^
Resumo:
Despite widespread recognition of the problem of adolescent alcohol and other drug (AOD) abuse, research on its most common treatment modality, group work, is lacking. This research gap is alarming given that outcomes range from positive to potentially iatrogenic. This study sought to identify change mechanisms and/or treatment factors that are observable within group treatment sessions and that may predict AOD use outcomes. This NIH (F31 DA 020233-01A1) study evaluated 108, 10-19 year olds and the 19 school-based treatment groups to which they were previously assigned (R01 AA10246; PI: Wagner). Associations between motivational interviewing (MI) based change talk variables, group leader MI skills, and alcohol and marijuana use outcomes up to 12-months following treatment were evaluated. Treatment session audio recordings and transcripts (1R21AA015679-01; PI: Macgowan) were coded using a new discourse analysis coding scheme for measuring group member change talk (Amrhein, 2003). Therapist MI skills were similarly measured using the Motivational Interviewing Treatment Integrity instrument. Group member responses to commitment predicted group marijuana use at the 1-month follow up. Also, group leader empathy was significantly associated with group commitment for marijuana use at the middle and ending stages of treatment. Both of the above process measures were applied in a group setting for the first time. Building upon MI and social learning theory principles, group commitment and group member responses to commitment are new observable, in-session, process constructs that may predict positive and negative adolescent group treatment outcomes. These constructs, as well as the discourse analysis method and instruments used to measure them, raise many possibilities for future group work process research and practice.
Resumo:
Concurrent software executes multiple threads or processes to achieve high performance. However, concurrency results in a huge number of different system behaviors that are difficult to test and verify. The aim of this dissertation is to develop new methods and tools for modeling and analyzing concurrent software systems at design and code levels. This dissertation consists of several related results. First, a formal model of Mondex, an electronic purse system, is built using Petri nets from user requirements, which is formally verified using model checking. Second, Petri nets models are automatically mined from the event traces generated from scientific workflows. Third, partial order models are automatically extracted from some instrumented concurrent program execution, and potential atomicity violation bugs are automatically verified based on the partial order models using model checking. Our formal specification and verification of Mondex have contributed to the world wide effort in developing a verified software repository. Our method to mine Petri net models automatically from provenance offers a new approach to build scientific workflows. Our dynamic prediction tool, named McPatom, can predict several known bugs in real world systems including one that evades several other existing tools. McPatom is efficient and scalable as it takes advantage of the nature of atomicity violations and considers only a pair of threads and accesses to a single shared variable at one time. However, predictive tools need to consider the tradeoffs between precision and coverage. Based on McPatom, this dissertation presents two methods for improving the coverage and precision of atomicity violation predictions: 1) a post-prediction analysis method to increase coverage while ensuring precision; 2) a follow-up replaying method to further increase coverage. Both methods are implemented in a completely automatic tool.
Resumo:
Airborne Light Detection and Ranging (LIDAR) technology has become the primary method to derive high-resolution Digital Terrain Models (DTMs), which are essential for studying Earth's surface processes, such as flooding and landslides. The critical step in generating a DTM is to separate ground and non-ground measurements in a voluminous point LIDAR dataset, using a filter, because the DTM is created by interpolating ground points. As one of widely used filtering methods, the progressive morphological (PM) filter has the advantages of classifying the LIDAR data at the point level, a linear computational complexity, and preserving the geometric shapes of terrain features. The filter works well in an urban setting with a gentle slope and a mixture of vegetation and buildings. However, the PM filter often removes ground measurements incorrectly at the topographic high area, along with large sizes of non-ground objects, because it uses a constant threshold slope, resulting in "cut-off" errors. A novel cluster analysis method was developed in this study and incorporated into the PM filter to prevent the removal of the ground measurements at topographic highs. Furthermore, to obtain the optimal filtering results for an area with undulating terrain, a trend analysis method was developed to adaptively estimate the slope-related thresholds of the PM filter based on changes of topographic slopes and the characteristics of non-terrain objects. The comparison of the PM and generalized adaptive PM (GAPM) filters for selected study areas indicates that the GAPM filter preserves the most "cut-off" points removed incorrectly by the PM filter. The application of the GAPM filter to seven ISPRS benchmark datasets shows that the GAPM filter reduces the filtering error by 20% on average, compared with the method used by the popular commercial software TerraScan. The combination of the cluster method, adaptive trend analysis, and the PM filter allows users without much experience in processing LIDAR data to effectively and efficiently identify ground measurements for the complex terrains in a large LIDAR data set. The GAPM filter is highly automatic and requires little human input. Therefore, it can significantly reduce the effort of manually processing voluminous LIDAR measurements.
Resumo:
Concurrent software executes multiple threads or processes to achieve high performance. However, concurrency results in a huge number of different system behaviors that are difficult to test and verify. The aim of this dissertation is to develop new methods and tools for modeling and analyzing concurrent software systems at design and code levels. This dissertation consists of several related results. First, a formal model of Mondex, an electronic purse system, is built using Petri nets from user requirements, which is formally verified using model checking. Second, Petri nets models are automatically mined from the event traces generated from scientific workflows. Third, partial order models are automatically extracted from some instrumented concurrent program execution, and potential atomicity violation bugs are automatically verified based on the partial order models using model checking. Our formal specification and verification of Mondex have contributed to the world wide effort in developing a verified software repository. Our method to mine Petri net models automatically from provenance offers a new approach to build scientific workflows. Our dynamic prediction tool, named McPatom, can predict several known bugs in real world systems including one that evades several other existing tools. McPatom is efficient and scalable as it takes advantage of the nature of atomicity violations and considers only a pair of threads and accesses to a single shared variable at one time. However, predictive tools need to consider the tradeoffs between precision and coverage. Based on McPatom, this dissertation presents two methods for improving the coverage and precision of atomicity violation predictions: 1) a post-prediction analysis method to increase coverage while ensuring precision; 2) a follow-up replaying method to further increase coverage. Both methods are implemented in a completely automatic tool.
Resumo:
This dissertation develops a new figure of merit to measure the similarity (or dissimilarity) of Gaussian distributions through a novel concept that relates the Fisher distance to the percentage of data overlap. The derivations are expanded to provide a generalized mathematical platform for determining an optimal separating boundary of Gaussian distributions in multiple dimensions. Real-world data used for implementation and in carrying out feasibility studies were provided by Beckman-Coulter. It is noted that although the data used is flow cytometric in nature, the mathematics are general in their derivation to include other types of data as long as their statistical behavior approximate Gaussian distributions. ^ Because this new figure of merit is heavily based on the statistical nature of the data, a new filtering technique is introduced to accommodate for the accumulation process involved with histogram data. When data is accumulated into a frequency histogram, the data is inherently smoothed in a linear fashion, since an averaging effect is taking place as the histogram is generated. This new filtering scheme addresses data that is accumulated in the uneven resolution of the channels of the frequency histogram. ^ The qualitative interpretation of flow cytometric data is currently a time consuming and imprecise method for evaluating histogram data. This method offers a broader spectrum of capabilities in the analysis of histograms, since the figure of merit derived in this dissertation integrates within its mathematics both a measure of similarity and the percentage of overlap between the distributions under analysis. ^
Resumo:
The microarray technology provides a high-throughput technique to study gene expression. Microarrays can help us diagnose different types of cancers, understand biological processes, assess host responses to drugs and pathogens, find markers for specific diseases, and much more. Microarray experiments generate large amounts of data. Thus, effective data processing and analysis are critical for making reliable inferences from the data. ^ The first part of dissertation addresses the problem of finding an optimal set of genes (biomarkers) to classify a set of samples as diseased or normal. Three statistical gene selection methods (GS, GS-NR, and GS-PCA) were developed to identify a set of genes that best differentiate between samples. A comparative study on different classification tools was performed and the best combinations of gene selection and classifiers for multi-class cancer classification were identified. For most of the benchmarking cancer data sets, the gene selection method proposed in this dissertation, GS, outperformed other gene selection methods. The classifiers based on Random Forests, neural network ensembles, and K-nearest neighbor (KNN) showed consistently god performance. A striking commonality among these classifiers is that they all use a committee-based approach, suggesting that ensemble classification methods are superior. ^ The same biological problem may be studied at different research labs and/or performed using different lab protocols or samples. In such situations, it is important to combine results from these efforts. The second part of the dissertation addresses the problem of pooling the results from different independent experiments to obtain improved results. Four statistical pooling techniques (Fisher inverse chi-square method, Logit method. Stouffer's Z transform method, and Liptak-Stouffer weighted Z-method) were investigated in this dissertation. These pooling techniques were applied to the problem of identifying cell cycle-regulated genes in two different yeast species. As a result, improved sets of cell cycle-regulated genes were identified. The last part of dissertation explores the effectiveness of wavelet data transforms for the task of clustering. Discrete wavelet transforms, with an appropriate choice of wavelet bases, were shown to be effective in producing clusters that were biologically more meaningful. ^
Resumo:
The goal of this project was to develop a rapid separation and detection method for analyzing organic compounds in smokeless powders and then test its applicability on gunshot residue (GSR) samples. In this project, a total of 20 common smokeless powder additives and their decomposition products were separated by ultra performance liquid chromatography (UPLC) and confirmed by tandem mass spectrometry (MS/MS) using multiple reaction monitoring mode (MRM). Some of the targeted compounds included diphenylamines, centralites, nitrotoluenes, nitroglycerin, and various phthalates. The compounds were ionized in the MS source using simultaneous positive and negative electrospray ionization (ESI) with negative atmospheric pressure chemical ionization (APCI) in order to detect all compounds in a single analysis. The developed UPLC/MS/MS method was applied to commercially available smokeless powders and gunshot residue samples recovered from the hands of shooters, spent cartridges, and smokeless powder retrieved from unfired cartridges. Distinct compositions were identified for smokeless powders from different manufacturers and from separate manufacturing lots. The procedure also produced specific chemical profiles when tested on gunshot residues from different manufacturers. Overall, this thesis represents the development of a rapid and reproducible procedure capable of simultaneously detecting the widest possible range of components present in organic gunshot residue.^
Resumo:
In an effort to improve instruction and better accommodate the needs of students, community colleges are offering courses delivered in a variety of delivery formats that require students to have some level of technology fluency to be successful in the course. This study was conducted to investigate the relationship between student socioeconomic status (SES), course delivery method, and course type on enrollment, final course grades, course completion status, and course passing status at a state college. ^ A dataset for 20,456 students of low and not low SES enrolled in science, technology, engineering, and mathematics (STEM) course types delivered using traditional, online, blended, and web enhanced course delivery formats at Miami Dade College, a large open access 4-year state college located in Miami-Dade County, Florida, was analyzed. A factorial ANOVA using course type, course delivery method, and student SES found no significant differences in final course grades when used to determine if course delivery methods were equally effective for students of low and not low SES taking STEM course types. Additionally, three chi-square goodness-of-fit tests were used to investigate for differences in enrollment, course completion and course passing status by SES, course type, and course delivery method. The findings of the chi-square tests indicated that: (a) there were significant differences in enrollment by SES and course delivery methods for the Engineering/Technology, Math, and overall course types but not for the Natural Science course type and (b) there were no significant differences in course completion status and course passing status by SES and course types overall and SES and course delivery methods overall. However, there were statistically significant but weak relationships between course passing status, SES and the math course type as well as between course passing status, SES, and online and traditional course delivery methods. ^ The mixed findings in the study indicate that strides have been made in closing the theoretical gap in education and technology skills that may exist for students of different SES levels. MDC's course delivery and student support models may assist other institutions address student success in courses that necessitate students having some level of technology fluency. ^