12 resultados para temporal visualization techniques
em Digital Commons at Florida International University
Resumo:
Due to the rapid advances in computing and sensing technologies, enormous amounts of data are being generated everyday in various applications. The integration of data mining and data visualization has been widely used to analyze these massive and complex data sets to discover hidden patterns. For both data mining and visualization to be effective, it is important to include the visualization techniques in the mining process and to generate the discovered patterns for a more comprehensive visual view. In this dissertation, four related problems: dimensionality reduction for visualizing high dimensional datasets, visualization-based clustering evaluation, interactive document mining, and multiple clusterings exploration are studied to explore the integration of data mining and data visualization. In particular, we 1) propose an efficient feature selection method (reliefF + mRMR) for preprocessing high dimensional datasets; 2) present DClusterE to integrate cluster validation with user interaction and provide rich visualization tools for users to examine document clustering results from multiple perspectives; 3) design two interactive document summarization systems to involve users efforts and generate customized summaries from 2D sentence layouts; and 4) propose a new framework which organizes the different input clusterings into a hierarchical tree structure and allows for interactive exploration of multiple clustering solutions.
Resumo:
This dissertation establishes the foundation for a new 3-D visual interface integrating Magnetic Resonance Imaging (MRI) to Diffusion Tensor Imaging (DTI). The need for such an interface is critical for understanding brain dynamics, and for providing more accurate diagnosis of key brain dysfunctions in terms of neuronal connectivity. ^ This work involved two research fronts: (1) the development of new image processing and visualization techniques in order to accurately establish relational positioning of neuronal fiber tracts and key landmarks in 3-D brain atlases, and (2) the obligation to address the computational requirements such that the processing time is within the practical bounds of clinical settings. The system was evaluated using data from thirty patients and volunteers with the Brain Institute at Miami Children's Hospital. ^ Innovative visualization mechanisms allow for the first time white matter fiber tracts to be displayed alongside key anatomical structures within accurately registered 3-D semi-transparent images of the brain. ^ The segmentation algorithm is based on the calculation of mathematically-tuned thresholds and region-detection modules. The uniqueness of the algorithm is in its ability to perform fast and accurate segmentation of the ventricles. In contrast to the manual selection of the ventricles, which averaged over 12 minutes, the segmentation algorithm averaged less than 10 seconds in its execution. ^ The registration algorithm established searches and compares MR with DT images of the same subject, where derived correlation measures quantify the resulting accuracy. Overall, the images were 27% more correlated after registration, while an average of 1.5 seconds is all it took to execute the processes of registration, interpolation, and re-slicing of the images all at the same time and in all the given dimensions. ^ This interface was fully embedded into a fiber-tracking software system in order to establish an optimal research environment. This highly integrated 3-D visualization system reached a practical level that makes it ready for clinical deployment. ^
Resumo:
Background: Biologists often need to assess whether unfamiliar datasets warrant the time investment required for more detailed exploration. Basing such assessments on brief descriptions provided by data publishers is unwieldy for large datasets that contain insights dependent on specific scientific questions. Alternatively, using complex software systems for a preliminary analysis may be deemed as too time consuming in itself, especially for unfamiliar data types and formats. This may lead to wasted analysis time and discarding of potentially useful data. Results: We present an exploration of design opportunities that the Google Maps interface offers to biomedical data visualization. In particular, we focus on synergies between visualization techniques and Google Maps that facilitate the development of biological visualizations which have both low-overhead and sufficient expressivity to support the exploration of data at multiple scales. The methods we explore rely on displaying pre-rendered visualizations of biological data in browsers, with sparse yet powerful interactions, by using the Google Maps API. We structure our discussion around five visualizations: a gene co-regulation visualization, a heatmap viewer, a genome browser, a protein interaction network, and a planar visualization of white matter in the brain. Feedback from collaborative work with domain experts suggests that our Google Maps visualizations offer multiple, scale-dependent perspectives and can be particularly helpful for unfamiliar datasets due to their accessibility. We also find that users, particularly those less experienced with computer use, are attracted by the familiarity of the Google Maps API. Our five implementations introduce design elements that can benefit visualization developers. Conclusions: We describe a low-overhead approach that lets biologists access readily analyzed views of unfamiliar scientific datasets. We rely on pre-computed visualizations prepared by data experts, accompanied by sparse and intuitive interactions, and distributed via the familiar Google Maps framework. Our contributions are an evaluation demonstrating the validity and opportunities of this approach, a set of design guidelines benefiting those wanting to create such visualizations, and five concrete example visualizations.
Resumo:
Land use and transportation interaction has been a research topic for several decades. There have been efforts to identify impacts of transportation on land use from several different perspectives. One focus has been the role of transportation improvements in encouraging new land developments or relocation of activities due to improved accessibility. The impacts studied have included property values and increased development. Another focus has been on the changes in travel behavior due to better mobility and accessibility. Most studies to date have been conducted in metropolitan level, thus unable to account for interactions spatially and temporally at smaller geographic scales. ^ In this study, a framework for studying the temporal interactions between transportation and land use was proposed and applied to three selected corridor areas in Miami-Dade County, Florida. The framework consists of two parts: one is developing of temporal data and the other is applying time series analysis to this temporal data to identify their dynamic interactions. Temporal GIS databases were constructed and used to compile building permit data and transportation improvement projects. Two types of time series analysis approaches were utilized: univariate models and multivariate models. Time series analysis is designed to describe the dynamic consequences of time series by developing models and forecasting the future of the system based on historical trends. Model estimation results from the selected corridors were then compared. ^ It was found that the time series models predicted residential development better than commercial development. It was also found that results from three study corridors varied in terms of the magnitude of impacts, length of lags, significance of the variables, and the model structure. Long-run effect or cumulated impact of transportation improvement on land developments was also measured with time series techniques. The study offered evidence that congestion negatively impacted development and transportation investments encouraged land development. ^
Resumo:
An Automatic Vehicle Location (AVL) system is a computer-based vehicle tracking system that is capable of determining a vehicle's location in real time. As a major technology of the Advanced Public Transportation System (APTS), AVL systems have been widely deployed by transit agencies for purposes such as real-time operation monitoring, computer-aided dispatching, and arrival time prediction. AVL systems make a large amount of transit performance data available that are valuable for transit performance management and planning purposes. However, the difficulties of extracting useful information from the huge spatial-temporal database have hindered off-line applications of the AVL data. ^ In this study, a data mining process, including data integration, cluster analysis, and multiple regression, is proposed. The AVL-generated data are first integrated into a Geographic Information System (GIS) platform. The model-based cluster method is employed to investigate the spatial and temporal patterns of transit travel speeds, which may be easily translated into travel time. The transit speed variations along the route segments are identified. Transit service periods such as morning peak, mid-day, afternoon peak, and evening periods are determined based on analyses of transit travel speed variations for different times of day. The seasonal patterns of transit performance are investigated by using the analysis of variance (ANOVA). Travel speed models based on the clustered time-of-day intervals are developed using important factors identified as having significant effects on speed for different time-of-day periods. ^ It has been found that transit performance varied from different seasons and different time-of-day periods. The geographic location of a transit route segment also plays a role in the variation of the transit performance. The results of this research indicate that advanced data mining techniques have good potential in providing automated techniques of assisting transit agencies in service planning, scheduling, and operations control. ^
Resumo:
Since the Morris worm was released in 1988, Internet worms continue to be one of top security threats. For example, the Conficker worm infected 9 to 15 million machines in early 2009 and shut down the service of some critical government and medical networks. Moreover, it constructed a massive peer-to-peer (P2P) botnet. Botnets are zombie networks controlled by attackers setting out coordinated attacks. In recent years, botnets have become the number one threat to the Internet. The objective of this research is to characterize spatial-temporal infection structures of Internet worms, and apply the observations to study P2P-based botnets formed by worm infection. First, we infer temporal characteristics of the Internet worm infection structure, i.e., the host infection time and the worm infection sequence, and thus pinpoint patient zero or initially infected hosts. Specifically, we apply statistical estimation techniques on Darknet observations. We show analytically and empirically that our proposed estimators can significantly improve the inference accuracy. Second, we reveal two key spatial characteristics of the Internet worm infection structure, i.e., the number of children and the generation of the underlying tree topology formed by worm infection. Specifically, we apply probabilistic modeling methods and a sequential growth model. We show analytically and empirically that the number of children has asymptotically a geometric distribution with parameter 0.5, and the generation follows closely a Poisson distribution. Finally, we evaluate bot detection strategies and effects of user defenses in P2P-based botnets formed by worm infection. Specifically, we apply the observations of the number of children and demonstrate analytically and empirically that targeted detection that focuses on the nodes with the largest number of children is an efficient way to expose bots. However, we also point out that future botnets may self-stop scanning to weaken targeted detection, without greatly slowing down the speed of worm infection. We then extend the worm spatial infection structure and show empirically that user defenses, e.g. , patching or cleaning, can significantly mitigate the robustness and the effectiveness of P2P-based botnets. To counterattack, we evaluate a simple measure by future botnets that enhances topology robustness through worm re-infection.
Resumo:
With the exponential increasing demands and uses of GIS data visualization system, such as urban planning, environment and climate change monitoring, weather simulation, hydrographic gauge and so forth, the geospatial vector and raster data visualization research, application and technology has become prevalent. However, we observe that current web GIS techniques are merely suitable for static vector and raster data where no dynamic overlaying layers. While it is desirable to enable visual explorations of large-scale dynamic vector and raster geospatial data in a web environment, improving the performance between backend datasets and the vector and raster applications remains a challenging technical issue. This dissertation is to implement these challenging and unimplemented areas: how to provide a large-scale dynamic vector and raster data visualization service with dynamic overlaying layers accessible from various client devices through a standard web browser, and how to make the large-scale dynamic vector and raster data visualization service as rapid as the static one. To accomplish these, a large-scale dynamic vector and raster data visualization geographic information system based on parallel map tiling and a comprehensive performance improvement solution are proposed, designed and implemented. They include: the quadtree-based indexing and parallel map tiling, the Legend String, the vector data visualization with dynamic layers overlaying, the vector data time series visualization, the algorithm of vector data rendering, the algorithm of raster data re-projection, the algorithm for elimination of superfluous level of detail, the algorithm for vector data gridding and re-grouping and the cluster servers side vector and raster data caching.
Resumo:
Interferometric synthetic aperture radar (InSAR) techniques can successfully detect phase variations related to the water level changes in wetlands and produce spatially detailed high-resolution maps of water level changes. Despite the vast details, the usefulness of the wetland InSAR observations is rather limited, because hydrologists and water resources managers need information on absolute water level values and not on relative water level changes. We present an InSAR technique called Small Temporal Baseline Subset (STBAS) for monitoring absolute water level time series using radar interferograms acquired successively over wetlands. The method uses stage (water level) observation for calibrating the relative InSAR observations and tying them to the stage's vertical datum. We tested the STBAS technique with two-year long Radarsat-1 data acquired during 2006–2008 over the Water Conservation Area 1 (WCA1) in the Everglades wetlands, south Florida (USA). The InSAR-derived water level data were calibrated using 13 stage stations located in the study area to generate 28 successive high spatial resolution maps (50 m pixel resolution) of absolute water levels. We evaluate the quality of the STBAS technique using a root mean square error (RMSE) criterion of the difference between InSAR observations and stage measurements. The average RMSE is 6.6 cm, which provides an uncertainty estimation of the STBAS technique to monitor absolute water levels. About half of the uncertainties are attributed to the accuracy of the InSAR technique to detect relative water levels. The other half reflects uncertainties derived from tying the relative levels to the stage stations' datum.
Resumo:
Petri Nets are a formal, graphical and executable modeling technique for the specification and analysis of concurrent and distributed systems and have been widely applied in computer science and many other engineering disciplines. Low level Petri nets are simple and useful for modeling control flows but not powerful enough to define data and system functionality. High level Petri nets (HLPNs) have been developed to support data and functionality definitions, such as using complex structured data as tokens and algebraic expressions as transition formulas. Compared to low level Petri nets, HLPNs result in compact system models that are easier to be understood. Therefore, HLPNs are more useful in modeling complex systems. ^ There are two issues in using HLPNs—modeling and analysis. Modeling concerns the abstracting and representing the systems under consideration using HLPNs, and analysis deals with effective ways study the behaviors and properties of the resulting HLPN models. In this dissertation, several modeling and analysis techniques for HLPNs are studied, which are integrated into a framework that is supported by a tool. ^ For modeling, this framework integrates two formal languages: a type of HLPNs called Predicate Transition Net (PrT Net) is used to model a system's behavior and a first-order linear time temporal logic (FOLTL) to specify the system's properties. The main contribution of this dissertation with regard to modeling is to develop a software tool to support the formal modeling capabilities in this framework. ^ For analysis, this framework combines three complementary techniques, simulation, explicit state model checking and bounded model checking (BMC). Simulation is a straightforward and speedy method, but only covers some execution paths in a HLPN model. Explicit state model checking covers all the execution paths but suffers from the state explosion problem. BMC is a tradeoff as it provides a certain level of coverage while more efficient than explicit state model checking. The main contribution of this dissertation with regard to analysis is adapting BMC to analyze HLPN models and integrating the three complementary analysis techniques in a software tool to support the formal analysis capabilities in this framework. ^ The SAMTools developed for this framework in this dissertation integrates three tools: PIPE+ for HLPNs behavioral modeling and simulation, SAMAT for hierarchical structural modeling and property specification, and PIPE+Verifier for behavioral verification.^
Resumo:
Many systems and applications are continuously producing events. These events are used to record the status of the system and trace the behaviors of the systems. By examining these events, system administrators can check the potential problems of these systems. If the temporal dynamics of the systems are further investigated, the underlying patterns can be discovered. The uncovered knowledge can be leveraged to predict the future system behaviors or to mitigate the potential risks of the systems. Moreover, the system administrators can utilize the temporal patterns to set up event management rules to make the system more intelligent. With the popularity of data mining techniques in recent years, these events grad- ually become more and more useful. Despite the recent advances of the data mining techniques, the application to system event mining is still in a rudimentary stage. Most of works are still focusing on episodes mining or frequent pattern discovering. These methods are unable to provide a brief yet comprehensible summary to reveal the valuable information from the high level perspective. Moreover, these methods provide little actionable knowledge to help the system administrators to better man- age the systems. To better make use of the recorded events, more practical techniques are required. From the perspective of data mining, three correlated directions are considered to be helpful for system management: (1) Provide concise yet comprehensive summaries about the running status of the systems; (2) Make the systems more intelligence and autonomous; (3) Effectively detect the abnormal behaviors of the systems. Due to the richness of the event logs, all these directions can be solved in the data-driven manner. And in this way, the robustness of the systems can be enhanced and the goal of autonomous management can be approached. This dissertation mainly focuses on the foregoing directions that leverage tem- poral mining techniques to facilitate system management. More specifically, three concrete topics will be discussed, including event, resource demand prediction, and streaming anomaly detection. Besides the theoretic contributions, the experimental evaluation will also be presented to demonstrate the effectiveness and efficacy of the corresponding solutions.
Resumo:
Petri Nets are a formal, graphical and executable modeling technique for the specification and analysis of concurrent and distributed systems and have been widely applied in computer science and many other engineering disciplines. Low level Petri nets are simple and useful for modeling control flows but not powerful enough to define data and system functionality. High level Petri nets (HLPNs) have been developed to support data and functionality definitions, such as using complex structured data as tokens and algebraic expressions as transition formulas. Compared to low level Petri nets, HLPNs result in compact system models that are easier to be understood. Therefore, HLPNs are more useful in modeling complex systems. There are two issues in using HLPNs - modeling and analysis. Modeling concerns the abstracting and representing the systems under consideration using HLPNs, and analysis deals with effective ways study the behaviors and properties of the resulting HLPN models. In this dissertation, several modeling and analysis techniques for HLPNs are studied, which are integrated into a framework that is supported by a tool. For modeling, this framework integrates two formal languages: a type of HLPNs called Predicate Transition Net (PrT Net) is used to model a system's behavior and a first-order linear time temporal logic (FOLTL) to specify the system's properties. The main contribution of this dissertation with regard to modeling is to develop a software tool to support the formal modeling capabilities in this framework. For analysis, this framework combines three complementary techniques, simulation, explicit state model checking and bounded model checking (BMC). Simulation is a straightforward and speedy method, but only covers some execution paths in a HLPN model. Explicit state model checking covers all the execution paths but suffers from the state explosion problem. BMC is a tradeoff as it provides a certain level of coverage while more efficient than explicit state model checking. The main contribution of this dissertation with regard to analysis is adapting BMC to analyze HLPN models and integrating the three complementary analysis techniques in a software tool to support the formal analysis capabilities in this framework. The SAMTools developed for this framework in this dissertation integrates three tools: PIPE+ for HLPNs behavioral modeling and simulation, SAMAT for hierarchical structural modeling and property specification, and PIPE+Verifier for behavioral verification.
Resumo:
With the exponential growth of the usage of web-based map services, the web GIS application has become more and more popular. Spatial data index, search, analysis, visualization and the resource management of such services are becoming increasingly important to deliver user-desired Quality of Service. First, spatial indexing is typically time-consuming and is not available to end-users. To address this, we introduce TerraFly sksOpen, an open-sourced an Online Indexing and Querying System for Big Geospatial Data. Integrated with the TerraFly Geospatial database [1-9], sksOpen is an efficient indexing and query engine for processing Top-k Spatial Boolean Queries. Further, we provide ergonomic visualization of query results on interactive maps to facilitate the user’s data analysis. Second, due to the highly complex and dynamic nature of GIS systems, it is quite challenging for the end users to quickly understand and analyze the spatial data, and to efficiently share their own data and analysis results with others. Built on the TerraFly Geo spatial database, TerraFly GeoCloud is an extra layer running upon the TerraFly map and can efficiently support many different visualization functions and spatial data analysis models. Furthermore, users can create unique URLs to visualize and share the analysis results. TerraFly GeoCloud also enables the MapQL technology to customize map visualization using SQL-like statements [10]. Third, map systems often serve dynamic web workloads and involve multiple CPU and I/O intensive tiers, which make it challenging to meet the response time targets of map requests while using the resources efficiently. Virtualization facilitates the deployment of web map services and improves their resource utilization through encapsulation and consolidation. Autonomic resource management allows resources to be automatically provisioned to a map service and its internal tiers on demand. v-TerraFly are techniques to predict the demand of map workloads online and optimize resource allocations, considering both response time and data freshness as the QoS target. The proposed v-TerraFly system is prototyped on TerraFly, a production web map service, and evaluated using real TerraFly workloads. The results show that v-TerraFly can accurately predict the workload demands: 18.91% more accurate; and efficiently allocate resources to meet the QoS target: improves the QoS by 26.19% and saves resource usages by 20.83% compared to traditional peak load-based resource allocation.