13 resultados para High-dimensional data visualization
em Digital Commons at Florida International University
Resumo:
Due to the rapid advances in computing and sensing technologies, enormous amounts of data are being generated everyday in various applications. The integration of data mining and data visualization has been widely used to analyze these massive and complex data sets to discover hidden patterns. For both data mining and visualization to be effective, it is important to include the visualization techniques in the mining process and to generate the discovered patterns for a more comprehensive visual view. In this dissertation, four related problems: dimensionality reduction for visualizing high dimensional datasets, visualization-based clustering evaluation, interactive document mining, and multiple clusterings exploration are studied to explore the integration of data mining and data visualization. In particular, we 1) propose an efficient feature selection method (reliefF + mRMR) for preprocessing high dimensional datasets; 2) present DClusterE to integrate cluster validation with user interaction and provide rich visualization tools for users to examine document clustering results from multiple perspectives; 3) design two interactive document summarization systems to involve users efforts and generate customized summaries from 2D sentence layouts; and 4) propose a new framework which organizes the different input clusterings into a hierarchical tree structure and allows for interactive exploration of multiple clustering solutions.
Resumo:
Because some Web users will be able to design a template to visualize information from scratch, while other users need to automatically visualize information by changing some parameters, providing different levels of customization of the information is a desirable goal. Our system allows the automatic generation of visualizations given the semantics of the data, and the static or pre-specified visualization by creating an interface language. We address information visualization taking into consideration the Web, where the presentation of the retrieved information is a challenge. ^ We provide a model to narrow the gap between the user's way of expressing queries and database manipulation languages (SQL) without changing the system itself thus improving the query specification process. We develop a Web interface model that is integrated with the HTML language to create a powerful language that facilitates the construction of Web-based database reports. ^ As opposed to other papers, this model offers a new way of exploring databases focusing on providing Web connectivity to databases with minimal or no result buffering, formatting, or extra programming. We describe how to easily connect the database to the Web. In addition, we offer an enhanced way on viewing and exploring the contents of a database, allowing users to customize their views depending on the contents and the structure of the data. Current database front-ends typically attempt to display the database objects in a flat view making it difficult for users to grasp the contents and the structure of their result. Our model narrows the gap between databases and the Web. ^ The overall objective of this research is to construct a model that accesses different databases easily across the net and generates SQL, forms, and reports across all platforms without requiring the developer to code a complex application. This increases the speed of development. In addition, using only the Web browsers, the end-user can retrieve data from databases remotely to make necessary modifications and manipulations of data using the Web formatted forms and reports, independent of the platform, without having to open different applications, or learn to use anything but their Web browser. We introduce a strategic method to generate and construct SQL queries, enabling inexperienced users that are not well exposed to the SQL world to build syntactically and semantically a valid SQL query and to understand the retrieved data. The generated SQL query will be validated against the database schema to ensure harmless and efficient SQL execution. (Abstract shortened by UMI.)^
Resumo:
With the exponential increasing demands and uses of GIS data visualization system, such as urban planning, environment and climate change monitoring, weather simulation, hydrographic gauge and so forth, the geospatial vector and raster data visualization research, application and technology has become prevalent. However, we observe that current web GIS techniques are merely suitable for static vector and raster data where no dynamic overlaying layers. While it is desirable to enable visual explorations of large-scale dynamic vector and raster geospatial data in a web environment, improving the performance between backend datasets and the vector and raster applications remains a challenging technical issue. This dissertation is to implement these challenging and unimplemented areas: how to provide a large-scale dynamic vector and raster data visualization service with dynamic overlaying layers accessible from various client devices through a standard web browser, and how to make the large-scale dynamic vector and raster data visualization service as rapid as the static one. To accomplish these, a large-scale dynamic vector and raster data visualization geographic information system based on parallel map tiling and a comprehensive performance improvement solution are proposed, designed and implemented. They include: the quadtree-based indexing and parallel map tiling, the Legend String, the vector data visualization with dynamic layers overlaying, the vector data time series visualization, the algorithm of vector data rendering, the algorithm of raster data re-projection, the algorithm for elimination of superfluous level of detail, the algorithm for vector data gridding and re-grouping and the cluster servers side vector and raster data caching.
Resumo:
Lake Analyzer is a numerical code coupled with supporting visualization tools for determining indices of mixing and stratification that are critical to the biogeochemical cycles of lakes and reservoirs. Stability indices, including Lake Number, Wedderburn Number, Schmidt Stability, and thermocline depth are calculated according to established literature definitions and returned to the user in a time series format. The program was created for the analysis of high-frequency data collected from instrumented lake buoys, in support of the emerging field of aquatic sensor network science. Available outputs for the Lake Analyzer program are: water temperature (error-checked and/or down-sampled), wind speed (error-checked and/or down-sampled), metalimnion extent (top and bottom), thermocline depth, friction velocity, Lake Number, Wedderburn Number, Schmidt Stability, mode-1 vertical seiche period, and Brunt-Väisälä buoyancy frequency. Secondary outputs for several of these indices delineate the parent thermocline depth (seasonal thermocline) from the shallower secondary or diurnal thermocline. Lake Analyzer provides a program suite and best practices for the comparison of mixing and stratification indices in lakes across gradients of climate, hydro-physiography, and time, and enables a more detailed understanding of the resulting biogeochemical transformations at different spatial and temporal scales.
Resumo:
With the recent explosion in the complexity and amount of digital multimedia data, there has been a huge impact on the operations of various organizations in distinct areas, such as government services, education, medical care, business, entertainment, etc. To satisfy the growing demand of multimedia data management systems, an integrated framework called DIMUSE is proposed and deployed for distributed multimedia applications to offer a full scope of multimedia related tools and provide appealing experiences for the users. This research mainly focuses on video database modeling and retrieval by addressing a set of core challenges. First, a comprehensive multimedia database modeling mechanism called Hierarchical Markov Model Mediator (HMMM) is proposed to model high dimensional media data including video objects, low-level visual/audio features, as well as historical access patterns and frequencies. The associated retrieval and ranking algorithms are designed to support not only the general queries, but also the complicated temporal event pattern queries. Second, system training and learning methodologies are incorporated such that user interests are mined efficiently to improve the retrieval performance. Third, video clustering techniques are proposed to continuously increase the searching speed and accuracy by architecting a more efficient multimedia database structure. A distributed video management and retrieval system is designed and implemented to demonstrate the overall performance. The proposed approach is further customized for a mobile-based video retrieval system to solve the perception subjectivity issue by considering individual user's profile. Moreover, to deal with security and privacy issues and concerns in distributed multimedia applications, DIMUSE also incorporates a practical framework called SMARXO, which supports multilevel multimedia security control. SMARXO efficiently combines role-based access control (RBAC), XML and object-relational database management system (ORDBMS) to achieve the target of proficient security control. A distributed multimedia management system named DMMManager (Distributed MultiMedia Manager) is developed with the proposed framework DEMUR; to support multimedia capturing, analysis, retrieval, authoring and presentation in one single framework.
Resumo:
Background: Biologists often need to assess whether unfamiliar datasets warrant the time investment required for more detailed exploration. Basing such assessments on brief descriptions provided by data publishers is unwieldy for large datasets that contain insights dependent on specific scientific questions. Alternatively, using complex software systems for a preliminary analysis may be deemed as too time consuming in itself, especially for unfamiliar data types and formats. This may lead to wasted analysis time and discarding of potentially useful data. Results: We present an exploration of design opportunities that the Google Maps interface offers to biomedical data visualization. In particular, we focus on synergies between visualization techniques and Google Maps that facilitate the development of biological visualizations which have both low-overhead and sufficient expressivity to support the exploration of data at multiple scales. The methods we explore rely on displaying pre-rendered visualizations of biological data in browsers, with sparse yet powerful interactions, by using the Google Maps API. We structure our discussion around five visualizations: a gene co-regulation visualization, a heatmap viewer, a genome browser, a protein interaction network, and a planar visualization of white matter in the brain. Feedback from collaborative work with domain experts suggests that our Google Maps visualizations offer multiple, scale-dependent perspectives and can be particularly helpful for unfamiliar datasets due to their accessibility. We also find that users, particularly those less experienced with computer use, are attracted by the familiarity of the Google Maps API. Our five implementations introduce design elements that can benefit visualization developers. Conclusions: We describe a low-overhead approach that lets biologists access readily analyzed views of unfamiliar scientific datasets. We rely on pre-computed visualizations prepared by data experts, accompanied by sparse and intuitive interactions, and distributed via the familiar Google Maps framework. Our contributions are an evaluation demonstrating the validity and opportunities of this approach, a set of design guidelines benefiting those wanting to create such visualizations, and five concrete example visualizations.
Resumo:
This paper reflects a research project on the influence of online news media (from print, radio, and televised outlets) on disaster response. Coverage on the October 2010 Indonesian tsunami and earthquake was gathered from 17 sources from October 26 through November 30. This data was analyzed quantitatively with respect to coverage intensity over time and among outlets. Qualitative analyses were also conducted using keywords and value scale that assessed the degree of positivity or negativity associated with that keyword in the context of accountability. Results yielded insights into the influence of online media on actors' assumption of accountability and quality of response. It also provided information as to the optimal time window in which advocates and disaster management specialists can best present recommendations to improve policy and raise awareness. Coverage of outlets was analyzed individually, in groups, and as a whole, in order to discern behavior patterns for a better understanding of media interdependency. This project produced analytical insights but is primarily intended as a prototype for more refined and extensive research.
Resumo:
The need for a high quality tourism database is well known. For example, planners and managers need high quality data for budgeting, forecasting, planning marketing and advertising strategies, and staffing. Thus the concepts of quality and need are intertwined to pose a problem to the tourism professional, be they private sector or public sector employees. One could argue that collaboration by public and private sector tourism professionals could provide the best sources and uses of high quality tourism data. This discussion proposes just such a collaboration and a detailed methodology for operationalizing this arrangement.
Resumo:
An automated on-line SPE-LC-MS/MS method was developed for the quantitation of multiple classes of antibiotics in environmental waters. High sensitivity in the low ng/L range was accomplished by using large volume injections with 10-mL of sample. Positive confirmation of analytes was achieved using two selected reaction monitoring (SRM) transitions per antibiotic and quantitation was performed using an internal standard approach. Samples were extracted using online solid phase extraction, then using column switching technique; extracted samples were immediately passed through liquid chromatography and analyzed by tandem mass spectrometry. The total run time per each sample was 20 min. The statistically calculated method detection limits for various environmental samples were between 1.2 and 63 ng/L. Furthermore, the method was validated in terms of precision, accuracy and linearity. ^ The developed analytical methodology was used to measure the occurrence of antibiotics in reclaimed waters (n=56), surface waters (n=53), ground waters (n=8) and drinking waters (n=54) collected from different parts of South Florida. In reclaimed waters, the most frequently detected antibiotics were nalidixic acid, erythromycin, clarithromycin, azithromycin trimethoprim, sulfamethoxazole and ofloxacin (19.3-604.9 ng/L). Detection of antibiotics in reclaimed waters indicates that they can't be completely removed by conventional wastewater treatment process. Furthermore, the average mass loads of antibiotics released into the local environment through reclaimed water were estimated as 0.248 Kg/day. Among the surface waters samples, Miami River (reaching up to 580 ng/L) and Black Creek canal (up to 124 ng/L) showed highest concentrations of antibiotics. No traces of antibiotics were found in ground waters. On the other hand, erythromycin (monitored as anhydro erythromycin) was detected in 82% of the drinking water samples (n.d-66 ng/L). The developed approach is suitable for both research and monitoring applications.^ Major metabolites of antibiotics in reclaimed wates were identified and quantified using high resolution benchtop Q-Exactive orbitrap mass spectrometer. A phase I metabolite of erythromycin was tentatively identified in full scan based on accurate mass measurement. Using extracted ion chromatogram (XIC), high resolution data-dependent MS/MS spectra and metabolic profiling software the metabolite was identified as desmethyl anhydro erythromycin with molecular formula C36H63NO12 and m/z 702.4423. The molar concentration of the metabolite to erythromycin was in the order of 13 %. To my knowledge, this is the first known report on this metabolite in reclaimed water. Another compound acetyl-sulfamethoxazole, a phase II metabolite of sulfamethoxazole was also identified in reclaimed water and mole fraction of the metabolite represent 36 %, of the cumulative sulfamethoxazole concentration. The results were illustrating the importance to include metabolites also in the routine analysis to obtain a mass balance for better understanding of the occurrence, fate and distribution of antibiotics in the environment. ^ Finally, all the antibiotics detected in reclaimed and surface waters were investigated to assess the potential risk to the aquatic organisms. The surface water antibiotic concentrations that represented the real time exposure conditions revealed that the macrolide antibiotics, erythromycin, clarithromycin and tylosin along with quinolone antibiotic, ciprofloxacin were suspected to induce high toxicity to aquatic biota. Preliminary results showing that, among the antibiotic groups tested, macrolides posed the highest ecological threat, and therefore, they may need to be further evaluated with, long-term exposure studies considering bioaccumulation factors and more number of species selected. Overall, the occurrence of antibiotics in aquatic environment is posing an ecological health concern.^
Resumo:
An automated on-line SPE-LC-MS/MS method was developed for the quantitation of multiple classes of antibiotics in environmental waters. High sensitivity in the low ng/L range was accomplished by using large volume injections with 10-mL of sample. Positive confirmation of analytes was achieved using two selected reaction monitoring (SRM) transitions per antibiotic and quantitation was performed using an internal standard approach. Samples were extracted using online solid phase extraction, then using column switching technique; extracted samples were immediately passed through liquid chromatography and analyzed by tandem mass spectrometry. The total run time per each sample was 20 min. The statistically calculated method detection limits for various environmental samples were between 1.2 and 63 ng/L. Furthermore, the method was validated in terms of precision, accuracy and linearity. The developed analytical methodology was used to measure the occurrence of antibiotics in reclaimed waters (n=56), surface waters (n=53), ground waters (n=8) and drinking waters (n=54) collected from different parts of South Florida. In reclaimed waters, the most frequently detected antibiotics were nalidixic acid, erythromycin, clarithromycin, azithromycin trimethoprim, sulfamethoxazole and ofloxacin (19.3-604.9 ng/L). Detection of antibiotics in reclaimed waters indicates that they can’t be completely removed by conventional wastewater treatment process. Furthermore, the average mass loads of antibiotics released into the local environment through reclaimed water were estimated as 0.248 Kg/day. Among the surface waters samples, Miami River (reaching up to 580 ng/L) and Black Creek canal (up to 124 ng/L) showed highest concentrations of antibiotics. No traces of antibiotics were found in ground waters. On the other hand, erythromycin (monitored as anhydro erythromycin) was detected in 82% of the drinking water samples (n.d-66 ng/L). The developed approach is suitable for both research and monitoring applications. Major metabolites of antibiotics in reclaimed wates were identified and quantified using high resolution benchtop Q-Exactive orbitrap mass spectrometer. A phase I metabolite of erythromycin was tentatively identified in full scan based on accurate mass measurement. Using extracted ion chromatogram (XIC), high resolution data-dependent MS/MS spectra and metabolic profiling software the metabolite was identified as desmethyl anhydro erythromycin with molecular formula C36H63NO12 and m/z 702.4423. The molar concentration of the metabolite to erythromycin was in the order of 13 %. To my knowledge, this is the first known report on this metabolite in reclaimed water. Another compound acetyl-sulfamethoxazole, a phase II metabolite of sulfamethoxazole was also identified in reclaimed water and mole fraction of the metabolite represent 36 %, of the cumulative sulfamethoxazole concentration. The results were illustrating the importance to include metabolites also in the routine analysis to obtain a mass balance for better understanding of the occurrence, fate and distribution of antibiotics in the environment. Finally, all the antibiotics detected in reclaimed and surface waters were investigated to assess the potential risk to the aquatic organisms. The surface water antibiotic concentrations that represented the real time exposure conditions revealed that the macrolide antibiotics, erythromycin, clarithromycin and tylosin along with quinolone antibiotic, ciprofloxacin were suspected to induce high toxicity to aquatic biota. Preliminary results showing that, among the antibiotic groups tested, macrolides posed the highest ecological threat, and therefore, they may need to be further evaluated with, long-term exposure studies considering bioaccumulation factors and more number of species selected. Overall, the occurrence of antibiotics in aquatic environment is posing an ecological health concern.
Resumo:
This work is the first work using patterned soft underlayers in multilevel three-dimensional vertical magnetic data storage systems. The motivation stems from an exponentially growing information stockpile, and a corresponding need for more efficient storage devices with higher density. The world information stockpile currently exceeds 150EB (ExaByte=1x1018Bytes); most of which is in analog form. Among the storage technologies (semiconductor, optical and magnetic), magnetic hard disk drives are posed to occupy a big role in personal, network as well as corporate storage. However; this mode suffers from a limit known as the Superparamagnetic limit; which limits achievable areal density due to fundamental quantum mechanical stability requirements. There are many viable techniques considered to defer superparamagnetism into the 100's of Gbit/in2 such as: patterned media, Heat-Assisted Magnetic Recording (HAMR), Self Organized Magnetic Arrays (SOMA), antiferromagnetically coupled structures (AFC), and perpendicular magnetic recording. Nonetheless, these techniques utilize a single magnetic layer; and can thusly be viewed as two-dimensional in nature. In this work a novel three-dimensional vertical magnetic recording approach is proposed. This approach utilizes the entire thickness of a magnetic multilayer structure to store information; with potential areal density well into the Tbit/in2 regime. ^ There are several possible implementations for 3D magnetic recording; each presenting its own set of requirements, merits and challenges. The issues and considerations pertaining to the development of such systems will be examined, and analyzed using empirical and numerical analysis techniques. Two novel key approaches are proposed and developed: (1) Patterned soft underlayer (SUL) which allows for enhanced recording of thicker media, (2) A combinatorial approach for 3D media development that facilitates concurrent investigation of various film parameters on a predefined performance metric. A case study is presented using combinatorial overcoats of Tantalum and Zirconium Oxides for corrosion protection in magnetic media. ^ Feasibility of 3D recording is demonstrated, and an emphasis on 3D media development is emphasized as a key prerequisite. Patterned SUL shows significant enhancement over conventional "un-patterned" SUL, and shows that geometry can be used as a design tool to achieve favorable field distribution where magnetic storage and magnetic phenomena are involved. ^
Resumo:
Current reform initiatives recommend that school geometry teaching and learning include the study of three-dimensional geometric objects and provide students with opportunities to use spatial abilities in mathematical tasks. Two ways of using Geometer's Sketchpad (GSP), a dynamic and interactive computer program, in conjunction with manipulatives enable students to investigate and explore geometric concepts, especially when used in a constructivist setting. Research on spatial abilities has focused on visual reasoning to improve visualization skills. This dissertation investigated the hypothesis that connecting visual and analytic reasoning may better improve students' spatial visualization abilities as compared to instruction that makes little or no use of the connection of the two. Data were collected using the Purdue Spatial Visualization Tests (PSVT) administered as a pretest and posttest to a control and two experimental groups. Sixty-four 10th grade students in three geometry classrooms participated in the study during 6 weeks. Research questions were answered using statistical procedures. An analysis of covariance was used for a quantitative analysis, whereas a description of students' visual-analytic processing strategies was presented using qualitative methods. The quantitative results indicated that there were significant differences in gender, but not in the group factor. However, when analyzing a sub sample of 33 participants with pretest scores below the 50th percentile, males in one of the experimental groups significantly benefited from the treatment. A review of previous research also indicated that students with low visualization skills benefited more than those with higher visualization skills. The qualitative results showed that girls were more sophisticated in their visual-analytic processing strategies to solve three-dimensional tasks. It is recommended that the teaching and learning of spatial visualization start in the middle school, prior to students' more rigorous mathematics exposure in high school. A duration longer than 6 weeks for treatments in similar future research studies is also recommended.
Resumo:
Buildings and other infrastructures located in the coastal regions of the US have a higher level of wind vulnerability. Reducing the increasing property losses and causalities associated with severe windstorms has been the central research focus of the wind engineering community. The present wind engineering toolbox consists of building codes and standards, laboratory experiments, and field measurements. The American Society of Civil Engineers (ASCE) 7 standard provides wind loads only for buildings with common shapes. For complex cases it refers to physical modeling. Although this option can be economically viable for large projects, it is not cost-effective for low-rise residential houses. To circumvent these limitations, a numerical approach based on the techniques of Computational Fluid Dynamics (CFD) has been developed. The recent advance in computing technology and significant developments in turbulence modeling is making numerical evaluation of wind effects a more affordable approach. The present study targeted those cases that are not addressed by the standards. These include wind loads on complex roofs for low-rise buildings, aerodynamics of tall buildings, and effects of complex surrounding buildings. Among all the turbulence models investigated, the large eddy simulation (LES) model performed the best in predicting wind loads. The application of a spatially evolving time-dependent wind velocity field with the relevant turbulence structures at the inlet boundaries was found to be essential. All the results were compared and validated with experimental data. The study also revealed CFD’s unique flow visualization and aerodynamic data generation capabilities along with a better understanding of the complex three-dimensional aerodynamics of wind-structure interactions. With the proper modeling that realistically represents the actual turbulent atmospheric boundary layer flow, CFD can offer an economical alternative to the existing wind engineering tools. CFD’s easy accessibility is expected to transform the practice of structural design for wind, resulting in more wind-resilient and sustainable systems by encouraging optimal aerodynamic and sustainable structural/building design. Thus, this method will help ensure public safety and reduce economic losses due to wind perils.