9 resultados para PARTITION
em Digital Commons at Florida International University
Resumo:
Disk drives are the bottleneck in the processing of large amounts of data used in almost all common applications. File systems attempt to reduce this by storing data sequentially on the disk drives, thereby reducing the access latencies. Although this strategy is useful when data is retrieved sequentially, the access patterns in real world workloads is not necessarily sequential and this mismatch results in storage I/O performance degradation. This thesis demonstrates that one way to improve the storage performance is to reorganize data on disk drives in the same way in which it is mostly accessed. We identify two classes of accesses: static, where access patterns do not change over the lifetime of the data and dynamic, where access patterns frequently change over short durations of time, and propose, implement and evaluate layout strategies for each of these. Our strategies are implemented in a way that they can be seamlessly integrated or removed from the system as desired. We evaluate our layout strategies for static policies using tree-structured XML data where accesses to the storage device are mostly of two kinds—parent-to-child or child-to-sibling. Our results show that for a specific class of deep-focused queries, the existing file system layout policy performs better by 5–54X. For the non-deep-focused queries, our native layout mechanism shows an improvement of 3–127X. To improve performance of the dynamic access patterns, we implement a self-optimizing storage system that performs rearranges popular block accesses on a dedicated partition based on the observed workload characteristics. Our evaluation shows an improvement of over 80% in the disk busy times over a range of workloads. These results show that applying the knowledge of data access patterns for allocation decisions can substantially improve the I/O performance.
Resumo:
A wireless mesh network is a mesh network implemented over a wireless network system such as wireless LANs. Wireless Mesh Networks(WMNs) are promising for numerous applications such as broadband home networking, enterprise networking, transportation systems, health and medical systems, security surveillance systems, etc. Therefore, it has received considerable attention from both industrial and academic researchers. This dissertation explores schemes for resource management and optimization in WMNs by means of network routing and network coding.^ In this dissertation, we propose three optimization schemes. (1) First, a triple-tier optimization scheme is proposed for load balancing objective. The first tier mechanism achieves long-term routing optimization, and the second tier mechanism, using the optimization results obtained from the first tier mechanism, performs the short-term adaptation to deal with the impact of dynamic channel conditions. A greedy sub-channel allocation algorithm is developed as the third tier optimization scheme to further reduce the congestion level in the network. We conduct thorough theoretical analysis to show the correctness of our design and give the properties of our scheme. (2) Then, a Relay-Aided Network Coding scheme called RANC is proposed to improve the performance gain of network coding by exploiting the physical layer multi-rate capability in WMNs. We conduct rigorous analysis to find the design principles and study the tradeoff in the performance gain of RANC. Based on the analytical results, we provide a practical solution by decomposing the original design problem into two sub-problems, flow partition problem and scheduling problem. (3) Lastly, a joint optimization scheme of the routing in the network layer and network coding-aware scheduling in the MAC layer is introduced. We formulate the network optimization problem and exploit the structure of the problem via dual decomposition. We find that the original problem is composed of two problems, routing problem in the network layer and scheduling problem in the MAC layer. These two sub-problems are coupled through the link capacities. We solve the routing problem by two different adaptive routing algorithms. We then provide a distributed coding-aware scheduling algorithm. According to corresponding experiment results, the proposed schemes can significantly improve network performance.^
Resumo:
With the explosive growth of the volume and complexity of document data (e.g., news, blogs, web pages), it has become a necessity to semantically understand documents and deliver meaningful information to users. Areas dealing with these problems are crossing data mining, information retrieval, and machine learning. For example, document clustering and summarization are two fundamental techniques for understanding document data and have attracted much attention in recent years. Given a collection of documents, document clustering aims to partition them into different groups to provide efficient document browsing and navigation mechanisms. One unrevealed area in document clustering is that how to generate meaningful interpretation for the each document cluster resulted from the clustering process. Document summarization is another effective technique for document understanding, which generates a summary by selecting sentences that deliver the major or topic-relevant information in the original documents. How to improve the automatic summarization performance and apply it to newly emerging problems are two valuable research directions. To assist people to capture the semantics of documents effectively and efficiently, the dissertation focuses on developing effective data mining and machine learning algorithms and systems for (1) integrating document clustering and summarization to obtain meaningful document clusters with summarized interpretation, (2) improving document summarization performance and building document understanding systems to solve real-world applications, and (3) summarizing the differences and evolution of multiple document sources.
Resumo:
Annual Average Daily Traffic (AADT) is a critical input to many transportation analyses. By definition, AADT is the average 24-hour volume at a highway location over a full year. Traditionally, AADT is estimated using a mix of permanent and temporary traffic counts. Because field collection of traffic counts is expensive, it is usually done for only the major roads, thus leaving most of the local roads without any AADT information. However, AADTs are needed for local roads for many applications. For example, AADTs are used by state Departments of Transportation (DOTs) to calculate the crash rates of all local roads in order to identify the top five percent of hazardous locations for annual reporting to the U.S. DOT. ^ This dissertation develops a new method for estimating AADTs for local roads using travel demand modeling. A major component of the new method involves a parcel-level trip generation model that estimates the trips generated by each parcel. The model uses the tax parcel data together with the trip generation rates and equations provided by the ITE Trip Generation Report. The generated trips are then distributed to existing traffic count sites using a parcel-level trip distribution gravity model. The all-or-nothing assignment method is then used to assign the trips onto the roadway network to estimate the final AADTs. The entire process was implemented in the Cube demand modeling system with extensive spatial data processing using ArcGIS. ^ To evaluate the performance of the new method, data from several study areas in Broward County in Florida were used. The estimated AADTs were compared with those from two existing methods using actual traffic counts as the ground truths. The results show that the new method performs better than both existing methods. One limitation with the new method is that it relies on Cube which limits the number of zones to 32,000. Accordingly, a study area exceeding this limit must be partitioned into smaller areas. Because AADT estimates for roads near the boundary areas were found to be less accurate, further research could examine the best way to partition a study area to minimize the impact.^
Resumo:
A LLE-GC-MS method was developed to detect PPCPs in surface water samples from Big Cypress National Park, Everglades National Park and Biscayne National Park in South Florida. The most frequently found PPCPs were caffeine, DEET and triclosan with detected maximum concentration of 169 ng/L, 27.9 ng/L and 10.9 ng/L, respectively. The detection frequencies of hormones were less than PPCPs. Detected maximal concentrations of estrone, 17β-estradiol, coprostan-3-ol, coprostane and coprostan-3-one were 5.98 ng/L, 3.34 ng/L, 16.5 ng/L, 13.5 ng/L and 6.79 ng/L, respectively. An ASE-SPE-GC-MS method was developed and applied to the analysis of the sediment and soil area where reclaimed water was used for irrigation. Most analytes were below detection limits, even though some of analytes were detected in the reclaimed water at relatively high concentrations corroborating the fact that PPCPs do not significantly partition to mineral phases. An online SPE-HPLC-APPI-MS/MS method and an online SPE-HPLC-HESI-MS/MS method were developed to analyze reclaimed water and drinking water samples. In the reclaimed water study, reclaimed water samples were collected from the sprinkler for a year-long period at Florida International University Biscayne Bay Campus, where reclaimed water was reused for irrigation. Analysis results showed that several analytes were continuously detected in all reclaimed water samples. Coprostanol, bisphenol A and DEET's maximum concentration exceeded 10 μg/L (ppb). The four most frequently detected compounds were diphenhydramine (100%), DEET (98%), atenolol (98%) and carbamazepine (96%). In the study of drinking water, 54 tap water samples were collected from the Miami-Dade area. The maximum concentrations of salicylic acid, ibuprofen and DEET were 521 ng/L, 301 ng/L and 290 ng/L, respectively. The three most frequently detected compounds were DEET (93%), carbamazepine (43%) and salicylic acid (37%), respectively. Because the source of drinking water in Miami-Dade County is the relatively pristine Biscayne aquifer, these findings suggest the presence of wastewater intrusions into the delivery system or the onset of direct influence of surface waters into the shallow aquifer.
Resumo:
The purpose of this work is to increase ecological understanding of Avicennia germinans L. and Laguncularia racemosa (L.) Gaertn. F. growing in hypersaline habitats with a seasonal climate. The area has a dry season (DS) with low temperature and vapour pressure deficit (vpd), and a wet season (WS) with high temperature and slightly higher vpd. Seasonal patterns in interstitial soil water salinity suggested a lack of tidal flushing in this area to remove salt along the soil profile. The soil solution sodium/potassium (Na+/K+) ratio differed slightly along the soil profile during the DS, but during the WS it was significantly higher at the soil surface. Diurnal changes in xylem osmolality between predawn (higher) and midday (lower) were observed in both species. However, A. germinans had higher xylem osmolality compared to L. racemosa. Xylem Na+/K+ suggested higher selectivity of K+ over Na+ in both species and seasons. The water relations parameters derived from pressure–volume P–V curves were relatively stable between seasons for each species. The range of water potentials (Ψ), measured in the field, was within estimated values for turgor maintenance from P–V curves. Thus the leaves of both species were osmotically adapted to maintain continued water uptake in this hypersaline mangrove environment.
Resumo:
We evaluated metacommunity hypotheses of landscape arrangement (indicative of dispersal limitation) and environmental gradients (hydroperiod and nutrients) in structuring macroinvertebrate and fish communities in the southern Everglades. We used samples collected at sites from the eastern boundary of the southern Everglades and from Shark River Slough, to evaluate the role of these factors in metacommunity structure. We used eigenfunction spatial analysis to model community structure among sites and distance-based redundancy analysis to partition the variability in communities between spatial and environmental filters. For most animal communities, hydrological parameters had a greater influence on structure than nutrient enrichment, however both had large effects. The influence of spatial effects indicative of dispersal limitation was weak and only periphyton infauna appeared to be limited by regional dispersal. At the landscape scale, communities were well-mixed, but strongly influenced by hydrology. Local-scale species dominance was influenced by water-permanence and nutrient enrichment. Nutrient enrichment is limited to water inflow points associated with canals, which may explain its impact in this data set. Hydroperiod and nutrient enrichment are controlled by water managers; our analysis indicates that the decisions they make have strong effects on the communities at the base of the Everglades food web.
Mercury interactions with suspended solids at the Upper East Fork Poplar Creek, Oak Ridge, Tennessee
Resumo:
A water quality model was developed to analyze the impact of hydrological events on mercury contamination of the Upper East Fork Poplar Creek, Tennessee. The model simulates surface and subsurface hydrology and transport (MIKE SHE and MIKE 11) and it is coupled with the reactive transport of sediments and mercury (ECOLAB). The model was used to simulate the distribution of mercury contamination in the water and sediments as a function of daily hydrological events. Results from the model show a high correlation between suspended solids and mercury in the water due to the affinity of mercury with suspended organics. The governing parameters for the distribution of total suspended solids and mercury contamination were the critical velocity of the stream for particle resuspension, the rates of resuspension and production of particles, settling velocity, soil-water partition coefficient, and desorption rate of mercury in the water. Flow and load duration curves at the watershed exit were used to calibrate the model and to determine the impact of hydrological events on the total maximum daily load at Station 17. The results confirmed the strong link between hydrology and mercury transport.
Resumo:
Disk drives are the bottleneck in the processing of large amounts of data used in almost all common applications. File systems attempt to reduce this by storing data sequentially on the disk drives, thereby reducing the access latencies. Although this strategy is useful when data is retrieved sequentially, the access patterns in real world workloads is not necessarily sequential and this mismatch results in storage I/O performance degradation. This thesis demonstrates that one way to improve the storage performance is to reorganize data on disk drives in the same way in which it is mostly accessed. We identify two classes of accesses: static, where access patterns do not change over the lifetime of the data and dynamic, where access patterns frequently change over short durations of time, and propose, implement and evaluate layout strategies for each of these. Our strategies are implemented in a way that they can be seamlessly integrated or removed from the system as desired. We evaluate our layout strategies for static policies using tree-structured XML data where accesses to the storage device are mostly of two kinds - parent-tochild or child-to-sibling. Our results show that for a specific class of deep-focused queries, the existing file system layout policy performs better by 5-54X. For the non-deep-focused queries, our native layout mechanism shows an improvement of 3-127X. To improve performance of the dynamic access patterns, we implement a self-optimizing storage system that performs rearranges popular block accesses on a dedicated partition based on the observed workload characteristics. Our evaluation shows an improvement of over 80% in the disk busy times over a range of workloads. These results show that applying the knowledge of data access patterns for allocation decisions can substantially improve the I/O performance.