2 resultados para Semi-markov and markov renewal
em CORA - Cork Open Research Archive - University College Cork - Ireland
Resumo:
A substantial amount of information on the Internet is present in the form of text. The value of this semi-structured and unstructured data has been widely acknowledged, with consequent scientific and commercial exploitation. The ever-increasing data production, however, pushes data analytic platforms to their limit. This thesis proposes techniques for more efficient textual big data analysis suitable for the Hadoop analytic platform. This research explores the direct processing of compressed textual data. The focus is on developing novel compression methods with a number of desirable properties to support text-based big data analysis in distributed environments. The novel contributions of this work include the following. Firstly, a Content-aware Partial Compression (CaPC) scheme is developed. CaPC makes a distinction between informational and functional content in which only the informational content is compressed. Thus, the compressed data is made transparent to existing software libraries which often rely on functional content to work. Secondly, a context-free bit-oriented compression scheme (Approximated Huffman Compression) based on the Huffman algorithm is developed. This uses a hybrid data structure that allows pattern searching in compressed data in linear time. Thirdly, several modern compression schemes have been extended so that the compressed data can be safely split with respect to logical data records in distributed file systems. Furthermore, an innovative two layer compression architecture is used, in which each compression layer is appropriate for the corresponding stage of data processing. Peripheral libraries are developed that seamlessly link the proposed compression schemes to existing analytic platforms and computational frameworks, and also make the use of the compressed data transparent to developers. The compression schemes have been evaluated for a number of standard MapReduce analysis tasks using a collection of real-world datasets. In comparison with existing solutions, they have shown substantial improvement in performance and significant reduction in system resource requirements.
Resumo:
Reliable and fine resolution estimates of surface net-radiation are required for estimating latent and sensible heat fluxes between the land surface and the atmosphere. However, currently, fine resolution estimates of net-radiation are not available and consequently it is challenging to develop multi-year estimates of evapotranspiration at scales that can capture land surface heterogeneity and are relevant for policy and decision-making. We developed and evaluated a global net-radiation product at 5 km and 8-day resolution by combining mutually consistent atmosphere and land data from the Moderate Resolution Imaging Spectroradiometer (MODIS) on board Terra. Comparison with net-radiation measurements from 154 globally distributed sites (414 site-years) from the FLUXNET and Surface Radiation budget network (SURFRAD) showed that the net-radiation product agreed well with measurements across seasons and climate types in the extratropics (Wilmott’s index ranged from 0.74 for boreal to 0.63 for Mediterranean sites). Mean absolute deviation between the MODIS and measured net-radiation ranged from 38.0 ± 1.8 W∙m−2 in boreal to 72.0 ± 4.1 W∙m−2 in the tropical climates. The mean bias was small and constituted only 11%, 0.7%, 8.4%, 4.2%, 13.3%, and 5.4% of the mean absolute error in daytime net-radiation in boreal, Mediterranean, temperate-continental, temperate, semi-arid, and tropical climate, respectively. To assess the accuracy of the broader spatiotemporal patterns, we upscaled error-quantified MODIS net-radiation and compared it with the net-radiation estimates from the coarse spatial (1° × 1°) but high temporal resolution gridded net-radiation product from the Clouds and Earth’s Radiant Energy System (CERES). Our estimates agreed closely with the net-radiation estimates from the CERES. Difference between the two was less than 10 W•m−2 in 94% of the total land area. MODIS net-radiation product will be a valuable resource for the science community studying turbulent fluxes and energy budget at the Earth’s surface.