8 resultados para Arabic alphabet Data processing
em CORA - Cork Open Research Archive - University College Cork - Ireland
Resumo:
A substantial amount of information on the Internet is present in the form of text. The value of this semi-structured and unstructured data has been widely acknowledged, with consequent scientific and commercial exploitation. The ever-increasing data production, however, pushes data analytic platforms to their limit. This thesis proposes techniques for more efficient textual big data analysis suitable for the Hadoop analytic platform. This research explores the direct processing of compressed textual data. The focus is on developing novel compression methods with a number of desirable properties to support text-based big data analysis in distributed environments. The novel contributions of this work include the following. Firstly, a Content-aware Partial Compression (CaPC) scheme is developed. CaPC makes a distinction between informational and functional content in which only the informational content is compressed. Thus, the compressed data is made transparent to existing software libraries which often rely on functional content to work. Secondly, a context-free bit-oriented compression scheme (Approximated Huffman Compression) based on the Huffman algorithm is developed. This uses a hybrid data structure that allows pattern searching in compressed data in linear time. Thirdly, several modern compression schemes have been extended so that the compressed data can be safely split with respect to logical data records in distributed file systems. Furthermore, an innovative two layer compression architecture is used, in which each compression layer is appropriate for the corresponding stage of data processing. Peripheral libraries are developed that seamlessly link the proposed compression schemes to existing analytic platforms and computational frameworks, and also make the use of the compressed data transparent to developers. The compression schemes have been evaluated for a number of standard MapReduce analysis tasks using a collection of real-world datasets. In comparison with existing solutions, they have shown substantial improvement in performance and significant reduction in system resource requirements.
Resumo:
Two complementary wireless sensor nodes for building two-tiered heterogeneous networks are presented. A larger node with a 25 mm by 25 mm size acts as the backbone of the network, and can handle complex data processing. A smaller, cheaper node with a 10 mm by 10 mm size can perform simpler sensor-interfacing tasks. The 25mm node is based on previous work that has been done in the Tyndall National Institute that created a modular wireless sensor node. In this work, a new 25mm module is developed operating in the 433/868 MHz frequency bands, with a range of 3.8 km. The 10mm node is highly miniaturised, while retaining a high level of modularity. It has been designed to support very energy efficient operation for applications with low duty cycles, with a sleep current of 3.3 μA. Both nodes use commercially available components and have low manufacturing costs to allow the construction of large networks. In addition, interface boards for communicating with nodes have been developed for both the 25mm and 10mm nodes. These interface boards provide a USB connection, and support recharging of a Li-ion battery from the USB power supply. This paper discusses the design goals, the design methods, and the resulting implementation.
Resumo:
The power consumption of wireless sensor networks (WSN) module is an important practical concern in building energy management (BEM) system deployments. A set of metrics are created to assess the power profiles of WSN in real world condition. The aim of this work is to understand and eventually eliminate the uncertainties in WSN power consumption during long term deployments and the compatibility with existing and emerging energy harvesting technologies. This paper investigates the key metrics in data processing, wireless data transmission, data sensing and duty cycle parameter to understand the system power profile from a practical deployment prospective. Based on the proposed analysis, the impacts of individual metric on power consumption in a typical BEM application are presented and the subsequent low power solutions are investigated.
Resumo:
Body Sensor Network (BSN) technology is seeing a rapid emergence in application areas such as health, fitness and sports monitoring. Current BSN wireless sensors typically operate on a single frequency band (e.g. utilizing the IEEE 802.15.4 standard that operates at 2.45GHz) employing a single radio transceiver for wireless communications. This allows a simple wireless architecture to be realized with low cost and power consumption. However, network congestion/failure can create potential issues in terms of reliability of data transfer, quality-of-service (QOS) and data throughput for the sensor. These issues can be especially critical in healthcare monitoring applications where data availability and integrity is crucial. The addition of more than one radio has the potential to address some of the above issues. For example, multi-radio implementations can allow access to more than one network, providing increased coverage and data processing as well as improved interoperability between networks. A small number of multi-radio wireless sensor solutions exist at present but require the use of more than one radio transceiver devices to achieve multi-band operation. This paper presents the design of a novel prototype multi-radio hardware platform that uses a single radio transceiver. The proposed design allows multi-band operation in the 433/868MHz ISM bands and this, together with its low complexity and small form factor, make it suitable for a wide range of BSN applications.
Resumo:
Buried heat sources can be investigated by examining thermal infrared images and comparing these with the results of theoretical models which predict the thermal anomaly a given heat source may generate. Key factors influencing surface temperature include the geometry and temperature of the heat source, the surface meteorological environment, and the thermal conductivity and anisotropy of the rock. In general, a geothermal heat flux of greater than 2% of solar insolation is required to produce a detectable thermal anomaly in a thermal infrared image. A heat source of, for example, 2-300K greater than the average surface temperature must be a t depth shallower than 50m for the detection of the anomaly in a thermal infrared image, for typical terrestrial conditions. Atmospheric factors are of critical importance. While the mean atmospheric temperature has little significance, the convection is a dominant factor, and can act to swamp the thermal signature entirely. Given a steady state heat source that produces a detectable thermal anomaly, it is possible to loosely constrain the physical properties of the heat source and surrounding rock, using the surface thermal anomaly as a basis. The success of this technique is highly dependent on the degree to which the physical properties of the host rock are known. Important parameters include the surface thermal properties and thermal conductivity of the rock. Modelling of transient thermal situations was carried out, to assess the effect of time dependant thermal fluxes. One-dimensional finite element models can be readily and accurately applied to the investigation of diurnal heat flow, as with thermal inertia models. Diurnal thermal models of environments on Earth, the Moon and Mars were carried out using finite elements and found to be consistent with published measurements. The heat flow from an injection of hot lava into a near surface lava tube was considered. While this approach was useful for study, and long term monitoring in inhospitable areas, it was found to have little hazard warning utility, as the time taken for the thermal energy to propagate to the surface in dry rock (several months) in very long. The resolution of the thermal infrared imaging system is an important factor. Presently available satellite based systems such as Landsat (resolution of 120m) are inadequate for detailed study of geothermal anomalies. Airborne systems, such as TIMS (variable resolution of 3-6m) are much more useful for discriminating small buried heat sources. Planned improvements in the resolution of satellite based systems will broaden the potential for application of the techniques developed in this thesis. It is important to note, however, that adequate spatial resolution is a necessary but not sufficient condition for successful application of these techniques.
Resumo:
This work considers the static calculation of a program’s average-case time. The number of systems that currently tackle this research problem is quite small due to the difficulties inherent in average-case analysis. While each of these systems make a pertinent contribution, and are individually discussed in this work, only one of them forms the basis of this research. That particular system is known as MOQA. The MOQA system consists of the MOQA language and the MOQA static analysis tool. Its technique for statically determining average-case behaviour centres on maintaining strict control over both the data structure type and the labeling distribution. This research develops and evaluates the MOQA language implementation, and adds to the functions already available in this language. Furthermore, the theory that backs MOQA is generalised and the range of data structures for which the MOQA static analysis tool can determine average-case behaviour is increased. Also, some of the MOQA applications and extensions suggested in other works are logically examined here. For example, the accuracy of classifying the MOQA language as reversible is investigated, along with the feasibility of incorporating duplicate labels into the MOQA theory. Finally, the analyses that take place during the course of this research reveal some of the MOQA strengths and weaknesses. This thesis aims to be pragmatic when evaluating the current MOQA theory, the advancements set forth in the following work and the benefits of MOQA when compared to similar systems. Succinctly, this work’s significant expansion of the MOQA theory is accompanied by a realistic assessment of MOQA’s accomplishments and a serious deliberation of the opportunities available to MOQA in the future.
Resumo:
This paper describes implementations of two mobile cloud applications, file synchronisation and intensive data processing, using the Context Aware Mobile Cloud Services middleware, and the Cloud Personal Assistant. Both are part of the same mobile cloud project, actively developed and currently at the second version. We describe recent changes to the middleware, along with our experimental results of the two application models. We discuss challenges faced during the development of the middleware and their implications. The paper includes performance analysis of the CPA support for the two applications in respect to existing solutions.
Resumo:
Long reach passive optical networks (LR-PONs), which integrate fibre-to-the-home with metro networks, have been the subject of intensive research in recent years and are considered one of the most promising candidates for the next generation of optical access networks. Such systems ideally have reaches greater than 100km and bit rates of at least 10Gb/s per wavelength in the downstream and upstream directions. Due to the limited equipment sharing that is possible in access networks, the laser transmitters in the terminal units, which are usually the most expensive components, must be as cheap as possible. However, the requirement for low cost is generally incompatible with the need for a transmitter chirp characteristic that is optimised for such long reaches at 10Gb/s, and hence dispersion compensation is required. In this thesis electronic dispersion compensation (EDC) techniques are employed to increase the chromatic dispersion tolerance and to enhance the system performance at the expense of moderate additional implementation complexity. In order to use such EDC in LR-PON architectures, a number of challenges associated with the burst-mode nature of the upstream link need to be overcome. In particular, the EDC must be made adaptive from one burst to the next (burst-mode EDC, or BM-EDC) in time scales on the order of tens to hundreds of nanoseconds. Burst-mode operation of EDC has received little attention to date. The main objective of this thesis is to demonstrate the feasibility of such a concept and to identify the key BM-EDC design parameters required for applications in a 10Gb/s burst-mode link. This is achieved through a combination of simulations and transmission experiments utilising off-line data processing. The research shows that burst-to-burst adaptation can in principle be implemented efficiently, opening the possibility of low overhead, adaptive EDC-enabled burst-mode systems.