947 resultados para Data pre-processing
Resumo:
This thesis is a study of three techniques to improve performance of some standard fore-casting models, application to the energy demand and prices. We focus on forecasting demand and price one-day ahead. First, the wavelet transform was used as a pre-processing procedure with two approaches: multicomponent-forecasts and direct-forecasts. We have empirically compared these approaches and found that the former consistently outperformed the latter. Second, adaptive models were introduced to continuously update model parameters in the testing period by combining ?lters with standard forecasting methods. Among these adaptive models, the adaptive LR-GARCH model was proposed for the fi?rst time in the thesis. Third, with regard to noise distributions of the dependent variables in the forecasting models, we used either Gaussian or Student-t distributions. This thesis proposed a novel algorithm to infer parameters of Student-t noise models. The method is an extension of earlier work for models that are linear in parameters to the non-linear multilayer perceptron. Therefore, the proposed method broadens the range of models that can use a Student-t noise distribution. Because these techniques cannot stand alone, they must be combined with prediction models to improve their performance. We combined these techniques with some standard forecasting models: multilayer perceptron, radial basis functions, linear regression, and linear regression with GARCH. These techniques and forecasting models were applied to two datasets from the UK energy markets: daily electricity demand (which is stationary) and gas forward prices (non-stationary). The results showed that these techniques provided good improvement to prediction performance.
Resumo:
The purpose of this work is to gain knowledge on kinetics of biomass decomposition under oxidative atmospheres, mainly examining effect of heating rate on different biomass species. Two sets of experiments are carried out: the first set of experiments is thermal decomposition of four different wood particles, namely aspens, birch, oak and pine under an oxidative atmosphere and analysis with TGA; and the second set is to use large size samples of wood under different heat fluxes in a purpose-built furnace, where the temperature distribution, mass loss and ignition characteristics are recorded and analyzed by a data post-processing system. The experimental data is then used to develop a two-step reactions kinetic scheme with low and high temperature regions while the activation energy for the reactions of the species under different heating rates is calculated. It is found that the activation energy of the second stage reaction for the species with similar constituent fractions tends to converge to a similar value under the high heating rate.
Resumo:
Neuroimaging is increasingly used to understand conditions like stroke and epilepsy. However, there is growing recognition that neuroimaging can raise ethical issues. We used interpretative phenomenological analysis to analyse interview data pre-and post-scan to explore these ethical issues. Findings show participants can become anxious prior to scanning and the protocol for managing incidental findings is unclear. Participants lacked a frame of reference to contextualize their expectations and often drew on medical narratives. Recommendations to reduce anxiety include dialogue between researcher and participant to clarify understanding during consent and the use of a `virtual tour' of the neuroimaging experience.
Resumo:
This thesis documents the design, manufacture and testing of a passive and non-invasive micro-scale planar particle-from-fluid filter for segregating cell types from a homogeneous suspension. The microfluidics system can be used to separate spermatogenic cells from testis biopsy samples, providing a mechanism for filtrate retrieval for assisted reproduction therapy. The system can also be used for point-of-service diagnostics applications for hospitals, lab-on-a-chip pre-processing and field applications such as clinical testing in the third world. Various design concepts are developed and manufactured, and are assessed based on etched structure morphology, robustness to variations in the manufacturing process, and design impacts on fluid flow and particle separation characteristics. Segregation was measured using image processing algorithms that demonstrate efficiency is more than 55% for 1 µl volumes at populations exceeding 1 x 107. the technique supports a significant reduction in time over conventional processing, in the separation and identification of particle groups, offering a potential reduction in the associated cost of the targeted procedure. The thesis has developed a model of quasi-steady wetting flow within the micro channel and identifies the forces across the system during post-wetting equalisation. The model and its underlying assumptions are validated empirically in microfabricated test structures through a novel Micro-Particle Image Velocimetry technique. The prototype devices do not require ancillary equipment nor additional filtration media, and therefore offer fewer opportunities for sample contamination over conventional processing methods. The devices are disposable with minimal reagent volumes and process waste. Optimal processing parameters and production methods are identified with any improvements that could be made to enhance their performance in a number of identified potential applications.
Resumo:
Secondary fibre paper mills are significant users of both heat and electricity which is mainly derived from the combustion of fossil fuels. The cost of producing this energy is increasing year upon year. These mills are also significant producers of fibrous sludge and reject waste material which can contain high amounts of useful energy. Currently the majority of these waste fractions are disposed of by landfill, land-spread or incineration using natural gas. These disposal methods not only present environmental problems but are also very costly. The focus of this work was to utilise the waste fractions produced at secondary fibre paper mills for the on-site production of combined heat and power (CHP) using advanced thermal conversion methods (gasification and pyrolysis), well suited to relatively small scales of throughput. The heat and power can either be used on-site or exported. The first stage of the work was the development of methods to condition selected paper industry wastes to enable thermal conversion. This stage required detailed characterisation of the waste streams in terms of proximate and ultimate analysis and heat content. Suitable methods to dry and condition the wastes in preparation for thermal conversion were also explored. Through trials at pilot scale with both fixed bed downdraft gasification and intermediate pyrolysis systems, the energy recovered from selected wastes and waste blends in the form of product gas and pyrolysis products was quantified. The optimal process routes were selected based on the experimental results, and implementation studies were carried out at the selected candidate mills. The studies consider the pre-processing of the wastes, thermal conversion, and full integration of the energy products. The final stage of work was an economic analysis to quantify economic gain, return on investment and environmental benefits from the proposed processes.
Resumo:
The morphology of asphalt mixture can be defined as a set of parameters describing the geometrical characteristics of its constituent materials, their relative proportions as well as spatial arrangement in the mixture. The present study is carried out to investigate the effect of the morphology on its meso- and macro-mechanical response. An analysis approach is used for the meso-structural characterisation based on the X-ray computed tomography (CT) data. Image processing techniques are used to systematically vary the internal structure to obtain different morphology structures. A morphology framework is used to characterise the average mastic coating thickness around the main load carrying structure in the structures. The uniaxial tension simulation shows that the mixtures with the lowest coating thickness exhibit better inter-particle interaction with more continuous load distribution chains between adjacent aggregate particles, less stress concentrations and less strain localisation in the mastic phase.
Resumo:
Background: Allergy is a form of hypersensitivity to normally innocuous substances, such as dust, pollen, foods or drugs. Allergens are small antigens that commonly provoke an IgE antibody response. There are two types of bioinformatics-based allergen prediction. The first approach follows FAO/WHO Codex alimentarius guidelines and searches for sequence similarity. The second approach is based on identifying conserved allergenicity-related linear motifs. Both approaches assume that allergenicity is a linearly coded property. In the present study, we applied ACC pre-processing to sets of known allergens, developing alignment-independent models for allergen recognition based on the main chemical properties of amino acid sequences.Results: A set of 684 food, 1,156 inhalant and 555 toxin allergens was collected from several databases. A set of non-allergens from the same species were selected to mirror the allergen set. The amino acids in the protein sequences were described by three z-descriptors (z1, z2 and z3) and by auto- and cross-covariance (ACC) transformation were converted into uniform vectors. Each protein was presented as a vector of 45 variables. Five machine learning methods for classification were applied in the study to derive models for allergen prediction. The methods were: discriminant analysis by partial least squares (DA-PLS), logistic regression (LR), decision tree (DT), naïve Bayes (NB) and k nearest neighbours (kNN). The best performing model was derived by kNN at k = 3. It was optimized, cross-validated and implemented in a server named AllerTOP, freely accessible at http://www.pharmfac.net/allertop. AllerTOP also predicts the most probable route of exposure. In comparison to other servers for allergen prediction, AllerTOP outperforms them with 94% sensitivity.Conclusions: AllerTOP is the first alignment-free server for in silico prediction of allergens based on the main physicochemical properties of proteins. Significantly, as well allergenicity AllerTOP is able to predict the route of allergen exposure: food, inhalant or toxin. © 2013 Dimitrov et al.; licensee BioMed Central Ltd.
Resumo:
In 2002, 2003 and 2004, we took macoinvertebrate samples on a total of 36 occasions at the Badacsony bay of Lake Balaton. Our sampling site was characterised by areas of open water (in 2003 and 2004 full of reed-grass) as well as by areas covered by common reed (Phragmites australis) and narrowleaf cattail (Typha angustifolia). Samples were taken both from water body and benthic ooze by use of a stiff hand net. We have gained our data from processing 208 individual samples. We took samples frequently from early spring until late autumn for a deeper understanding of the processes of seasonal dynamics. The main seasonal patterns and temporal changes of diversity were described. We constructed a weather-dependent simulation model of the processes of seasonal dynamics in the interest of a possible further utilization of our data in climate change research. We described the total number of individuals, biovolume and diversity of all macroinvertebrate species with a single index and used the temporal trends of this index for simulation modelling. Our discrete deterministic model includes only the impact of temperature, other interactions might only appear concealed. Running the model for different climate change scenarios it became possible to estimate conditions for the 2070-2100 period. The results, however, should be treated very prudently not only because our model is very simple but also because the scenarios are the results of different models.
Resumo:
Shadows and illumination play an important role when generating a realistic scene in computer graphics. Most of the Augmented Reality (AR) systems track markers placed in a real scene and retrieve their position and orientation to serve as a frame of reference for added computer generated content, thereby producing an augmented scene. Realistic depiction of augmented content with coherent visual cues is a desired goal in many AR applications. However, rendering an augmented scene with realistic illumination is a complex task. Many existent approaches rely on a non automated pre-processing phase to retrieve illumination parameters from the scene. Other techniques rely on specific markers that contain light probes to perform environment lighting estimation. This study aims at designing a method to create AR applications with coherent illumination and shadows, using a textured cuboid marker, that does not require a training phase to provide lighting information. Such marker may be easily found in common environments: most of product packaging satisfies such characteristics. Thus, we propose a way to estimate a directional light configuration using multiple texture tracking to render AR scenes in a realistic fashion. We also propose a novel feature descriptor that is used to perform multiple texture tracking. Our descriptor is an extension of the binary descriptor, named discrete descriptor, and outperforms current state-of-the-art methods in speed, while maintaining their accuracy.
Resumo:
Background and aims: Machine learning techniques for the text mining of cancer-related clinical documents have not been sufficiently explored. Here some techniques are presented for the pre-processing of free-text breast cancer pathology reports, with the aim of facilitating the extraction of information relevant to cancer staging.
Materials and methods: The first technique was implemented using the freely available software RapidMiner to classify the reports according to their general layout: ‘semi-structured’ and ‘unstructured’. The second technique was developed using the open source language engineering framework GATE and aimed at the prediction of chunks of the report text containing information pertaining to the cancer morphology, the tumour size, its hormone receptor status and the number of positive nodes. The classifiers were trained and tested respectively on sets of 635 and 163 manually classified or annotated reports, from the Northern Ireland Cancer Registry.
Results: The best result of 99.4% accuracy – which included only one semi-structured report predicted as unstructured – was produced by the layout classifier with the k nearest algorithm, using the binary term occurrence word vector type with stopword filter and pruning. For chunk recognition, the best results were found using the PAUM algorithm with the same parameters for all cases, except for the prediction of chunks containing cancer morphology. For semi-structured reports the performance ranged from 0.97 to 0.94 and from 0.92 to 0.83 in precision and recall, while for unstructured reports performance ranged from 0.91 to 0.64 and from 0.68 to 0.41 in precision and recall. Poor results were found when the classifier was trained on semi-structured reports but tested on unstructured.
Conclusions: These results show that it is possible and beneficial to predict the layout of reports and that the accuracy of prediction of which segments of a report may contain certain information is sensitive to the report layout and the type of information sought.
Resumo:
A simple but efficient voice activity detector based on the Hilbert transform and a dynamic threshold is presented to be used on the pre-processing of audio signals -- The algorithm to define the dynamic threshold is a modification of a convex combination found in literature -- This scheme allows the detection of prosodic and silence segments on a speech in presence of non-ideal conditions like a spectral overlapped noise -- The present work shows preliminary results over a database built with some political speech -- The tests were performed adding artificial noise to natural noises over the audio signals, and some algorithms are compared -- Results will be extrapolated to the field of adaptive filtering on monophonic signals and the analysis of speech pathologies on futures works
Resumo:
Dissertação (mestrado)—Universidade de Brasília, Faculdade de Tecnologia, Departamento de Engenharia Elétrica, 2015.
Resumo:
We propose an adaptive mesh refinement strategy based on exploiting a combination of a pre-processing mesh re-distribution algorithm employing a harmonic mapping technique, and standard (isotropic) mesh subdivision for discontinuous Galerkin approximations of advection-diffusion problems. Numerical experiments indicate that the resulting adaptive strategy can efficiently reduce the computed discretization error by clustering the nodes in the computational mesh where the analytical solution undergoes rapid variation.
Resumo:
Due to the sensitive nature of patient data, the secondary use of electronic health records (EHR) is restricted in scientific research and product development. Such restrictions pursue to preserve the privacy of respective patients by limiting the availability and variety of sensitive patient data. Current limitations do not correspond with the actual needs requested by the potential secondary users. In this thesis, the secondary use of Finnish and Swedish EHR data is explored for the purpose of enhancing the availability of such data for clinical research and product development. Involved EHR-related procedures and technologies are analysed to identify the issues limiting the secondary use of patient data. Successful secondary use of patient data increases the data value. To explore the identified circumstances, a case study of potential secondary users and use intentions regarding EHR data was carried out in Finland and Sweden. The data collection for the conducted case study was performed using semi-structured interviews. In total, 14 Finnish and Swedish experts representing scientific research, health management, and business were interviewed. The motivation for the corresponding interviews was to evaluate the protection of EHR data used for secondary purposes. The efficiency of implemented procedures and technologies was analysed in terms of data availability and privacy preserving. The results of the conducted case study show that the factors affecting EHR availability are divided to three categories: management of patient data, preservation of patients' privacy, and potential secondary users. Identified issues regarding data management included laborious and inconsistent data request procedures and the role and effect of external service providers. Based on the study findings, two secondary use approaches enabling the secondary use of EHR data are identified: data alteration and protected processing environment. Data alteration increases the availability of relevant EHR data, further decreasing the value of such data. Protected processing approach restricts the amount of potential users and use intentions while providing more valuable data content.