417 resultados para Pre-processing


Relevância:

60.00% 60.00%

Publicador:

Resumo:

This study aims to assess the accuracy of Digital Elevation Model (DEM) which is generated by using Toutin’s model. Thus, Toutin’s model was run by using OrthoEngineSE of PCI Geomatics 10.3.Thealong-track stereoimages of Advanced Spaceborne Thermal Emission and Reflection radiometer (ASTER) sensor with 15 m resolution were used to produce DEM on an area with low and near Mean Sea Level (MSL) elevation in Johor Malaysia. Despite the satisfactory pre-processing results the visual assessment of the DEM generated from Toutin’s model showed that the DEM contained many outliers and incorrect values. The failure of Toutin’s model may mostly be due to the inaccuracy and insufficiency of ASTER ephemeris data for low terrains as well as huge water body in the stereo images.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Description of a patient's injuries is recorded in narrative text form by hospital emergency departments. For statistical reporting, this text data needs to be mapped to pre-defined codes. Existing research in this field uses the Naïve Bayes probabilistic method to build classifiers for mapping. In this paper, we focus on providing guidance on the selection of a classification method. We build a number of classifiers belonging to different classification families such as decision tree, probabilistic, neural networks, and instance-based, ensemble-based and kernel-based linear classifiers. An extensive pre-processing is carried out to ensure the quality of data and, in hence, the quality classification outcome. The records with a null entry in injury description are removed. The misspelling correction process is carried out by finding and replacing the misspelt word with a soundlike word. Meaningful phrases have been identified and kept, instead of removing the part of phrase as a stop word. The abbreviations appearing in many forms of entry are manually identified and only one form of abbreviations is used. Clustering is utilised to discriminate between non-frequent and frequent terms. This process reduced the number of text features dramatically from about 28,000 to 5000. The medical narrative text injury dataset, under consideration, is composed of many short documents. The data can be characterized as high-dimensional and sparse, i.e., few features are irrelevant but features are correlated with one another. Therefore, Matrix factorization techniques such as Singular Value Decomposition (SVD) and Non Negative Matrix Factorization (NNMF) have been used to map the processed feature space to a lower-dimensional feature space. Classifiers with these reduced feature space have been built. In experiments, a set of tests are conducted to reflect which classification method is best for the medical text classification. The Non Negative Matrix Factorization with Support Vector Machine method can achieve 93% precision which is higher than all the tested traditional classifiers. We also found that TF/IDF weighting which works well for long text classification is inferior to binary weighting in short document classification. Another finding is that the Top-n terms should be removed in consultation with medical experts, as it affects the classification performance.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Narrative text is a useful way of identifying injury circumstances from the routine emergency department data collections. Automatically classifying narratives based on machine learning techniques is a promising technique, which can consequently reduce the tedious manual classification process. Existing works focus on using Naive Bayes which does not always offer the best performance. This paper proposes the Matrix Factorization approaches along with a learning enhancement process for this task. The results are compared with the performance of various other classification approaches. The impact on the classification results from the parameters setting during the classification of a medical text dataset is discussed. With the selection of right dimension k, Non Negative Matrix Factorization-model method achieves 10 CV accuracy of 0.93.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Increasingly larger scale applications are generating an unprecedented amount of data. However, the increasing gap between computation and I/O capacity on High End Computing machines makes a severe bottleneck for data analysis. Instead of moving data from its source to the output storage, in-situ analytics processes output data while simulations are running. However, in-situ data analysis incurs much more computing resource contentions with simulations. Such contentions severely damage the performance of simulation on HPE. Since different data processing strategies have different impact on performance and cost, there is a consequent need for flexibility in the location of data analytics. In this paper, we explore and analyze several potential data-analytics placement strategies along the I/O path. To find out the best strategy to reduce data movement in given situation, we propose a flexible data analytics (FlexAnalytics) framework in this paper. Based on this framework, a FlexAnalytics prototype system is developed for analytics placement. FlexAnalytics system enhances the scalability and flexibility of current I/O stack on HEC platforms and is useful for data pre-processing, runtime data analysis and visualization, as well as for large-scale data transfer. Two use cases – scientific data compression and remote visualization – have been applied in the study to verify the performance of FlexAnalytics. Experimental results demonstrate that FlexAnalytics framework increases data transition bandwidth and improves the application end-to-end transfer performance.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Frog protection has become increasingly essential due to the rapid decline of its biodiversity. Therefore, it is valuable to develop new methods for studying this biodiversity. In this paper, a novel feature extraction method is proposed based on perceptual wavelet packet decomposition for classifying frog calls in noisy environments. Pre-processing and syllable segmentation are first applied to the frog call. Then, a spectral peak track is extracted from each syllable if possible. Track duration, dominant frequency and oscillation rate are directly extracted from the track. With k-means clustering algorithm, the calculated dominant frequency of all frog species is clustered into k parts, which produce a frequency scale for wavelet packet decomposition. Based on the adaptive frequency scale, wavelet packet decomposition is applied to the frog calls. Using the wavelet packet decomposition coefficients, a new feature set named perceptual wavelet packet decomposition sub-band cepstral coefficients is extracted. Finally, a k-nearest neighbour (k-NN) classifier is used for the classification. The experiment results show that the proposed features can achieve an average classification accuracy of 97.45% which outperforms syllable features (86.87%) and Mel-frequency cepstral coefficients (MFCCs) feature (90.80%).

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Pcf11p, an essential subunit of the yeast cleavage factor IA, is required for pre‐mRNA 3′ end processing, binds to the C‐terminal domain (CTD) of the largest subunit of RNA polymerase II (RNAP II) and is involved in transcription termination. We show that the conserved CTD interaction domain (CID) of Pcf11p is essential for cell viability. Interestingly, the CTD binding and 3′ end processing activities of Pcf11p can be functionally uncoupled from each other and provided by distinct Pcf11p fragments in trans. Impaired CTD binding did not affect the 3′ end processing activity of Pcf11p and a deficiency of Pcf11p in 3′ end processing did not prevent CTD binding. Transcriptional run‐on analysis with the CYC1 gene revealed that loss of cleavage activity did not correlate with a defect in transcription termination, whereas loss of CTD binding did. We conclude that Pcf11p is a bifunctional protein and that transcript cleavage is not an obligatory step prior to RNAP II termination.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper develops and evaluates an enhanced corpus based approach for semantic processing. Corpus based models that build representations of words directly from text do not require pre-existing linguistic knowledge, and have demonstrated psychologically relevant performance on a number of cognitive tasks. However, they have been criticised in the past for not incorporating sufficient structural information. Using ideas underpinning recent attempts to overcome this weakness, we develop an enhanced tensor encoding model to build representations of word meaning for semantic processing. Our enhanced model demonstrates superior performance when compared to a robust baseline model on a number of semantic processing tasks.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Pretretament is an essential and expensive processing step for the manufacturing of ethanol from lignocellulosic raw materials. Ionic liquids are a new class of solvents that have the potential to be used as pretreatment agents. The attractive characteristics of ionic liquid pretreatment of lignocellulosics such as thermal stability, dissolution properties, fractionation potential, cellulose decrystallisation capacity and saccharification impact are investigated in this thesis. Dissolution of bagasse with 1-butyl-3-methylimidazolium chloride ([C4mim]Cl) at high temperatures (110 �‹C to 160 �‹C) is investigated as a pretreatment process. Material balances are reported and used along with enzymatic saccharification data to identify optimum pretreatment conditions (150 �‹C for 90 min). At these conditions, the dissolved and reprecipitated material is enriched in cellulose, has a low crystallinity and the cellulose component is efficiently hydrolysed (93 %, 3 h, 15 FPU). At pretreatment temperatures < 150 �‹C, the undissolved material has only slightly lower crystallinity than the starting. At pretreatment temperatures . 150 �‹C, the undissolved material has low crystallinity and when combined with the dissolved material has a saccharification rate and extent similar to completely dissolved material (100 %, 3h, 15 FPU). Complete dissolution is not necessary to maximize saccharification efficiency at temperatures . 150 �‹C. Fermentation of [C4mim]Cl-pretreated, enzyme-saccharified bagasse to ethanol is successfully conducted (85 % molar glucose-to-ethanol conversion efficiency). As compared to standard dilute acid pretreatment, the optimised [C4mim]Cl pretreatment achieves substantially higher ethanol yields (79 % cf. 52 %) in less than half the processing time (pretreatment, saccharification, fermentation). Fractionation of bagasse partially dissolved in [C4mim]Cl to a polysaccharide rich and a lignin rich fraction is attempted using aqueous biphasic systems (ABSs) and single phase systems with preferential precipitation. ABSs of ILs and concentrated aqueous inorganic salt solutions are achievable (e.g. [C4mim]Cl with 200 g L-1 NaOH), albeit they exhibit a number of technical problems including phase convergence (which increases with increasing biomass loading) and deprotonation of imidazolium ILs (5 % - 8 % mol). Single phase fractionation systems comprising lignin solvents / cellulose antisolvents, viz. NaOH (2M) and acetone in water (1:1, volume basis), afford solids with, respectively, 40 % mass and 29 % mass less lignin than water precipitated solids. However, this delignification imparts little increase in saccharification rates and extents of these solids. An alternative single phase fractionation system is achieved simply by using water as an antisolvent. Regulating the water : IL ratio results in a solution that precipitates cellulose and maintains lignin in solution (0.5 water : IL mass ratio) in both [C4mim]Cl and 1-ethyl-3-methylimidazolium acetate ([C2mim]OAc)). This water based fractionation is applied in three IL pretreatments on bagasse ([C4mim]Cl, 1-ethyl-3-methyl imidazolium chloride ([C2mim]Cl) and [C2mim]OAc). Lignin removal of 10 %, 50 % and 60 % mass respectively is achieved although only 0.3 %, 1.5 % and 11.7 % is recoverable even after ample water addition (3.5 water : IL mass ratio) and acidification (pH . 1). In addition the recovered lignin fraction contains 70 % mass hemicelluloses. The delignified, cellulose-rich bagasse recovered from these three ILs is exposed to enzyme saccharification. The saccharification (24 h, 15 FPU) of the cellulose mass in starting bagasse, achieved by these pretreatments rank as: [C2mim]OAc (83 %)>>[C2mim]Cl (53 %)=[C4mim]Cl(53%). Mass balance determinations accounted for 97 % of starting bagasse mass for the [C4mim]Cl pretreatment , 81 % for [C2mim]Cl and 79 %for [C2mim]OAc. For all three IL treatments, the remaining bagasse mass (not accounted for by mass balance determinations) is mainly (more than half) lignin that is not recoverable from the liquid fraction. After pretreatment, 100 % mass of both ions of all three ILs were recovered in the liquid fraction. Compositional characteristics of [C2mim]OAc treated solids such as low lignin, low acetyl group content and preservation of arabinosyl groups are opposite to those of chloride IL treated solids. The former biomass characteristics resemble those imparted by aqueous alkali pretreatment while the latter resemble those of aqueous acid pretreatments. The 100 % mass recovery of cellulose in [C2mim]OAc as opposed to 53 % mass recovery in [C2mim]Cl further demonstrates this since the cellulose glycosidic bonds are protected under alkali conditions. The alkyl chain length decrease in the imidazolium cation of these ILs imparts higher rates of dissolution and losses, and increases the severity of the treatment without changing the chemistry involved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Diagnostics of rolling element bearings involves a combination of different techniques of signal enhancing and analysis. The most common procedure presents a first step of order tracking and synchronous averaging, able to remove the undesired components, synchronous with the shaft harmonics, from the signal, and a final step of envelope analysis to obtain the squared envelope spectrum. This indicator has been studied thoroughly, and statistically based criteria have been obtained, in order to identify damaged bearings. The statistical thresholds are valid only if all the deterministic components in the signal have been removed. Unfortunately, in various industrial applications, characterized by heterogeneous vibration sources, the first step of synchronous averaging is not sufficient to eliminate completely the deterministic components and an additional step of pre-whitening is needed before the envelope analysis. Different techniques have been proposed in the past with this aim: The most widely spread are linear prediction filters and spectral kurtosis. Recently, a new technique for pre-whitening has been proposed, based on cepstral analysis: the so-called cepstrum pre-whitening. Owing to its low computational requirements and its simplicity, it seems a good candidate to perform the intermediate pre-whitening step in an automatic damage recognition algorithm. In this paper, the effectiveness of the new technique will be tested on the data measured on a full-scale industrial bearing test-rig, able to reproduce the harsh conditions of operation. A benchmark comparison with the traditional pre-whitening techniques will be made, as a final step for the verification of the potentiality of the cepstrum pre-whitening.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Cleavage and polyadenylation factor (CPF) is a multi‐protein complex that functions in pre‐mRNA 3′‐end formation and in the RNA polymerase II (RNAP II) transcription cycle. Ydh1p/Cft2p is an essential component of CPF but its precise role in 3′‐end processing remained unclear. We found that mutations in YDH1 inhibited both the cleavage and the polyadenylation steps of the 3′‐end formation reaction in vitro. Recently, we demonstrated that an important function of CPF lies in the recognition of poly(A) site sequences and RNA binding analyses suggesting that Ydh1p/Cft2p interacts with the poly(A) site region. Here we show that mutant ydh1 strains are deficient in the recognition of the ACT1 cleavage site in vivo. The C‐terminal domain (CTD) of RNAP II plays a major role in coupling 3′‐end processing and transcription. We provide evidence that Ydh1p/Cft2p interacts with the CTD of RNAP II, several other subunits of CPF and with Pcf11p, a component of CF IA. We propose that Ydh1p/Cft2p contributes to the formation of important interaction surfaces that mediate the dynamic association of CPF with RNAP II, the recognition of poly(A) site sequences and the assembly of the polyadenylation machinery on the RNA substrate.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This work explored the applicability of electrocoagulation (EC) using aluminium electrodes for the removal of contaminants which can scale and foul reverse osmosis membranes from a coal seam (CS) water sample, predominantly comprising sodium chloride, and sodium bicarbonate. In general, the removal efficiency of species responsible for scaling and fouling was enhanced by increasing the applied current density/voltage and contact times (30–60 s) in the EC chamber. High removal efficiencies of species potentially responsible for scale formation in reverse osmosis units such as calcium (100%), magnesium (87.9%), strontium (99.3%), barium (100%) and silicates (98.3%) were achieved. Boron was more difficult to eliminate (13.3%) and this was postulated to be due to the elevated solution pH. Similarly, fluoride removal from solution (44%) was also inhibited by the presence of hydroxide ions in the pH range 9–10. Analysis of produced flocs suggested the dominant presence of relatively amorphous boehmite (AlOOH), albeit the formation of Al(OH)3 was not ruled out as the drying process employed may have converted aluminium hydroxide to aluminium oxyhydroxide species. Evidence for adsorption of contaminants on floc surface sites was determined from FTIR studies. The quantity of aluminium released during the electrocoagulation process was higher than the Faradaic amount which suggested that the high salt concentrations in the coal seam water had chemically reacted with the aluminium electrodes.