876 resultados para Feature extraction
Resumo:
Organic compounds in Australian coal seam gas produced water (CSG water) are poorly understood despite their environmental contamination potential. In this study, the presence of some organic substances is identified from government-held CSG water-quality data from the Bowen and Surat Basins, Queensland. These records revealed the presence of polycyclic aromatic hydrocarbons (PAHs) in 27% of samples of CSG water from the Walloon Coal Measures at concentrations <1 µg/L, and it is likely these compounds leached from in situ coals. PAHs identified from wells include naphthalene, phenanthrene, chrysene and dibenz[a,h]anthracene. In addition, the likelihood of coal-derived organic compounds leaching to groundwater is assessed by undertaking toxicity leaching experiments using coal rank and water chemistry as variables. These tests suggest higher molecular weight PAHs (including benzo[a]pyrene) leach from higher rank coals, whereas lower molecular weight PAHs leach at greater concentrations from lower rank coal. Some of the identified organic compounds have carcinogenic or health risk potential, but they are unlikely to be acutely toxic at the observed concentrations which are almost negligible (largely due to the hydrophobicity of such compounds). Hence, this study will be useful to practitioners assessing CSG water related environmental and health risk.
Resumo:
Representation of facial expressions using continuous dimensions has shown to be inherently more expressive and psychologically meaningful than using categorized emotions, and thus has gained increasing attention over recent years. Many sub-problems have arisen in this new field that remain only partially understood. A comparison of the regression performance of different texture and geometric features and investigation of the correlations between continuous dimensional axes and basic categorized emotions are two of these. This paper presents empirical studies addressing these problems, and it reports results from an evaluation of different methods for detecting spontaneous facial expressions within the arousal-valence dimensional space (AV). The evaluation compares the performance of texture features (SIFT, Gabor, LBP) against geometric features (FAP-based distances), and the fusion of the two. It also compares the prediction of arousal and valence, obtained using the best fusion method, to the corresponding ground truths. Spatial distribution, shift, similarity, and correlation are considered for the six basic categorized emotions (i.e. anger, disgust, fear, happiness, sadness, surprise). Using the NVIE database, results show that the fusion of LBP and FAP features performs the best. The results from the NVIE and FEEDTUM databases reveal novel findings about the correlations of arousal and valence dimensions to each of six basic emotion categories.
Resumo:
The strain data acquired from structural health monitoring (SHM) systems play an important role in the state monitoring and damage identification of bridges. Due to the environmental complexity of civil structures, a better understanding of the actual strain data will help filling the gap between theoretical/laboratorial results and practical application. In the study, the multi-scale features of strain response are first revealed after abundant investigations on the actual data from two typical long-span bridges. Results show that, strain types at the three typical temporal scales of 10^5, 10^2 and 10^0 sec are caused by temperature change, trains and heavy trucks, and have their respective cut-off frequency in the order of 10^-2, 10^-1 and 10^0 Hz. Multi-resolution analysis and wavelet shrinkage are applied for separating and extracting these strain types. During the above process, two methods for determining thresholds are introduced. The excellent ability of wavelet transform on simultaneously time-frequency analysis leads to an effective information extraction. After extraction, the strain data will be compressed at an attractive ratio. This research may contribute to a further understanding of actual strain data of long-span bridges; also, the proposed extracting methodology is applicable on actual SHM systems.
Resumo:
Text is the main method of communicating information in the digital age. Messages, blogs, news articles, reviews, and opinionated information abounds on the Internet. People commonly purchase products online and post their opinions about purchased items. This feedback is displayed publicly to assist others with their purchasing decisions, creating the need for a mechanism with which to extract and summarize useful information for enhancing the decision-making process. Our contribution is to improve the accuracy of extraction by combining different techniques from three major areas, named Data Mining, Natural Language Processing techniques and Ontologies. The proposed framework sequentially mines product’s aspects and users’ opinions, groups representative aspects by similarity, and generates an output summary. This paper focuses on the task of extracting product aspects and users’ opinions by extracting all possible aspects and opinions from reviews using natural language, ontology, and frequent “tag” sets. The proposed framework, when compared with an existing baseline model, yielded promising results.
Resumo:
AN ENGINEERING Workshop was held from 21 to 24 November 2006 in Veracruz, Mexico. Forty delegates from 12 countries attended the workshop on theory and practice of milling and diffusion extraction. This report provides a general overview of activities undertaken during that workshop which consisted of five technical sessions over two days with presentations and discussions plus two days of field and factory visits. Topics covered during the technical sessions included: power transmissions, cane preparation, diffusers, mills, and a comparison of milling and diffusion.
Resumo:
BACKGROUND: The use of salivary diagnostics is increasing because of its noninvasiveness, ease of sampling, and the relatively low risk of contracting infectious organisms. Saliva has been used as a biological fluid to identify and validate RNA targets in head and neck cancer patients. The goal of this study was to develop a robust, easy, and cost-effective method for isolating high yields of total RNA from saliva for downstream expression studies. METHODS: Oral whole saliva (200 mu L) was collected from healthy controls (n = 6) and from patients with head and neck cancer (n = 8). The method developed in-house used QIAzol lysis reagent (Qiagen) to extract RNA from saliva (both cell-free supernatants and cell pellets), followed by isopropyl alcohol precipitation, cDNA synthesis, and real-time PCR analyses for the genes encoding beta-actin ("housekeeping" gene) and histatin (a salivary gland-specific gene). RESULTS: The in-house QIAzol lysis reagent produced a high yield of total RNA (0.89 -7.1 mu g) from saliva (cell-free saliva and cell pellet) after DNase treatment. The ratio of the absorbance measured at 260 nm to that at 280 nm ranged from 1.6 to 1.9. The commercial kit produced a 10-fold lower RNA yield. Using our method with the QIAzol lysis reagent, we were also able to isolate RNA from archived saliva samples that had been stored without RNase inhibitors at -80 degrees C for >2 years. CONCLUSIONS: Our in-house QIAzol method is robust, is simple, provides RNA at high yields, and can be implemented to allow saliva transcriptomic studies to be translated into a clinical setting.
Resumo:
It is well established that the traditional taxonomy and nomenclature of Chironomidae relies on adult males whose usually characteristic genitalia provide evidence of species distinction. In the early days some names were based on female adults of variable distinctiveness – but females are difficult to identify (Ekrem et al. 2010) and many of these names remain dubious. In Russia especially, a system based on larval morphology grew in parallel to the conventional adult-based system. The systems became reconciled with the studies that underlay the production of the Holarctic generic keys to Chironomidae, commencing notably with the larval volume (Wiederholm, 1983). Ever since Thienemann’s pioneering studies, it has been evident that the pupa, notably the cast skins (exuviae) provide a wealth of features that can aid in identification (e.g. Wiederholm, 1986). Furthermore, the pupae can be readily associated with name-bearing adults when a pharate (‘cloaked’) adult stage is visible within the pupa. Association of larvae with the name-bearing later stages has been much more difficult, time-consuming and fraught with risk of failure. Yet it is identification of the larval stage that is needed by most applied researchers due to the value of the immature stages of the family in aquatic monitoring for water quality, although the pupal stage also has advocates (reviewed by Sinclair & Gresens, 2008). Few use the adult stage for such purposes as their provenance and association with the water body can be verified only by emergence trapping, and sampling of adults lies outside regular aquatic monitoring protocols.
Resumo:
This project is a step forward in the study of text mining where enhanced text representation with semantic information plays a significant role. It develops effective methods of entity-oriented retrieval, semantic relation identification and text clustering utilizing semantically annotated data. These methods are based on enriched text representation generated by introducing semantic information extracted from Wikipedia into the input text data. The proposed methods are evaluated against several start-of-art benchmarking methods on real-life data-sets. In particular, this thesis improves the performance of entity-oriented retrieval, identifies different lexical forms for an entity relation and handles clustering documents with multiple feature spaces.
Resumo:
Active learning approaches reduce the annotation cost required by traditional supervised approaches to reach the same effectiveness by actively selecting informative instances during the learning phase. However, effectiveness and robustness of the learnt models are influenced by a number of factors. In this paper we investigate the factors that affect the effectiveness, more specifically in terms of stability and robustness, of active learning models built using conditional random fields (CRFs) for information extraction applications. Stability, defined as a small variation of performance when small variation of the training data or a small variation of the parameters occur, is a major issue for machine learning models, but even more so in the active learning framework which aims to minimise the amount of training data required. The factors we investigate are a) the choice of incremental vs. standard active learning, b) the feature set used as a representation of the text (i.e., morphological features, syntactic features, or semantic features) and c) Gaussian prior variance as one of the important CRFs parameters. Our empirical findings show that incremental learning and the Gaussian prior variance lead to more stable and robust models across iterations. Our study also demonstrates that orthographical, morphological and contextual features as a group of basic features play an important role in learning effective models across all iterations.
Resumo:
This paper proposes a highly reliable fault diagnosis approach for low-speed bearings. The proposed approach first extracts wavelet-based fault features that represent diverse symptoms of multiple low-speed bearing defects. The most useful fault features for diagnosis are then selected by utilizing a genetic algorithm (GA)-based kernel discriminative feature analysis cooperating with one-against-all multicategory support vector machines (OAA MCSVMs). Finally, each support vector machine is individually trained with its own feature vector that includes the most discriminative fault features, offering the highest classification performance. In this study, the effectiveness of the proposed GA-based kernel discriminative feature analysis and the classification ability of individually trained OAA MCSVMs are addressed in terms of average classification accuracy. In addition, the proposedGA- based kernel discriminative feature analysis is compared with four other state-of-the-art feature analysis approaches. Experimental results indicate that the proposed approach is superior to other feature analysis methodologies, yielding an average classification accuracy of 98.06% and 94.49% under rotational speeds of 50 revolutions-per-minute (RPM) and 80 RPM, respectively. Furthermore, the individually trained MCSVMs with their own optimal fault features based on the proposed GA-based kernel discriminative feature analysis outperform the standard OAA MCSVMs, showing an average accuracy of 98.66% and 95.01% for bearings under rotational speeds of 50 RPM and 80 RPM, respectively.
Resumo:
There has been a growing interest in alignment-free methods for phylogenetic analysis using complete genome data. Among them, CVTree method, feature frequency profiles method and dynamical language approach were used to investigate the whole-proteome phylogeny of large dsDNA viruses. Using the data set of large dsDNA viruses from Gao and Qi (BMC Evol. Biol. 2007), the phylogenetic results based on the CVTree method and the dynamical language approach were compared in Yu et al. (BMC Evol. Biol. 2010). In this paper, we first apply dynamical language approach to the data set of large dsDNA viruses from Wu et al. (Proc. Natl. Acad. Sci. USA 2009) and compare our phylogenetic results with those based on the feature frequency profiles method. Then we construct the whole-proteome phylogeny of the larger dataset combining the above two data sets. According to the report of The International Committee on the Taxonomy of Viruses (ICTV), the trees from our analyses are in good agreement to the latest classification of large dsDNA viruses.