1000 resultados para EXTRACTION


Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present an overview of the QUT plant classification system submitted to LifeCLEF 2014. This system uses generic features extracted from a convolutional neural network previously used to perform general object classification. We examine the effectiveness of these features to perform plant classification when used in combination with an extremely randomised forest. Using this system, with minimal tuning, we obtained relatively good results with a score of 0:249 on the test set of LifeCLEF 2014.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Erythropoietin (EPO), a glycoprotein hormone of ∼34 kDa, is an important hematopoietic growth factor, mainly produced in the kidney and controls the number of red blood cells circulating in the blood stream. Sensitive and rapid recombinant human EPO (rHuEPO) detection tools that improve on the current laborious EPO detection techniques are in high demand for both clinical and sports industry. A sensitive aptamer-functionalized biosensor (aptasensor) has been developed by controlled growth of gold nanostructures (AuNS) over a gold substrate (pAu/AuNS). The aptasensor selectively binds to rHuEPO and, therefore, was used to extract and detect the drug from horse plasma by surface enhanced Raman spectroscopy (SERS). Due to the nanogap separation between the nanostructures, the high population and distribution of hot spots on the pAu/AuNS substrate surface, strong signal enhancement was acquired. By using wide area illumination (WAI) setting for the Raman detection, a low RSD of 4.92% over 150 SERS measurements was achieved. The significant reproducibility of the new biosensor addresses the serious problem of SERS signal inconsistency that hampers the use of the technique in the field. The WAI setting is compatible with handheld Raman devices. Therefore, the new aptasensor can be used for the selective extraction of rHuEPO from biological fluids and subsequently screened with handheld Raman spectrometer for SERS based in-field protein detection.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Objective This paper presents an automatic active learning-based system for the extraction of medical concepts from clinical free-text reports. Specifically, (1) the contribution of active learning in reducing the annotation effort, and (2) the robustness of incremental active learning framework across different selection criteria and datasets is determined. Materials and methods The comparative performance of an active learning framework and a fully supervised approach were investigated to study how active learning reduces the annotation effort while achieving the same effectiveness as a supervised approach. Conditional Random Fields as the supervised method, and least confidence and information density as two selection criteria for active learning framework were used. The effect of incremental learning vs. standard learning on the robustness of the models within the active learning framework with different selection criteria was also investigated. Two clinical datasets were used for evaluation: the i2b2/VA 2010 NLP challenge and the ShARe/CLEF 2013 eHealth Evaluation Lab. Results The annotation effort saved by active learning to achieve the same effectiveness as supervised learning is up to 77%, 57%, and 46% of the total number of sequences, tokens, and concepts, respectively. Compared to the Random sampling baseline, the saving is at least doubled. Discussion Incremental active learning guarantees robustness across all selection criteria and datasets. The reduction of annotation effort is always above random sampling and longest sequence baselines. Conclusion Incremental active learning is a promising approach for building effective and robust medical concept extraction models, while significantly reducing the burden of manual annotation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a new active learning query strategy for information extraction, called Domain Knowledge Informativeness (DKI). Active learning is often used to reduce the amount of annotation effort required to obtain training data for machine learning algorithms. A key component of an active learning approach is the query strategy, which is used to iteratively select samples for annotation. Knowledge resources have been used in information extraction as a means to derive additional features for sample representation. DKI is, however, the first query strategy that exploits such resources to inform sample selection. To evaluate the merits of DKI, in particular with respect to the reduction in annotation effort that the new query strategy allows to achieve, we conduct a comprehensive empirical comparison of active learning query strategies for information extraction within the clinical domain. The clinical domain was chosen for this work because of the availability of extensive structured knowledge resources which have often been exploited for feature generation. In addition, the clinical domain offers a compelling use case for active learning because of the necessary high costs and hurdles associated with obtaining annotations in this domain. Our experimental findings demonstrated that 1) amongst existing query strategies, the ones based on the classification model’s confidence are a better choice for clinical data as they perform equally well with a much lighter computational load, and 2) significant reductions in annotation effort are achievable by exploiting knowledge resources within active learning query strategies, with up to 14% less tokens and concepts to manually annotate than with state-of-the-art query strategies.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An automated method for extracting brain volumes from three commonly acquired three-dimensional (3D) MR images (proton density, T1 weighted, and T2-weighted) of the human head is described. The procedure is divided into four levels: preprocessing, segmentation, scalp removal, and postprocessing. A user-provided reference point is the sole operator-dependent input required. The method's parameters were first optimized and then fixed and applied to 30 repeat data sets from 15 normal older adult subjects to investigate its reproducibility. Percent differences between total brain volumes (TBVs) for the subjects' repeated data sets ranged from .5% to 2.2%. We conclude that the method is both robust and reproducible and has the potential for wide application.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Currently we are facing an overburdening growth of the number of reliable information sources on the Internet. The quantity of information available to everyone via Internet is dramatically growing each year [15]. At the same time, temporal and cognitive resources of human users are not changing, therefore causing a phenomenon of information overload. World Wide Web is one of the main sources of information for decision makers (reference to my research). However our studies show that, at least in Poland, the decision makers see some important problems when turning to Internet as a source of decision information. One of the most common obstacles raised is distribution of relevant information among many sources, and therefore need to visit different Web sources in order to collect all important content and analyze it. A few research groups have recently turned to the problem of information extraction from the Web [13]. The most effort so far has been directed toward collecting data from dispersed databases accessible via web pages (related to as data extraction or information extraction from the Web) and towards understanding natural language texts by means of fact, entity, and association recognition (related to as information extraction). Data extraction efforts show some interesting results, however proper integration of web databases is still beyond us. Information extraction field has been recently very successful in retrieving information from natural language texts, however it is still lacking abilities to understand more complex information, requiring use of common sense knowledge, discourse analysis and disambiguation techniques.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present an empirical evaluation and comparison of two content extraction methods in HTML: absolute XPath expressions and relative XPath expressions. We argue that the relative XPath expressions, although not widely used, should be used in preference to absolute XPath expressions in extracting content from human-created Web documents. Evaluation of robustness covers four thousand queries executed on several hundred webpages. We show that in referencing parts of real world dynamic HTML documents, relative XPath expressions are on average significantly more robust than absolute XPath ones.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A method for determination of tricyclazole in water using solid phase extraction and high performance liquid chromatography (HPLC) with UV detection at 230nm and a mobile phase of acetonitrile:water (20:80, v/v) was developed. A performance comparison between two types of solid phase sorbents, the C18 sorbent of Supelclean ENVI-18 cartridge and the styrene-divinyl benzene copolymer sorbent of Sep-Pak PS2-Plus cartridge was conducted. The Sep-Pak PS2-Plus cartridges were found more suitable for extracting tricyclazole from water samples than the Supelclean ENVI-18 cartridges. For this cartridge, both methanol and ethyl acetate produced good results. The method was validated with good linearity and with a limit of detection of 0.008gL-1 for a 500-fold concentration through the SPE procedure. The recoveries of the method were stable at 80% and the precision was from 1.1-6.0% within the range of fortified concentrations. The validated method was also applied to measure the concentrations of tricyclazole in real paddy water.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Semi-rigid molecular tweezers 1, 3 and 4 bind picric acid with more than tenfold increment in tetrachloromethane as compared to chloroform.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Frog protection has become increasingly essential due to the rapid decline of its biodiversity. Therefore, it is valuable to develop new methods for studying this biodiversity. In this paper, a novel feature extraction method is proposed based on perceptual wavelet packet decomposition for classifying frog calls in noisy environments. Pre-processing and syllable segmentation are first applied to the frog call. Then, a spectral peak track is extracted from each syllable if possible. Track duration, dominant frequency and oscillation rate are directly extracted from the track. With k-means clustering algorithm, the calculated dominant frequency of all frog species is clustered into k parts, which produce a frequency scale for wavelet packet decomposition. Based on the adaptive frequency scale, wavelet packet decomposition is applied to the frog calls. Using the wavelet packet decomposition coefficients, a new feature set named perceptual wavelet packet decomposition sub-band cepstral coefficients is extracted. Finally, a k-nearest neighbour (k-NN) classifier is used for the classification. The experiment results show that the proposed features can achieve an average classification accuracy of 97.45% which outperforms syllable features (86.87%) and Mel-frequency cepstral coefficients (MFCCs) feature (90.80%).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Dendrocalamus strictus and Bambusa arundinacea are monocarpic, gregariously flowering species of bamboo, common in the deciduous forests of the State of Karnataka in India. Their populations have significantly declined, especially since the last flowering. This decline parelleis increasing incidence of grazing, fire and extraction in recent decades. Results of an experiment in which the intensities of grazing and fire were varied, indicate that while grazing significantly depresses the survival of seedlings and the recruitment of new eulms of bamboo clumps, fire appeared to enhance seedling survival, presumably by reducing competition of lass fire-resistant species. New shoots of bamboo are destroyed by insects and a variety of herbivorous mammals. In areas of intense herbivore pressure, a bamboo clump initiates the production of a much larger number of new culrm, but results in many fewer and shorter intact culms. Extraction renders the new shoots more susceptible to herbivore pressure by removal of the protective covering of branches at the base of a bamboo clump. Hence, regular and extensive extraction by the paper mills in conjuction with intense grazing pressure strongly depresses the addition of new culms to bamboo clumps. Regulation of grazing in the forest by domestic livestock along with maintenance of the cover at the base of the clumps by extracting the culms at a higher level should reduce the rate of decline of the bamboo stocks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This study investigates the use of unsupervised features derived from word embedding approaches and novel sequence representation approaches for improving clinical information extraction systems. Our results corroborate previous findings that indicate that the use of word embeddings significantly improve the effectiveness of concept extraction models; however, we further determine the influence that the corpora used to generate such features have. We also demonstrate the promise of sequence-based unsupervised features for further improving concept extraction.