906 resultados para Text feature extraction


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Many learning problems require handling high dimensional datasets with a relatively small number of instances. Learning algorithms are thus confronted with the curse of dimensionality, and need to address it in order to be effective. Examples of these types of data include the bag-of-words representation in text classification problems and gene expression data for tumor detection/classification. Usually, among the high number of features characterizing the instances, many may be irrelevant (or even detrimental) for the learning tasks. It is thus clear that there is a need for adequate techniques for feature representation, reduction, and selection, to improve both the classification accuracy and the memory requirements. In this paper, we propose combined unsupervised feature discretization and feature selection techniques, suitable for medium and high-dimensional datasets. The experimental results on several standard datasets, with both sparse and dense features, show the efficiency of the proposed techniques as well as improvements over previous related techniques.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Arguably, the most difficult task in text classification is to choose an appropriate set of features that allows machine learning algorithms to provide accurate classification. Most state-of-the-art techniques for this task involve careful feature engineering and a pre-processing stage, which may be too expensive in the emerging context of massive collections of electronic texts. In this paper, we propose efficient methods for text classification based on information-theoretic dissimilarity measures, which are used to define dissimilarity-based representations. These methods dispense with any feature design or engineering, by mapping texts into a feature space using universal dissimilarity measures; in this space, classical classifiers (e.g. nearest neighbor or support vector machines) can then be used. The reported experimental evaluation of the proposed methods, on sentiment polarity analysis and authorship attribution problems, reveals that it approximates, sometimes even outperforms previous state-of-the-art techniques, despite being much simpler, in the sense that they do not require any text pre-processing or feature engineering.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Thesis submitted to Faculdade de Ciências e Tecnologia of the Universidade Nova de Lisboa, in partial fulfilment of the requirements for the degree of Master in Computer Science

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Molecular characterization of Cryptosporidium spp.oocysts in clinical samples is useful for public health since it allows the study of sources of contamination as well as the transmission in different geographical regions. Although widely used in developed countries, in Brazil it is restricted to academic studies, mostly using commercial kits for the extraction of genomic DNA, or in collaboration with external reference centers, rendering the method expensive and limited. The study proposes the application of the modifications recently introduced in the method improving feasibility with lower cost. This method was efficient for clinical samples preserved at -20 °C for up to six years and the low number of oocysts may be overcomed by repetitions of extraction.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Asymptomatic Plasmodium infection is a new challenge for public health in the American region. The polymerase chain reaction (PCR) is the best method for diagnosing subpatent parasitemias. In endemic areas, blood collection is hampered by geographical distances and deficient transport and storage conditions of the samples. Because DNA extraction from blood collected on filter paper is an efficient method for molecular studies in high parasitemic individuals, we investigated whether the technique could be an alternative for Plasmodium diagnosis among asymptomatic and pauciparasitemic subjects. In this report we compared three different methods (Chelex®-saponin, methanol and TRIS-EDTA) of DNA extraction from blood collected on filter paper from asymptomatic Plasmodium-infected individuals. Polymerase chain reaction assays for detection of Plasmodium species showed the best results when the Chelex®-saponin method was used. Even though the sensitivity of detection was approximately 66% and 31% for P. falciparum and P. vivax, respectively, this method did not show the effectiveness in DNA extraction required for molecular diagnosis of Plasmodium. The development of better methods for extracting DNA from blood collected on filter paper is important for the diagnosis of subpatent malarial infections in remote areas and would contribute to establishing the epidemiology of this form of infection.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Currently there are several methods to extract bacterial DNA based on different principles. However, the amount and the quality of the DNA obtained by each one of those methods is highly variable and microorganism dependent, as illustrated by coagulase-negative staphylococci (CoNS) which have a thick cell wall that is difficult to lyse. This study was designed to compare the quality and the amount of CoNS DNA, extracted by four different techniques: two in-house protocols and two commercial kits. DNA amount and quality determination was performed through spectrophotometry. The extracted DNA was also analyzed using agarose gel electrophoresis and by PCR. 267 isolates of CoNS were used in this study. The column method and thermal lyses showed better results with regard to DNA quality (mean ratio of A260/280 = 1.95) and average concentration of DNA (), respectively. All four methods tested provided appropriate DNA for PCR amplification, but with different yields. DNA quality is important since it allows the application of a large number of molecular biology techniques, and also it's storage for a longer period of time. In this sense the extraction method based on an extraction column presented the best results for CoNS.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The extraction of relevant terms from texts is an extensively researched task in Text- Mining. Relevant terms have been applied in areas such as Information Retrieval or document clustering and classification. However, relevance has a rather fuzzy nature since the classification of some terms as relevant or not relevant is not consensual. For instance, while words such as "president" and "republic" are generally considered relevant by human evaluators, and words like "the" and "or" are not, terms such as "read" and "finish" gather no consensus about their semantic and informativeness. Concepts, on the other hand, have a less fuzzy nature. Therefore, instead of deciding on the relevance of a term during the extraction phase, as most extractors do, I propose to first extract, from texts, what I have called generic concepts (all concepts) and postpone the decision about relevance for downstream applications, accordingly to their needs. For instance, a keyword extractor may assume that the most relevant keywords are the most frequent concepts on the documents. Moreover, most statistical extractors are incapable of extracting single-word and multi-word expressions using the same methodology. These factors led to the development of the ConceptExtractor, a statistical and language-independent methodology which is explained in Part I of this thesis. In Part II, I will show that the automatic extraction of concepts has great applicability. For instance, for the extraction of keywords from documents, using the Tf-Idf metric only on concepts yields better results than using Tf-Idf without concepts, specially for multi-words. In addition, since concepts can be semantically related to other concepts, this allows us to build implicit document descriptors. These applications led to published work. Finally, I will present some work that, although not published yet, is briefly discussed in this document.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Introduction Polymerase chain reaction (PCR) may offer an alternative diagnostic option when clinical signs and symptoms suggest visceral leishmaniasis (VL) but microscopic scanning and serological tests provide negative results. PCR using urine is sensitive enough to diagnose human visceral leishmaniasis (VL). However, DNA quality is a crucial factor for successful amplification. Methods A comparative performance evaluation of DNA extraction methods from the urine of patients with VL using two commercially available extraction kits and two phenol-chloroform protocols was conducted to determine which method produces the highest quality DNA suitable for PCR amplification, as well as the most sensitive, fast and inexpensive method. All commercially available kits were able to shorten the duration of DNA extraction. Results With regard to detection limits, both phenol: chloroform extraction and the QIAamp DNA Mini Kit provided good results (0.1 pg of DNA) for the extraction of DNA from a parasite smaller than Leishmania (Leishmania) infantum (< 100fg of DNA). However, among 11 urine samples from subjects with VL, better performance was achieved with the phenol:chloroform method (8/11) relative to the QIAamp DNA Mini Kit (4/11), with a greater number of positive samples detected at a lower cost using PCR. Conclusion Our results demonstrate that phenol:chloroform with an ethanol precipitation prior to extraction is the most efficient method in terms of yield and cost, using urine as a non-invasive source of DNA and providing an alternative diagnostic method at a low cost.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Currently the world swiftly adapts to visual communication. Online services like YouTube and Vine show that video is no longer the domain of broadcast television only. Video is used for different purposes like entertainment, information, education or communication. The rapid growth of today’s video archives with sparsely available editorial data creates a big problem of its retrieval. The humans see a video like a complex interplay of cognitive concepts. As a result there is a need to build a bridge between numeric values and semantic concepts. This establishes a connection that will facilitate videos’ retrieval by humans. The critical aspect of this bridge is video annotation. The process could be done manually or automatically. Manual annotation is very tedious, subjective and expensive. Therefore automatic annotation is being actively studied. In this thesis we focus on the multimedia content automatic annotation. Namely the use of analysis techniques for information retrieval allowing to automatically extract metadata from video in a videomail system. Furthermore the identification of text, people, actions, spaces, objects, including animals and plants. Hence it will be possible to align multimedia content with the text presented in the email message and the creation of applications for semantic video database indexing and retrieving.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract: INTRODUCTION : Molecular analyses are auxiliary tools for detecting Koch's bacilli in clinical specimens from patients with suspected tuberculosis (TB). However, there are still no efficient diagnostic tests that combine high sensitivity and specificity and yield rapid results in the detection of TB. This study evaluated single-tube nested polymerase chain reaction (STNPCR) as a molecular diagnostic test with low risk of cross contamination for detecting Mycobacterium tuberculosis in clinical samples. METHODS: Mycobacterium tuberculosis deoxyribonucleic acid (DNA) was detected in blood and urine samples by STNPCR followed by agarose gel electrophoresis. In this system, reaction tubes were not opened between the two stages of PCR (simple and nested). RESULTS: STNPCR demonstrated good accuracy in clinical samples with no cross contamination between microtubes. Sensitivity in blood and urine, analyzed in parallel, was 35%-62% for pulmonary and 41%-72% for extrapulmonary TB. The specificity of STNPCR was 100% in most analyses, depending on the type of clinical sample (blood or urine) and clinical form of disease (pulmonary or extrapulmonary). CONCLUSIONS: STNPCR was effective in detecting TB, especially the extrapulmonary form for which sensitivity was higher, and had the advantage of less invasive sample collection from patients for whom a spontaneous sputum sample was unavailable. With low risk of cross contamination, the STNPCR can be used as an adjunct to conventional methods for diagnosing TB.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract: INTRODUCTION: Before 2004, the occurrence of acute Chagas disease (ACD) by oral transmission associated with food was scarcely known or investigated. Originally sporadic and circumstantial, ACD occurrences have now become frequent in the Amazon region, with recently related outbreaks spreading to several Brazilian states. These cases are associated with the consumption of açai juice by waste reservoir animals or insect vectors infected with Trypanosoma cruzi in endemic areas. Although guidelines for processing the fruit to minimize contamination through microorganisms and parasites exist, açai-based products must be assessed for quality, for which the demand for appropriate methodologies must be met. METHODS: Dilutions ranging from 5 to 1,000 T. cruzi CL Brener cells were mixed with 2mL of acai juice. Four Extraction of T. cruzi DNA methods were used on the fruit, and the cetyltrimethyl ammonium bromide (CTAB) method was selected according to JRC, 2005. RESULTS: DNA extraction by the CTAB method yielded satisfactory results with regard to purity and concentration for use in PCR. Overall, the methods employed proved that not only extraction efficiency but also high sensitivity in amplification was important. CONCLUSIONS: The method for T. cruzi detection in food is a powerful tool in the epidemiological investigation of outbreaks as it turns epidemiological evidence into supporting data that serve to confirm T. cruzi infection in the foods. It also facilitates food quality control and assessment of good manufacturing practices involving acai-based products.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Transcriptional Regulatory Networks (TRNs) are powerful tool for representing several interactions that occur within a cell. Recent studies have provided information to help researchers in the tasks of building and understanding these networks. One of the major sources of information to build TRNs is biomedical literature. However, due to the rapidly increasing number of scientific papers, it is quite difficult to analyse the large amount of papers that have been published about this subject. This fact has heightened the importance of Biomedical Text Mining approaches in this task. Also, owing to the lack of adequate standards, as the number of databases increases, several inconsistencies concerning gene and protein names and identifiers are common. In this work, we developed an integrated approach for the reconstruction of TRNs that retrieve the relevant information from important biological databases and insert it into a unique repository, named KREN. Also, we applied text mining techniques over this integrated repository to build TRNs. However, was necessary to create a dictionary of names and synonyms associated with these entities and also develop an approach that retrieves all the abstracts from the related scientific papers stored on PubMed, in order to create a corpora of data about genes. Furthermore, these tasks were integrated into @Note, a software system that allows to use some methods from the Biomedical Text Mining field, including an algorithms for Named Entity Recognition (NER), extraction of all relevant terms from publication abstracts, extraction relationships between biological entities (genes, proteins and transcription factors). And finally, extended this tool to allow the reconstruction Transcriptional Regulatory Networks through using scientific literature.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Dissertação de mestrado integrado em Engenharia e Gestão de Sistemas de Informação

Relevância:

30.00% 30.00%

Publicador:

Resumo:

OBJECTIVE: To analyze the results of laser-assisted extraction of permanent pacemaker and defibrillator leads. METHODS: We operated upon 36 patients, whose mean age was 54.2 years, and extracted 56 leads. The reasons for extracting the leads were as follows: infection in 19 patients, elective replacement in 13, and other causes in 4 patients. The mean time of catheter placement was 7.5±5.5 years. Forty-seven leads were from pacemakers, and 9 were from defibrillators. Thirty-eight leads were in use, 14 had been abandoned in the pacemaker pocket, and 4 had been abandoned inside the venous system. RESULTS: We successfully extracted 54 catheters, obtaining a 96.4% rate of success and an 82.1% rate for complete extraction. The 2 unsuccessful cases were due to the presence of calcium in the trajectory of the lead. The mean duration of laser light application was 123.0±104.5 s, using 5,215.2±4,924.0 pulses, in a total of 24.4±24.2 cycles of application. Thirty-four leads were extracted from the myocardium with countertraction after complete progression of the laser sheath, 12 leads came loose during the progression of the laser sheath, and the remaining 10 were extracted with other maneuvers. One patient experienced cardiac tamponade after extraction of the defibrillator lead, requiring open emergency surgery. CONCLUSION: The use of the excimer laser allowed extraction of the leads with a 96% rate of success; it was not effective in 2 patients who had calcification on the lead. One patient (2.8%) had a complication that required cardiac surgery on an emergency basis.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Magdeburg, Univ., Fak. für Informatik, Diss., 2011