998 resultados para News detection


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Tämän diplomityön tarkoituksena on tutkia, mitä vaaditaan uutisten samanlaisuuden automaattiseen tunnistamiseen. Uutiset ovat tekstipohjaisia uutisia, jotka on haettu eri uutislähteistä. Uutisista on tarkoitus tunnistaa ensinnäkin ne uutiset, jotka tarkoittavat samaa asiaa, sekä ne uutiset, jotka eivät ole aivan sama asia, mutta liittyvät kuitenkin toisiinsa. Tässä diplomityössä tutkitaan, millä algoritmeilla tämä tunnistus onnistuu tehokkaimmin sekä suomalaisessa, että englanninkielisessä tekstissä. Diplomityössä vertaillaan valmiita algoritmeja. Tavoitteena on valita sellainen algoritmiyhdistelmä, että 90 % vertailluista uutisista tunnistuu oikein. Tutkimuksessa käytetään 2 eri ryhmittelyalgoritmia, sekä 3 eri stemmaus-algoritmia. Näitä algoritmeja vertaillaan sekä uutisten tunnistustehokkuuden, että niiden suorituskyvyn suhteen. Parhaimmaksi stemmaus-algoritmiksi osoittautui sekä suomen-, että englanninkielisten uutisten vertailussa Porterin algoritmi. Ryhmittely-algoritmeista tehokkaammaksi osoittautui yksinkertaisempi erilaisiin tunnuslukuihin perustuva algoritmi.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Storyline detection from news articles aims at summarizing events described under a certain news topic and revealing how those events evolve over time. It is a difficult task because it requires first the detection of events from news articles published in different time periods and then the construction of storylines by linking events into coherent news stories. Moreover, each storyline has different hierarchical structures which are dependent across epochs. Existing approaches often ignore the dependency of hierarchical structures in storyline generation. In this paper, we propose an unsupervised Bayesian model, called dynamic storyline detection model, to extract structured representations and evolution patterns of storylines. The proposed model is evaluated on a large scale news corpus. Experimental results show that our proposed model outperforms several baseline approaches.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Mestrado em Gestão e Avaliação de Tecnologias da Saúde

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The R-package “compositions”is a tool for advanced compositional analysis. Its basicfunctionality has seen some conceptual improvement, containing now some facilitiesto work with and represent ilr bases built from balances, and an elaborated subsys-tem for dealing with several kinds of irregular data: (rounded or structural) zeroes,incomplete observations and outliers. The general approach to these irregularities isbased on subcompositions: for an irregular datum, one can distinguish a “regular” sub-composition (where all parts are actually observed and the datum behaves typically)and a “problematic” subcomposition (with those unobserved, zero or rounded parts, orelse where the datum shows an erratic or atypical behaviour). Systematic classificationschemes are proposed for both outliers and missing values (including zeros) focusing onthe nature of irregularities in the datum subcomposition(s).To compute statistics with values missing at random and structural zeros, a projectionapproach is implemented: a given datum contributes to the estimation of the desiredparameters only on the subcompositon where it was observed. For data sets withvalues below the detection limit, two different approaches are provided: the well-knownimputation technique, and also the projection approach.To compute statistics in the presence of outliers, robust statistics are adapted to thecharacteristics of compositional data, based on the minimum covariance determinantapproach. The outlier classification is based on four different models of outlier occur-rence and Monte-Carlo-based tests for their characterization. Furthermore the packageprovides special plots helping to understand the nature of outliers in the dataset.Keywords: coda-dendrogram, lost values, MAR, missing data, MCD estimator,robustness, rounded zeros

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis investigates how macroeconomic news announcements affect jumps and cojumps in foreign exchange markets, especially under different business cycles. We use 5-min interval from high frequency data on Euro/Dollar, Pound/Dollar and Yen/Dollar from Nov. 1, 2004 to Feb. 28, 2015. The jump detection method was proposed by Andersen et al. (2007c), Lee & Mykland (2008) and then modified by Boudt et al. (2011a) for robustness. Then we apply the two-regime smooth transition regression model of Teräsvirta (1994) to explore news effects under different business cycles. We find that scheduled news related to employment, real activity, forward expectations, monetary policy, current account, price and consumption influences forex jumps, but only FOMC Rate Decisions has consistent effects on cojumps. Speeches given by major central bank officials near a crisis also significantly affect jumps and cojumps. However, the impacts of some macroeconomic news are not the same under different economic states.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The R-package “compositions”is a tool for advanced compositional analysis. Its basic functionality has seen some conceptual improvement, containing now some facilities to work with and represent ilr bases built from balances, and an elaborated subsys- tem for dealing with several kinds of irregular data: (rounded or structural) zeroes, incomplete observations and outliers. The general approach to these irregularities is based on subcompositions: for an irregular datum, one can distinguish a “regular” sub- composition (where all parts are actually observed and the datum behaves typically) and a “problematic” subcomposition (with those unobserved, zero or rounded parts, or else where the datum shows an erratic or atypical behaviour). Systematic classification schemes are proposed for both outliers and missing values (including zeros) focusing on the nature of irregularities in the datum subcomposition(s). To compute statistics with values missing at random and structural zeros, a projection approach is implemented: a given datum contributes to the estimation of the desired parameters only on the subcompositon where it was observed. For data sets with values below the detection limit, two different approaches are provided: the well-known imputation technique, and also the projection approach. To compute statistics in the presence of outliers, robust statistics are adapted to the characteristics of compositional data, based on the minimum covariance determinant approach. The outlier classification is based on four different models of outlier occur- rence and Monte-Carlo-based tests for their characterization. Furthermore the package provides special plots helping to understand the nature of outliers in the dataset. Keywords: coda-dendrogram, lost values, MAR, missing data, MCD estimator, robustness, rounded zeros

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the last decade, large numbers of social media services have emerged and been widely used in people's daily life as important information sharing and acquisition tools. With a substantial amount of user-contributed text data on social media, it becomes a necessity to develop methods and tools for text analysis for this emerging data, in order to better utilize it to deliver meaningful information to users. ^ Previous work on text analytics in last several decades is mainly focused on traditional types of text like emails, news and academic literatures, and several critical issues to text data on social media have not been well explored: 1) how to detect sentiment from text on social media; 2) how to make use of social media's real-time nature; 3) how to address information overload for flexible information needs. ^ In this dissertation, we focus on these three problems. First, to detect sentiment of text on social media, we propose a non-negative matrix tri-factorization (tri-NMF) based dual active supervision method to minimize human labeling efforts for the new type of data. Second, to make use of social media's real-time nature, we propose approaches to detect events from text streams on social media. Third, to address information overload for flexible information needs, we propose two summarization framework, dominating set based summarization framework and learning-to-rank based summarization framework. The dominating set based summarization framework can be applied for different types of summarization problems, while the learning-to-rank based summarization framework helps utilize the existing training data to guild the new summarization tasks. In addition, we integrate these techneques in an application study of event summarization for sports games as an example of how to better utilize social media data. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the last decade, large numbers of social media services have emerged and been widely used in people's daily life as important information sharing and acquisition tools. With a substantial amount of user-contributed text data on social media, it becomes a necessity to develop methods and tools for text analysis for this emerging data, in order to better utilize it to deliver meaningful information to users. Previous work on text analytics in last several decades is mainly focused on traditional types of text like emails, news and academic literatures, and several critical issues to text data on social media have not been well explored: 1) how to detect sentiment from text on social media; 2) how to make use of social media's real-time nature; 3) how to address information overload for flexible information needs. In this dissertation, we focus on these three problems. First, to detect sentiment of text on social media, we propose a non-negative matrix tri-factorization (tri-NMF) based dual active supervision method to minimize human labeling efforts for the new type of data. Second, to make use of social media's real-time nature, we propose approaches to detect events from text streams on social media. Third, to address information overload for flexible information needs, we propose two summarization framework, dominating set based summarization framework and learning-to-rank based summarization framework. The dominating set based summarization framework can be applied for different types of summarization problems, while the learning-to-rank based summarization framework helps utilize the existing training data to guild the new summarization tasks. In addition, we integrate these techneques in an application study of event summarization for sports games as an example of how to better utilize social media data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

To assess binocular detection grating acuity using the LEA GRATINGS test to establish age-related norms in healthy infants during their first 3 months of life. In this prospective, longitudinal study of healthy infants with clear red reflex at birth, responses to gratings were measured at 1, 2, and 3 months of age using LEA gratings at a distance of 28 cm. The results were recorded as detection grating acuity values, which were arranged in frequency tables and converted to a one-octave scale for statistical analysis. For the repeated measurements, analysis of variance (ANOVA) was used to compare the detection grating acuity results between ages. A total of 133 infants were included. The binocular responses to gratings showed development toward higher mean values and spatial frequencies, ranging from 0.55 ± 0.70 cycles per degree (cpd), or 1.74 ± 0.21 logMAR, in month 1 to 3.11 ± 0.54 cpd, or 0.98 ± 0.16 logMAR, in month 3. Repeated ANOVA indicated differences among grating acuity values in the three age groups. The LEA GRATINGS test allowed assessment of detection grating acuity and its development in a cohort of healthy infants during their first 3 months of life.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A novel capillary electrophoresis method using capacitively coupled contactless conductivity detection is proposed for the determination of the biocide tetrakis(hydroxymethyl)phosphonium sulfate. The feasibility of the electrophoretic separation of this biocide was attributed to the formation of an anionic complex between the biocide and borate ions in the background electrolyte. Evidence of this complex formation was provided by (11) B NMR spectroscopy. A linear relationship (R(2) = 0.9990) between the peak area of the complex and the biocide concentration (50-900 μmol/L) was found. The limit of detection and limit of quantification were 15.0 and 50.1 μmol/L, respectively. The proposed method was applied to the determination of tetrakis(hydroxymethyl)phosphonium sulfate in commercial formulations, and the results were in good agreement with those obtained by the standard iodometric titration method. The method was also evaluated for the analysis of tap water and cooling water samples treated with the biocide. The results of the recovery tests at three concentration levels (300, 400, and 600 μmol/L) varied from 75 to 99%, with a relative standard deviation no higher than 9%.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Infections of the central nervous systems (CNS) present a diagnostic problem for which an accurate laboratory diagnosis is essential. Invasive practices, such as cerebral biopsy, have been replaced by obtaining a polymerase chain reaction (PCR) diagnosis using cerebral spinal fluid (CSF) as a reference method. Tests on DNA extracted from plasma are noninvasive, thus avoiding all of the collateral effects and patient risks associated with CSF collection. This study aimed to determine whether plasma can replace CSF in nested PCR analysis for the detection of CNS human herpesvirus (HHV) diseases by analysing the proportion of patients whose CSF nested PCR results were positive for CNS HHV who also had the same organism identified by plasma nested PCR. In this study, CSF DNA was used as the gold standard, and nested PCR was performed on both types of samples. Fifty-two patients with symptoms of nervous system infection were submitted to CSF and blood collection. For the eight HHV, one positive DNA result-in plasma and/or CSF nested PCR-was considered an active HHV infection, whereas the occurrence of two or more HHVs in the same sample was considered a coinfection. HHV infections were positively detected in 27/52 (51.9%) of the CSF and in 32/52 (61.5%) of the plasma, difference not significant, thus nested PCR can be performed on plasma instead of CSF. In conclusion, this findings suggest that plasma as a useful material for the diagnosis of cases where there is any difficulty to perform a CSF puncture.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The aim of this study was to develop a methodology using Raman hyperspectral imaging and chemometric methods for identification of pre- and post-blast explosive residues on banknote surfaces. The explosives studied were of military, commercial and propellant uses. After the acquisition of the hyperspectral imaging, independent component analysis (ICA) was applied to extract the pure spectra and the distribution of the corresponding image constituents. The performance of the methodology was evaluated by the explained variance and the lack of fit of the models, by comparing the ICA recovered spectra with the reference spectra using correlation coefficients and by the presence of rotational ambiguity in the ICA solutions. The methodology was applied to forensic samples to solve an automated teller machine explosion case. Independent component analysis proved to be a suitable method of resolving curves, achieving equivalent performance with the multivariate curve resolution with alternating least squares (MCR-ALS) method. At low concentrations, MCR-ALS presents some limitations, as it did not provide the correct solution. The detection limit of the methodology presented in this study was 50μgcm(-2).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A rapid and low cost method to determine Cr(VI) in soils based upon alkaline metal extraction at room temperature is proposed as a semi-quantitative procedure to be performed in the field. A color comparison with standards with contents of Cr(VI) in the range of 10 to 150 mg kg-1 was used throughout. For the different types of soils studied, more than 75% of the fortified soluble Cr(VI) were recovered for all levels of spike tested for both the proposed and standard methods. Recoveries of 83 and 99% were obtained for the proposed and the standard methods, respectively, taking into account the analysis of a heavily contaminated soil sample.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The fungus Metarhizium anisopliae is used on a large scale in Brazil as a microbial control agent against the sugar cane spittlebugs, Mahanarva posticata and M. fimbriolata (Hemiptera., Cercopidae). We applied strain E9 of M. anisopliae in a bioassay on soil, with field doses of conidia to determine if it can cause infection, disease and mortality in immature stages of Anastrepha fraterculus, the South American fruit fly. All the events were studied histologically and at the molecular level during the disease cycle, using a novel histological technique, light green staining, associated with light microscopy, and by PCR, using a specific DNA primer developed for M. anisopliae capable to identify Brazilian strains like E9. The entire infection cycle, which starts by conidial adhesion to the cuticle of the host, followed by germination with or without the formation of an appressorium, penetration through the cuticle and colonisation, with development of a dimorphic phase, hyphal bodies in the hemocoel, and death of the host, lasted 96 hours under the bioassay conditions, similar to what occurs under field conditions. During the disease cycle, the propagules of the entomopathogenic fungus were detected by identifying DNA with the specific primer ITSMet: 5' TCTGAATTTTTTATAAGTAT 3' with ITS4 (5' TCCTCCGCTTATTGATATGC 3') as a reverse primer. This simple methodology permits in situ studies of the infective process, contributing to our understanding of the host-pathogen relationship and allowing monitoring of the efficacy and survival of this entomopathogenic fungus in large-scale applications in the field. It also facilitates monitoring the environmental impact of M. anisopliae on non-target insects.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Previous studies indicated that patients with atherosclerosis are predominantly infected by human cytomegalovirus (HCMV), but rarely infected by type 1 Epstein-Barr virus (EBV-1). In this study, atheromas of 30 patients who underwent aortocoronary bypass surgery with coronary endartherectomy were tested for the presence of these two viruses. HCMV occurred in 93.3% of the samples and EBV-1 was present in 50% of them. Concurrent presence of both pathogens was detected in 43.3% of the samples.