930 resultados para Probabilistic latent semantic analysis (PLSA)


Relevância:

30.00% 30.00%

Publicador:

Resumo:

The level of information provided by ink evidence to the criminal and civil justice system is limited. The limitations arise from the weakness of the interpretative framework currently used, as proposed in the ASTM 1422-05 and 1789-04 on ink analysis. It is proposed to use the likelihood ratio from the Bayes theorem to interpret ink evidence. Unfortunately, when considering the analytical practices, as defined in the ASTM standards on ink analysis, it appears that current ink analytical practices do not allow for the level of reproducibility and accuracy required by a probabilistic framework. Such framework relies on the evaluation of the statistics of the ink characteristics using an ink reference database and the objective measurement of similarities between ink samples. A complete research programme was designed to (a) develop a standard methodology for analysing ink samples in a more reproducible way, (b) comparing automatically and objectively ink samples and (c) evaluate the proposed methodology in a forensic context. This report focuses on the first of the three stages. A calibration process, based on a standard dye ladder, is proposed to improve the reproducibility of ink analysis by HPTLC, when these inks are analysed at different times and/or by different examiners. The impact of this process on the variability between the repetitive analyses of ink samples in various conditions is studied. The results show significant improvements in the reproducibility of ink analysis compared to traditional calibration methods.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Prior genome-wide association studies (GWAS) of major depressive disorder (MDD) have met with limited success. We sought to increase statistical power to detect disease loci by conducting a GWAS mega-analysis for MDD. In the MDD discovery phase, we analyzed more than 1.2 million autosomal and X chromosome single-nucleotide polymorphisms (SNPs) in 18 759 independent and unrelated subjects of recent European ancestry (9240 MDD cases and 9519 controls). In the MDD replication phase, we evaluated 554 SNPs in independent samples (6783 MDD cases and 50 695 controls). We also conducted a cross-disorder meta-analysis using 819 autosomal SNPs with P<0.0001 for either MDD or the Psychiatric GWAS Consortium bipolar disorder (BIP) mega-analysis (9238 MDD cases/8039 controls and 6998 BIP cases/7775 controls). No SNPs achieved genome-wide significance in the MDD discovery phase, the MDD replication phase or in pre-planned secondary analyses (by sex, recurrent MDD, recurrent early-onset MDD, age of onset, pre-pubertal onset MDD or typical-like MDD from a latent class analyses of the MDD criteria). In the MDD-bipolar cross-disorder analysis, 15 SNPs exceeded genome-wide significance (P<5 × 10(-8)), and all were in a 248 kb interval of high LD on 3p21.1 (chr3:52 425 083-53 822 102, minimum P=5.9 × 10(-9) at rs2535629). Although this is the largest genome-wide analysis of MDD yet conducted, its high prevalence means that the sample is still underpowered to detect genetic effects typical for complex traits. Therefore, we were unable to identify robust and replicable findings. We discuss what this means for genetic research for MDD. The 3p21.1 MDD-BIP finding should be interpreted with caution as the most significant SNP did not replicate in MDD samples, and genotyping in independent samples will be needed to resolve its status.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Sampling issues represent a topic of ongoing interest to the forensic science community essentially because of their crucial role in laboratory planning and working protocols. For this purpose, forensic literature described thorough (Bayesian) probabilistic sampling approaches. These are now widely implemented in practice. They allow, for instance, to obtain probability statements that parameters of interest (e.g., the proportion of a seizure of items that present particular features, such as an illegal substance) satisfy particular criteria (e.g., a threshold or an otherwise limiting value). Currently, there are many approaches that allow one to derive probability statements relating to a population proportion, but questions on how a forensic decision maker - typically a client of a forensic examination or a scientist acting on behalf of a client - ought actually to decide about a proportion or a sample size, remained largely unexplored to date. The research presented here intends to address methodology from decision theory that may help to cope usefully with the wide range of sampling issues typically encountered in forensic science applications. The procedures explored in this paper enable scientists to address a variety of concepts such as the (net) value of sample information, the (expected) value of sample information or the (expected) decision loss. All of these aspects directly relate to questions that are regularly encountered in casework. Besides probability theory and Bayesian inference, the proposed approach requires some additional elements from decision theory that may increase the efforts needed for practical implementation. In view of this challenge, the present paper will emphasise the merits of graphical modelling concepts, such as decision trees and Bayesian decision networks. These can support forensic scientists in applying the methodology in practice. How this may be achieved is illustrated with several examples. The graphical devices invoked here also serve the purpose of supporting the discussion of the similarities, differences and complementary aspects of existing Bayesian probabilistic sampling criteria and the decision-theoretic approach proposed throughout this paper.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Asylum seekers may have a higher rate of latenttuberculosis infection (LTBI) than resident populations in Westerncountries. LTBI can be detected by an Interferon Gamma ReleaseAssay (IGRA). Screening asylum seekers at highest risk for LTBI orfuture tuberculosis by IGRA could be considered. The aims of this pilotstudy were to assess the prevalence and the risk factors of LTBI amonga group of asylum seekers recently arrived in Switzerland.Methods: A prospective cross-sectional study was performed amongadult asylum seekers, staying in two migrant centers of the Vaud county,Switzerland, after a first screening for active tuberculosis at the border.The participants were offered IGRA screening using T-SPOT.TB andwere questioned about risk factors associated with LTBI. Migrants with apositive test had a chest radiograph and a medical examination. Thosewith active tuberculosis were excluded and were treated. The migrantswith LTBI received a preventive treatment, if indicated. The risk factorswere analyzed by univariate and multivariate logistical regression.Results: Among 788 migrants recently arrived, 639 were adults, 393agreed to be screened (61.50%) and 98 of them had a positive T-SPOT.TB (24.93%) of which 5 (5.1%) had an active tuberculosis (previouslynot detected at the border), and 2 had already been treated for activetuberculosis. In univariate analysis, the major risk factors associatedwith LTBI were country of origin and travel conditions. Compared withmigrants from Balkanic countries, migrants from Africa had an OR forLTBI of 3.68, migrants from Asia an OR of 4.3 and migrants fromFormer Soviet Union an OR of 4.5. Migrants who crossed severalborders before arriving in Switzerland had an OR of LTBI of 2.49compared with migrants who came directly from the home country.Age, cough and prior exposure to tuberculosis had a non-significantinfluence on the rate of test positivity. In multivariate analysis, thecombination of country of origin, travel conditions, age, cough andexposure to tuberculosis resulted in a score with optimal predictivevalue (Roc = 81%).Conclusions: Asylum seekers recently arrived in Vaud county had ahigh prevalence of LTBI and active tuberculosis. The major risk factorswere country of origin and travel conditions. Selecting for screening byIGRA the asylum seekers with the highest risk factors seems possible.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND: Solexa/Illumina short-read ultra-high throughput DNA sequencing technology produces millions of short tags (up to 36 bases) by parallel sequencing-by-synthesis of DNA colonies. The processing and statistical analysis of such high-throughput data poses new challenges; currently a fair proportion of the tags are routinely discarded due to an inability to match them to a reference sequence, thereby reducing the effective throughput of the technology. RESULTS: We propose a novel base calling algorithm using model-based clustering and probability theory to identify ambiguous bases and code them with IUPAC symbols. We also select optimal sub-tags using a score based on information content to remove uncertain bases towards the ends of the reads. CONCLUSION: We show that the method improves genome coverage and number of usable tags as compared with Solexa's data processing pipeline by an average of 15%. An R package is provided which allows fast and accurate base calling of Solexa's fluorescence intensity files and the production of informative diagnostic plots.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Part I of this series of articles focused on the construction of graphical probabilistic inference procedures, at various levels of detail, for assessing the evidential value of gunshot residue (GSR) particle evidence. The proposed models - in the form of Bayesian networks - address the issues of background presence of GSR particles, analytical performance (i.e., the efficiency of evidence searching and analysis procedures) and contamination. The use and practical implementation of Bayesian networks for case pre-assessment is also discussed. This paper, Part II, concentrates on Bayesian parameter estimation. This topic complements Part I in that it offers means for producing estimates useable for the numerical specification of the proposed probabilistic graphical models. Bayesian estimation procedures are given a primary focus of attention because they allow the scientist to combine (his/her) prior knowledge about the problem of interest with newly acquired experimental data. The present paper also considers further topics such as the sensitivity of the likelihood ratio due to uncertainty in parameters and the study of likelihood ratio values obtained for members of particular populations (e.g., individuals with or without exposure to GSR).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper analyzes and evaluates, in the context of Ontology learning, some techniques to identify and extract candidate terms to classes of a taxonomy. Besides, this work points out some inconsistencies that may be occurring in the preprocessing of text corpus, and proposes techniques to obtain good terms candidate to classes of a taxonomy.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we present a description of the role of definitional verbal patterns for the extraction of semantic relations. Several studies show that semantic relations can be extracted from analytic definitions contained in machine-readable dictionaries (MRDs). In addition, definitions found in specialised texts are a good starting point to search for different types of definitions where other semantic relations occur. The extraction of definitional knowledge from specialised corpora represents another interesting approach for the extraction of semantic relations. Here, we present a descriptive analysis of definitional verbal patterns in Spanish and the first steps towards the development of a system for the automatic extraction of definitional knowledge.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Deciding whether two fingerprint marks originate from the same source requires examination and comparison of their features. Many cognitive factors play a major role in such information processing. In this paper we examined the consistency (both between- and within-experts) in the analysis of latent marks, and whether the presence of a 'target' comparison print affects this analysis. Our findings showed that the context of a comparison print affected analysis of the latent mark, possibly influencing allocation of attention, visual search, and threshold for determining a 'signal'. We also found that even without the context of the comparison print there was still a lack of consistency in analysing latent marks. Not only was this reflected by inconsistency between different experts, but the same experts at different times were inconsistent with their own analysis. However, the characterization of these inconsistencies depends on the standard and definition of what constitutes inconsistent. Furthermore, these effects were not uniform; the lack of consistency varied across fingerprints and experts. We propose solutions to mediate variability in the analysis of friction ridge skin.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper a method for extracting semantic informationfrom online music discussion forums is proposed. The semantic relations are inferred from the co-occurrence of musical concepts in forum posts, using network analysis. The method starts by defining a dictionary of common music terms in an art music tradition. Then, it creates a complex network representation of the online forum by matchingsuch dictionary against the forum posts. Once the complex network is built we can study different network measures, including node relevance, node co-occurrence andterm relations via semantically connecting words. Moreover, we can detect communities of concepts inside the forum posts. The rationale is that some music terms are more related to each other than to other terms. All in all, this methodology allows us to obtain meaningful and relevantinformation from forum discussions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Using data from the Spanish household budget survey, we investigate life- cycle effects on several product expenditures. A latent-variable model approach is adopted to evaluate the impact of income on expenditures, controlling for the number of members in the family. Two latent factors underlying repeated measures of monetary and non-monetary income are used as explanatory variables in the expenditure regression equations, thus avoiding possible bias associated to the measurement error in income. The proposed methodology also takes care of the case in which product expenditures exhibit a pattern of infrequent purchases. Multiple-group analysis is used to assess the variation of key parameters of the model across various household life-cycle typologies. The analysis discloses significant life-cycle effects on the mean levels of expenditures; it also detects significant life-cycle effects on the way expenditures are affected by income and family size. Asymptotic robust methods are used to account for possible non-normality of the data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The sample dimension, types of variables, format used for measurement, and construction of instruments to collect valid and reliable data must be considered during the research process. In the social and health sciences, and more specifically in nursing, data-collection instruments are usually composed of latent variables or variables that cannot be directly observed. Such facts emphasize the importance of deciding how to measure study variables (using an ordinal scale or a Likert or Likert-type scale). Psychometric scales are examples of instruments that are affected by the type of variables that comprise them, which could cause problems with measurement and statistical analysis (parametric tests versus non-parametric tests). Hence, investigators using these variables must rely on suppositions based on simulation studies or recommendations based on scientific evidence in order to make the best decisions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Using data from the Spanish household budget survey, we investigate life-cycle effects on several product expenditures. A latent-variable model approach is adopted to evaluate the impact of income on expenditures, controlling for the number of members in the family. Two latent factors underlying repeated measures of monetary and non-monetary income are used as explanatory variables in the expenditure regression equations, thus avoiding possible bias associated to the measurement error in income. The proposed methodology also takes care of the case in which product expenditures exhibit a pattern of infrequent purchases. Multiple-group analysis is used to assess the variation of key parameters of the model across various household life-cycle typologies. The analysis discloses significant life-cycle effects on the mean levels of expenditures; it also detects significant life-cycle effects on the way expenditures are affected by income and family size. Asymptotic robust methods are used to account for possible non-normality of the data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Standard methods for the analysis of linear latent variable models oftenrely on the assumption that the vector of observed variables is normallydistributed. This normality assumption (NA) plays a crucial role inassessingoptimality of estimates, in computing standard errors, and in designinganasymptotic chi-square goodness-of-fit test. The asymptotic validity of NAinferences when the data deviates from normality has been calledasymptoticrobustness. In the present paper we extend previous work on asymptoticrobustnessto a general context of multi-sample analysis of linear latent variablemodels,with a latent component of the model allowed to be fixed across(hypothetical)sample replications, and with the asymptotic covariance matrix of thesamplemoments not necessarily finite. We will show that, under certainconditions,the matrix $\Gamma$ of asymptotic variances of the analyzed samplemomentscan be substituted by a matrix $\Omega$ that is a function only of thecross-product moments of the observed variables. The main advantage of thisis thatinferences based on $\Omega$ are readily available in standard softwareforcovariance structure analysis, and do not require to compute samplefourth-order moments. An illustration with simulated data in the context ofregressionwith errors in variables will be presented.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Dorsal and ventral pathways for syntacto-semantic speech processing in the left hemisphere are represented in the dual-stream model of auditory processing. Here we report new findings for the right dorsal and ventral temporo-frontal pathway during processing of affectively intonated speech (i.e. affective prosody) in humans, together with several left hemispheric structural connections, partly resembling those for syntacto-semantic speech processing. We investigated white matter fiber connectivity between regions responding to affective prosody in several subregions of the bilateral superior temporal cortex (secondary and higher-level auditory cortex) and of the inferior frontal cortex (anterior and posterior inferior frontal gyrus). The fiber connectivity was investigated by using probabilistic diffusion tensor based tractography. The results underscore several so far underestimated auditory pathway connections, especially for the processing of affective prosody, such as a right ventral auditory pathway. The results also suggest the existence of a dual-stream processing in the right hemisphere, and a general predominance of the dorsal pathways in both hemispheres underlying the neural processing of affective prosody in an extended temporo-frontal network.