958 resultados para Probabilistic generalization
Resumo:
Early detection of breast cancer (BC) with mammography may cause overdiagnosis andovertreatment, detecting tumors which would remain undiagnosed during a lifetime. The aims of this study were: first, to model invasive BC incidence trends in Catalonia (Spain) taking into account reproductive and screening data; and second, to quantify the extent of BC overdiagnosis. We modeled the incidence of invasive BC using a Poisson regression model. Explanatory variables were:age at diagnosis and cohort characteristics (completed fertility rate, percentage of women that use mammography at age 50, and year of birth). This model also was used to estimate the background incidence in the absence of screening. We used a probabilistic model to estimate the expected BC incidence if women in the population usedmammography as reported in health surveys. The difference between the observed and expected cumulative incidences provided an estimate of overdiagnosis.Incidence of invasive BC increased, especially in cohorts born from 1940 to 1955. The biggest increase was observed in these cohorts between the ages of 50 to 65 years, where the final BC incidence rates more than doubled the initial ones. Dissemination of mammography was significantly associated with BC incidence and overdiagnosis. Our estimates of overdiagnosis ranged from 0.4% to 46.6%, for women born around 1935 and 1950, respectively.Our results support the existence of overdiagnosis in Catalonia attributed to mammography usage, and the limited malignant potential of some tumors may play an important role. Women should be better informed about this risk. Research should be oriented towards personalized screening and risk assessment tools
Resumo:
The sociocultural changes that led to the genesis of Romance languages widened the gap between oral and written patterns, which display different discoursive and linguistic devices. In early documents, discoursive implicatures connecting propositions were not generally codified, so that the reader should furnish the correctinterpretation according to his own perception of real facts; which can still be attested in current oral utterances. Once Romance languages had undergone several levelling processes which concluded in the first standardizations, implicatures became explicatures and were syntactically codified by means of univocal new complexconjunctions. As a consequence of the emergence of these new subordination strategies, a freer distribution of the information conveyed by the utterances is allowed. The success of complex structural patterns ran alongside of the genesis of new narrative genres and the generalization of a learned rhetoric. Both facts are a spontaneous effect of new approaches to the act of reading. Ancient texts were written to be read to a wide audience, whereas those printed by the end of the XV th century were conceived to be read quietly, in a low voice, by a private reader. The goal of this paper is twofold, since we will show that: a) The development of new complex conjunctions through the history of Romance languages accommodates to four structural patterns that range from parataxis tohypotaxis. b) This development is a reflex of the well known grammaticalization path from discourse to syntax that implies the codification of discoursive strategies (Givón 2 1979, Sperber and Wilson 1986, Carston 1988, Grice 1989, Bach 1994, Blackemore 2002, among others]
Resumo:
Background The 'database search problem', that is, the strengthening of a case - in terms of probative value - against an individual who is found as a result of a database search, has been approached during the last two decades with substantial mathematical analyses, accompanied by lively debate and centrally opposing conclusions. This represents a challenging obstacle in teaching but also hinders a balanced and coherent discussion of the topic within the wider scientific and legal community. This paper revisits and tracks the associated mathematical analyses in terms of Bayesian networks. Their derivation and discussion for capturing probabilistic arguments that explain the database search problem are outlined in detail. The resulting Bayesian networks offer a distinct view on the main debated issues, along with further clarity. Methods As a general framework for representing and analyzing formal arguments in probabilistic reasoning about uncertain target propositions (that is, whether or not a given individual is the source of a crime stain), this paper relies on graphical probability models, in particular, Bayesian networks. This graphical probability modeling approach is used to capture, within a single model, a series of key variables, such as the number of individuals in a database, the size of the population of potential crime stain sources, and the rarity of the corresponding analytical characteristics in a relevant population. Results This paper demonstrates the feasibility of deriving Bayesian network structures for analyzing, representing, and tracking the database search problem. The output of the proposed models can be shown to agree with existing but exclusively formulaic approaches. Conclusions The proposed Bayesian networks allow one to capture and analyze the currently most well-supported but reputedly counter-intuitive and difficult solution to the database search problem in a way that goes beyond the traditional, purely formulaic expressions. The method's graphical environment, along with its computational and probabilistic architectures, represents a rich package that offers analysts and discussants with additional modes of interaction, concise representation, and coherent communication.
Resumo:
We investigate whether dimensionality reduction using a latent generative model is beneficial for the task of weakly supervised scene classification. In detail, we are given a set of labeled images of scenes (for example, coast, forest, city, river, etc.), and our objective is to classify a new image into one of these categories. Our approach consists of first discovering latent ";topics"; using probabilistic Latent Semantic Analysis (pLSA), a generative model from the statistical text literature here applied to a bag of visual words representation for each image, and subsequently, training a multiway classifier on the topic distribution vector for each image. We compare this approach to that of representing each image by a bag of visual words vector directly and training a multiway classifier on these vectors. To this end, we introduce a novel vocabulary using dense color SIFT descriptors and then investigate the classification performance under changes in the size of the visual vocabulary, the number of latent topics learned, and the type of discriminative classifier used (k-nearest neighbor or SVM). We achieve superior classification performance to recent publications that have used a bag of visual word representation, in all cases, using the authors' own data sets and testing protocols. We also investigate the gain in adding spatial information. We show applications to image retrieval with relevance feedback and to scene classification in videos
Resumo:
Reinforcement learning (RL) is a very suitable technique for robot learning, as it can learn in unknown environments and in real-time computation. The main difficulties in adapting classic RL algorithms to robotic systems are the generalization problem and the correct observation of the Markovian state. This paper attempts to solve the generalization problem by proposing the semi-online neural-Q_learning algorithm (SONQL). The algorithm uses the classic Q_learning technique with two modifications. First, a neural network (NN) approximates the Q_function allowing the use of continuous states and actions. Second, a database of the most representative learning samples accelerates and stabilizes the convergence. The term semi-online is referred to the fact that the algorithm uses the current but also past learning samples. However, the algorithm is able to learn in real-time while the robot is interacting with the environment. The paper shows simulated results with the "mountain-car" benchmark and, also, real results with an underwater robot in a target following behavior
Resumo:
We investigated procedural learning in 18 children with basal ganglia (BG) lesions or dysfunctions of various aetiologies, using a visuo-motor learning test, the Serial Reaction Time (SRT) task, and a cognitive learning test, the Probabilistic Classification Learning (PCL) task. We compared patients with early (<1 year old, n=9), later onset (>6 years old, n=7) or progressive disorder (idiopathic dystonia, n=2). All patients showed deficits in both visuo-motor and cognitive domains, except those with idiopathic dystonia, who displayed preserved classification learning skills. Impairments seem to be independent from the age of onset of pathology. As far as we know, this study is the first to investigate motor and cognitive procedural learning in children with BG damage. Procedural impairments were documented whatever the aetiology of the BG damage/dysfunction and time of pathology onset, thus supporting the claim of very early skill learning development and lack of plasticity in case of damage.
Resumo:
PURPOSE: All kinds of blood manipulations aim to increase the total hemoglobin mass (tHb-mass). To establish tHb-mass as an effective screening parameter for detecting blood doping, the knowledge of its normal variation over time is necessary. The aim of the present study, therefore, was to determine the intraindividual variance of tHb-mass in elite athletes during a training year emphasizing off, training, and race seasons at sea level. METHODS: tHb-mass and hemoglobin concentration ([Hb]) were determined in 24 endurance athletes five times during a year and were compared with a control group (n = 6). An analysis of covariance was used to test the effects of training phases, age, gender, competition level, body mass, and training volume. Three error models, based on 1) a total percentage error of measurement, 2) the combination of a typical percentage error (TE) of analytical origin with an absolute SD of biological origin, and 3) between-subject and within-subject variance components as obtained by an analysis of variance, were tested. RESULTS: In addition to the expected influence of performance status, the main results were that the effects of training volume (P = 0.20) and training phases (P = 0.81) on tHb-mass were not significant. We found that within-subject variations mainly have an analytical origin (TE approximately 1.4%) and a very small SD (7.5 g) of biological origin. CONCLUSION: tHb-mass shows very low individual oscillations during a training year (<6%), and these oscillations are below the expected changes in tHb-mass due to Herythropoetin (EPO) application or blood infusion (approximately 10%). The high stability of tHb-mass over a period of 1 year suggests that it should be included in an athlete's biological passport and analyzed by recently developed probabilistic inference techniques that define subject-based reference ranges.
Resumo:
BACKGROUND Drugs for inhalation are the cornerstone of therapy in obstructive lung disease. We have observed that up to 75 % of patients do not perform a correct inhalation technique. The inability of patients to correctly use their inhaler device may be a direct consequence of insufficient or poor inhaler technique instruction. The objective of this study is to test the efficacy of two educational interventions to improve the inhalation techniques in patients with Chronic Obstructive Pulmonary Disease (COPD). METHODS This study uses both a multicenter patients´ preference trial and a comprehensive cohort design with 495 COPD-diagnosed patients selected by a non-probabilistic method of sampling from seven Primary Care Centers. The participants will be divided into two groups and five arms. The two groups are: 1) the patients´ preference group with two arms and 2) the randomized group with three arms. In the preference group, the two arms correspond to the two educational interventions (Intervention A and Intervention B) designed for this study. In the randomized group the three arms comprise: intervention A, intervention B and a control arm. Intervention A is written information (a leaflet describing the correct inhalation techniques). Intervention B is written information about inhalation techniques plus training by an instructor. Every patient in each group will be visited six times during the year of the study at health care center. DISCUSSION Our hypothesis is that the application of two educational interventions in patients with COPD who are treated with inhaled therapy will increase the number of patients who perform a correct inhalation technique by at least 25 %. We will evaluate the effectiveness of these interventions on patient inhalation technique improvement, considering that it will be adequate and feasible within the context of clinical practice.
Resumo:
Uncertainty quantification of petroleum reservoir models is one of the present challenges, which is usually approached with a wide range of geostatistical tools linked with statistical optimisation or/and inference algorithms. Recent advances in machine learning offer a novel approach to model spatial distribution of petrophysical properties in complex reservoirs alternative to geostatistics. The approach is based of semisupervised learning, which handles both ?labelled? observed data and ?unlabelled? data, which have no measured value but describe prior knowledge and other relevant data in forms of manifolds in the input space where the modelled property is continuous. Proposed semi-supervised Support Vector Regression (SVR) model has demonstrated its capability to represent realistic geological features and describe stochastic variability and non-uniqueness of spatial properties. On the other hand, it is able to capture and preserve key spatial dependencies such as connectivity of high permeability geo-bodies, which is often difficult in contemporary petroleum reservoir studies. Semi-supervised SVR as a data driven algorithm is designed to integrate various kind of conditioning information and learn dependences from it. The semi-supervised SVR model is able to balance signal/noise levels and control the prior belief in available data. In this work, stochastic semi-supervised SVR geomodel is integrated into Bayesian framework to quantify uncertainty of reservoir production with multiple models fitted to past dynamic observations (production history). Multiple history matched models are obtained using stochastic sampling and/or MCMC-based inference algorithms, which evaluate posterior probability distribution. Uncertainty of the model is described by posterior probability of the model parameters that represent key geological properties: spatial correlation size, continuity strength, smoothness/variability of spatial property distribution. The developed approach is illustrated with a fluvial reservoir case. The resulting probabilistic production forecasts are described by uncertainty envelopes. The paper compares the performance of the models with different combinations of unknown parameters and discusses sensitivity issues.
Resumo:
BACKGROUND Functional brain images such as Single-Photon Emission Computed Tomography (SPECT) and Positron Emission Tomography (PET) have been widely used to guide the clinicians in the Alzheimer's Disease (AD) diagnosis. However, the subjectivity involved in their evaluation has favoured the development of Computer Aided Diagnosis (CAD) Systems. METHODS It is proposed a novel combination of feature extraction techniques to improve the diagnosis of AD. Firstly, Regions of Interest (ROIs) are selected by means of a t-test carried out on 3D Normalised Mean Square Error (NMSE) features restricted to be located within a predefined brain activation mask. In order to address the small sample-size problem, the dimension of the feature space was further reduced by: Large Margin Nearest Neighbours using a rectangular matrix (LMNN-RECT), Principal Component Analysis (PCA) or Partial Least Squares (PLS) (the two latter also analysed with a LMNN transformation). Regarding the classifiers, kernel Support Vector Machines (SVMs) and LMNN using Euclidean, Mahalanobis and Energy-based metrics were compared. RESULTS Several experiments were conducted in order to evaluate the proposed LMNN-based feature extraction algorithms and its benefits as: i) linear transformation of the PLS or PCA reduced data, ii) feature reduction technique, and iii) classifier (with Euclidean, Mahalanobis or Energy-based methodology). The system was evaluated by means of k-fold cross-validation yielding accuracy, sensitivity and specificity values of 92.78%, 91.07% and 95.12% (for SPECT) and 90.67%, 88% and 93.33% (for PET), respectively, when a NMSE-PLS-LMNN feature extraction method was used in combination with a SVM classifier, thus outperforming recently reported baseline methods. CONCLUSIONS All the proposed methods turned out to be a valid solution for the presented problem. One of the advances is the robustness of the LMNN algorithm that not only provides higher separation rate between the classes but it also makes (in combination with NMSE and PLS) this rate variation more stable. In addition, their generalization ability is another advance since several experiments were performed on two image modalities (SPECT and PET).
Resumo:
Study on the likelihood and prevalence of patients with copd, over a year in a family medicine consultation, during 2012 and first two months of 2013. In a query of a health center about 15oo patients every 6 months probabilistic evolution was studied according to the theory of Laplace. Analyze both the COPD, its symptoms, etiology, clinical consultation and treatment in Family Medicine.
Resumo:
In the forensic examination of DNA mixtures, the question of how to set the total number of contributors (N) presents a topic of ongoing interest. Part of the discussion gravitates around issues of bias, in particular when assessments of the number of contributors are not made prior to considering the genotypic configuration of potential donors. Further complication may stem from the observation that, in some cases, there may be numbers of contributors that are incompatible with the set of alleles seen in the profile of a mixed crime stain, given the genotype of a potential contributor. In such situations, procedures that take a single and fixed number contributors as their output can lead to inferential impasses. Assessing the number of contributors within a probabilistic framework can help avoiding such complication. Using elements of decision theory, this paper analyses two strategies for inference on the number of contributors. One procedure is deterministic and focuses on the minimum number of contributors required to 'explain' an observed set of alleles. The other procedure is probabilistic using Bayes' theorem and provides a probability distribution for a set of numbers of contributors, based on the set of observed alleles as well as their respective rates of occurrence. The discussion concentrates on mixed stains of varying quality (i.e., different numbers of loci for which genotyping information is available). A so-called qualitative interpretation is pursued since quantitative information such as peak area and height data are not taken into account. The competing procedures are compared using a standard scoring rule that penalizes the degree of divergence between a given agreed value for N, that is the number of contributors, and the actual value taken by N. Using only modest assumptions and a discussion with reference to a casework example, this paper reports on analyses using simulation techniques and graphical models (i.e., Bayesian networks) to point out that setting the number of contributors to a mixed crime stain in probabilistic terms is, for the conditions assumed in this study, preferable to a decision policy that uses categoric assumptions about N.
Resumo:
A ubiquitous assessment of swimming velocity (main metric of the performance) is essential for the coach to provide a tailored feedback to the trainee. We present a probabilistic framework for the data-driven estimation of the swimming velocity at every cycle using a low-cost wearable inertial measurement unit (IMU). The statistical validation of the method on 15 swimmers shows that an average relative error of 0.1 ± 9.6% and high correlation with the tethered reference system (rX,Y=0.91 ) is achievable. Besides, a simple tool to analyze the influence of sacrum kinematics on the performance is provided.
Resumo:
The genetic characterization of unbalanced mixed stains remains an important area where improvement is imperative. In fact, with current methods for DNA analysis (Polymerase Chain Reaction with the SGM Plus™ multiplex kit), it is generally not possible to obtain a conventional autosomal DNA profile of the minor contributor if the ratio between the two contributors in a mixture is smaller than 1:10. This is a consequence of the fact that the major contributor's profile 'masks' that of the minor contributor. Besides known remedies to this problem, such as Y-STR analysis, a new compound genetic marker that consists of a Deletion/Insertion Polymorphism (DIP), linked to a Short Tandem Repeat (STR) polymorphism, has recently been developed and proposed elsewhere in literature [1]. The present paper reports on the derivation of an approach for the probabilistic evaluation of DIP-STR profiling results obtained from unbalanced DNA mixtures. The procedure is based on object-oriented Bayesian networks (OOBNs) and uses the likelihood ratio as an expression of the probative value. OOBNs are retained in this paper because they allow one to provide a clear description of the genotypic configuration observed for the mixed stain as well as for the various potential contributors (e.g., victim and suspect). These models also allow one to depict the assumed relevance relationships and perform the necessary probabilistic computations.
Resumo:
El terme paisatge i les seves aplicacions són cada dia més utilitzats per les administracions i altres entitats com a eina de gestió del territori. Aprofitant la gran quantitat de dades en bases compatibles amb SIG (Sistemes d’Informació Geogràfica) existents a Catalunya s’ha desenvolupat una síntesi cartogràfica on s’identifiquen els Paisatges Funcionals (PF) de Catalunya, concepte que fa referència al comportament fisico-ecològic del terreny a partir de variables topogràfiques i climàtiques convenientment transformades i agregades. S’ha utilitzat un mètode semiautomàtic i iteratiu de classificació no supervisada (clustering) que permet la creació d’una llegenda jeràrquica o nivells de generalització. S’ha obtingut com a resultat el Mapa de Paisatges Funcionals de Catalunya (MPFC) amb una llegenda de 26 categories de paisatges i 5 nivells de generalització amb una resolució espacial de 180 m. Paral·lelament, s’han realitzat validacions indirectes sobre el mapa obtingut a partir dels coneixements naturalistes i la cartografia existent, així com també d’un mapa d’incertesa (aplicant lògica difusa) que aporten informació de la fiabilitat de la classificació realitzada. Els Paisatges Funcionals obtinguts permeten relacionar zones de condicions topo-climàtiques homogènies i dividir el territori en zones caracteritzades ambientalment i no políticament amb la intenció que sigui d’utilitat a l’hora de millorar la gestió dels recursos naturals i la planificació d’actuacions humanes.