924 resultados para Automatic Analysis of Multivariate Categorical Data Sets
Resumo:
Collecting and analysing data is an important element in any field of human activity and research. Even in sports, collecting and analyzing statistical data is attracting a growing interest. Some exemplar use cases are: improvement of technical/tactical aspects for team coaches, definition of game strategies based on the opposite team play or evaluation of the performance of players. Other advantages are related to taking more precise and impartial judgment in referee decisions: a wrong decision can change the outcomes of important matches. Finally, it can be useful to provide better representations and graphic effects that make the game more engaging for the audience during the match. Nowadays it is possible to delegate this type of task to automatic software systems that can use cameras or even hardware sensors to collect images or data and process them. One of the most efficient methods to collect data is to process the video images of the sporting event through mixed techniques concerning machine learning applied to computer vision. As in other domains in which computer vision can be applied, the main tasks in sports are related to object detection, player tracking, and to the pose estimation of athletes. The goal of the present thesis is to apply different models of CNNs to analyze volleyball matches. Starting from video frames of a volleyball match, we reproduce a bird's eye view of the playing court where all the players are projected, reporting also for each player the type of action she/he is performing.
Resumo:
We consider a mixture model approach to the regression analysis of competing-risks data. Attention is focused on inference concerning the effects of factors on both the probability of occurrence and the hazard rate conditional on each of the failure types. These two quantities are specified in the mixture model using the logistic model and the proportional hazards model, respectively. We propose a semi-parametric mixture method to estimate the logistic and regression coefficients jointly, whereby the component-baseline hazard functions are completely unspecified. Estimation is based on maximum likelihood on the basis of the full likelihood, implemented via an expectation-conditional maximization (ECM) algorithm. Simulation studies are performed to compare the performance of the proposed semi-parametric method with a fully parametric mixture approach. The results show that when the component-baseline hazard is monotonic increasing, the semi-parametric and fully parametric mixture approaches are comparable for mildly and moderately censored samples. When the component-baseline hazard is not monotonic increasing, the semi-parametric method consistently provides less biased estimates than a fully parametric approach and is comparable in efficiency in the estimation of the parameters for all levels of censoring. The methods are illustrated using a real data set of prostate cancer patients treated with different dosages of the drug diethylstilbestrol. Copyright (C) 2003 John Wiley Sons, Ltd.
Resumo:
Complex industrial plants exhibit multiple interactions among smaller parts and with human operators. Failure in one part can propagate across subsystem boundaries causing a serious disaster. This paper analyzes the industrial accident data series in the perspective of dynamical systems. First, we process real world data and show that the statistics of the number of fatalities reveal features that are well described by power law (PL) distributions. For early years, the data reveal double PL behavior, while, for more recent time periods, a single PL fits better into the experimental data. Second, we analyze the entropy of the data series statistics over time. Third, we use the Kullback–Leibler divergence to compare the empirical data and multidimensional scaling (MDS) techniques for data analysis and visualization. Entropy-based analysis is adopted to assess complexity, having the advantage of yielding a single parameter to express relationships between the data. The classical and the generalized (fractional) entropy and Kullback–Leibler divergence are used. The generalized measures allow a clear identification of patterns embedded in the data.
Resumo:
Magdeburg, Univ., Fak. für Informatik, Diss., 2014
Resumo:
Real-world objects are often endowed with features that violate Gestalt principles. In our experiment, we examined the neural correlates of binding under conflict conditions in terms of the binding-by-synchronization hypothesis. We presented an ambiguous stimulus ("diamond illusion") to 12 observers. The display consisted of four oblique gratings drifting within circular apertures. Its interpretation fluctuates between bound ("diamond") and unbound (component gratings) percepts. To model a situation in which Gestalt-driven analysis contradicts the perceptually explicit bound interpretation, we modified the original diamond (OD) stimulus by speeding up one grating. Using OD and modified diamond (MD) stimuli, we managed to dissociate the neural correlates of Gestalt-related (OD vs. MD) and perception-related (bound vs. unbound) factors. Their interaction was expected to reveal the neural networks synchronized specifically in the conflict situation. The synchronization topography of EEG was analyzed with the multivariate S-estimator technique. We found that good Gestalt (OD vs. MD) was associated with a higher posterior synchronization in the beta-gamma band. The effect of perception manifested itself as reciprocal modulations over the posterior and anterior regions (theta/beta-gamma bands). Specifically, higher posterior and lower anterior synchronization supported the bound percept, and the opposite was true for the unbound percept. The interaction showed that binding under challenging perceptual conditions is sustained by enhanced parietal synchronization. We argue that this distributed pattern of synchronization relates to the processes of multistage integration ranging from early grouping operations in the visual areas to maintaining representations in the frontal networks of sensory memory.
Resumo:
Despite the central role of quantitative PCR (qPCR) in the quantification of mRNA transcripts, most analyses of qPCR data are still delegated to the software that comes with the qPCR apparatus. This is especially true for the handling of the fluorescence baseline. This article shows that baseline estimation errors are directly reflected in the observed PCR efficiency values and are thus propagated exponentially in the estimated starting concentrations as well as 'fold-difference' results. Because of the unknown origin and kinetics of the baseline fluorescence, the fluorescence values monitored in the initial cycles of the PCR reaction cannot be used to estimate a useful baseline value. An algorithm that estimates the baseline by reconstructing the log-linear phase downward from the early plateau phase of the PCR reaction was developed and shown to lead to very reproducible PCR efficiency values. PCR efficiency values were determined per sample by fitting a regression line to a subset of data points in the log-linear phase. The variability, as well as the bias, in qPCR results was significantly reduced when the mean of these PCR efficiencies per amplicon was used in the calculation of an estimate of the starting concentration per sample.
Resumo:
A version of Matheron’s discrete Gaussian model is applied to cell composition data.The examples are for map patterns of felsic metavolcanics in two different areas. Q-Qplots of the model for cell values representing proportion of 10 km x 10 km cell areaunderlain by this rock type are approximately linear, and the line of best fit can be usedto estimate the parameters of the model. It is also shown that felsic metavolcanics in theAbitibi area of the Canadian Shield can be modeled as a fractal
Resumo:
Several eco-toxicological studies have shown that insectivorous mammals, due to theirfeeding habits, easily accumulate high amounts of pollutants in relation to other mammal species. To assess the bio-accumulation levels of toxic metals and their in°uenceon essential metals, we quantified the concentration of 19 elements (Ca, K, Fe, B, P,S, Na, Al, Zn, Ba, Rb, Sr, Cu, Mn, Hg, Cd, Mo, Cr and Pb) in bones of 105 greaterwhite-toothed shrews (Crocidura russula) from a polluted (Ebro Delta) and a control(Medas Islands) area. Since chemical contents of a bio-indicator are mainly compositional data, conventional statistical analyses currently used in eco-toxicology can givemisleading results. Therefore, to improve the interpretation of the data obtained, weused statistical techniques for compositional data analysis to define groups of metalsand to evaluate the relationships between them, from an inter-population viewpoint.Hypothesis testing on the adequate balance-coordinates allow us to confirm intuitionbased hypothesis and some previous results. The main statistical goal was to test equalmeans of balance-coordinates for the two defined populations. After checking normality,one-way ANOVA or Mann-Whitney tests were carried out for the inter-group balances
Resumo:
MOTIVATION: High-throughput sequencing technologies enable the genome-wide analysis of the impact of genetic variation on molecular phenotypes at unprecedented resolution. However, although powerful, these technologies can also introduce unexpected artifacts. Results: We investigated the impact of library amplification bias on the identification of allele-specific (AS) molecular events from high-throughput sequencing data derived from chromatin immunoprecipitation assays (ChIP-seq). Putative AS DNA binding activity for RNA polymerase II was determined using ChIP-seq data derived from lymphoblastoid cell lines of two parent-daughter trios. We found that, at high-sequencing depth, many significant AS binding sites suffered from an amplification bias, as evidenced by a larger number of clonal reads representing one of the two alleles. To alleviate this bias, we devised an amplification bias detection strategy, which filters out sites with low read complexity and sites featuring a significant excess of clonal reads. This method will be useful for AS analyses involving ChIP-seq and other functional sequencing assays.
Resumo:
BACKGROUND: Finding genes that are differentially expressed between conditions is an integral part of understanding the molecular basis of phenotypic variation. In the past decades, DNA microarrays have been used extensively to quantify the abundance of mRNA corresponding to different genes, and more recently high-throughput sequencing of cDNA (RNA-seq) has emerged as a powerful competitor. As the cost of sequencing decreases, it is conceivable that the use of RNA-seq for differential expression analysis will increase rapidly. To exploit the possibilities and address the challenges posed by this relatively new type of data, a number of software packages have been developed especially for differential expression analysis of RNA-seq data. RESULTS: We conducted an extensive comparison of eleven methods for differential expression analysis of RNA-seq data. All methods are freely available within the R framework and take as input a matrix of counts, i.e. the number of reads mapping to each genomic feature of interest in each of a number of samples. We evaluate the methods based on both simulated data and real RNA-seq data. CONCLUSIONS: Very small sample sizes, which are still common in RNA-seq experiments, impose problems for all evaluated methods and any results obtained under such conditions should be interpreted with caution. For larger sample sizes, the methods combining a variance-stabilizing transformation with the 'limma' method for differential expression analysis perform well under many different conditions, as does the nonparametric SAMseq method.
Resumo:
Whether for investigative or intelligence aims, crime analysts often face up the necessity to analyse the spatiotemporal distribution of crimes or traces left by suspects. This article presents a visualisation methodology supporting recurrent practical analytical tasks such as the detection of crime series or the analysis of traces left by digital devices like mobile phone or GPS devices. The proposed approach has led to the development of a dedicated tool that has proven its effectiveness in real inquiries and intelligence practices. It supports a more fluent visual analysis of the collected data and may provide critical clues to support police operations as exemplified by the presented case studies.
Resumo:
The recently measured inclusive electron-proton cross section in the nucleon resonance region, performed with the CLAS detector at the Thomas Jefferson Laboratory, has provided new data for the nucleon structure function F2 with previously unavailable precision. In this paper we propose a description of these experimental data based on a Regge-dual model for F2. The basic inputs in the model are nonlinear complex Regge trajectories producing both isobar resonances and a smooth background. The model is tested against the experimental data, and the Q2 dependence of the moments is calculated. The fitted model for the structure function (inclusive cross section) is a limiting case of the more general scattering amplitude equally applicable to deeply virtual Compton scattering. The connection between the two is discussed.
Resumo:
Termed the “silent epidemic”, traumatic brain injury is the most debilitating outcome of injury characterized by the irreversibility of its damages, long-term effects on quality of life, and healthcare costs. The latest data available from the Centers for Disease Control and Prevention (CDC) estimate that nationally 50,000 people with traumatic brain injury (TBI) die each year; three times as many are hospitalized and more than twenty times as many are released from emergency room departments (ED) (CDC, 2008)1. The purpose of this report is to describe the epidemiology of TBI in Iowa to help guide policy and programming. TBI is a result of an external force which transfers energy to the brain. Stroke is caused by a disruption of blood flow in the brain that leads to brain injury. Though stroke is recognized as the 3rd leading cause of death nationally2, and is an injury that affects the brain it does not meet the definition a traumatic brain injury and is not included in this report.
Resumo:
The incidence of hepatocellular carcinoma (HCC) is increasing in Western countries. Although several clinical factors have been identified, many individuals never develop HCC, suggesting a genetic susceptibility. However, to date, only a few single-nucleotide polymorphisms have been reproducibly shown to be linked to HCC onset. A variant (rs738409 C>G, encoding for p.I148M) in the PNPLA3 gene is associated with liver damage in chronic liver diseases. Interestingly, several studies have reported that the minor rs738409[G] allele is more represented in HCC cases in chronic hepatitis C (CHC) and alcoholic liver disease (ALD). However, a significant association with HCC related to CHC has not been consistently observed, and the strength of the association between rs738409 and HCC remains unclear. We performed a meta-analysis of individual participant data including 2,503 European patients with cirrhosis to assess the association between rs738409 and HCC, particularly in ALD and CHC. We found that rs738409 was strongly associated with overall HCC (odds ratio [OR] per G allele, additive model=1.77; 95% confidence interval [CI]: 1.42-2.19; P=2.78 × 10(-7) ). This association was more pronounced in ALD (OR=2.20; 95% CI: 1.80-2.67; P=4.71 × 10(-15) ) than in CHC patients (OR=1.55; 95% CI: 1.03-2.34; P=3.52 × 10(-2) ). After adjustment for age, sex, and body mass index, the variant remained strongly associated with HCC. Conclusion: Overall, these results suggest that rs738409 exerts a marked influence on hepatocarcinogenesis in patients with cirrhosis of European descent and provide a strong argument for performing further mechanistic studies to better understand the role of PNPLA3 in HCC development.