930 resultados para Data-driven
Resumo:
The synthetic control (SC) method has been recently proposed as an alternative method to estimate treatment e ects in comparative case studies. Abadie et al. [2010] and Abadie et al. [2015] argue that one of the advantages of the SC method is that it imposes a data-driven process to select the comparison units, providing more transparency and less discretionary power to the researcher. However, an important limitation of the SC method is that it does not provide clear guidance on the choice of predictor variables used to estimate the SC weights. We show that such lack of speci c guidances provides signi cant opportunities for the researcher to search for speci cations with statistically signi cant results, undermining one of the main advantages of the method. Considering six alternative speci cations commonly used in SC applications, we calculate in Monte Carlo simulations the probability of nding a statistically signi cant result at 5% in at least one speci cation. We nd that this probability can be as high as 13% (23% for a 10% signi cance test) when there are 12 pre-intervention periods and decay slowly with the number of pre-intervention periods. With 230 pre-intervention periods, this probability is still around 10% (18% for a 10% signi cance test). We show that the speci cation that uses the average pre-treatment outcome values to estimate the weights performed particularly bad in our simulations. However, the speci cation-searching problem remains relevant even when we do not consider this speci cation. We also show that this speci cation-searching problem is relevant in simulations with real datasets looking at placebo interventions in the Current Population Survey (CPS). In order to mitigate this problem, we propose a criterion to select among SC di erent speci cations based on the prediction error of each speci cations in placebo estimations
Resumo:
We present a measurement of the semileptonic mixing asymmetry for B0 mesons, asld, using two independent decay channels: B0→μ +D -X, with D -→K +π -π -; and B0→μ +D *-X, with D * -→D ̄0π -, D ̄0→ K +π - (and charge conjugate processes). We use a data sample corresponding to 10.4fb -1 of pp̄ collisions at √s=1.96TeV, collected with the D0 experiment at the Fermilab Tevatron collider. We extract the charge asymmetries in these two channels as a function of the visible proper decay length of the B0 meson, correct for detector-related asymmetries using data-driven methods, and account for dilution from charge-symmetric processes using Monte Carlo simulation. The final measurement combines four signal visible proper decay length regions for each channel, yielding asld=[0.68±0.45(stat)±0.14(syst)]%. This is the single most precise measurement of this parameter, with uncertainties smaller than the current world average of B factory measurements. © 2012 American Physical Society.
Resumo:
O processamento de voz tornou-se uma tecnologia cada vez mais baseada na modelagem automática de vasta quantidade de dados. Desta forma, o sucesso das pesquisas nesta área está diretamente ligado a existência de corpora de domínio público e outros recursos específicos, tal como um dicionário fonético. No Brasil, ao contrário do que acontece para a língua inglesa, por exemplo, não existe atualmente em domínio público um sistema de Reconhecimento Automático de Voz (RAV) para o Português Brasileiro com suporte a grandes vocabulários. Frente a este cenário, o trabalho tem como principal objetivo discutir esforços dentro da iniciativa FalaBrasil [1], criada pelo Laboratório de Processamento de Sinais (LaPS) da UFPA, apresentando pesquisas e softwares na área de RAV para o Português do Brasil. Mais especificamente, o presente trabalho discute a implementação de um sistema de reconhecimento de voz com suporte a grandes vocabulários para o Português do Brasil, utilizando a ferramenta HTK baseada em modelo oculto de Markov (HMM) e a criação de um módulo de conversão grafema-fone, utilizando técnicas de aprendizado de máquina.
Resumo:
O método de empilhamento Superfície de Reflexão Comum (SRC) foi originalmente introduzido como um método data-driven para simular seções afastamento-nulo a partir de dados sísmicos de reflexão pré-empilhados 2-D adquiridos ao longo de uma linha de aquisição reta. Este método está baseado em uma aproximação de tempos de trânsito hiperbólica de segunda ordem parametrizada com três atributos cinemáticos do campo de onda. Em dados terrestres, os efeitos topográficos desempenham um papel importante no processamento e imageamento de dados sísmicos. Assim, esta característica tem sido considerada recentemente pelo método SRC. Neste trabalho apresentamos uma revisão das aproximações de tempos de trânsito SRC que consideram topografia suave e rugosa. Adicionalmente, nós revemos também a aproximação de tempos de trânsito Multifoco para o caso da topografia rugosa. Por meio de um exemplo sintético simples, nós fornecemos finalmente as primeiras comparações entre as diferentes expressões de tempos de trânsito.
Resumo:
A measurement of differential cross sections for the production of a pair of isolated photons in proton-proton collisions at root s = 7 TeV is presented. The data sample corresponds to an integrated luminosity of 5.0 fb(-1) collected with the CMS detector. A data-driven isolation template method is used to extract the prompt diphoton yield. The measured cross section for two isolated photons, with transverse energy above 40 and 25 GeV respectively, in the pseudorapidity range vertical bar eta vertical bar < 2.5, vertical bar eta vertical bar (sic) [1.44, 1.57] and with an angular separation Delta R > 0.45, is 17.2 +/-0.2 (stat) +/-1.9 (syst) +/- 0.4 (lumi) pb. Differential cross sections are measured as a function of the diphoton invariant mass, the diphoton transverse momentum, the azimuthal angle difference between the two photons, and the cosine of the polar angle in the Collins-Soper reference frame of the diphoton system. The results are compared to theoretical predictions at leading, next-to-leading, and next-to-next-to-leading order in quantum chromodynamics.
Resumo:
Objective. To define inactive disease (ID) and clinical remission (CR) and to delineate variables that can be used to measure ID/CR in childhood-onset systemic lupus erythematosus (cSLE). Methods. Delphi questionnaires were sent to an international group of pediatric rheumatologists. Respondents provided information about variables to be used in future algorithms to measure ID/CR. The usefulness of these variables was assessed in 35 children with ID and 31 children with minimally active lupus (MAL). Results. While ID reflects cSLE status at a specific point in time, CR requires the presence of ID for >6 months and considers treatment. There was consensus that patients in ID/CR can have <2 mild nonlimiting symptoms (i.e., fatigue, arthralgia, headaches, or myalgia) but not Raynaud's phenomenon, chest pain, or objective physical signs of cSLE; antinuclear antibody positivity and erythrocyte sedimentation rate elevation can be present. Complete blood count, renal function testing, and complement C3 all must be within the normal range. Based on consensus, only damage-related laboratory or clinical findings of cSLE are permissible with ID. The above parameters were suitable to differentiate children with ID/CR from those with MAL (area under the receiver operating characteristic curve >0.85). Disease activity scores with or without the physician global assessment of disease activity and patient symptoms were well suited to differentiate children with ID from those with MAL. Conclusion. Consensus has been reached on common definitions of ID/CR with cSLE and relevant patient characteristics with ID/CR. Further studies must assess the usefulness of the data-driven candidate criteria for ID in cSLE.
Resumo:
The construction and use of multimedia corpora has been advocated for a while in the literature as one of the expected future application fields of Corpus Linguistics. This research project represents a pioneering experience aimed at applying a data-driven methodology to the study of the field of AVT, similarly to what has been done in the last few decades in the macro-field of Translation Studies. This research was based on the experience of Forlixt 1, the Forlì Corpus of Screen Translation, developed at the University of Bologna’s Department of Interdisciplinary Studies in Translation, Languages and Culture. As a matter of fact, in order to quantify strategies of linguistic transfer of an AV product, we need to take into consideration not only the linguistic aspect of such a product but all the meaning-making resources deployed in the filmic text. Provided that one major benefit of Forlixt 1 is the combination of audiovisual and textual data, this corpus allows the user to access primary data for scientific investigation, and thus no longer rely on pre-processed material such as traditional annotated transcriptions. Based on this rationale, the first chapter of the thesis sets out to illustrate the state of the art of research in the disciplinary fields involved. The primary objective was to underline the main repercussions on multimedia texts resulting from the interaction of a double support, audio and video, and, accordingly, on procedures, means, and methods adopted in their translation. By drawing on previous research in semiotics and film studies, the relevant codes at work in visual and acoustic channels were outlined. Subsequently, we concentrated on the analysis of the verbal component and on the peculiar characteristics of filmic orality as opposed to spontaneous dialogic production. In the second part, an overview of the main AVT modalities was presented (dubbing, voice-over, interlinguistic and intra-linguistic subtitling, audio-description, etc.) in order to define the different technologies, processes and professional qualifications that this umbrella term presently includes. The second chapter focuses diachronically on various theories’ contribution to the application of Corpus Linguistics’ methods and tools to the field of Translation Studies (i.e. Descriptive Translation Studies, Polysystem Theory). In particular, we discussed how the use of corpora can favourably help reduce the gap existing between qualitative and quantitative approaches. Subsequently, we reviewed the tools traditionally employed by Corpus Linguistics in regard to the construction of traditional “written language” corpora, to assess whether and how they can be adapted to meet the needs of multimedia corpora. In particular, we reviewed existing speech and spoken corpora, as well as multimedia corpora specifically designed to investigate Translation. The third chapter reviews Forlixt 1's main developing steps, from a technical (IT design principles, data query functions) and methodological point of view, by laying down extensive scientific foundations for the annotation methods adopted, which presently encompass categories of pragmatic, sociolinguistic, linguacultural and semiotic nature. Finally, we described the main query tools (free search, guided search, advanced search and combined search) and the main intended uses of the database in a pedagogical perspective. The fourth chapter lists specific compilation criteria retained, as well as statistics of the two sub-corpora, by presenting data broken down by language pair (French-Italian and German-Italian) and genre (cinema’s comedies, television’s soapoperas and crime series). Next, we concentrated on the discussion of the results obtained from the analysis of summary tables reporting the frequency of categories applied to the French-Italian sub-corpus. The detailed observation of the distribution of categories identified in the original and dubbed corpus allowed us to empirically confirm some of the theories put forward in the literature and notably concerning the nature of the filmic text, the dubbing process and Italian dubbed language’s features. This was possible by looking into some of the most problematic aspects, like the rendering of socio-linguistic variation. The corpus equally allowed us to consider so far neglected aspects, such as pragmatic, prosodic, kinetic, facial, and semiotic elements, and their combination. At the end of this first exploration, some specific observations concerning possible macrotranslation trends were made for each type of sub-genre considered (cinematic and TV genre). On the grounds of this first quantitative investigation, the fifth chapter intended to further examine data, by applying ad hoc models of analysis. Given the virtually infinite number of combinations of categories adopted, and of the latter with searchable textual units, three possible qualitative and quantitative methods were designed, each of which was to concentrate on a particular translation dimension of the filmic text. The first one was the cultural dimension, which specifically focused on the rendering of selected cultural references and on the investigation of recurrent translation choices and strategies justified on the basis of the occurrence of specific clusters of categories. The second analysis was conducted on the linguistic dimension by exploring the occurrence of phrasal verbs in the Italian dubbed corpus and by ascertaining the influence on the adoption of related translation strategies of possible semiotic traits, such as gestures and facial expressions. Finally, the main aim of the third study was to verify whether, under which circumstances, and through which modality, graphic and iconic elements were translated into Italian from an original corpus of both German and French films. After having reviewed the main translation techniques at work, an exhaustive account of possible causes for their non-translation was equally provided. By way of conclusion, the discussion of results obtained from the distribution of annotation categories on the French-Italian corpus, as well as the application of specific models of analysis allowed us to underline possible advantages and drawbacks related to the adoption of a corpus-based approach to AVT studies. Even though possible updating and improvement were proposed in order to help solve some of the problems identified, it is argued that the added value of Forlixt 1 lies ultimately in having created a valuable instrument, allowing to carry out empirically-sound contrastive studies that may be usefully replicated on different language pairs and several types of multimedia texts. Furthermore, multimedia corpora can also play a crucial role in L2 and translation teaching, two disciplines in which their use still lacks systematic investigation.
Resumo:
In this thesis three measurements of top-antitop differential cross section at an energy in the center of mass of 7 TeV will be shown, as a function of the transverse momentum, the mass and the rapidity of the top-antitop system. The analysis has been carried over a data sample of about 5/fb recorded with the ATLAS detector. The events have been selected with a cut based approach in the "one lepton plus jets" channel, where the lepton can be either an electron or a muon. The most relevant backgrounds (multi-jet QCD and W+jets) have been extracted using data driven methods; the others (Z+ jets, diboson and single top) have been simulated with Monte Carlo techniques. The final, background-subtracted, distributions have been corrected, using unfolding methods, for the detector and selection effects. At the end, the results have been compared with the theoretical predictions. The measurements are dominated by the systematic uncertainties and show no relevant deviation from the Standard Model predictions.
Resumo:
The production of the Z boson in proton-proton collisions at the LHC serves as a standard candle at the ATLAS experiment during early data-taking. The decay of the Z into an electron-positron pair gives a clean signature in the detector that allows for calibration and performance studies. The cross-section of ~ 1 nb allows first LHC measurements of parton density functions. In this thesis, simulations of 10 TeV collisions at the ATLAS detector are studied. The challenges for an experimental measurement of the cross-section with an integrated luminositiy of 100 pb−1 are discussed. In preparation for the cross-section determination, the single-electron efficiencies are determined via a simulation based method and in a test of a data-driven ansatz. The two methods show a very good agreement and differ by ~ 3% at most. The ingredients of an inclusive and a differential Z production cross-section measurement at ATLAS are discussed and their possible contributions to systematic uncertainties are presented. For a combined sample of signal and background the expected uncertainty on the inclusive cross-section for an integrated luminosity of 100 pb−1 is determined to 1.5% (stat) +/- 4.2% (syst) +/- 10% (lumi). The possibilities for single-differential cross-section measurements in rapidity and transverse momentum of the Z boson, which are important quantities because of the impact on parton density functions and the capability to check for non-pertubative effects in pQCD, are outlined. The issues of an efficiency correction based on electron efficiencies as function of the electron’s transverse momentum and pseudorapidity are studied. A possible alternative is demonstrated by expanding the two-dimensional efficiencies with the additional dimension of the invariant mass of the two leptons of the Z decay.
Resumo:
Falls are common and burdensome accidents among the elderly. About one third of the population aged 65 years or more experience at least one fall each year. Fall risk assessment is believed to be beneficial for fall prevention. This thesis is about prognostic tools for falls for community-dwelling older adults. We provide an overview of the state of the art. We then take different approaches: we propose a theoretical probabilistic model to investigate some properties of prognostic tools for falls; we present a tool whose parameters were derived from data of the literature; we train and test a data-driven prognostic tool. Finally, we present some preliminary results on prediction of falls through features extracted from wearable inertial sensors. Heterogeneity in validation results are expected from theoretical considerations and are observed from empirical data. Differences in studies design hinder comparability and collaborative research. According to the multifactorial etiology of falls, assessment on multiple risk factors is needed in order to achieve good predictive accuracy.
Resumo:
The Standard Model of particle physics was developed to describe the fundamental particles, which form matter, and their interactions via the strong, electromagnetic and weak force. Although most measurements are described with high accuracy, some observations indicate that the Standard Model is incomplete. Numerous extensions were developed to solve these limitations. Several of these extensions predict heavy resonances, so-called Z' bosons, that can decay into an electron positron pair. The particle accelerator Large Hadron Collider (LHC) at CERN in Switzerland was built to collide protons at unprecedented center-of-mass energies, namely 7 TeV in 2011. With the data set recorded in 2011 by the ATLAS detector, a large multi-purpose detector located at the LHC, the electron positron pair mass spectrum was measured up to high masses in the TeV range. The properties of electrons and the probability that other particles are mis-identified as electrons were studied in detail. Using the obtained information, a sophisticated Standard Model expectation was derived with data-driven methods and Monte Carlo simulations. In the comparison of the measurement with the expectation, no significant deviations from the Standard Model expectations were observed. Therefore exclusion limits for several Standard Model extensions were calculated. For example, Sequential Standard Model (SSM) Z' bosons with masses below 2.10 TeV were excluded with 95% Confidence Level (C.L.).
Resumo:
A critical point in the analysis of ground displacements time series is the development of data driven methods that allow the different sources that generate the observed displacements to be discerned and characterised. A widely used multivariate statistical technique is the Principal Component Analysis (PCA), which allows reducing the dimensionality of the data space maintaining most of the variance of the dataset explained. Anyway, PCA does not perform well in finding the solution to the so-called Blind Source Separation (BSS) problem, i.e. in recovering and separating the original sources that generated the observed data. This is mainly due to the assumptions on which PCA relies: it looks for a new Euclidean space where the projected data are uncorrelated. The Independent Component Analysis (ICA) is a popular technique adopted to approach this problem. However, the independence condition is not easy to impose, and it is often necessary to introduce some approximations. To work around this problem, I use a variational bayesian ICA (vbICA) method, which models the probability density function (pdf) of each source signal using a mix of Gaussian distributions. This technique allows for more flexibility in the description of the pdf of the sources, giving a more reliable estimate of them. Here I present the application of the vbICA technique to GPS position time series. First, I use vbICA on synthetic data that simulate a seismic cycle (interseismic + coseismic + postseismic + seasonal + noise) and a volcanic source, and I study the ability of the algorithm to recover the original (known) sources of deformation. Secondly, I apply vbICA to different tectonically active scenarios, such as the 2009 L'Aquila (central Italy) earthquake, the 2012 Emilia (northern Italy) seismic sequence, and the 2006 Guerrero (Mexico) Slow Slip Event (SSE).
Resumo:
The Standard Model of particle physics is a very successful theory which describes nearly all known processes of particle physics very precisely. Nevertheless, there are several observations which cannot be explained within the existing theory. In this thesis, two analyses with high energy electrons and positrons using data of the ATLAS detector are presented. One, probing the Standard Model of particle physics and another searching for phenomena beyond the Standard Model.rnThe production of an electron-positron pair via the Drell-Yan process leads to a very clean signature in the detector with low background contributions. This allows for a very precise measurement of the cross-section and can be used as a precision test of perturbative quantum chromodynamics (pQCD) where this process has been calculated at next-to-next-to-leading order (NNLO). The invariant mass spectrum mee is sensitive to parton distribution functions (PFDs), in particular to the poorly known distribution of antiquarks at large momentum fraction (Bjoerken x). The measurementrnof the high-mass Drell-Yan cross-section in proton-proton collisions at a center-of-mass energy of sqrt(s) = 7 TeV is performed on a dataset collected with the ATLAS detector, corresponding to an integrated luminosity of 4.7 fb-1. The differential cross-section of pp -> Z/gamma + X -> e+e- + X is measured as a function of the invariant mass in the range 116 GeV < mee < 1500 GeV. The background is estimated using a data driven method and Monte Carlo simulations. The final cross-section is corrected for detector effects and different levels of final state radiation corrections. A comparison isrnmade to various event generators and to predictions of pQCD calculations at NNLO. A good agreement within the uncertainties between measured cross-sections and Standard Model predictions is observed.rnExamples of observed phenomena which can not be explained by the Standard Model are the amount of dark matter in the universe and neutrino oscillations. To explain these phenomena several extensions of the Standard Model are proposed, some of them leading to new processes with a high multiplicity of electrons and/or positrons in the final state. A model independent search in multi-object final states, with objects defined as electrons and positrons, is performed to search for these phenomenas. Therndataset collected at a center-of-mass energy of sqrt(s) = 8 TeV, corresponding to an integrated luminosity of 20.3 fb-1 is used. The events are separated in different categories using the object multiplicity. The data-driven background method, already used for the cross-section measurement was developed further for up to five objects to get an estimation of the number of events including fake contributions. Within the uncertainties the comparison between data and Standard Model predictions shows no significant deviations.
Resumo:
L'informatica e le sue tecnologie nella società moderna si riassumono spesso in un assioma fuorviante: essa, infatti, è comunemente legata al concetto che ciò che le tecnologie ci offrono può essere accessibile da tutti e sfruttato, all'interno della propria quotidianità, in modi più o meno semplici. Anche se quello appena descritto è un obiettivo fondamentale del mondo high-tech, occorre chiarire subito una questione: l'informatica non è semplicemente tutto ciò che le tecnologie ci offrono, perchè questo pensiero sommario fa presagire ad un'informatica "generalizzante"; l'informatica invece si divide tra molteplici ambiti, toccando diversi mondi inter-disciplinari. L'importanza di queste tecnologie nella società moderna deve spingerci a porre domande, riflessioni sul perchè l'informatica, in tutte le sue sfaccettature, negli ultimi decenni, ha portato una vera e propria rivoluzione nelle nostre vite, nelle nostre abitudini, e non di meno importanza, nel nostro contesto lavorativo e aziendale, e non ha alcuna intenzione (per fortuna) di fermare le proprie possibilità di sviluppo. In questo trattato ci occuperemo di definire una particolare tecnica moderna relativa a una parte di quel mondo complesso che viene definito come "Intelligenza Artificiale". L'intelligenza Artificiale (IA) è una scienza che si è sviluppata proprio con il progresso tecnologico e dei suoi potenti strumenti, che non sono solo informatici, ma soprattutto teorico-matematici (probabilistici) e anche inerenti l'ambito Elettronico-TLC (basti pensare alla Robotica): ecco l'interdisciplinarità. Concetto che è fondamentale per poi affrontare il nocciolo del percorso presentato nel secondo capitolo del documento proposto: i due approcci possibili, semantico e probabilistico, verso l'elaborazione del linguaggio naturale(NLP), branca fondamentale di IA. Per quanto darò un buono spazio nella tesi a come le tecniche di NLP semantiche e statistiche si siano sviluppate nel tempo, verrà prestata attenzione soprattutto ai concetti fondamentali di questi ambiti, perché, come già detto sopra, anche se è fondamentale farsi delle basi e conoscere l'evoluzione di queste tecnologie nel tempo, l'obiettivo è quello a un certo punto di staccarsi e studiare il livello tecnologico moderno inerenti a questo mondo, con uno sguardo anche al domani: in questo caso, la Sentiment Analysis (capitolo 3). Sentiment Analysis (SA) è una tecnica di NLP che si sta definendo proprio ai giorni nostri, tecnica che si è sviluppata soprattutto in relazione all'esplosione del fenomeno Social Network, che viviamo e "tocchiamo" costantemente. L'approfondimento centrale della tesi verterà sulla presentazione di alcuni esempi moderni e modelli di SA che riguardano entrambi gli approcci (statistico e semantico), con particolare attenzione a modelli di SA che sono stati proposti per Twitter in questi ultimi anni, valutando quali sono gli scenari che propone questa tecnica moderna, e a quali conseguenze contestuali (e non) potrebbe portare questa particolare tecnica.
Resumo:
Primate multisensory object perception involves distributed brain regions. To investigate the network character of these regions of the human brain, we applied data-driven group spatial independent component analysis (ICA) to a functional magnetic resonance imaging (fMRI) data set acquired during a passive audio-visual (AV) experiment with common object stimuli. We labeled three group-level independent component (IC) maps as auditory (A), visual (V), and AV, based on their spatial layouts and activation time courses. The overlap between these IC maps served as definition of a distributed network of multisensory candidate regions including superior temporal, ventral occipito-temporal, posterior parietal and prefrontal regions. During an independent second fMRI experiment, we explicitly tested their involvement in AV integration. Activations in nine out of these twelve regions met the max-criterion (A < AV > V) for multisensory integration. Comparison of this approach with a general linear model-based region-of-interest definition revealed its complementary value for multisensory neuroimaging. In conclusion, we estimated functional networks of uni- and multisensory functional connectivity from one dataset and validated their functional roles in an independent dataset. These findings demonstrate the particular value of ICA for multisensory neuroimaging research and using independent datasets to test hypotheses generated from a data-driven analysis.