940 resultados para data-driven modelling
Resumo:
Due to the increase in water demand and hydropower energy, it is getting more important to operate hydraulic structures in an efficient manner while sustaining multiple demands. Especially, companies, governmental agencies, consultant offices require effective, practical integrated tools and decision support frameworks to operate reservoirs, cascades of run-of-river plants and related elements such as canals by merging hydrological and reservoir simulation/optimization models with various numerical weather predictions, radar and satellite data. The model performance is highly related with the streamflow forecast, related uncertainty and its consideration in the decision making. While deterministic weather predictions and its corresponding streamflow forecasts directly restrict the manager to single deterministic trajectories, probabilistic forecasts can be a key solution by including uncertainty in flow forecast scenarios for dam operation. The objective of this study is to compare deterministic and probabilistic streamflow forecasts on an earlier developed basin/reservoir model for short term reservoir management. The study is applied to the Yuvacık Reservoir and its upstream basin which is the main water supply of Kocaeli City located in the northwestern part of Turkey. The reservoir represents a typical example by its limited capacity, downstream channel restrictions and high snowmelt potential. Mesoscale Model 5 and Ensemble Prediction System data are used as a main input and the flow forecasts are done for 2012 year using HEC-HMS. Hydrometeorological rule-based reservoir simulation model is accomplished with HEC-ResSim and integrated with forecasts. Since EPS based hydrological model produce a large number of equal probable scenarios, it will indicate how uncertainty spreads in the future. Thus, it will provide risk ranges in terms of spillway discharges and reservoir level for operator when it is compared with deterministic approach. The framework is fully data driven, applicable, useful to the profession and the knowledge can be transferred to other similar reservoir systems.
Resumo:
Guias para exploração mineral são normalmente baseados em modelos conceituais de depósitos. Esses guias são, normalmente, baseados na experiência dos geólogos, em dados descritivos e em dados genéticos. Modelamentos numéricos, probabilísticos e não probabilísticos, para estimar a ocorrência de depósitos minerais é um novo procedimento que vem a cada dia aumentando sua utilização e aceitação pela comunidade geológica. Essa tese utiliza recentes metodologias para a geração de mapas de favorablidade mineral. A denominada Ilha Cristalina de Rivera, uma janela erosional da Bacia do Paraná, situada na porção norte do Uruguai, foi escolhida como estudo de caso para a aplicação das metodologias. A construção dos mapas de favorabilidade mineral foi feita com base nos seguintes tipos de dados, informações e resultados de prospecção: 1) imagens orbitais; 2) prospecção geoquimica; 3) prospecção aerogeofísica; 4) mapeamento geo-estrutural e 5) altimetria. Essas informacões foram selecionadas e processadas com base em um modelo de depósito mineral (modelo conceitual), desenvolvido com base na Mina de Ouro San Gregorio. O modelo conceitual (modelo San Gregorio), incluiu características descritivas e genéticas da Mina San Gregorio, a qual abrange os elementos característicos significativos das demais ocorrências minerais conhecidas na Ilha Cristalina de Rivera. A geração dos mapas de favorabilidade mineral envolveu a construção de um banco de dados, o processamento dos dados, e a integração dos dados. As etapas de construção e processamento dos dados, compreenderam a coleta, a seleção e o tratamento dos dados de maneira a constituírem os denominados Planos de Informação. Esses Planos de Informação foram gerados e processados organizadamente em agrupamentos, de modo a constituírem os Fatores de Integração para o mapeamento de favorabilidade mineral na Ilha Cristalina de Rivera. Os dados foram integrados por meio da utilização de duas diferentes metodologias: 1) Pesos de Evidência (dirigida pelos dados) e 2) Lógica Difusa (dirigida pelo conhecimento). Os mapas de favorabilidade mineral resultantes da implementação das duas metodologias de integração foram primeiramente analisados e interpretados de maneira individual. Após foi feita uma análise comparativa entre os resultados. As duas metodologias xxiv obtiveram sucesso em identificar, como áreas de alta favorabilidade, as áreas mineralizadas conhecidas, além de outras áreas ainda não trabalhadas. Os mapas de favorabilidade mineral resultantes das duas metodologias mostraram-se coincidentes em relação as áreas de mais alta favorabilidade. A metodologia Pesos de Evidência apresentou o mapa de favorabilidade mineral mais conservador em termos de extensão areal, porém mais otimista em termos de valores de favorabilidade em comparação aos mapas de favorabilidade mineral resultantes da implementação da metodologia Lógica Difusa. Novos alvos para exploração mineral foram identificados e deverão ser objeto de investigação em detalhe.
Resumo:
Nesta dissertação realizou-se um experimento de Monte Carlo para re- velar algumas características das distribuições em amostras finitas dos estimadores Backfitting (B) e de Integração Marginal(MI) para uma regressão aditiva bivariada. Está-se particularmente interessado em fornecer alguma evidência de como os diferentes métodos de seleção da janela hn, tais co- mo os métodos plug-in, impactam as propriedades em pequenas amostras dos estimadores. Está-se interessado, também, em fornecer evidência do comportamento de diferentes estimadores de hn relativamente a seqüência ótima de hn que minimiza uma função perda escolhida. O impacto de ignorar a dependência entre os regressores na estimação da janela é tam- bém investigado. Esta é uma prática comum e deve ter impacto sobre o desempenho dos estimadores. Além disso, não há nenhuma rotina atual- mente disponível nos pacotes estatísticos/econométricos para a estimação de regressões aditivas via os métodos de Backfitting e Integração Marginal. É um dos objetivos a criação de rotinas em Gauss para a implementação prática destes estimadores. Por fim, diferentemente do que ocorre atual- mente, quando a utilização dos estimadores-B e MI é feita de maneira completamente ad-hoc, há o objetivo de fornecer a usuários informação que permita uma escolha mais objetiva de qual estimador usar quando se está trabalhando com uma amostra finita.
Resumo:
We study semiparametric two-step estimators which have the same structure as parametric doubly robust estimators in their second step. The key difference is that we do not impose any parametric restriction on the nuisance functions that are estimated in a first stage, but retain a fully nonparametric model instead. We call these estimators semiparametric doubly robust estimators (SDREs), and show that they possess superior theoretical and practical properties compared to generic semiparametric two-step estimators. In particular, our estimators have substantially smaller first-order bias, allow for a wider range of nonparametric first-stage estimates, rate-optimal choices of smoothing parameters and data-driven estimates thereof, and their stochastic behavior can be well-approximated by classical first-order asymptotics. SDREs exist for a wide range of parameters of interest, particularly in semiparametric missing data and causal inference models. We illustrate our method with a simulation exercise.
Resumo:
This research attempts to analyze the effects of open government data on the administration and practice of the educational process by comparing the contexts of Brazil and England. The findings illustrate two principal dynamics: control and collaboration. In the case of control, or what is called the "data-driven" paradigm, data help advance the cause of political accountability through the disclosure of school performance. In collaboration, or what is referred to as the "data-informed" paradigm, data is intended to support the decision-making process of administrators through dialogical processes with other social actors.
Resumo:
The synthetic control (SC) method has been recently proposed as an alternative method to estimate treatment e ects in comparative case studies. Abadie et al. [2010] and Abadie et al. [2015] argue that one of the advantages of the SC method is that it imposes a data-driven process to select the comparison units, providing more transparency and less discretionary power to the researcher. However, an important limitation of the SC method is that it does not provide clear guidance on the choice of predictor variables used to estimate the SC weights. We show that such lack of speci c guidances provides signi cant opportunities for the researcher to search for speci cations with statistically signi cant results, undermining one of the main advantages of the method. Considering six alternative speci cations commonly used in SC applications, we calculate in Monte Carlo simulations the probability of nding a statistically signi cant result at 5% in at least one speci cation. We nd that this probability can be as high as 13% (23% for a 10% signi cance test) when there are 12 pre-intervention periods and decay slowly with the number of pre-intervention periods. With 230 pre-intervention periods, this probability is still around 10% (18% for a 10% signi cance test). We show that the speci cation that uses the average pre-treatment outcome values to estimate the weights performed particularly bad in our simulations. However, the speci cation-searching problem remains relevant even when we do not consider this speci cation. We also show that this speci cation-searching problem is relevant in simulations with real datasets looking at placebo interventions in the Current Population Survey (CPS). In order to mitigate this problem, we propose a criterion to select among SC di erent speci cations based on the prediction error of each speci cations in placebo estimations
Resumo:
We present a measurement of the semileptonic mixing asymmetry for B0 mesons, asld, using two independent decay channels: B0→μ +D -X, with D -→K +π -π -; and B0→μ +D *-X, with D * -→D ̄0π -, D ̄0→ K +π - (and charge conjugate processes). We use a data sample corresponding to 10.4fb -1 of pp̄ collisions at √s=1.96TeV, collected with the D0 experiment at the Fermilab Tevatron collider. We extract the charge asymmetries in these two channels as a function of the visible proper decay length of the B0 meson, correct for detector-related asymmetries using data-driven methods, and account for dilution from charge-symmetric processes using Monte Carlo simulation. The final measurement combines four signal visible proper decay length regions for each channel, yielding asld=[0.68±0.45(stat)±0.14(syst)]%. This is the single most precise measurement of this parameter, with uncertainties smaller than the current world average of B factory measurements. © 2012 American Physical Society.
Resumo:
O processamento de voz tornou-se uma tecnologia cada vez mais baseada na modelagem automática de vasta quantidade de dados. Desta forma, o sucesso das pesquisas nesta área está diretamente ligado a existência de corpora de domínio público e outros recursos específicos, tal como um dicionário fonético. No Brasil, ao contrário do que acontece para a língua inglesa, por exemplo, não existe atualmente em domínio público um sistema de Reconhecimento Automático de Voz (RAV) para o Português Brasileiro com suporte a grandes vocabulários. Frente a este cenário, o trabalho tem como principal objetivo discutir esforços dentro da iniciativa FalaBrasil [1], criada pelo Laboratório de Processamento de Sinais (LaPS) da UFPA, apresentando pesquisas e softwares na área de RAV para o Português do Brasil. Mais especificamente, o presente trabalho discute a implementação de um sistema de reconhecimento de voz com suporte a grandes vocabulários para o Português do Brasil, utilizando a ferramenta HTK baseada em modelo oculto de Markov (HMM) e a criação de um módulo de conversão grafema-fone, utilizando técnicas de aprendizado de máquina.
Resumo:
O método de empilhamento Superfície de Reflexão Comum (SRC) foi originalmente introduzido como um método data-driven para simular seções afastamento-nulo a partir de dados sísmicos de reflexão pré-empilhados 2-D adquiridos ao longo de uma linha de aquisição reta. Este método está baseado em uma aproximação de tempos de trânsito hiperbólica de segunda ordem parametrizada com três atributos cinemáticos do campo de onda. Em dados terrestres, os efeitos topográficos desempenham um papel importante no processamento e imageamento de dados sísmicos. Assim, esta característica tem sido considerada recentemente pelo método SRC. Neste trabalho apresentamos uma revisão das aproximações de tempos de trânsito SRC que consideram topografia suave e rugosa. Adicionalmente, nós revemos também a aproximação de tempos de trânsito Multifoco para o caso da topografia rugosa. Por meio de um exemplo sintético simples, nós fornecemos finalmente as primeiras comparações entre as diferentes expressões de tempos de trânsito.
Resumo:
A measurement of differential cross sections for the production of a pair of isolated photons in proton-proton collisions at root s = 7 TeV is presented. The data sample corresponds to an integrated luminosity of 5.0 fb(-1) collected with the CMS detector. A data-driven isolation template method is used to extract the prompt diphoton yield. The measured cross section for two isolated photons, with transverse energy above 40 and 25 GeV respectively, in the pseudorapidity range vertical bar eta vertical bar < 2.5, vertical bar eta vertical bar (sic) [1.44, 1.57] and with an angular separation Delta R > 0.45, is 17.2 +/-0.2 (stat) +/-1.9 (syst) +/- 0.4 (lumi) pb. Differential cross sections are measured as a function of the diphoton invariant mass, the diphoton transverse momentum, the azimuthal angle difference between the two photons, and the cosine of the polar angle in the Collins-Soper reference frame of the diphoton system. The results are compared to theoretical predictions at leading, next-to-leading, and next-to-next-to-leading order in quantum chromodynamics.
Resumo:
Objective. To define inactive disease (ID) and clinical remission (CR) and to delineate variables that can be used to measure ID/CR in childhood-onset systemic lupus erythematosus (cSLE). Methods. Delphi questionnaires were sent to an international group of pediatric rheumatologists. Respondents provided information about variables to be used in future algorithms to measure ID/CR. The usefulness of these variables was assessed in 35 children with ID and 31 children with minimally active lupus (MAL). Results. While ID reflects cSLE status at a specific point in time, CR requires the presence of ID for >6 months and considers treatment. There was consensus that patients in ID/CR can have <2 mild nonlimiting symptoms (i.e., fatigue, arthralgia, headaches, or myalgia) but not Raynaud's phenomenon, chest pain, or objective physical signs of cSLE; antinuclear antibody positivity and erythrocyte sedimentation rate elevation can be present. Complete blood count, renal function testing, and complement C3 all must be within the normal range. Based on consensus, only damage-related laboratory or clinical findings of cSLE are permissible with ID. The above parameters were suitable to differentiate children with ID/CR from those with MAL (area under the receiver operating characteristic curve >0.85). Disease activity scores with or without the physician global assessment of disease activity and patient symptoms were well suited to differentiate children with ID from those with MAL. Conclusion. Consensus has been reached on common definitions of ID/CR with cSLE and relevant patient characteristics with ID/CR. Further studies must assess the usefulness of the data-driven candidate criteria for ID in cSLE.
Resumo:
Celebrado en la Sala de Grado de la Facultad de Ciencias del Mar (ULPGC) el 18 de junio de 2013
Resumo:
Nuclear Magnetic Resonance (NMR) is a branch of spectroscopy that is based on the fact that many atomic nuclei may be oriented by a strong magnetic field and will absorb radiofrequency radiation at characteristic frequencies. The parameters that can be measured on the resulting spectral lines (line positions, intensities, line widths, multiplicities and transients in time-dependent experi-ments) can be interpreted in terms of molecular structure, conformation, molecular motion and other rate processes. In this way, high resolution (HR) NMR allows performing qualitative and quantitative analysis of samples in solution, in order to determine the structure of molecules in solution and not only. In the past, high-field NMR spectroscopy has mainly concerned with the elucidation of chemical structure in solution, but today is emerging as a powerful exploratory tool for probing biochemical and physical processes. It represents a versatile tool for the analysis of foods. In literature many NMR studies have been reported on different type of food such as wine, olive oil, coffee, fruit juices, milk, meat, egg, starch granules, flour, etc using different NMR techniques. Traditionally, univariate analytical methods have been used to ex-plore spectroscopic data. This method is useful to measure or to se-lect a single descriptive variable from the whole spectrum and , at the end, only this variable is analyzed. This univariate methods ap-proach, applied to HR-NMR data, lead to different problems due especially to the complexity of an NMR spectrum. In fact, the lat-ter is composed of different signals belonging to different mole-cules, but it is also true that the same molecules can be represented by different signals, generally strongly correlated. The univariate methods, in this case, takes in account only one or a few variables, causing a loss of information. Thus, when dealing with complex samples like foodstuff, univariate analysis of spectra data results not enough powerful. Spectra need to be considered in their wholeness and, for analysing them, it must be taken in consideration the whole data matrix: chemometric methods are designed to treat such multivariate data. Multivariate data analysis is used for a number of distinct, differ-ent purposes and the aims can be divided into three main groups: • data description (explorative data structure modelling of any ge-neric n-dimensional data matrix, PCA for example); • regression and prediction (PLS); • classification and prediction of class belongings for new samples (LDA and PLS-DA and ECVA). The aim of this PhD thesis was to verify the possibility of identify-ing and classifying plants or foodstuffs, in different classes, based on the concerted variation in metabolite levels, detected by NMR spectra and using the multivariate data analysis as a tool to inter-pret NMR information. It is important to underline that the results obtained are useful to point out the metabolic consequences of a specific modification on foodstuffs, avoiding the use of a targeted analysis for the different metabolites. The data analysis is performed by applying chemomet-ric multivariate techniques to the NMR dataset of spectra acquired. The research work presented in this thesis is the result of a three years PhD study. This thesis reports the main results obtained from these two main activities: A1) Evaluation of a data pre-processing system in order to mini-mize unwanted sources of variations, due to different instrumental set up, manual spectra processing and to sample preparations arte-facts; A2) Application of multivariate chemiometric models in data analy-sis.
Resumo:
The construction and use of multimedia corpora has been advocated for a while in the literature as one of the expected future application fields of Corpus Linguistics. This research project represents a pioneering experience aimed at applying a data-driven methodology to the study of the field of AVT, similarly to what has been done in the last few decades in the macro-field of Translation Studies. This research was based on the experience of Forlixt 1, the Forlì Corpus of Screen Translation, developed at the University of Bologna’s Department of Interdisciplinary Studies in Translation, Languages and Culture. As a matter of fact, in order to quantify strategies of linguistic transfer of an AV product, we need to take into consideration not only the linguistic aspect of such a product but all the meaning-making resources deployed in the filmic text. Provided that one major benefit of Forlixt 1 is the combination of audiovisual and textual data, this corpus allows the user to access primary data for scientific investigation, and thus no longer rely on pre-processed material such as traditional annotated transcriptions. Based on this rationale, the first chapter of the thesis sets out to illustrate the state of the art of research in the disciplinary fields involved. The primary objective was to underline the main repercussions on multimedia texts resulting from the interaction of a double support, audio and video, and, accordingly, on procedures, means, and methods adopted in their translation. By drawing on previous research in semiotics and film studies, the relevant codes at work in visual and acoustic channels were outlined. Subsequently, we concentrated on the analysis of the verbal component and on the peculiar characteristics of filmic orality as opposed to spontaneous dialogic production. In the second part, an overview of the main AVT modalities was presented (dubbing, voice-over, interlinguistic and intra-linguistic subtitling, audio-description, etc.) in order to define the different technologies, processes and professional qualifications that this umbrella term presently includes. The second chapter focuses diachronically on various theories’ contribution to the application of Corpus Linguistics’ methods and tools to the field of Translation Studies (i.e. Descriptive Translation Studies, Polysystem Theory). In particular, we discussed how the use of corpora can favourably help reduce the gap existing between qualitative and quantitative approaches. Subsequently, we reviewed the tools traditionally employed by Corpus Linguistics in regard to the construction of traditional “written language” corpora, to assess whether and how they can be adapted to meet the needs of multimedia corpora. In particular, we reviewed existing speech and spoken corpora, as well as multimedia corpora specifically designed to investigate Translation. The third chapter reviews Forlixt 1's main developing steps, from a technical (IT design principles, data query functions) and methodological point of view, by laying down extensive scientific foundations for the annotation methods adopted, which presently encompass categories of pragmatic, sociolinguistic, linguacultural and semiotic nature. Finally, we described the main query tools (free search, guided search, advanced search and combined search) and the main intended uses of the database in a pedagogical perspective. The fourth chapter lists specific compilation criteria retained, as well as statistics of the two sub-corpora, by presenting data broken down by language pair (French-Italian and German-Italian) and genre (cinema’s comedies, television’s soapoperas and crime series). Next, we concentrated on the discussion of the results obtained from the analysis of summary tables reporting the frequency of categories applied to the French-Italian sub-corpus. The detailed observation of the distribution of categories identified in the original and dubbed corpus allowed us to empirically confirm some of the theories put forward in the literature and notably concerning the nature of the filmic text, the dubbing process and Italian dubbed language’s features. This was possible by looking into some of the most problematic aspects, like the rendering of socio-linguistic variation. The corpus equally allowed us to consider so far neglected aspects, such as pragmatic, prosodic, kinetic, facial, and semiotic elements, and their combination. At the end of this first exploration, some specific observations concerning possible macrotranslation trends were made for each type of sub-genre considered (cinematic and TV genre). On the grounds of this first quantitative investigation, the fifth chapter intended to further examine data, by applying ad hoc models of analysis. Given the virtually infinite number of combinations of categories adopted, and of the latter with searchable textual units, three possible qualitative and quantitative methods were designed, each of which was to concentrate on a particular translation dimension of the filmic text. The first one was the cultural dimension, which specifically focused on the rendering of selected cultural references and on the investigation of recurrent translation choices and strategies justified on the basis of the occurrence of specific clusters of categories. The second analysis was conducted on the linguistic dimension by exploring the occurrence of phrasal verbs in the Italian dubbed corpus and by ascertaining the influence on the adoption of related translation strategies of possible semiotic traits, such as gestures and facial expressions. Finally, the main aim of the third study was to verify whether, under which circumstances, and through which modality, graphic and iconic elements were translated into Italian from an original corpus of both German and French films. After having reviewed the main translation techniques at work, an exhaustive account of possible causes for their non-translation was equally provided. By way of conclusion, the discussion of results obtained from the distribution of annotation categories on the French-Italian corpus, as well as the application of specific models of analysis allowed us to underline possible advantages and drawbacks related to the adoption of a corpus-based approach to AVT studies. Even though possible updating and improvement were proposed in order to help solve some of the problems identified, it is argued that the added value of Forlixt 1 lies ultimately in having created a valuable instrument, allowing to carry out empirically-sound contrastive studies that may be usefully replicated on different language pairs and several types of multimedia texts. Furthermore, multimedia corpora can also play a crucial role in L2 and translation teaching, two disciplines in which their use still lacks systematic investigation.
Resumo:
In this thesis three measurements of top-antitop differential cross section at an energy in the center of mass of 7 TeV will be shown, as a function of the transverse momentum, the mass and the rapidity of the top-antitop system. The analysis has been carried over a data sample of about 5/fb recorded with the ATLAS detector. The events have been selected with a cut based approach in the "one lepton plus jets" channel, where the lepton can be either an electron or a muon. The most relevant backgrounds (multi-jet QCD and W+jets) have been extracted using data driven methods; the others (Z+ jets, diboson and single top) have been simulated with Monte Carlo techniques. The final, background-subtracted, distributions have been corrected, using unfolding methods, for the detector and selection effects. At the end, the results have been compared with the theoretical predictions. The measurements are dominated by the systematic uncertainties and show no relevant deviation from the Standard Model predictions.