4 resultados para Cross-Lingual Information Retrieval
em ArchiMeD - Elektronische Publikationen der Universität Mainz - Alemanha
Resumo:
Except the article forming the main content most HTML documents on the WWW contain additional contents such as navigation menus, design elements or commercial banners. In the context of several applications it is necessary to draw the distinction between main and additional content automatically. Content extraction and template detection are the two approaches to solve this task. This thesis gives an extensive overview of existing algorithms from both areas. It contributes an objective way to measure and evaluate the performance of content extraction algorithms under different aspects. These evaluation measures allow to draw the first objective comparison of existing extraction solutions. The newly introduced content code blurring algorithm overcomes several drawbacks of previous approaches and proves to be the best content extraction algorithm at the moment. An analysis of methods to cluster web documents according to their underlying templates is the third major contribution of this thesis. In combination with a localised crawling process this clustering analysis can be used to automatically create sets of training documents for template detection algorithms. As the whole process can be automated it allows to perform template detection on a single document, thereby combining the advantages of single and multi document algorithms.
Resumo:
I present a new experimental method called Total Internal Reflection Fluorescence Cross-Correlation Spectroscopy (TIR-FCCS). It is a method that can probe hydrodynamic flows near solid surfaces, on length scales of tens of nanometres. Fluorescent tracers flowing with the liquid are excited by evanescent light, produced by epi-illumination through the periphery of a high NA oil-immersion objective. Due to the fast decay of the evanescent wave, fluorescence only occurs for tracers in the ~100 nm proximity of the surface, thus resulting in very high normal resolution. The time-resolved fluorescence intensity signals from two laterally shifted (in flow direction) observation volumes, created by two confocal pinholes are independently measured and recorded. The cross-correlation of these signals provides important information for the tracers’ motion and thus their flow velocity. Due to the high sensitivity of the method, fluorescent species with different size, down to single dye molecules can be used as tracers. The aim of my work was to build an experimental setup for TIR-FCCS and use it to experimentally measure the shear rate and slip length of water flowing on hydrophilic and hydrophobic surfaces. However, in order to extract these parameters from the measured correlation curves a quantitative data analysis is needed. This is not straightforward task due to the complexity of the problem, which makes the derivation of analytical expressions for the correlation functions needed to fit the experimental data, impossible. Therefore in order to process and interpret the experimental results I also describe a new numerical method of data analysis of the acquired auto- and cross-correlation curves – Brownian Dynamics techniques are used to produce simulated auto- and cross-correlation functions and to fit the corresponding experimental data. I show how to combine detailed and fairly realistic theoretical modelling of the phenomena with accurate measurements of the correlation functions, in order to establish a fully quantitative method to retrieve the flow properties from the experiments. An importance-sampling Monte Carlo procedure is employed in order to fit the experiments. This provides the optimum parameter values together with their statistical error bars. The approach is well suited for both modern desktop PC machines and massively parallel computers. The latter allows making the data analysis within short computing times. I applied this method to study flow of aqueous electrolyte solution near smooth hydrophilic and hydrophobic surfaces. Generally on hydrophilic surface slip is not expected, while on hydrophobic surface some slippage may exists. Our results show that on both hydrophilic and moderately hydrophobic (contact angle ~85°) surfaces the slip length is ~10-15nm or lower, and within the limitations of the experiments and the model, indistinguishable from zero.
Resumo:
The dominant process in hard proton-proton collisions is the production of hadronic jets.rnThese sprays of particles are produced by colored partons, which are struck out of their confinement within the proton.rnPrevious measurements of inclusive jet cross sections have provided valuable information for the determination of parton density functions and allow for stringent tests of perturbative QCD at the highest accessible energies.rnrnThis thesis will present a measurement of inclusive jet cross sections in proton-proton collisions using the ATLAS detector at the LHC at a center-of-mass energy of 7 TeV.rnJets are identified using the anti-kt algorithm and jet radii of R=0.6 and R=0.4.rnThey are calibrated using a dedicated pT and eta dependent jet calibration scheme.rnThe cross sections are measured for 40 GeV < pT <= 1 TeV and |y| < 2.8 in four bins of absolute rapidity, using data recorded in 2010 corresponding to an integrated luminosity of 3 pb^-1.rnThe data is fully corrected for detector effects and compared to theoretical predictions calculated at next-to-leading order including non-perturbative effects.rnThe theoretical predictions are found to agree with data within the experimental and theoretic uncertainties.rnrnThe ratio of cross sections for R=0.4 and R=0.6 is measured, exploiting the significant correlations of the systematic uncertainties, and is compared to recently developed theoretical predictions.rnThe underlying event can be characterized by the amount of transverse momentum per unit rapidity and azimuth, called rhoue.rnUsing analytical approaches to the calculation of non-perturbative corrections to jets, rhoue at the LHC is estimated using the ratio measurement.rnA feasibility study of a combined measurement of rhoue and the average strong coupling in the non-perturbative regime alpha_0 is presented and proposals for future jet measurements at the LHC are made.
Resumo:
Die Molekularbiologie von Menschen ist ein hochkomplexes und vielfältiges Themengebiet, in dem in vielen Bereichen geforscht wird. Der Fokus liegt hier insbesondere auf den Bereichen der Genomik, Proteomik, Transkriptomik und Metabolomik, und Jahre der Forschung haben große Mengen an wertvollen Daten zusammengetragen. Diese Ansammlung wächst stetig und auch für die Zukunft ist keine Stagnation absehbar. Mittlerweile aber hat diese permanente Informationsflut wertvolles Wissen in unüberschaubaren, digitalen Datenbergen begraben und das Sammeln von forschungsspezifischen und zuverlässigen Informationen zu einer großen Herausforderung werden lassen. Die in dieser Dissertation präsentierte Arbeit hat ein umfassendes Kompendium von humanen Geweben für biomedizinische Analysen generiert. Es trägt den Namen medicalgenomics.org und hat diverse biomedizinische Probleme auf der Suche nach spezifischem Wissen in zahlreichen Datenbanken gelöst. Das Kompendium ist das erste seiner Art und sein gewonnenes Wissen wird Wissenschaftlern helfen, einen besseren systematischen Überblick über spezifische Gene oder funktionaler Profile, mit Sicht auf Regulation sowie pathologische und physiologische Bedingungen, zu bekommen. Darüber hinaus ermöglichen verschiedene Abfragemethoden eine effiziente Analyse von signalgebenden Ereignissen, metabolischen Stoffwechselwegen sowie das Studieren der Gene auf der Expressionsebene. Die gesamte Vielfalt dieser Abfrageoptionen ermöglicht den Wissenschaftlern hoch spezialisierte, genetische Straßenkarten zu erstellen, mit deren Hilfe zukünftige Experimente genauer geplant werden können. Infolgedessen können wertvolle Ressourcen und Zeit eingespart werden, bei steigenden Erfolgsaussichten. Des Weiteren kann das umfassende Wissen des Kompendiums genutzt werden, um biomedizinische Hypothesen zu generieren und zu überprüfen.