966 resultados para PDF,estrazione,Linked Open Data,dataset RDF
Resumo:
Lavoro svolto per la creazione di una rete citazionale a partire da articoli scientifici codificati in XML JATS. Viene effettuata un'introduzione sul semantic publishing, le ontologie di riferimento e i principali dataset su pubblicazioni scientifiche. Infine viene presentato il prototipo CiNeX che si occupa di estrarre da un dataset in XML JATS un grafo RDF utilizzando l'ontologia SPAR.
Resumo:
La tesi descrive il sistema denominato GARTP che visualizza l'analisi dell'anticipo e del ritardo nel trasporto pubblico, su una mappa cartografica.
Resumo:
La capacità di estrarre entità da testi, collegarle tra loro ed eliminare possibili ambiguità tra di esse è uno degli obiettivi del Web Semantico. Chiamato anche Web 3.0, esso presenta numerose innovazioni volte ad arricchire il Web con dati strutturati comprensibili sia dagli umani che dai calcolatori. Nel reperimento di questi temini e nella definizione delle entities è di fondamentale importanza la loro univocità. Il nostro orizzonte di lavoro è quello delle università italiane e le entities che vogliamo estrarre, collegare e rendere univoche sono nomi di professori italiani. L’insieme di informazioni di partenza, per sua natura, vede la presenza di ambiguità. Attenendoci il più possibile alla sua semantica, abbiamo studiato questi dati ed abbiamo risolto le collisioni presenti sui nomi dei professori. Arald, la nostra architettura software per il Web Semantico, estrae entità e le collega, ma soprattutto risolve ambiguità e omonimie tra i professori delle università italiane. Per farlo si appoggia alla semantica dei loro lavori accademici e alla rete di coautori desumibile dagli articoli da loro pubblicati, rappresentati tramite un data cluster. In questo docu delle università italiane e le entities che vogliamo estrarre, collegare e rendere univoche sono nomi di professori italiani. Partendo da un insieme di informazioni che, per sua natura, vede la presenza di ambiguità, lo abbiamo studiato attenendoci il più possibile alla sua semantica, ed abbiamo risolto le collisioni che accadevano sui nomi dei professori. Arald, la nostra architettura software per il Web Semantico, estrae entità, le collega, ma soprattutto risolve ambiguità e omonimie tra i professori delle università italiane. Per farlo si appoggia alla semantica dei loro lavori accademici e alla rete di coautori desumibile dagli articoli da loro pubblicati tramite la costruzione di un data cluster.
Resumo:
Lo scopo del progetto Bird-A è di mettere a disposizione uno strumento basato su ontologie per progettare un'interfaccia web collaborativa di creazione, visualizzazione, modifica e cancellazione di dati RDF e di fornirne una prima implementazione funzionante. La visione che sta muovendo la comunità del web semantico negli ultimi anni è quella di creare un Web basato su dati strutturati tra loro collegati, più che su documenti. Questo modello di architettura prende il nome di Linked Data ed è basata sulla possibilità di considerare cose, concetti, persone come risorse identificabili tramite URI e di poter fornire informazioni e descrivere collegamenti tra queste risorse attraverso l'uso di formati standard come RDF. Ciò che ha però frenato la diffusione di questi dati strutturati ed interconnessi sono stati gli alti requisiti di competenze tecniche necessarie sia alla loro creazione che alla loro fruizione. Il progetto Bird-A si prefigge di semplificare la creazione e la fruizione di dati RDF, favorendone la condivisione e la diffusione anche fra persone non dotate di conoscenze tecniche specifiche.
Resumo:
Riparian zones are dynamic, transitional ecosystems between aquatic and terrestrial ecosystems with well defined vegetation and soil characteristics. Development of an all-encompassing definition for riparian ecotones, because of their high variability, is challenging. However, there are two primary factors that all riparian ecotones are dependent on: the watercourse and its associated floodplain. Previous approaches to riparian boundary delineation have utilized fixed width buffers, but this methodology has proven to be inadequate as it only takes the watercourse into consideration and ignores critical geomorphology, associated vegetation and soil characteristics. Our approach offers advantages over other previously used methods by utilizing: the geospatial modeling capabilities of ArcMap GIS; a better sampling technique along the water course that can distinguish the 50-year flood plain, which is the optimal hydrologic descriptor of riparian ecotones; the Soil Survey Database (SSURGO) and National Wetland Inventory (NWI) databases to distinguish contiguous areas beyond the 50-year plain; and land use/cover characteristics associated with the delineated riparian zones. The model utilizes spatial data readily available from Federal and State agencies and geospatial clearinghouses. An accuracy assessment was performed to assess the impact of varying the 50-year flood height, changing the DEM spatial resolution (1, 3, 5 and 10m), and positional inaccuracies with the National Hydrography Dataset (NHD) streams layer on the boundary placement of the delineated variable width riparian ecotones area. The result of this study is a robust and automated GIS based model attached to ESRI ArcMap software to delineate and classify variable-width riparian ecotones.
Resumo:
This paper presents an overview of the Mobile Data Challenge (MDC), a large-scale research initiative aimed at generating innovations around smartphone-based research, as well as community-based evaluation of mobile data analysis methodologies. First, we review the Lausanne Data Collection Campaign (LDCC), an initiative to collect unique longitudinal smartphone dataset for the MDC. Then, we introduce the Open and Dedicated Tracks of the MDC, describe the specific datasets used in each of them, discuss the key design and implementation aspects introduced in order to generate privacy-preserving and scientifically relevant mobile data resources for wider use by the research community, and summarize the main research trends found among the 100+ challenge submissions. We finalize by discussing the main lessons learned from the participation of several hundred researchers worldwide in the MDC Tracks.
Resumo:
Purpose: Traditional patient-specific IMRT QA measurements are labor intensive and consume machine time. Calculation-based IMRT QA methods typically are not comprehensive. We have developed a comprehensive calculation-based IMRT QA method to detect uncertainties introduced by the initial dose calculation, the data transfer through the Record-and-Verify (R&V) system, and various aspects of the physical delivery. Methods: We recomputed the treatment plans in the patient geometry for 48 cases using data from the R&V, and from the delivery unit to calculate the “as-transferred” and “as-delivered” doses respectively. These data were sent to the original TPS to verify transfer and delivery or to a second TPS to verify the original calculation. For each dataset we examined the dose computed from the R&V record (RV) and from the delivery records (Tx), and the dose computed with a second verification TPS (vTPS). Each verification dose was compared to the clinical dose distribution using 3D gamma analysis and by comparison of mean dose and ROI-specific dose levels to target volumes. Plans were also compared to IMRT QA absolute and relative dose measurements. Results: The average 3D gamma passing percentages using 3%-3mm, 2%-2mm, and 1%-1mm criteria for the RV plan were 100.0 (σ=0.0), 100.0 (σ=0.0), and 100.0 (σ=0.1); for the Tx plan they were 100.0 (σ=0.0), 100.0 (σ=0.0), and 99.0 (σ=1.4); and for the vTPS plan they were 99.3 (σ=0.6), 97.2 (σ=1.5), and 79.0 (σ=8.6). When comparing target volume doses in the RV, Tx, and vTPS plans to the clinical plans, the average ratios of ROI mean doses were 0.999 (σ=0.001), 1.001 (σ=0.002), and 0.990 (σ=0.009) and ROI-specific dose levels were 0.999 (σ=0.001), 1.001 (σ=0.002), and 0.980 (σ=0.043), respectively. Comparing the clinical, RV, TR, and vTPS calculated doses to the IMRT QA measurements for all 48 patients, the average ratios for absolute doses were 0.999 (σ=0.013), 0.998 (σ=0.013), 0.999 σ=0.015), and 0.990 (σ=0.012), respectively, and the average 2D gamma(5%-3mm) passing percentages for relative doses for 9 patients was were 99.36 (σ=0.68), 99.50 (σ=0.49), 99.13 (σ=0.84), and 98.76 (σ=1.66), respectively. Conclusions: Together with mechanical and dosimetric QA, our calculation-based IMRT QA method promises to minimize the need for patient-specific QA measurements by identifying outliers in need of further review.
Resumo:
Underwater georeferenced photo-transect survey was conducted on September 23 - 27, 2007 at different sections of the reef flat, reef crest and reef slope in Heron Reef. For this survey a snorkeler or diver swam over the bottom while taking photos of the benthos at a set height using a standard digital camera and towing a surface float GPS which was logging its track every five seconds. A standard digital compact camera was placed in an underwater housing and fitted with a 16 mm lens which provided a 1.0 m x 1.0 m footprint, at 0.5 m height above the benthos. Horizontal distance between photos was estimated by three fin kicks of the survey diver/snorkeler, which corresponded to a surface distance of approximately 2.0 - 4.0 m. The GPS was placed in a dry-bag and logged its position as it floated at the surface while being towed by the photographer. A total of 3,586 benthic photos were taken. A floating GPS setup connected to the swimmer/diver by a line enabled recording of coordinates of each benthic. Approximation of coordinates of each benthic photo was done based on the photo timestamp and GPS coordinate time stamp, using GPS Photo Link Software (www.geospatialexperts.com). Coordinates of each photo were interpolated by finding the gps coordinates that were logged at a set time before and after the photo was captured. Benthic or substrate cover data was derived from each photo by randomly placing 24 points over each image using the Coral Point Count excel program (Kohler and Gill, 2006). Each point was then assigned to 1 out of 80 cover types, which represented the benthic feature beneath it. Benthic cover composition summary of each photo scores was generated automatically using CPCE program. The resulting benthic cover data of each photo was linked to gps coordinates, saved as an ArcMap point shapefile, and projected to Universal Transverse Mercator WGS84 Zone 56 South.
Resumo:
Fluctuations in oxygen (d18O) and carbon (d13C) isotope values of benthic foraminiferal calcite from the tropical Pacific and Southern Oceans indicate rapid reversals in the dominant mode and direction of the thermohaline circulation during a 1 m.y. interval (71-70 Ma) in the Maastrichtian. At the onset of this change, benthic foraminiferal d18O values increased and were highest in low-latitude Pacific Ocean waters, whereas benthic and planktic foraminiferal d13C values decreased and benthic values were lowest in the Southern Ocean. Subsequently, benthic foraminiferal d18O values in the Indo-Pacific decreased, and benthic and planktic d13C values increased globally. These isotopic patterns suggest that cool intermediate-depth waters, derived from high-latitude regions, penetrated temporarily to the tropics. The low benthic d13C values at the Southern Ocean sites, however, suggest that these cool waters may have been derived from high northern rather than high southern latitudes. Correlation with eustatic sea-level curves suggests that sea-level change was the most likely mechanism to change the circulation and/or source(s) of intermediate-depth waters. We thus propose that oceanic circulation during the latest Cretaceous was vigorous and that competing sources of intermediate- and deep-water formation, linked to changes in climate and sea level, may have alternated in importance.
Resumo:
Monitoring the impact of sea storms on coastal areas is fundamental to study beach evolution and the vulnerability of low-lying coasts to erosion and flooding. Modelling wave runup on a beach is possible, but it requires accurate topographic data and model tuning, that can be done comparing observed and modeled runup. In this study we collected aerial photos using an Unmanned Aerial Vehicle after two different swells on the same study area. We merged the point cloud obtained with photogrammetry with multibeam data, in order to obtain a complete beach topography. Then, on each set of rectified and georeferenced UAV orthophotos, we identified the maximum wave runup for both events recognizing the wet area left by the waves. We then used our topography and numerical models to simulate the wave runup and compare the model results to observed values during the two events. Our results highlight the potential of the methodology presented, which integrates UAV platforms, photogrammetry and Geographic Information Systems to provide faster and cheaper information on beach topography and geomorphology compared with traditional techniques without losing in accuracy. We use the results obtained from this technique as a topographic base for a model that calculates runup for the two swells. The observed and modeled runups are consistent, and open new directions for future research.
Resumo:
There is generally a lack of knowledge on how marine organic carbon accumulation is linked to vertical export and primary productivity patterns. In this study, a multi-proxy geochemical and organic-sedimentological approach is coupled with organic facies modelling focusing on regional calculations of carbon cycling and carbon burial on the western Barents Shelf between northern Scandinavia and Svalbard. OF-Mod 3D, an organic facies modelling software tool, is used to reconstruct the marine and terrestrial organic carbon fractions and to make inferences about marine primary productivity in this region. The model is calibrated with an extensive sample dataset and reproduces the present-day regional distribution of the organic carbon fractions well. Based on this new organic facies model, we present regional carbon mass accumulation rate calculations for the western Barents Sea. The calibration dataset includes location and water depth, sand fraction, organic carbon and nitrogen data and calculated marine and terrestrial organic carbon fractions.
Resumo:
Publishing Linked Data is a process that involves several design decisions and technologies. Although some initial guidelines have been already provided by Linked Data publishers, these are still far from covering all the steps that are necessary (from data source selection to publication) or giving enough details about all these steps, technologies, intermediate products, etc. Furthermore, given the variety of data sources from which Linked Data can be generated, we believe that it is possible to have a single and uni�ed method for publishing Linked Data, but we should rely on di�erent techniques, technologies and tools for particular datasets of a given domain. In this paper we present a general method for publishing Linked Data and the application of the method to cover di�erent sources from di�erent domains.
Resumo:
The uptake of Linked Data (LD) has promoted the proliferation of datasets and their associated ontologies for describing different domains. Par-ticular LD development characteristics such as agility and web-based architec-ture necessitate the revision, adaption, and lightening of existing methodologies for ontology development. This thesis proposes a lightweight method for ontol-ogy development in an LD context which will be based in data-driven agile de-velopments, existing resources to be reused, and the evaluation of the obtained products considering both classical ontological engineering principles and LD characteristics.
Resumo:
Abstract Due to recent scientific and technological advances in information sys¬tems, it is now possible to perform almost every application on a mobile device. The need to make sense of such devices more intelligent opens an opportunity to design data mining algorithm that are able to autonomous execute in local devices to provide the device with knowledge. The problem behind autonomous mining deals with the proper configuration of the algorithm to produce the most appropriate results. Contextual information together with resource information of the device have a strong impact on both the feasibility of a particu¬lar execution and on the production of the proper patterns. On the other hand, performance of the algorithm expressed in terms of efficacy and efficiency highly depends on the features of the dataset to be analyzed together with values of the parameters of a particular implementation of an algorithm. However, few existing approaches deal with autonomous configuration of data mining algorithms and in any case they do not deal with contextual or resources information. Both issues are of particular significance, in particular for social net¬works application. In fact, the widespread use of social networks and consequently the amount of information shared have made the need of modeling context in social application a priority. Also the resource consumption has a crucial role in such platforms as the users are using social networks mainly on their mobile devices. This PhD thesis addresses the aforementioned open issues, focusing on i) Analyzing the behavior of algorithms, ii) mapping contextual and resources information to find the most appropriate configuration iii) applying the model for the case of a social recommender. Four main contributions are presented: - The EE-Model: is able to predict the behavior of a data mining algorithm in terms of resource consumed and accuracy of the mining model it will obtain. - The SC-Mapper: maps a situation defined by the context and resource state to a data mining configuration. - SOMAR: is a social activity (event and informal ongoings) recommender for mobile devices. - D-SOMAR: is an evolution of SOMAR which incorporates the configurator in order to provide updated recommendations. Finally, the experimental validation of the proposed contributions using synthetic and real datasets allows us to achieve the objectives and answer the research questions proposed for this dissertation.
Resumo:
Idea Management Systems are an implementation of open innovation notion in the Web environment with the use of crowdsourcing techniques. In this area, one of the popular methods for coping with large amounts of data is duplicate de- tection. With our research, we answer a question if there is room to introduce more relationship types and in what degree would this change affect the amount of idea metadata and its diversity. Furthermore, based on hierarchical dependencies between idea relationships and relationship transitivity we propose a number of methods for dataset summarization. To evaluate our hypotheses we annotate idea datasets with new relationships using the contemporary methods of Idea Management Systems to detect idea similarity. Having datasets with relationship annotations at our disposal, we determine if idea features not related to idea topic (e.g. innovation size) have any relation to how annotators perceive types of idea similarity or dissimilarity.