872 resultados para heterogeneous data sources
Resumo:
The present paper advocates for the creation of a federated, hybrid database in the cloud, integrating law data from all available public sources in one single open access system - adding, in the process, relevant meta-data to the indexed documents, including the identification of social and semantic entities and the relationships between them, using linked open data techniques and standards such as RDF. Examples of potential benefits and applications of this approach are also provided, including, among others, experiences from of our previous research, in which data integration, graph databases and social and semantic networks analysis were used to identify power relations, litigation dynamics and cross-references patterns both intra and inter-institutionally, covering most of the World international economic courts.
Resumo:
There are a number of morphological analysers for Polish. Most of these, however, are non-free resources. What is more, different analysers employ different tagsets and tokenisation strategies. This situation calls for a simpleand universal framework to join different sources of morphological information, including the existing resources as well as user-provided dictionaries. We present such a configurable framework that allows to write simple configuration files that define tokenisation strategies and the behaviour of morphologicalanalysers, including simple tagset conversion.
Resumo:
Gold in the quartz-pebble conglomerates of the late Archean Witwatersrand Basin, South Africa, is often intimately associated with carbonaceous matter of organic/biogenic origin which occurs in the form of stratiform carbon seams and paragenetically late bitumen nodules. Both carbon forms are believed to be formed by solidification of migrating hydrocarbons. This paper presents bulk and molecular chemical and stable carbon isotope data for the carbonaceous matter, all of which are used to provide a clue to the source of the hydrocarbons. These data are compared with those from intra-basinal shales and overlying dolostone of the Transvaal Supergroup. The delta C-13 values of the extracts from the Witwatersrand carbonaceous material show small differences (up to 2.4 parts per thousand) compared to the associated insoluble organic matter. This suggests that the auriferous rocks were stained by mobile hydrocarbons produced by thermal and oxidative alteration of indigenous bitumens, a contribution from hydrocarbons derived from intra-basinal Witwatersrand shales cannot be excluded. Individual aliphatic hydrocarbons of the various carbonaceous materials were subjected to compound specific isotope analysis using on-line gas chromatography/combustion/stable isotope ratio mass spectrometry (GC/C/IRMS). The limited variability of the molecular parameters and uniform delta C-13 values of individual n-alkanes (-31.1 +/- 1.7 parts per thousand) and isoprenoids (-30.7 +/- 1.1 parts per thousand) in the Witwatersrand samples exclude the mixing of oils from different sources. Carbonaceous matter in the dolostones shows distinctly different bulk and molecular isotope characteristics and thus cannot have been the source of the hydrocarbons in the Witwatersrand deposits. All the various forms of Witwatersrand carbon appear indigenous to the Witwatersrand Basin, and the differences between them are explained by variable, in general probably short (centimeter- to meter-scale) hydrocarbon migration during diagenesis and subsequent hydrothermal infiltration. (C) 2001 Elsevier Science B.V. All rights reserved.
Resumo:
Forecasting coal resources and reserves is critical for coal mine development. Thickness maps are commonly used for assessing coal resources and reserves; however they are limited for capturing coal splitting effects in thick and heterogeneous coal zones. As an alternative, three-dimensional geostatistical methods are used to populate facies distributionwithin a densely drilled heterogeneous coal zone in the As Pontes Basin (NWSpain). Coal distribution in this zone is mainly characterized by coal-dominated areas in the central parts of the basin interfingering with terrigenous-dominated alluvial fan zones at the margins. The three-dimensional models obtained are applied to forecast coal resources and reserves. Predictions using subsets of the entire dataset are also generated to understand the performance of methods under limited data constraints. Three-dimensional facies interpolation methods tend to overestimate coal resources and reserves due to interpolation smoothing. Facies simulation methods yield similar resource predictions than conventional thickness map approximations. Reserves predicted by facies simulation methods are mainly influenced by: a) the specific coal proportion threshold used to determine if a block can be recovered or not, and b) the capability of the modelling strategy to reproduce areal trends in coal proportions and splitting between coal-dominated and terrigenousdominated areas of the basin. Reserves predictions differ between the simulation methods, even with dense conditioning datasets. Simulation methods can be ranked according to the correlation of their outputs with predictions from the directly interpolated coal proportion maps: a) with low-density datasets sequential indicator simulation with trends yields the best correlation, b) with high-density datasets sequential indicator simulation with post-processing yields the best correlation, because the areal trends are provided implicitly by the dense conditioning data.
Resumo:
The past few decades have seen a considerable increase in the number of parallel and distributed systems. With the development of more complex applications, the need for more powerful systems has emerged and various parallel and distributed environments have been designed and implemented. Each of the environments, including hardware and software, has unique strengths and weaknesses. There is no single parallel environment that can be identified as the best environment for all applications with respect to hardware and software properties. The main goal of this thesis is to provide a novel way of performing data-parallel computation in parallel and distributed environments by utilizing the best characteristics of difference aspects of parallel computing. For the purpose of this thesis, three aspects of parallel computing were identified and studied. First, three parallel environments (shared memory, distributed memory, and a network of workstations) are evaluated to quantify theirsuitability for different parallel applications. Due to the parallel and distributed nature of the environments, networks connecting the processors in these environments were investigated with respect to their performance characteristics. Second, scheduling algorithms are studied in order to make them more efficient and effective. A concept of application-specific information scheduling is introduced. The application- specific information is data about the workload extractedfrom an application, which is provided to a scheduling algorithm. Three scheduling algorithms are enhanced to utilize the application-specific information to further refine their scheduling properties. A more accurate description of the workload is especially important in cases where the workunits are heterogeneous and the parallel environment is heterogeneous and/or non-dedicated. The results obtained show that the additional information regarding the workload has a positive impact on the performance of applications. Third, a programming paradigm for networks of symmetric multiprocessor (SMP) workstations is introduced. The MPIT programming paradigm incorporates the Message Passing Interface (MPI) with threads to provide a methodology to write parallel applications that efficiently utilize the available resources and minimize the overhead. The MPIT allows for communication and computation to overlap by deploying a dedicated thread for communication. Furthermore, the programming paradigm implements an application-specific scheduling algorithm. The scheduling algorithm is executed by the communication thread. Thus, the scheduling does not affect the execution of the parallel application. Performance results achieved from the MPIT show that considerable improvements over conventional MPI applications are achieved.
Resumo:
A newspaper content management system has to deal with a very heterogeneous information space as the experience in the Diari Segre newspaper has shown us. The greatest problem is to harmonise the different ways the involved users (journalist, archivists...) structure the newspaper information space, i.e. news, topics, headlines, etc. Our approach is based on ontology and differentiated universes of discourse (UoD). Users interact with the system and, from this interaction, integration rules are derived. These rules are based on Description Logic ontological relations for subsumption and equivalence. They relate the different UoD and produce a shared conceptualisation of the newspaper information domain.
Resumo:
BACKGROUND: The need to contextualise wastewater-based figures about illicit drug consumption by comparing them with other indicators has been stressed by numerous studies. The objective of the present study was to further investigate the possibility of combining wastewater data to conventional statistics to assess the reliability of the former method and obtain a more balanced picture of illicit drug consumption in the investigated area. METHODS: Wastewater samples were collected between October 2013 and July 2014 in the metropolitan area of Lausanne (226,000 inhabitants), Switzerland. Methadone, its metabolite 2-ethylidene-1,5-dimethyl-3,3-diphenylpyrrolidine (EDDP), the exclusive metabolite of heroin, 6-monoacetylmorphine (6-MAM), and morphine loads were used to estimate the amounts of methadone and heroin consumed. RESULTS: Methadone consumption estimated from EDDP was in agreement with the expectations. Heroin estimates based on 6-MAM loads were inconsistent. Estimates obtained from morphine loads, combined to prescription/sales data, were in agreement with figures derived from syringe distribution data and general population surveys. CONCLUSIONS: The results obtained for methadone allowed assessing the reliability of the selected sampling strategy, supporting its ability to capture the consumption of a small cohort (i.e., 743 patients). Using morphine as marker, in combination with prescription/sales data, estimates in accordance with other indicators about heroin use were obtained. Combining different sources of data allowed strengthening the results and suggested that the different indicators (i.e., administration route, average dosage and number of consumers) contribute to depict a realistic representation of the phenomenon in the investigated area. Heroin consumption was estimated to approximately 13gday(-1) (118gday(-1) at street level).
Resumo:
In this thesis author approaches the problem of automated text classification, which is one of basic tasks for building Intelligent Internet Search Agent. The work discusses various approaches to solving sub-problems of automated text classification, such as feature extraction and machine learning on text sources. Author also describes her own multiword approach to feature extraction and pres-ents the results of testing this approach using linear discriminant analysis based classifier, and classifier combining unsupervised learning for etalon extraction with supervised learning using common backpropagation algorithm for multilevel perceptron.
Resumo:
Many ants forage in complex environments and use a combination of trail pheromone information and route memory to navigate between food sources and the nest. Previous research has shown that foraging routes differ in how easily they are learned. In particular, it is easier to learn feeding locations that are reached by repeating (e.g. left-left or right-right) than alternating choices (left-right or right-left) along a route with two T-bifurcations. This raises the hypothesis that the learnability of the feeding sites may influence overall colony foraging patterns. We studied this in the mass-recruiting ant Lasius niger. We used mazes with two T-bifurcations, and allowed colonies to exploit two equidistant food sources that differed in how easily their locations were learned. In experiment 1, learnability was manipulated by using repeating versus alternating routes from nest to feeder. In experiment 2, we added visual landmarks along the route to one food source. Our results suggest that colonies preferentially exploited the feeding site that was easier to learn. This was the case even if the more difficult to learn feeding site was discovered first. Furthermore, we show that these preferences were at least partly caused by lower error rates (experiment 1) and greater foraging speeds (experiment 2) of foragers visiting the more easily learned feeder locations. Our results indicate that the learnability of feeding sites is an important factor influencing collective foraging patterns of ant colonies under more natural conditions, given that in natural environments foragers often face multiple bifurcations on their way to food sources.
Resumo:
Geophysical data may provide crucial information about hydrological properties, states, and processes that are difficult to obtain by other means. Large data sets can be acquired over widely different scales in a minimally invasive manner and at comparatively low costs, but their effective use in hydrology makes it necessary to understand the fidelity of geophysical models, the assumptions made in their construction, and the links between geophysical and hydrological properties. Geophysics has been applied for groundwater prospecting for almost a century, but it is only in the last 20 years that it is regularly used together with classical hydrological data to build predictive hydrological models. A largely unexplored venue for future work is to use geophysical data to falsify or rank competing conceptual hydrological models. A promising cornerstone for such a model selection strategy is the Bayes factor, but it can only be calculated reliably when considering the main sources of uncertainty throughout the hydrogeophysical parameter estimation process. Most classical geophysical imaging tools tend to favor models with smoothly varying property fields that are at odds with most conceptual hydrological models of interest. It is thus necessary to account for this bias or use alternative approaches in which proposed conceptual models are honored at all steps in the model building process.
Resumo:
BACKGROUND: Evidence for the possible effect of vitamin E on head and neck cancers (HNCs) is limited. METHODS: We used individual-level pooled data from 10 case-control studies (5959 cases and 12 248 controls) participating in the International Head and Neck Cancer Epidemiology (INHANCE) consortium to assess the association between vitamin E intake from natural sources and cancer of the oral cavity/pharynx and larynx. Adjusted odds ratios (ORs) and 95% confidence intervals (CIs) were estimated using unconditional logistic regression models applied to quintile categories of nonalcohol energy-adjusted vitamin E intake. RESULTS: Intake of vitamin E was inversely related to oral/pharyngeal cancer (OR for the fifth vs the first quintile category=0.59, 95% CI: 0.49-0.71; P for trend <0.001) and to laryngeal cancer (OR=0.67, 95% CI: 0.54-0.83, P for trend <0.001). There was, however, appreciable heterogeneity of the estimated effect across studies for oral/pharyngeal cancer. Inverse associations were generally observed for the anatomical subsites of oral and pharyngeal cancer and within covariate strata for both sites. CONCLUSION: Our findings suggest that greater vitamin E intake from foods may lower HNC risk, although we were not able to explain the heterogeneity observed across studies or rule out certain sources of bias.
Resumo:
Ingvaldsen et al. comment on our study assessing global fish interchanges between the North Atlantic and Pacific oceans for more than 500 species during the entire 21st century. They propose that discrepancies between our model projections and observed data for cod in the Barents Sea are the result of the choice of Atmosphere-Ocean General Circulation Models (AOGCMs). We address this assertion here, re-running the cod model with additional observation data from the Barents Sea1, 3, and show that the lack of open-access, archived data for the Barents Sea was the primary cause of local prediction mismatch. This finding recalls the importance of systematic deposit of biodiversity data in global databases
Resumo:
The modern generation of Cherenkov telescopes has revealed a new population of gamma-ray sources in the Galaxy. Some of them have been identified with previously known X-ray binary systems while other remain without clear counterparts a lower energies. Our initial goal here was reporting on extensive radio observations of the first extended and yet unidentified source, namely TeV J2032+4130. This object was originally detected by the HEGRA telescope in the direction of the Cygnus OB2 region and its nature has been a matter of debate during the latest years. The situation has become more complex with the Whipple and MILAGRO telescopes new TeV detections in the same field which could be consistent with the historic HEGRA source, although a different origin cannot be ruled out. Aims.We aim to pursue our radio exploration of the TeV J2032+4130 position that we initiated in a previous paper but taking now into account the latest results from new Whipple and MILAGRO TeV telescopes. The data presented here are an extended follow up of our previous work. Methods.Our investigation is mostly based on interferometric radio observations with the Giant Metre Wave Radio Telescope (GMRT) close to Pune (India) and the Very Large Array (VLA) in New Mexico (USA). We also conducted near infrared observations with the 3.5 m telescope and the OMEGA2000 camera at the Centro Astronómico Hispano Alemán (CAHA) in Almería (Spain). Results.We present deep radio maps centered on the TeV J2032+4130 position at different wavelengths. In particular, our 49 and 20 cm maps cover a field of view larger than half a degree that fully includes the Whipple position and the peak of MILAGRO emission. Our most important result here is a catalogue of 153 radio sources detected at 49 cm within the GMRT antennae primary beam with a full width half maximum (FWHM) of 43 arc-minute. Among them, peculiar sources inside the Whipple error ellipse are discussed in detail, including a likely double-double radio galaxy and a one-sided jet source of possible blazar nature. This last object adds another alternative counterpart possibility to be considered for both the HEGRA, Whipple and MILAGRO emission. Moreover, our multi-configuration VLA images reveal the non-thermal extended emission previously reported by us with improved angular resolution. Its non-thermal spectral index is also confirmed thanks to matching beam observations at the 20 and 6 cm wavelengths.
Resumo:
Polyphenols are a major class of bioactive phytochemicals whose consumption may play a role in the prevention of a number of chronic diseases such as cardiovascular diseases, type II diabetes and cancers. Phenol-Explorer, launched in 2009, is the only freely available web-based database on the content of polyphenols in food and their in vivo metabolism and pharmacokinetics. Here we report the third release of the database (Phenol-Explorer 3.0), which adds data on the effects of food processing on polyphenol contents in foods. Data on >100 foods, covering 161 polyphenols or groups of polyphenols before and after processing, were collected from 129 peer-reviewed publications and entered into new tables linked to the existing relational design. The effect of processing on polyphenol content is expressed in the form of retention factor coefficients, or the proportion of a given polyphenol retained after processing, adjusted for change in water content. The result is the first database on the effects of food processing on polyphenol content and, following the model initially defined for Phenol-Explorer, all data may be traced back to original sources. The new update will allow polyphenol scientists to more accurately estimate polyphenol exposure from dietary surveys. Database URL: http://www.phenol-explorer.eu
Resumo:
In recent years, new analytical tools have allowed researchers to extract historical information contained in molecular data, which has fundamentally transformed our understanding of processes ruling biological invasions. However, the use of these new analytical tools has been largely restricted to studies of terrestrial organisms despite the growing recognition that the sea contains ecosystems that are amongst the most heavily affected by biological invasions, and that marine invasion histories are often remarkably complex. Here, we studied the routes of invasion and colonisation histories of an invasive marine invertebrate Microcosmus squamiger (Ascidiacea) using microsatellite loci, mitochondrial DNA sequence data and 11 worldwide populations. Discriminant analysis of principal components, clustering methods and approximate Bayesian computation (ABC) methods showed that the most likely source of the introduced populations was a single admixture event that involved populations from two genetically differentiated ancestral regions - the western and eastern coasts of Australia. The ABC analyses revealed that colonisation of the introduced range of M. squamiger consisted of a series of non-independent introductions along the coastlines of Africa, North America and Europe. Furthermore, we inferred that the sequence of colonisation across continents was in line with historical taxonomic records - first the Mediterranean Sea and South Africa from an unsampled ancestral population, followed by sequential introductions in California and, more recently, the NE Atlantic Ocean. We revealed the most likely invasion history for world populations of M. squamiger, which is broadly characterized by the presence of multiple ancestral sources and non-independent introductions within the introduced range. The results presented here illustrate the complexity of marine invasion routes and identify a cause-effect relationship between human-mediated transport and the success of widespread marine non-indigenous species, which benefit from stepping-stone invasions and admixture processes involving different sources for the spread and expansion of their range.