909 resultados para classification and regression trees


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Economic historians have recently emphasized the importance of integrating economic and historical approaches in studying institutions. The literature on the Ottoman system of taxation, however, has continued to adopt a primarily historical approach, using ad hoc categories of classification and explaining the system through its continuities with the historical precedent. This paper integrates economic and historical approaches to examine the structure, efficiency, and regional diversity of the tax system. The structure of the system made it possible for the Ottomans to economize on the transaction cost of measuring the tax base. Regional variations resulted from both efficient adaptations and institutional rigidities.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A census of 925 U.S. colleges and universities offering masters and doctorate degrees was conducted in order to study the number of elements of an environmental management system as defined by ISO 14001 possessed by small, medium and large institutions. A 30% response rate was received with 273 responses included in the final data analysis. Overall, the number of ISO 14001 elements implemented among the 273 institutions ranged from 0 to 16, with a median of 12. There was no significant association between the number of elements implemented among institutions and the size of the institution (p = 0.18; Kruskal-Wallis test) or among USEPA regions (p = 0.12; Kruskal-Wallis test). The proportion of U.S. colleges and universities that reported having implemented a structured, comprehensive environmental management system, defined by answering yes to all 16 elements, was 10% (95% C.I. 6.6%–14.1%); however 38% (95% C.I. 32.0%–43.8%) reported that they had implemented a structured, comprehensive environmental management system, while 30.0% (95% C.I. 24.7%–35.9%) are planning to implement a comprehensive environmental management system within the next five years. Stratified analyses were performed by institution size, Carnegie Classification and job title. ^ The Osnabruck model, and another under development by the South Carolina Sustainable Universities Initiative, are the only two environmental management system models that have been proposed specifically for colleges and universities, although several guides are now available. The Environmental Management System Implementation Model for U.S. Colleges and Universities developed is an adaptation of the ISO 14001 standard and USEPA recommendations and has been tailored to U.S. colleges and universities for use in streamlining the implementation process. In using this implementation model created for the U.S. research and academic setting, it is hoped that these highly specialized institutions will be provided with a clearer and more cost-effective path towards the implementation of an EMS and greater compliance with local, state and federal environmental legislation. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cross-sectional age and sex specific distributions of serum total cholesterol were described for 1091 children age 6-18 years, in The Woodlands, Texas. Associations of serum total cholesterol with five anthropometric measurements (weight, height, body mass index, arm circumference, and triceps skinfold thickness) were examined by correlation and regression analyses. Examination of serum total cholesterol distributions showed lower levels in boys than in girls for most of the age groups studied. Mean levels of total cholesterol peaked at age 9 for boys and 8 for girls. Serum total cholesterol leveled off until age 14 for boys and 11 for girls, and then dropped through age 18 for both boys and girls. These results support the hypothesis that serum total cholesterol concentration drops at pre-adolescence.^ Age adjusted correlations were observed between serum total cholesterol and triceps skinfold thickness for both boys and girls. This association was stronger in boys. Triceps skinfold thickness and arm circumference were consistently the strongest correlates for serum total cholesterol in boys. Weight and arm circumference were consistently the strongest correlates for serum total cholesterol in girls. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study explored the relationship of attitudes, needs, and health services utilization patterns of elderly veterans who were identified and categorized by their expectation for and receipt of sick-role legitimation. Three prescription types (new, change, renewal) were defined as the operational variables. A population of 676 ambulatory, chronically ill (average age 60 years) veterans were sent a questionnaire (74% response rate). In addition, retrospective medical and prescription record review was performed for a 45% sample of respondents. The results were analyzed using discriminant function and regression analysis. Fewer than 20% of the veterans responding expected to receive more prescriptions than were presently prescribed, whereas over 80% expected refill authorizations. Distinct attitudinal, need, and utilization patterns were identified. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The interplay between obesity, physical activity, weight gain and genetic variants in mTOR pathway have not been studied in renal cell carcinoma (RCC). We examined the associations between obesity, weight gain, physical activity and RCC risk. We also analyzed whether genetic variants in the mTOR pathway could modify the association. Incident renal cell carcinoma cases and healthy controls were recruited from the University of Texas MD Anderson Cancer Center in Houston, Texas. Cases and controls were frequency-matched by age (±5 years), ethnicity, sex, and county of residence. Epidemiologic data were collected via in-person interview. A total of 577 cases and 593 healthy controls (all white) were included. One hundred ninety-two (192) SNPs from 22 genes were available and their genotyping data were extracted from previous genome-wide association studies. Logistic regression and regression spline were performed to obtain odds ratios. Obesity at age 20, 40, and 3 years prior to diagnosis/recruitment, and moderate and large weight gain from age 20 to 40 were each significantly associated with increased RCC risk. Low physical activity was associated with a 4.08-fold (95% CI: 2.92-5.70) increased risk. Five single nucleotide polymorphisms (SNPs) were significantly associated with RCC risk and their cumulative effect increased the risk by up to 72% (95% CI: 1.20-2.46). Strata specific effects for weight change and genotyping cumulative groups were observed. However, no interaction was suggested by our study. In conclusion, energy balance related risk factors and genetic variants in the mTOR pathway may jointly influence susceptibility to RCC. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Grain size of 139 unconsolidated sediment samples from seven DSDP sites in the Guaymas Basin and the southeastern tip of the Baja California Peninsula was determined by sieve and pipette techniques. Shepard (1954) classification and Inman (1952) parameters correlation were used for all samples. Sediment texture ranged from sand to silty clay. On the basis of grain-size parameter, the sediments can be divided into three broad groups: (1) very fine sands and coarse silts; (2) medium- to very fine silts; and (3) clays and coarse silts.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The micro-scale spatial distribution patterns of a demersal fish and decapod crustacean assemblage were assessed in a hard-bottom kelp environment in the southern North Sea. Using quadrats along line transects, we assessed the in situ fish and crustacean abundance in relation to substratum types (rock, cobbles and large pebbles) and the density of algae. Six fish and four crustacean species were abundant, with Ctenolabrus rupestris clearly dominating the fish community and Galathea squamifera dominating the crustacean community. Differences in the substratum types had an even stronger effect on the micro-scale distribution than the density of the dominating algae species. Kelp had a negative effect on the fish abundances, with significantly lower average densities in kelp beds compared with adjacent open areas. Averaged over all of the substrata, the most attractive substratum for the fish was large pebbles. In contrast, crustaceans did not show a specific substratum affinity. The results clearly indicate that, similar to other complex systems, significant micro-scale species-habitat associations occur in northern hard-bottom environments. However, because of the frequently harsh environmental conditions, these habitats are mainly sampled from ships with sampling gear, and the resulting data cannot be used to resolve small-scale species-habitat associations. A detailed substratum classification and community assessment, often only possible using SCUBA diving, is therefore important to reach a better understanding of the functional relationships between species and their environment in northern temperate waters, knowledge that is very important with respect to the increasing environmental pressure caused by global climate change.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Shrubs and trees are expected to expand in the sub-Arctic due to global warming. Our study was conducted in Abisko, sub-arctic Sweden. We recorded the change in coverage of shrub and tree species over a 32- to 34-year period, in three 50 x 50 m plots; in the alpine-tree-line ecotone. The cover of shrubs and trees (<3.5 cm diameter at breast height) were estimated during 2009-2010 and compared with historical documentation from 1976 to 1977. Similarly, all tree stems (>=3.5 cm) were noted and positions determined. There has been a substantial increase of cover of shrubs and trees, particularly dwarf birch (Betula nana), and mountain birch (Betula pubescens ssp. czerepanovii), and an establishment of aspen (Populus tremula). The other species willows (Salix spp.), juniper (Juniperus communis), and rowan (Sorbus aucuparia) revealed inconsistent changes among the plots. Although this study was unable to identify the causes for the change in shrubs and small trees, they are consistent with anticipated changes due to climate change and reduced herbivory.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Forty sediment and four basement basalt samples from DSDP Hole 525A, Leg 74, as well as nine basalt samples from southern and offshore Brazil, were subjected to instrumental neutron activation analysis. Thirty-two major, minor, and trace elements were determined. The downcore element concentration profiles and regression analyses show that the rare earth elements (REE) are present in significant amounts in both the carbonate and noncarbonate phases in sediments; Sr is concentrated in the carbonate phase, and most of the other elements determined exist mainly in the noncarbonate phase. The calculated partition coefficients of the REE between the carbonate phase and the free ion concentrations in seawater are high and increase with decreasing REE ionic radii from 3.9 x 10**6 for La to 15 x 10**6 for Lu. Calculations show that the lanthanide concentrations in South Atlantic seawater have not been changed significantly over the past 70 Ma. The Ce anomaly observed in the carbonate phase is a redox indicator of ancient seawater. Study of the Ce anomaly reveals that seawater was anoxic over the Walvis Ridge during the late Campanian. As the gap between South America and West Africa widened and the Walvis Ridge subsided from late Campanian to late Paleocene times, the water circulation of the South Atlantic improved and achieved oxidation conditions about 54 Ma that are similar to present seawater redox conditions in the world oceans. The chemical compositions of the basement rocks correspond to alkalic basalts, not mid-ocean ridge basalts (MORBs). The results add more evidence to support the hypothesis that the Walvis Ridge was formed by a series of volcanos moving over a "hot spot" near the Mid-Atlantic Ridge. From the chemical composition and REE pattern, one 112 Ma old basalt on the Brazilian continental shelf has been identified as an early stage MORB. To date, this is the oldest oceanic tholeiite recovered from the South Atlantic. This direct evidence indicates that the continental split between South America and Africa commenced > 112 Ma.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Manual and low-tech well drilling techniques have potential to assist in reaching the United Nations' millennium development goal for water in sub-Saharan Africa. This study used publicly available geospatial data in a regression tree analysis to predict groundwater depth in the Zinder region of Niger to identify suitable areas for manual well drilling. Regression trees were developed and tested on a database for 3681 wells in the Zinder region. A tree with 17 terminal leaves provided a range of ground water depth estimates that were appropriate for manual drilling, though much of the tree's complexity was associated with depths that were beyond manual methods. A natural log transformation of groundwater depth was tested to see if rescaling dataset variance would result in finer distinctions for regions of shallow groundwater. The RMSE for a log-transformed tree with only 10 terminal leaves was almost half that of the untransformed 17 leaf tree for groundwater depths less than 10 m. This analysis indicated important groundwater relationships for commonly available maps of geology, soils, elevation, and enhanced vegetation index from the MODIS satellite imaging system.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Visual traces of iron reduction and oxidation are linked to the redox status of soils and have been used to characterise the quality of agricultural soils.We tested whether this feature could also be used to explain the spatial pattern of the natural vegetation of tidal habitats. If so, an easy assessment of the effect of rising sea level on tidal ecosystems would be possible. Our study was conducted at the salt marshes of the northern lagoon of Venice, which are strongly threatened by erosion and rising sea level and are part of the world heritage 'Venice and its lagoon'. We analysed the abundance of plant species at 255 sampling points along a land-sea gradient. In addition, we surveyed the redox morphology (presence/absence of red iron oxide mottles in the greyish topsoil horizons) of the soils and the presence of disturbances. We used indicator species analysis, correlation trees and multivariate regression trees to analyse relations between soil properties and plant species distribution. Plant species with known sensitivity to anaerobic conditions (e.g. Halimione portulacoides) were identified as indicators for oxic soils (showing iron oxide mottles within a greyish soil matrix). Plant species that tolerate a low redox potential (e.g. Spartina maritima) were identified as indicators for anoxic soils (greyish matrix without oxide mottles). Correlation trees and multivariate regression trees indicate the dominant role of the redox morphology of the soils in plant species distribution. In addition, the distance from the mainland and the presence of disturbances were identified as tree-splitting variables. The small-scale variation of oxygen availability plays a key role for the biodiversity of salt marsh ecosystems. Our results suggest that the redox morphology of salt marsh soils indicates the plant availability of oxygen. Thus, the consideration of this indicator may enable an understanding of the heterogeneity of biological processes in oxygen-limited systems and may be a sensitive and easy-to-use tool to assess human impacts on salt marsh ecosystems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The use of fly ash (FA) as an admixture to concrete is broadly extended for two main reasons: the reduction of costs that supposes the substitution of cement and the micro structural changes motivated by the mineral admixture. Regarding this second point, there is a consensus that considers that the ash generates a more compact concrete and a reduction in the size of the pore. However, the measure in which this contributes to the pozzolanic activity or as filler is not well defined. There is also no justification to the influence of the physical parameters, fineness of the grain and free water, in its behavior. This work studies the use of FA as a partial substitute of the cement in concretes of different workability (dry and wet) and the influence in the reactivity of the ash. The concrete of dry consistency which serves as reference uses a cement dose of 250 Kg/m 3 and the concrete of fluid consistency utilized a dose of cement of 350 Kg/m 3 . Two trademark of Portland Cement Type 1 were used. The first reached the resistant class for its fineness of grain and the second one for its composition. Moreover, three doses of FA have been used, and the water/binder ratio was constant in all the mixtures. We have studied the mechanical properties and the micro-structure of the concretes by means of compressive strength tests, mercury intrusion porosimetry (MIP) and thermal analysis (TA). The results of compressive strength tests allow us to observe that concrete mixtures with cements of the same classification and similar dosage of binder do not present the same mechanical behavior. These results show that the effective water/binder ratio has a major role in the development of the mechanical properties of concrete. The study of different dosages using TA, thermo-gravimetry and differential thermal analysis, revealed that the portlandite content is not restrictive in any of the dosages studied. Again, this proves that the rheology of the material influences the reaction rate and content of hydrated cement products. We conclude that the available free water is determinant in the efficiency of pozzolanic reaction. It is so that in accordance to the availability of free water, the ashes can react as an active admixture or simply change the porous distribution. The MIP shows concretes that do not exhibit significant changes in their mechanical behavior, but have suffered significant variation in their porous structure

Relevância:

100.00% 100.00%

Publicador:

Resumo:

OntoTag - A Linguistic and Ontological Annotation Model Suitable for the Semantic Web 1. INTRODUCTION. LINGUISTIC TOOLS AND ANNOTATIONS: THEIR LIGHTS AND SHADOWS Computational Linguistics is already a consolidated research area. It builds upon the results of other two major ones, namely Linguistics and Computer Science and Engineering, and it aims at developing computational models of human language (or natural language, as it is termed in this area). Possibly, its most well-known applications are the different tools developed so far for processing human language, such as machine translation systems and speech recognizers or dictation programs. These tools for processing human language are commonly referred to as linguistic tools. Apart from the examples mentioned above, there are also other types of linguistic tools that perhaps are not so well-known, but on which most of the other applications of Computational Linguistics are built. These other types of linguistic tools comprise POS taggers, natural language parsers and semantic taggers, amongst others. All of them can be termed linguistic annotation tools. Linguistic annotation tools are important assets. In fact, POS and semantic taggers (and, to a lesser extent, also natural language parsers) have become critical resources for the computer applications that process natural language. Hence, any computer application that has to analyse a text automatically and ‘intelligently’ will include at least a module for POS tagging. The more an application needs to ‘understand’ the meaning of the text it processes, the more linguistic tools and/or modules it will incorporate and integrate. However, linguistic annotation tools have still some limitations, which can be summarised as follows: 1. Normally, they perform annotations only at a certain linguistic level (that is, Morphology, Syntax, Semantics, etc.). 2. They usually introduce a certain rate of errors and ambiguities when tagging. This error rate ranges from 10 percent up to 50 percent of the units annotated for unrestricted, general texts. 3. Their annotations are most frequently formulated in terms of an annotation schema designed and implemented ad hoc. A priori, it seems that the interoperation and the integration of several linguistic tools into an appropriate software architecture could most likely solve the limitations stated in (1). Besides, integrating several linguistic annotation tools and making them interoperate could also minimise the limitation stated in (2). Nevertheless, in the latter case, all these tools should produce annotations for a common level, which would have to be combined in order to correct their corresponding errors and inaccuracies. Yet, the limitation stated in (3) prevents both types of integration and interoperation from being easily achieved. In addition, most high-level annotation tools rely on other lower-level annotation tools and their outputs to generate their own ones. For example, sense-tagging tools (operating at the semantic level) often use POS taggers (operating at a lower level, i.e., the morphosyntactic) to identify the grammatical category of the word or lexical unit they are annotating. Accordingly, if a faulty or inaccurate low-level annotation tool is to be used by other higher-level one in its process, the errors and inaccuracies of the former should be minimised in advance. Otherwise, these errors and inaccuracies would be transferred to (and even magnified in) the annotations of the high-level annotation tool. Therefore, it would be quite useful to find a way to (i) correct or, at least, reduce the errors and the inaccuracies of lower-level linguistic tools; (ii) unify the annotation schemas of different linguistic annotation tools or, more generally speaking, make these tools (as well as their annotations) interoperate. Clearly, solving (i) and (ii) should ease the automatic annotation of web pages by means of linguistic tools, and their transformation into Semantic Web pages (Berners-Lee, Hendler and Lassila, 2001). Yet, as stated above, (ii) is a type of interoperability problem. There again, ontologies (Gruber, 1993; Borst, 1997) have been successfully applied thus far to solve several interoperability problems. Hence, ontologies should help solve also the problems and limitations of linguistic annotation tools aforementioned. Thus, to summarise, the main aim of the present work was to combine somehow these separated approaches, mechanisms and tools for annotation from Linguistics and Ontological Engineering (and the Semantic Web) in a sort of hybrid (linguistic and ontological) annotation model, suitable for both areas. This hybrid (semantic) annotation model should (a) benefit from the advances, models, techniques, mechanisms and tools of these two areas; (b) minimise (and even solve, when possible) some of the problems found in each of them; and (c) be suitable for the Semantic Web. The concrete goals that helped attain this aim are presented in the following section. 2. GOALS OF THE PRESENT WORK As mentioned above, the main goal of this work was to specify a hybrid (that is, linguistically-motivated and ontology-based) model of annotation suitable for the Semantic Web (i.e. it had to produce a semantic annotation of web page contents). This entailed that the tags included in the annotations of the model had to (1) represent linguistic concepts (or linguistic categories, as they are termed in ISO/DCR (2008)), in order for this model to be linguistically-motivated; (2) be ontological terms (i.e., use an ontological vocabulary), in order for the model to be ontology-based; and (3) be structured (linked) as a collection of ontology-based triples, as in the usual Semantic Web languages (namely RDF(S) and OWL), in order for the model to be considered suitable for the Semantic Web. Besides, to be useful for the Semantic Web, this model should provide a way to automate the annotation of web pages. As for the present work, this requirement involved reusing the linguistic annotation tools purchased by the OEG research group (http://www.oeg-upm.net), but solving beforehand (or, at least, minimising) some of their limitations. Therefore, this model had to minimise these limitations by means of the integration of several linguistic annotation tools into a common architecture. Since this integration required the interoperation of tools and their annotations, ontologies were proposed as the main technological component to make them effectively interoperate. From the very beginning, it seemed that the formalisation of the elements and the knowledge underlying linguistic annotations within an appropriate set of ontologies would be a great step forward towards the formulation of such a model (henceforth referred to as OntoTag). Obviously, first, to combine the results of the linguistic annotation tools that operated at the same level, their annotation schemas had to be unified (or, preferably, standardised) in advance. This entailed the unification (id. standardisation) of their tags (both their representation and their meaning), and their format or syntax. Second, to merge the results of the linguistic annotation tools operating at different levels, their respective annotation schemas had to be (a) made interoperable and (b) integrated. And third, in order for the resulting annotations to suit the Semantic Web, they had to be specified by means of an ontology-based vocabulary, and structured by means of ontology-based triples, as hinted above. Therefore, a new annotation scheme had to be devised, based both on ontologies and on this type of triples, which allowed for the combination and the integration of the annotations of any set of linguistic annotation tools. This annotation scheme was considered a fundamental part of the model proposed here, and its development was, accordingly, another major objective of the present work. All these goals, aims and objectives could be re-stated more clearly as follows: Goal 1: Development of a set of ontologies for the formalisation of the linguistic knowledge relating linguistic annotation. Sub-goal 1.1: Ontological formalisation of the EAGLES (1996a; 1996b) de facto standards for morphosyntactic and syntactic annotation, in a way that helps respect the triple structure recommended for annotations in these works (which is isomorphic to the triple structures used in the context of the Semantic Web). Sub-goal 1.2: Incorporation into this preliminary ontological formalisation of other existing standards and standard proposals relating the levels mentioned above, such as those currently under development within ISO/TC 37 (the ISO Technical Committee dealing with Terminology, which deals also with linguistic resources and annotations). Sub-goal 1.3: Generalisation and extension of the recommendations in EAGLES (1996a; 1996b) and ISO/TC 37 to the semantic level, for which no ISO/TC 37 standards have been developed yet. Sub-goal 1.4: Ontological formalisation of the generalisations and/or extensions obtained in the previous sub-goal as generalisations and/or extensions of the corresponding ontology (or ontologies). Sub-goal 1.5: Ontological formalisation of the knowledge required to link, combine and unite the knowledge represented in the previously developed ontology (or ontologies). Goal 2: Development of OntoTag’s annotation scheme, a standard-based abstract scheme for the hybrid (linguistically-motivated and ontological-based) annotation of texts. Sub-goal 2.1: Development of the standard-based morphosyntactic annotation level of OntoTag’s scheme. This level should include, and possibly extend, the recommendations of EAGLES (1996a) and also the recommendations included in the ISO/MAF (2008) standard draft. Sub-goal 2.2: Development of the standard-based syntactic annotation level of the hybrid abstract scheme. This level should include, and possibly extend, the recommendations of EAGLES (1996b) and the ISO/SynAF (2010) standard draft. Sub-goal 2.3: Development of the standard-based semantic annotation level of OntoTag’s (abstract) scheme. Sub-goal 2.4: Development of the mechanisms for a convenient integration of the three annotation levels already mentioned. These mechanisms should take into account the recommendations included in the ISO/LAF (2009) standard draft. Goal 3: Design of OntoTag’s (abstract) annotation architecture, an abstract architecture for the hybrid (semantic) annotation of texts (i) that facilitates the integration and interoperation of different linguistic annotation tools, and (ii) whose results comply with OntoTag’s annotation scheme. Sub-goal 3.1: Specification of the decanting processes that allow for the classification and separation, according to their corresponding levels, of the results of the linguistic tools annotating at several different levels. Sub-goal 3.2: Specification of the standardisation processes that allow (a) complying with the standardisation requirements of OntoTag’s annotation scheme, as well as (b) combining the results of those linguistic tools that share some level of annotation. Sub-goal 3.3: Specification of the merging processes that allow for the combination of the output annotations and the interoperation of those linguistic tools that share some level of annotation. Sub-goal 3.4: Specification of the merge processes that allow for the integration of the results and the interoperation of those tools performing their annotations at different levels. Goal 4: Generation of OntoTagger’s schema, a concrete instance of OntoTag’s abstract scheme for a concrete set of linguistic annotations. These linguistic annotations result from the tools and the resources available in the research group, namely • Bitext’s DataLexica (http://www.bitext.com/EN/datalexica.asp), • LACELL’s (POS) tagger (http://www.um.es/grupos/grupo-lacell/quees.php), • Connexor’s FDG (http://www.connexor.eu/technology/machinese/glossary/fdg/), and • EuroWordNet (Vossen et al., 1998). This schema should help evaluate OntoTag’s underlying hypotheses, stated below. Consequently, it should implement, at least, those levels of the abstract scheme dealing with the annotations of the set of tools considered in this implementation. This includes the morphosyntactic, the syntactic and the semantic levels. Goal 5: Implementation of OntoTagger’s configuration, a concrete instance of OntoTag’s abstract architecture for this set of linguistic tools and annotations. This configuration (1) had to use the schema generated in the previous goal; and (2) should help support or refute the hypotheses of this work as well (see the next section). Sub-goal 5.1: Implementation of the decanting processes that facilitate the classification and separation of the results of those linguistic resources that provide annotations at several different levels (on the one hand, LACELL’s tagger operates at the morphosyntactic level and, minimally, also at the semantic level; on the other hand, FDG operates at the morphosyntactic and the syntactic levels and, minimally, at the semantic level as well). Sub-goal 5.2: Implementation of the standardisation processes that allow (i) specifying the results of those linguistic tools that share some level of annotation according to the requirements of OntoTagger’s schema, as well as (ii) combining these shared level results. In particular, all the tools selected perform morphosyntactic annotations and they had to be conveniently combined by means of these processes. Sub-goal 5.3: Implementation of the merging processes that allow for the combination (and possibly the improvement) of the annotations and the interoperation of the tools that share some level of annotation (in particular, those relating the morphosyntactic level, as in the previous sub-goal). Sub-goal 5.4: Implementation of the merging processes that allow for the integration of the different standardised and combined annotations aforementioned, relating all the levels considered. Sub-goal 5.5: Improvement of the semantic level of this configuration by adding a named entity recognition, (sub-)classification and annotation subsystem, which also uses the named entities annotated to populate a domain ontology, in order to provide a concrete application of the present work in the two areas involved (the Semantic Web and Corpus Linguistics). 3. MAIN RESULTS: ASSESSMENT OF ONTOTAG’S UNDERLYING HYPOTHESES The model developed in the present thesis tries to shed some light on (i) whether linguistic annotation tools can effectively interoperate; (ii) whether their results can be combined and integrated; and, if they can, (iii) how they can, respectively, interoperate and be combined and integrated. Accordingly, several hypotheses had to be supported (or rejected) by the development of the OntoTag model and OntoTagger (its implementation). The hypotheses underlying OntoTag are surveyed below. Only one of the hypotheses (H.6) was rejected; the other five could be confirmed. H.1 The annotations of different levels (or layers) can be integrated into a sort of overall, comprehensive, multilayer and multilevel annotation, so that their elements can complement and refer to each other. • CONFIRMED by the development of: o OntoTag’s annotation scheme, o OntoTag’s annotation architecture, o OntoTagger’s (XML, RDF, OWL) annotation schemas, o OntoTagger’s configuration. H.2 Tool-dependent annotations can be mapped onto a sort of tool-independent annotations and, thus, can be standardised. • CONFIRMED by means of the standardisation phase incorporated into OntoTag and OntoTagger for the annotations yielded by the tools. H.3 Standardisation should ease: H.3.1: The interoperation of linguistic tools. H.3.2: The comparison, combination (at the same level and layer) and integration (at different levels or layers) of annotations. • H.3 was CONFIRMED by means of the development of OntoTagger’s ontology-based configuration: o Interoperation, comparison, combination and integration of the annotations of three different linguistic tools (Connexor’s FDG, Bitext’s DataLexica and LACELL’s tagger); o Integration of EuroWordNet-based, domain-ontology-based and named entity annotations at the semantic level. o Integration of morphosyntactic, syntactic and semantic annotations. H.4 Ontologies and Semantic Web technologies (can) play a crucial role in the standardisation of linguistic annotations, by providing consensual vocabularies and standardised formats for annotation (e.g., RDF triples). • CONFIRMED by means of the development of OntoTagger’s RDF-triple-based annotation schemas. H.5 The rate of errors introduced by a linguistic tool at a given level, when annotating, can be reduced automatically by contrasting and combining its results with the ones coming from other tools, operating at the same level. However, these other tools might be built following a different technological (stochastic vs. rule-based, for example) or theoretical (dependency vs. HPS-grammar-based, for instance) approach. • CONFIRMED by the results yielded by the evaluation of OntoTagger. H.6 Each linguistic level can be managed and annotated independently. • REJECTED: OntoTagger’s experiments and the dependencies observed among the morphosyntactic annotations, and between them and the syntactic annotations. In fact, Hypothesis H.6 was already rejected when OntoTag’s ontologies were developed. We observed then that several linguistic units stand on an interface between levels, belonging thereby to both of them (such as morphosyntactic units, which belong to both the morphological level and the syntactic level). Therefore, the annotations of these levels overlap and cannot be handled independently when merged into a unique multileveled annotation. 4. OTHER MAIN RESULTS AND CONTRIBUTIONS First, interoperability is a hot topic for both the linguistic annotation community and the whole Computer Science field. The specification (and implementation) of OntoTag’s architecture for the combination and integration of linguistic (annotation) tools and annotations by means of ontologies shows a way to make these different linguistic annotation tools and annotations interoperate in practice. Second, as mentioned above, the elements involved in linguistic annotation were formalised in a set (or network) of ontologies (OntoTag’s linguistic ontologies). • On the one hand, OntoTag’s network of ontologies consists of − The Linguistic Unit Ontology (LUO), which includes a mostly hierarchical formalisation of the different types of linguistic elements (i.e., units) identifiable in a written text; − The Linguistic Attribute Ontology (LAO), which includes also a mostly hierarchical formalisation of the different types of features that characterise the linguistic units included in the LUO; − The Linguistic Value Ontology (LVO), which includes the corresponding formalisation of the different values that the attributes in the LAO can take; − The OIO (OntoTag’s Integration Ontology), which  Includes the knowledge required to link, combine and unite the knowledge represented in the LUO, the LAO and the LVO;  Can be viewed as a knowledge representation ontology that describes the most elementary vocabulary used in the area of annotation. • On the other hand, OntoTag’s ontologies incorporate the knowledge included in the different standards and recommendations for linguistic annotation released so far, such as those developed within the EAGLES and the SIMPLE European projects or by the ISO/TC 37 committee: − As far as morphosyntactic annotations are concerned, OntoTag’s ontologies formalise the terms in the EAGLES (1996a) recommendations and their corresponding terms within the ISO Morphosyntactic Annotation Framework (ISO/MAF, 2008) standard; − As for syntactic annotations, OntoTag’s ontologies incorporate the terms in the EAGLES (1996b) recommendations and their corresponding terms within the ISO Syntactic Annotation Framework (ISO/SynAF, 2010) standard draft; − Regarding semantic annotations, OntoTag’s ontologies generalise and extend the recommendations in EAGLES (1996a; 1996b) and, since no stable standards or standard drafts have been released for semantic annotation by ISO/TC 37 yet, they incorporate the terms in SIMPLE (2000) instead; − The terms coming from all these recommendations and standards were supplemented by those within the ISO Data Category Registry (ISO/DCR, 2008) and also of the ISO Linguistic Annotation Framework (ISO/LAF, 2009) standard draft when developing OntoTag’s ontologies. Third, we showed that the combination of the results of tools annotating at the same level can yield better results (both in precision and in recall) than each tool separately. In particular, 1. OntoTagger clearly outperformed two of the tools integrated into its configuration, namely DataLexica and FDG in all the combination sub-phases in which they overlapped (i.e. POS tagging, lemma annotation and morphological feature annotation). As far as the remaining tool is concerned, i.e. LACELL’s tagger, it was also outperformed by OntoTagger in POS tagging and lemma annotation, and it did not behave better than OntoTagger in the morphological feature annotation layer. 2. As an immediate result, this implies that a) This type of combination architecture configurations can be applied in order to improve significantly the accuracy of linguistic annotations; and b) Concerning the morphosyntactic level, this could be regarded as a way of constructing more robust and more accurate POS tagging systems. Fourth, Semantic Web annotations are usually performed by humans or else by machine learning systems. Both of them leave much to be desired: the former, with respect to their annotation rate; the latter, with respect to their (average) precision and recall. In this work, we showed how linguistic tools can be wrapped in order to annotate automatically Semantic Web pages using ontologies. This entails their fast, robust and accurate semantic annotation. As a way of example, as mentioned in Sub-goal 5.5, we developed a particular OntoTagger module for the recognition, classification and labelling of named entities, according to the MUC and ACE tagsets (Chinchor, 1997; Doddington et al., 2004). These tagsets were further specified by means of a domain ontology, namely the Cinema Named Entities Ontology (CNEO). This module was applied to the automatic annotation of ten different web pages containing cinema reviews (that is, around 5000 words). In addition, the named entities annotated with this module were also labelled as instances (or individuals) of the classes included in the CNEO and, then, were used to populate this domain ontology. • The statistical results obtained from the evaluation of this particular module of OntoTagger can be summarised as follows. On the one hand, as far as recall (R) is concerned, (R.1) the lowest value was 76,40% (for file 7); (R.2) the highest value was 97, 50% (for file 3); and (R.3) the average value was 88,73%. On the other hand, as far as the precision rate (P) is concerned, (P.1) its minimum was 93,75% (for file 4); (R.2) its maximum was 100% (for files 1, 5, 7, 8, 9, and 10); and (R.3) its average value was 98,99%. • These results, which apply to the tasks of named entity annotation and ontology population, are extraordinary good for both of them. They can be explained on the basis of the high accuracy of the annotations provided by OntoTagger at the lower levels (mainly at the morphosyntactic level). However, they should be conveniently qualified, since they might be too domain- and/or language-dependent. It should be further experimented how our approach works in a different domain or a different language, such as French, English, or German. • In any case, the results of this application of Human Language Technologies to Ontology Population (and, accordingly, to Ontological Engineering) seem very promising and encouraging in order for these two areas to collaborate and complement each other in the area of semantic annotation. Fifth, as shown in the State of the Art of this work, there are different approaches and models for the semantic annotation of texts, but all of them focus on a particular view of the semantic level. Clearly, all these approaches and models should be integrated in order to bear a coherent and joint semantic annotation level. OntoTag shows how (i) these semantic annotation layers could be integrated together; and (ii) they could be integrated with the annotations associated to other annotation levels. Sixth, we identified some recommendations, best practices and lessons learned for annotation standardisation, interoperation and merge. They show how standardisation (via ontologies, in this case) enables the combination, integration and interoperation of different linguistic tools and their annotations into a multilayered (or multileveled) linguistic annotation, which is one of the hot topics in the area of Linguistic Annotation. And last but not least, OntoTag’s annotation scheme and OntoTagger’s annotation schemas show a way to formalise and annotate coherently and uniformly the different units and features associated to the different levels and layers of linguistic annotation. This is a great scientific step ahead towards the global standardisation of this area, which is the aim of ISO/TC 37 (in particular, Subcommittee 4, dealing with the standardisation of linguistic annotations and resources).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Las aplicaciones de la teledetección al seguimiento de lo que ocurre en la superficie terrestre se han ido multiplicando y afinando con el lanzamiento de nuevos sensores por parte de las diferentes agencias espaciales. La necesidad de tener información actualizada cada poco tiempo y espacialmente homogénea, ha provocado el desarrollo de nuevos programas como el Earth Observing System (EOS) de la National Aeronautics and Space Administration (NASA). Uno de los sensores que incorpora el buque insignia de ese programa, el satélite TERRA, es el Multi-angle Imaging SpectroRadiometer (MISR), diseñado para capturar información multiangular de la superficie terrestre. Ya desde los años 1970, se conocía que la reflectancia de las diversas ocupaciones y usos del suelo variaba en función del ángulo de observación y de iluminación, es decir, que eran anisotrópicas. Tal variación estaba además relacionada con la estructura tridimensional de tales ocupaciones, por lo que se podía aprovechar tal relación para obtener información de esa estructura, más allá de la que pudiera proporcionar la información meramente espectral. El sensor MISR incorpora 9 cámaras a diferentes ángulos para capturar 9 imágenes casi simultáneas del mismo punto, lo que permite estimar con relativa fiabilidad la respuesta anisotrópica de la superficie terrestre. Varios trabajos han demostrado que se pueden estimar variables relacionadas con la estructura de la vegetación con la información que proporciona MISR. En esta Tesis se ha realizado una primera aplicación a la Península Ibérica, para comprobar su utilidad a la hora de estimar variables de interés forestal. En un primer paso se ha analizado la variabilidad temporal que se produce en los datos, debido a los cambios en la geometría de captación, es decir, debido a la posición relativa de sensores y fuente de iluminación, que en este caso es el Sol. Se ha comprobado cómo la anisotropía es mayor desde finales de otoño hasta principios de primavera debido a que la posición del Sol es más cercana al plano de los sensores. También se ha comprobado que los valores máximo y mínimo se van desplazando temporalmente entre el centro y el extremo angular. En la caracterización multiangular de ocupaciones del suelo de CORINE Land Cover que se ha realizado, se puede observar cómo la forma predominante en las imágenes con el Sol más alto es convexa con un máximo en la cámara más cercana a la fuente de iluminación. Sin embargo, cuando el Sol se encuentra mucho más bajo, ese máximo es muy externo. Por otra parte, los datos obtenidos en verano son mucho más variables para cada ocupación que los de noviembre, posiblemente debido al aumento proporcional de las zonas en sombra. Para comprobar si la información multiangular tiene algún efecto en la obtención de imágenes clasificadas según ocupación y usos del suelo, se han realizado una serie de clasificaciones variando la información utilizada, desde sólo multiespectral, a multiangular y multiespectral. Los resultados muestran que, mientras para las clasificaciones más genéricas la información multiangular proporciona los peores resultados, a medida que se amplían el número de clases a obtener tal información mejora a lo obtenido únicamente con información multiespectral. Por otra parte, se ha realizado una estimación de variables cuantitativas como la fracción de cabida cubierta (Fcc) y la altura de la vegetación a partir de información proporcionada por MISR a diferentes resoluciones. En el valle de Alcudia (Ciudad Real) se ha estimado la fracción de cabida cubierta del arbolado para un píxel de 275 m utilizando redes neuronales. Los resultados muestran que utilizar información multiespectral y multiangular puede mejorar casi un 20% las estimaciones realizadas sólo con datos multiespectrales. Además, las relaciones obtenidas llegan al 0,7 de R con errores inferiores a un 10% en Fcc, siendo éstos mucho mejores que los obtenidos con el producto elaborado a partir de datos multiespectrales del sensor Moderate Resolution Imaging Spectroradiometer (MODIS), también a bordo de Terra, para la misma variable. Por último, se ha estimado la fracción de cabida cubierta y la altura efectiva de la vegetación para 700.000 ha de la provincia de Murcia, con una resolución de 1.100 m. Los resultados muestran la relación existente entre los datos espectrales y los multiangulares, obteniéndose coeficientes de Spearman del orden de 0,8 en el caso de la fracción de cabida cubierta de la vegetación, y de 0,4 en el caso de la altura efectiva. Las estimaciones de ambas variables con redes neuronales y diversas combinaciones de datos, arrojan resultados con R superiores a 0,85 para el caso del grado de cubierta vegetal, y 0,6 para la altura efectiva. Los parámetros multiangulares proporcionados en los productos elaborados con MISR a 1.100 m, no obtienen buenos resultados por sí mismos pero producen cierta mejora al incorporarlos a la información espectral. Los errores cuadráticos medios obtenidos son inferiores a 0,016 para la Fcc de la vegetación en tanto por uno, y 0,7 m para la altura efectiva de la misma. Regresiones geográficamente ponderadas muestran además que localmente se pueden obtener mejores resultados aún mejores, especialmente cuando hay una mayor variabilidad espacial de las variables estimadas. En resumen, la utilización de los datos proporcionados por MISR ofrece una prometedora vía de mejora de resultados en la media-baja resolución, tanto para la clasificación de imágenes como para la obtención de variables cuantitativas de la estructura de la vegetación. ABSTRACT Applications of remote sensing for monitoring what is happening on the land surface have been multiplied and refined with the launch of new sensors by different Space Agencies. The need of having up to date and spatially homogeneous data, has led to the development of new programs such as the Earth Observing System (EOS) of the National Aeronautics and Space Administration (NASA). One of the sensors incorporating the flagship of that program, the TERRA satellite, is Multi-angle Imaging Spectroradiometer (MISR), designed to capture the multi-angle information of the Earth's surface. Since the 1970s, it was known that the reflectance of various land covers and land uses varied depending on the viewing and ilumination angles, so they are anisotropic. Such variation was also related to the three dimensional structure of such covers, so that one could take advantage of such a relationship to obtain information from that structure, beyond which spectral information could provide. The MISR sensor incorporates 9 cameras at different angles to capture 9 almost simultaneous images of the same point, allowing relatively reliable estimates of the anisotropic response of the Earth's surface. Several studies have shown that we can estimate variables related to the vegetation structure with the information provided by this sensor, so this thesis has made an initial application to the Iberian Peninsula, to check their usefulness in estimating forest variables of interest. In a first step we analyzed the temporal variability that occurs in the data, due to the changes in the acquisition geometry, i.e. the relative position of sensor and light source, which in this case is the Sun. It has been found that the anisotropy is greater from late fall through early spring due to the Sun's position closer to the plane of the sensors. It was also found that the maximum and minimum values are displaced temporarily between the center and the ends. In characterizing CORINE Land Covers that has been done, one could see how the predominant form in the images with the highest sun is convex with a maximum in the camera closer to the light source. However, when the sun is much lower, the maximum is external. Moreover, the data obtained for each land cover are much more variable in summer that in November, possibly due to the proportional increase in shadow areas. To check whether the information has any effect on multi-angle imaging classification of land cover and land use, a series of classifications have been produced changing the data used, from only multispectrally, to multi-angle and multispectral. The results show that while for the most generic classifications multi-angle information is the worst, as there are extended the number of classes to obtain such information it improves the results. On the other hand, an estimate was made of quantitative variables such as canopy cover and vegetation height using information provided by MISR at different resolutions. In the valley of Alcudia (Ciudad Real), we estimated the canopy cover of trees for a pixel of 275 m by using neural networks. The results showed that using multispectral and multiangle information can improve by almost 20% the estimates that only used multispectral data. Furthermore, the relationships obtained reached an R coefficient of 0.7 with errors below 10% in canopy cover, which is much better result than the one obtained using data from the Moderate Resolution Imaging Spectroradiometer (MODIS), also onboard Terra, for the same variable. Finally we estimated the canopy cover and the effective height of the vegetation for 700,000 hectares in the province of Murcia, with a spatial resolution of 1,100 m. The results show a relationship between the spectral and the multi-angle data, and provide estimates of the canopy cover with a Spearman’s coefficient of 0.8 in the case of the vegetation canopy cover, and 0.4 in the case of the effective height. The estimates of both variables using neural networks and various combinations of data, yield results with an R coefficient greater than 0.85 for the case of the canopy cover, and 0.6 for the effective height. Multi-angle parameters provided in the products made from MISR at 1,100 m pixel size, did not produce good results from themselves but improved the results when included to the spectral information. The mean square errors were less than 0.016 for the canopy cover, and 0.7 m for the effective height. Geographically weighted regressions also showed that locally we can have even better results, especially when there is high spatial variability of estimated variables. In summary, the use of the data provided by MISR offers a promising way of improving remote sensing performance in the low-medium spatial resolution, both for image classification and for the estimation of quantitative variables of the vegetation structure.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Aiming to address requirements concerning integration of services in the context of ?big data?, this paper presents an innovative approach that (i) ensures a flexible, adaptable and scalable information and computation infrastructure, and (ii) exploits the competences of stakeholders and information workers to meaningfully confront information management issues such as information characterization, classification and interpretation, thus incorporating the underlying collective intelligence. Our approach pays much attention to the issues of usability and ease-of-use, not requiring any particular programming expertise from the end users. We report on a series of technical issues concerning the desired flexibility of the proposed integration framework and we provide related recommendations to developers of such solutions. Evaluation results are also discussed.