980 resultados para Syntactic Projection
Resumo:
Biomedical research is currently facing a new type of challenge: an excess of information, both in terms of raw data from experiments and in the number of scientific publications describing their results. Mirroring the focus on data mining techniques to address the issues of structured data, there has recently been great interest in the development and application of text mining techniques to make more effective use of the knowledge contained in biomedical scientific publications, accessible only in the form of natural human language. This thesis describes research done in the broader scope of projects aiming to develop methods, tools and techniques for text mining tasks in general and for the biomedical domain in particular. The work described here involves more specifically the goal of extracting information from statements concerning relations of biomedical entities, such as protein-protein interactions. The approach taken is one using full parsing—syntactic analysis of the entire structure of sentences—and machine learning, aiming to develop reliable methods that can further be generalized to apply also to other domains. The five papers at the core of this thesis describe research on a number of distinct but related topics in text mining. In the first of these studies, we assessed the applicability of two popular general English parsers to biomedical text mining and, finding their performance limited, identified several specific challenges to accurate parsing of domain text. In a follow-up study focusing on parsing issues related to specialized domain terminology, we evaluated three lexical adaptation methods. We found that the accurate resolution of unknown words can considerably improve parsing performance and introduced a domain-adapted parser that reduced the error rate of theoriginal by 10% while also roughly halving parsing time. To establish the relative merits of parsers that differ in the applied formalisms and the representation given to their syntactic analyses, we have also developed evaluation methodology, considering different approaches to establishing comparable dependency-based evaluation results. We introduced a methodology for creating highly accurate conversions between different parse representations, demonstrating the feasibility of unification of idiverse syntactic schemes under a shared, application-oriented representation. In addition to allowing formalism-neutral evaluation, we argue that such unification can also increase the value of parsers for domain text mining. As a further step in this direction, we analysed the characteristics of publicly available biomedical corpora annotated for protein-protein interactions and created tools for converting them into a shared form, thus contributing also to the unification of text mining resources. The introduced unified corpora allowed us to perform a task-oriented comparative evaluation of biomedical text mining corpora. This evaluation established clear limits on the comparability of results for text mining methods evaluated on different resources, prompting further efforts toward standardization. To support this and other research, we have also designed and annotated BioInfer, the first domain corpus of its size combining annotation of syntax and biomedical entities with a detailed annotation of their relationships. The corpus represents a major design and development effort of the research group, with manual annotation that identifies over 6000 entities, 2500 relationships and 28,000 syntactic dependencies in 1100 sentences. In addition to combining these key annotations for a single set of sentences, BioInfer was also the first domain resource to introduce a representation of entity relations that is supported by ontologies and able to capture complex, structured relationships. Part I of this thesis presents a summary of this research in the broader context of a text mining system, and Part II contains reprints of the five included publications.
Resumo:
By theorems of Ferguson and Lacey ($d=2$) and Lacey and Terwilleger ($d>2$), Nehari's theorem is known to hold on the polydisc $\D^d$ for $d>1$, i.e., if $H_\psi$ is a bounded Hankel form on $H^2(\D^d)$ with analytic symbol $\psi$, then there is a function $\varphi$ in $L^\infty(\T^d)$ such that $\psi$ is the Riesz projection of $\varphi$. A method proposed in Helson's last paper is used to show that the constant $C_d$ in the estimate $\|\varphi\|_\infty\le C_d \|H_\psi\|$ grows at least exponentially with $d$; it follows that there is no analogue of Nehari's theorem on the infinite-dimensional polydisc.
Resumo:
PURPOSE: Iterative algorithms introduce new challenges in the field of image quality assessment. The purpose of this study is to use a mathematical model to evaluate objectively the low contrast detectability in CT. MATERIALS AND METHODS: A QRM 401 phantom containing 5 and 8 mm diameter spheres with a contrast level of 10 and 20 HU was used. The images were acquired at 120 kV with CTDIvol equal to 5, 10, 15, 20 mGy and reconstructed using the filtered back-projection (FBP), adaptive statistical iterative reconstruction 50% (ASIR 50%) and model-based iterative reconstruction (MBIR) algorithms. The model observer used is the Channelized Hotelling Observer (CHO). The channels are dense difference of Gaussian channels (D-DOG). The CHO performances were compared to the outcomes of six human observers having performed four alternative forced choice (4-AFC) tests. RESULTS: For the same CTDIvol level and according to CHO model, the MBIR algorithm gives the higher detectability index. The outcomes of human observers and results of CHO are highly correlated whatever the dose levels, the signals considered and the algorithms used when some noise is added to the CHO model. The Pearson coefficient between the human observers and the CHO is 0.93 for FBP and 0.98 for MBIR. CONCLUSION: The human observers' performances can be predicted by the CHO model. This opens the way for proposing, in parallel to the standard dose report, the level of low contrast detectability expected. The introduction of iterative reconstruction requires such an approach to ensure that dose reduction does not impair diagnostics.
Resumo:
L’objectiu principal d’aquest treball de fi de grau és fer-se càrrec d’una traducció jurídica amb tot el què això implica: documentar-se a través de fonts fiables, emprar les eines adequades, lliurar-lo dins el termini establert, entre d’altres. En aquest cas, és una traducció de les lleis que regulen les adopcions a l’Índia. A més, en aquest treball també s’explica breument el dret civil a Catalunya i es compara amb el de l’Índia, ja que es basen en idees molt diferents. Aquests tipus de traduccions exigeixen precisió i claredat perquè els conceptes i les estructures sintàctiques acostumen a ser molt complexes. A continuació, hi ha detallat cada pas que s’ha seguit per tal d’assolir l’objectiu principal.
Resumo:
Tractem de sintetitzar la projecció exterior de la poesia de Miquel Martí i Pol al llarg de més de cinquanta anys. Ha estat traduïda a catorze llengües diferents i s’han traduït mostres poètiques de tots els seus títols.
Resumo:
The role of grammatical class in lexical access and representation is still not well understood. Grammatical effects obtained in picture-word interference experiments have been argued to show the operation of grammatical constraints during lexicalization when syntactic integration is required by the task. Alternative views hold that the ostensibly grammatical effects actually derive from the coincidence of semantic and grammatical differences between lexical candidates. We present three picture-word interference experiments conducted in Spanish. In the first two, the semantic relatedness (related or unrelated) and the grammatical class (nouns or verbs) of the target and the distracter were manipulated in an infinitive form action naming task in order to disentangle their contributions to verb lexical access. In the third experiment, a possible confound between grammatical class and semantic domain (objects or actions) was eliminated by using action-nouns as distracters. A condition in which participants were asked to name the action pictures using an inflected form of the verb was also included to explore whether the need of syntactic integration modulated the appearance of grammatical effects. Whereas action-words (nouns or verbs), but not object-nouns, produced longer reaction times irrespective of their grammatical class in the infinitive condition, only verbs slowed latencies in the inflected form condition. Our results suggest that speech production relies on the exclusion of candidate responses that do not fulfil task-pertinent criteria like membership in the appropriate semantic domain or grammatical class. Taken together, these findings are explained by a response-exclusion account of speech output. This and alternative hypotheses are discussed.
Resumo:
In this paper, we propose a new supervised linearfeature extraction technique for multiclass classification problemsthat is specially suited to the nearest neighbor classifier (NN).The problem of finding the optimal linear projection matrix isdefined as a classification problem and the Adaboost algorithmis used to compute it in an iterative way. This strategy allowsthe introduction of a multitask learning (MTL) criterion in themethod and results in a solution that makes no assumptions aboutthe data distribution and that is specially appropriated to solvethe small sample size problem. The performance of the methodis illustrated by an application to the face recognition problem.The experiments show that the representation obtained followingthe multitask approach improves the classic feature extractionalgorithms when using the NN classifier, especially when we havea few examples from each class
Resumo:
Aquest treball pretén analitzar l’oferta cultural al municipi de Palafrugell i la seva projecció turística, per tal de formular propostes de millora i contribuir així a augmentar la presència turística en aquest tipus de recursos
Resumo:
This thesis addresses the problem of computing the minimal and maximal diameter of the Cayley graph of Coxeter groups. We first present and assert relevant parts of polytope theory and related Coxeter theory. After this, a method of contracting the orthogonal projections of a polytope from Rd onto R2 and R3, d ¸ 3 is presented. This method is the Equality Set Projection algorithm that requires a constant number of linearprogramming problems per facet of the projection in the absence of degeneracy. The ESP algorithm allows us to compute also projected geometric diameters of high-dimensional polytopes. A representation set of projected polytopes is presented to illustrate the methods adopted in this thesis.
Resumo:
One of the major interests in soil analysis is the evaluation of its chemical, physical and biological parameters, which are indicators of soil quality (the most important is the organic matter). Besides there is a great interest in the study of humic substances and on the assessment of pollutants, such as pesticides and heavy metals, in soils. Chemometrics is a powerful tool to deal with these problems and can help soil researchers to extract much more information from their data. In spite of this, the presence of these kinds of strategies in the literature has obtained projection only recently. The utilization of chemometric methods in soil analysis is evaluated in this article. The applications will be divided in four parts (with emphasis in the first two): (i) descriptive and exploratory methods based on Principal Component Analysis (PCA); (ii) multivariate calibration methods (MLR, PCR and PLS); (iii) methods such as Evolving Factor Analysis and SIMPLISMA; and (iv) artificial intelligence methods, such as Artificial Neural Networks.
Resumo:
The equilibria, the spectra and the identities of the species of Cr(VI) that are present in aqueous solution have long been an active subject of discussion in the literature. In this paper, three different chemometric methodologies are applied to sets of UV/Visible spectra of aqueous Cr(VI) solutions, in order to solve a chemical system where there is no available information concerning the composition of the samples nor spectral information about the pure species. Imbrie Q-mode factor analysis, followed by varimax rotation and Imbrie oblique projection, were used to estimate the composition of Cr(VI) equilibrium solutions and, by combining these results with the k-matrix method, to obtain the pure spectra of the species. Evolving factor analysis and self modeling curve resolution were used to confirm the number of the species and the resolution of the system, respectively. Sets of 3.3×10-4 and 3.3×10-5 mol L-1 Cr(VI) solutions, respectively, were analyzed in the pH range from 1 to 12. Two factors were identified, which were related to the chromate ion (CrO4(2-)) and bichromate ion (HCrO4-). The pK of the equilibrium was estimated as 5.8.
Resumo:
This paper describes the development of a two-way shallow-transfer rule-based machine translation system between Bulgarian and Macedonian. It gives an account of the resources and the methods used for constructing the system, including the development of monolingual and bilingual dictionaries, syntactic transfer rules and constraint grammars. An evaluation of thesystem's performance was carried out and compared to another commercially available MT system for the two languages. Some future work was suggested.
Resumo:
L'objectiu d'aquest article és analitzar els principals criteris que les guies d'estil recomanen per visibilitzar les dones ¿o per fer un ús no sexista del llenguatge¿ des de dos punts de vista: el sintacticosemàntic i el discursiu. Des del punt de vista sintacticosemàntic, s'estudien bàsicament els fenòmens relacionats amb la coordinació, la concordança i la repetició o elisió d'elements (per exemple, especificadors del nom), i la manera com les diferents opcions afecten el significat oracional. Des del punt de vista discursiu, s'analitzen els fenòmens relacionats amb la coreferència; és a dir, la relació entre les diferents maneres d'expressar un mateix referent per mitjà d'elements nominals al llarg del text, i l'efecte que provoquen en el text en conjunt. Amb aquest objectiu, l'estudi analitza des d'un punt de vista qualitatiu les dades proporcionades per un corpus de textos procedents de tres àmbits (polític, administratiu i educatiu) en què s'apliquen sovint aquesta mena de criteris. Paraules clau: català, llenguatge no sexista, visibilització lingüística de les dones, sintaxi, cohesió, coreferència, llenguatge androcèntric, estil. The goal of this article is to analyse the main criteria recommended by style guides aimed at making women more visible or, in other words, to make a non-sexist use of language. I will concentrate on two main aspects: the syntactic-semantic and the discursive. From a syntactic-semantic point of view, the main elements being studied are those related to coordination, agreement and repetition or omission of elements (for instance, noun specifiers), and also the way the different options chosen affect the meaning of the sentence. From a discursive and stylistic point of view, the elements analysed are those related to coreference, that is, the relationship between the different ways of expressing a same referent through different elements in the text, and the effect they produce in the text as a whole. Having this as the main goal, the study analyses from a qualitative point of view the data from a corpus in three different areas (politics, administration and education), which often apply this kind of criteria. Keywords: Catalan, non-sexist language, female linguistic visibility, syntax, cohesion, co-reference, androcentric language, style
Resumo:
Most research on the underlying causes of social and communicative impairment in autism spectrum disorders (ASD) has been devoted to pragmatic aspects of language. The present research is exploring the syntactic knowledge as a probable underlying mechanism of language deficit in ASD. Three groups comprising high-functioning ASD, low-functioning ASD, and typically developing 5-year-old Persian-speaking children were tested on comprehension of passive sentences. Results suggest that while low-functioning children with ASD might be impaired in the area of grammar, high-functioning children with ASD are not. The new results are compared to those of two recent studies on comprehension of passives in Greek-speaking and English-speaking subjects with ASD (Perovic et al., 2007; Terzi, et al., to appear).
Resumo:
Recent years have produced great advances in the instrumentation technology. The amount of available data has been increasing due to the simplicity, speed and accuracy of current spectroscopic instruments. Most of these data are, however, meaningless without a proper analysis. This has been one of the reasons for the overgrowing success of multivariate handling of such data. Industrial data is commonly not designed data; in other words, there is no exact experimental design, but rather the data have been collected as a routine procedure during an industrial process. This makes certain demands on the multivariate modeling, as the selection of samples and variables can have an enormous effect. Common approaches in the modeling of industrial data are PCA (principal component analysis) and PLS (projection to latent structures or partial least squares) but there are also other methods that should be considered. The more advanced methods include multi block modeling and nonlinear modeling. In this thesis it is shown that the results of data analysis vary according to the modeling approach used, thus making the selection of the modeling approach dependent on the purpose of the model. If the model is intended to provide accurate predictions, the approach should be different than in the case where the purpose of modeling is mostly to obtain information about the variables and the process. For industrial applicability it is essential that the methods are robust and sufficiently simple to apply. In this way the methods and the results can be compared and an approach selected that is suitable for the intended purpose. Differences in data analysis methods are compared with data from different fields of industry in this thesis. In the first two papers, the multi block method is considered for data originating from the oil and fertilizer industries. The results are compared to those from PLS and priority PLS. The third paper considers applicability of multivariate models to process control for a reactive crystallization process. In the fourth paper, nonlinear modeling is examined with a data set from the oil industry. The response has a nonlinear relation to the descriptor matrix, and the results are compared between linear modeling, polynomial PLS and nonlinear modeling using nonlinear score vectors.