226 resultados para lazy parsing
Resumo:
Dissertação de mestrado integrado em Engenharia e Gestão de Sistemas de Informação
Resumo:
The decision support models in intensive care units are developed to support medical staff in their decision making process. However, the optimization of these models is particularly difficult to apply due to dynamic, complex and multidisciplinary nature. Thus, there is a constant research and development of new algorithms capable of extracting knowledge from large volumes of data, in order to obtain better predictive results than the current algorithms. To test the optimization techniques a case study with real data provided by INTCare project was explored. This data is concerning to extubation cases. In this dataset, several models like Evolutionary Fuzzy Rule Learning, Lazy Learning, Decision Trees and many others were analysed in order to detect early extubation. The hydrids Decision Trees Genetic Algorithm, Supervised Classifier System and KNNAdaptive obtained the most accurate rate 93.2%, 93.1%, 92.97% respectively, thus showing their feasibility to work in a real environment.
Resumo:
An increasing number of studies have sprung up in recent years seeking to identify individual inventors from patent data. Different heuristics have been suggested to use their names and other information disclosed in patent documents in order to find out “who is who” in patents. This paper contributes to this literature by setting forth a methodology to identify them using patents applied to the European Patent Office (EPO hereafter). As in the large part of this literature, we basically follow a three-steps procedure: (1) the parsing stage, aimed at reducing the noise in the inventor’s name and other fields of the patent; (2) the matching stage, where name matching algorithms are used to group possible similar names; (3) the filtering stage, where additional information and different scoring schemes are used to filter out these potential same inventors. The paper includes some figures resulting of applying the algorithms to the set of European inventors applying to the EPO for a large period of time.
Resumo:
The Final Year Project consists of two essentially different parts, which share acommon theme: HTML code validation. The first of these two parts focuses on the study of the validation process. It supplies a brief introduction to the evolution of HTML and XHTML, the new tags introduced in HTML5 and the most common errors found in today's websites. Already developed HTML validation tools are analyzed and examined in detail in order to compare their features and evaluate their performances. Lastly, a comparison of the parsing process in the most common browsers found nowadays is provided. In the second part of the project the focus of the project is shifted towards the development of a XHTML5 validation tool. The input is a XHTML5 file whose content may or may not comply with the W3C specification, and therefore, may or may not be a valid XHTML5 document. The output provided by this tool will be a fixed XHTML5 document and an error log returned in the form of a XML file. Information as to the course of action pursued to fix the error and its location will also be included.
Resumo:
Le passage de la vie solitaire à la vie sociale représente une des principales transitions évolutives. La socialité a évolué au sein de plusieurs taxons du règne animal et notamment chez les insectes sociaux qui ont atteint son niveau le plus élevé : l'eusocialité. Les colonies d'insectes sociaux se composent d'une reine, qui monopolise la reproduction, et d'ouvrières, non-reproductrices ou parfois stériles, qui aident à élever la descendance de la reine. Selon la théorie de la sélection de parentèle, les ouvrières augmentent leur fitness (succès reproducteur) non pas à travers leur propre progéniture, mais en aidant des individus apparentés (leur reine) à produire davantage de descendants. Cette théorie prédit ainsi que les ouvrières ont un intérêt à rester fidèles à leur nid natal. Toutefois, chez la guêpe tropicale Polistes canadensis, de nombreuse ouvrières visitent d'autres nids que leur nid natal : un phénomène appelé « dérive des ouvrières ». Le but de ce doctorat est ainsi de mieux comprendre les mécanismes impliqués dans ce comportement particulier des ouvrières ainsi que ces implications pour la théorie de la sélection de parentèle. Nous avons examiné le comportement de dérive des ouvrières à travers une étude des dynamiques sociales chez la guêpe tropicale P. canadensis. Mes résultats montrent que les populations de P. canadensis se composent en différentes agrégations de nids. Malgré de précédentes suggestions, on n'observe qu'une faible viscosité génétique au sein des populations de P. canadensis étudiées. On retrouve toutefois un degré d'apparentement entre nids d'une même agrégation. Ceci suggère que les ouvrières dériveuses sont susceptibles de bénéficier de fitness indirect en aidant les nids proches géographiquement. De plus, ces échanges d'ouvrières ne semblent pas accidentels puisque l'on constate des variations de taux de dérive et puisque les déplacements observés entre nids persistent sur plusieurs périodes de temps. La charge de travail, qui correspond aux différences d'effort de fourragement entre nid visités et natals, est décrite dans notre étude comme potentiel facteur expliquant le comportement de dérive des ouvrières chez P. canadensis. Nos expériences de retrait d'ouvrières et de couvain ont révélées que les dériveuses ne semblent pas répondre aux changements de besoins en aide des nids visités. Les ouvrières dériveuses biaisent leur effort en aidant leur propre nid, par lequel elles bénéficient le plus en termes de fitness indirect, avant de se consacrer à tout autre nid. Dans l'ensemble, ces résultats sur la dérive des ouvrières chez P. canadensis sont cohérents et suggèrent que ce comportement est une importante stratégie de reproduction alternative chez cette espèce qui contribue à la fitness indirecte de ces ouvrières non-reproductrices. De plus, ce doctorat apporte des informations sur la structure génétique des populations de guêpes Polistes et décrit le rôle des ouvrières inactives. Celles-ci semblent servir de réserve en ouvrières apportant du support à la colonie dans l'éventualité d'une perte d'individus. Plus généralement, ce travail met l'accent sur l'organisation complexe et l'adaptabilité des individus dans les sociétés d'insectes. - One major transition in evolution is the shift from solitary to social life. Sociality has evolved in a few taxa of the animal kingdom, most notably in the social insects which have achieved the highest level of sociality: eusociality. Colonies of social insects are formed by a reproductive queen, and many non-reproductive or sterile workers who help raise their mother queen's offspring. Kin selection theory explains worker behaviour in terms of the indirect fitness they gain from raising non-offspring kin. It therefore predicts that workers should stay faithful to their natal nests, to which they are the more related. However, in the tropical paper wasps Polistes canadensis, high levels of nest-drifting, whereby workers spend time on other neighbouring nests, has been reported. This PhD aimed at understanding the mechanisms involved in this peculiar behaviour as well as its implications for kin selection theory. I examined nest-drifting through the study of the social dynamics of the tropical paper wasp P. canadensis. My results showed that populations of this species of paper wasps are composed of different aggregations of nests. The studied populations showed little limited dispersal (viscosity), despite previous suggestion, but nests within these aggregations were more related to each other than nests outside of aggregations. This suggested that drifters may benefit from indirect fitness when helping on neighbouring nests. Drifting was unlikely to be accidental since we found drifting patterns at various rates and consistently over several time periods during monitoring. Workload (differences in colony-level foraging effort) was also a potential factor explaining nest-drifting in P. canadensis. Worker and brood removal experiments revealed that drifters do not respond to any changes in the need for help in the non-natal nests they visit. Drifters thus bias their help in their natal nests, from which they may benefit the most in terms of indirect fitness, before investing in others. Altogether, these results on nest-drifting in P. canadensis are consistent and suggest that nest-drifting is an important alternative reproductive strategy, contributing to the indirect fitness benefits gained by non-reproductive wasps. Additionally, this PhD provides information on the genetic structure of paper wasps' populations and demonstrates the role of inactive or lazy wasps as a "reserve worker force", which provides resilience to the colony in the event of worker mortality. More generally, this work further highlights the complex organization and adaptability of individuals in insect societies.
Resumo:
The physiological basis of human cerebral asymmetry for language remains mysterious. We have used simultaneous physiological and anatomical measurements to investigate the issue. Concentrating on neural oscillatory activity in speech-specific frequency bands and exploring interactions between gestural (motor) and auditory-evoked activity, we find, in the absence of language-related processing, that left auditory, somatosensory, articulatory motor, and inferior parietal cortices show specific, lateralized, speech-related physiological properties. With the addition of ecologically valid audiovisual stimulation, activity in auditory cortex synchronizes with left-dominant input from the motor cortex at frequencies corresponding to syllabic, but not phonemic, speech rhythms. Our results support theories of language lateralization that posit a major role for intrinsic, hardwired perceptuomotor processing in syllabic parsing and are compatible both with the evolutionary view that speech arose from a combination of syllable-sized vocalizations and meaningful hand gestures and with developmental observations suggesting phonemic analysis is a developmentally acquired process.
Resumo:
BACKGROUND: Molecular interaction Information is a key resource in modern biomedical research. Publicly available data have previously been provided in a broad array of diverse formats, making access to this very difficult. The publication and wide implementation of the Human Proteome Organisation Proteomics Standards Initiative Molecular Interactions (HUPO PSI-MI) format in 2004 was a major step towards the establishment of a single, unified format by which molecular interactions should be presented, but focused purely on protein-protein interactions. RESULTS: The HUPO-PSI has further developed the PSI-MI XML schema to enable the description of interactions between a wider range of molecular types, for example nucleic acids, chemical entities, and molecular complexes. Extensive details about each supported molecular interaction can now be captured, including the biological role of each molecule within that interaction, detailed description of interacting domains, and the kinetic parameters of the interaction. The format is supported by data management and analysis tools and has been adopted by major interaction data providers. Additionally, a simpler, tab-delimited format MITAB2.5 has been developed for the benefit of users who require only minimal information in an easy to access configuration. CONCLUSION: The PSI-MI XML2.5 and MITAB2.5 formats have been jointly developed by interaction data producers and providers from both the academic and commercial sector, and are already widely implemented and well supported by an active development community. PSI-MI XML2.5 enables the description of highly detailed molecular interaction data and facilitates data exchange between databases and users without loss of information. MITAB2.5 is a simpler format appropriate for fast Perl parsing or loading into Microsoft Excel.
Resumo:
Tämädiplomityö tutkii kuinka Eclipse -ympäristöä voidaan käyttää testitapausten generoinnissa. Eräs diplomityön pääaiheista on tutkia voidaanko olemassa olevilla Eclipsen komponenteilla parantaa symboolitietoutta, jotta testitapausten generointiin saataisiin lisää tietoa. Aluksi diplomityö antaa lyhyen katsauksen ohjelmistojentestaukseen, jotta lukija ymmärtää mitä ohjelmistotekniikan osa-aluetta diplomityö käsittelee. Tämän jälkeen kerrotaan lisää tietoa itse testitapausten generointiprosessista. Kun perusteet on käsitelty, tutustetaan lukija Eclipse -ympäristöön, mikä se on, mistä se koostuu ja mitä sillä voidaan tehdä. Tarkempaa tietoa kerrotaan Eclipsen komponenteista joita voidaan käyttää apuna testitapausten generoinnissa. Integrointi esimerkkinä diplomityössä esitellään valmiin testitapausgeneraattorin integrointi Eclipse -ympäristöön. Lopuksi Eclipse -pohjaista ratkaisua verrataan symboolitietouden sekä ajoajan kannalta aikaisempaan ratkaisuun. Diplomityön tuloksena syntyi prototyyppi jonka avulla todistettiin, että Eclipse - ympäristöön on mahdollista integroida testitapausgeneraattori ja että se voi lisätä symboolitietoutta. Tämätietouden lisäys kuitenkin lisäsi myös tarvittavaa ajoaikaa, joissakintapauksissa jopa merkittävästi. Samalla todettiin, että tällä hetkellä on menossa projekteja joiden tarkoituksena on parantaa käytettyjen Eclipse komponenttien suorituskykyä ja että tämä voi parantaa tuloksia tulevaisuudessa.
Resumo:
We present a new branch and bound algorithm for weighted Max-SAT, called Lazy which incorporates original data structures and inference rules, as well as a lower bound of better quality. We provide experimental evidence that our solver is very competitive and outperforms some of the best performing Max-SAT and weighted Max-SAT solvers on a wide range of instances.
Resumo:
Ohjelmistojen tietoturva on noussut viime aikoina entistä tärkeämpään rooliin. Ohjelmistojen suunnittelu pitää alusta alkaen hoitaa siten, että tietoturva tulee huomioitua. Ohjelman helppokäyttöisyys ei saisi ajaa tietoturvan edelle, eikä myöskään ohjeiden lukematta jättäminen saa tarkoittaa tietoturvan menetystä. Tärkeä osa ohjelmistojen tietoturvaa on myös ohjelmiston laillinen käyttö. Se miten laiton käyttö estetään sen sijaan on erittäin vaikeaa toteuttaa nykyjärjestelmissä. Työn tarkoituksena oli tutkia Intellitel Communications Oy:n sanomayhdyskäytävää, Intellitel Messaging Gateway, tuotetietoturvan näkökulmasta, löytää sieltä mahdolliset virheet ja myös korjata ne.
Resumo:
Työssä tutkittiin IFC (Industrial Foundation Classes)-tietomallin mukaisen tiedoston jäsentämistä, tiedon jatkoprosessointia ja tiedonsiirtoa sovellusten välillä. Tutkittiin, mitä vaihtoehtoja tiedon siirron toteuttamiseksi ohjelmallisesti on ja mihin suuntaan tiedon siirtäminen on menossa tulevaisuudessa. Soveltavassa osassa toteutettiin IFC-standardin mukaisen ISO10303-tiedoston (Osa 21) jäsentäminen ja tulkitseminen XML-muotoon. Sovelluksessa jäsennetään ja tulkitaan CAD-ohjelmistolla tehty IFC-tiedosto C# -ohjelmointikielellä ja tallennetaan tieto XML-tietokantaan kustannuslaskentaohjelmiston luettavaksi.
Resumo:
This thesis evaluates methods for obtaining high performance in applications running on the mobile Java platform. Based on the evaluated methods, an optimization was done to a Java extension API running on top the Symbian operating system. The API provides location-based services for mobile Java applications. As a part of this thesis, the JNI implementation in Symbian OS was also benchmarked. A benchmarking tool was implemented in the analysis phase in order to implement extensive performance test set. Based on the benchmark results, it was noted that the landmarks implementation of the API was performing very slowly with large amounts of data. The existing implementation proved to be very inconvenient for optimization because the early implementers did not take performance and design issues into consideration. A completely new architecture was implemented for the API in order to provide scalable landmark initialization and data extraction by using lazy initialization methods. Additionally, runtime memory consumption was also an important part of the optimization. The improvement proved to be very efficient based on the measurements after the optimization. Most of the common API use cases performed extremely well compared to the old implementation. Performance optimization is an important quality attribute of any piece of software especially in embedded mobile devices. Typically, projects get into trouble with performance because there are no clear performance targets and knowledge how to achieve them. Well-known guidelines and performance models help to achieve good overall performance in Java applications and programming interfaces.
Resumo:
An increasing number of studies in recent years have sought to identify individual inventors from patent data. A variety of heuristics have been proposed for using the names and other information disclosed in patent documents to establish who is who in patents. This paper contributes to this literature by describing a methodology for identifying inventors using patents applied to the European Patent Office, EPO hereafter. As in much of this literature, we basically follow a threestep procedure : 1- the parsing stage, aimed at reducing the noise in the inventor’s name and other fields of the patent; 2- the matching stage, where name matching algorithms are used to group similar names; and 3- the filtering stage, where additional information and various scoring schemes are used to filter out these similarlynamed inventors. The paper presents the results obtained by using the algorithms with the set of European inventors applying to the EPO over a long period of time.
Resumo:
The extensional theory of arrays is one of the most important ones for applications of SAT Modulo Theories (SMT) to hardware and software verification. Here we present a new T-solver for arrays in the context of the DPLL(T) approach to SMT. The main characteristics of our solver are: (i) no translation of writes into reads is needed, (ii) there is no axiom instantiation, and (iii) the T-solver interacts with the Boolean engine by asking to split on equality literals between indices. As far as we know, this is the first accurate description of an array solver integrated in a state-of-the-art SMT solver and, unlike most state-of-the-art solvers, it is not based on a lazy instantiation of the array axioms. Moreover, it is very competitive in practice, specially on problems that require heavy reasoning on array literals
Resumo:
Machine learning provides tools for automated construction of predictive models in data intensive areas of engineering and science. The family of regularized kernel methods have in the recent years become one of the mainstream approaches to machine learning, due to a number of advantages the methods share. The approach provides theoretically well-founded solutions to the problems of under- and overfitting, allows learning from structured data, and has been empirically demonstrated to yield high predictive performance on a wide range of application domains. Historically, the problems of classification and regression have gained the majority of attention in the field. In this thesis we focus on another type of learning problem, that of learning to rank. In learning to rank, the aim is from a set of past observations to learn a ranking function that can order new objects according to how well they match some underlying criterion of goodness. As an important special case of the setting, we can recover the bipartite ranking problem, corresponding to maximizing the area under the ROC curve (AUC) in binary classification. Ranking applications appear in a large variety of settings, examples encountered in this thesis include document retrieval in web search, recommender systems, information extraction and automated parsing of natural language. We consider the pairwise approach to learning to rank, where ranking models are learned by minimizing the expected probability of ranking any two randomly drawn test examples incorrectly. The development of computationally efficient kernel methods, based on this approach, has in the past proven to be challenging. Moreover, it is not clear what techniques for estimating the predictive performance of learned models are the most reliable in the ranking setting, and how the techniques can be implemented efficiently. The contributions of this thesis are as follows. First, we develop RankRLS, a computationally efficient kernel method for learning to rank, that is based on minimizing a regularized pairwise least-squares loss. In addition to training methods, we introduce a variety of algorithms for tasks such as model selection, multi-output learning, and cross-validation, based on computational shortcuts from matrix algebra. Second, we improve the fastest known training method for the linear version of the RankSVM algorithm, which is one of the most well established methods for learning to rank. Third, we study the combination of the empirical kernel map and reduced set approximation, which allows the large-scale training of kernel machines using linear solvers, and propose computationally efficient solutions to cross-validation when using the approach. Next, we explore the problem of reliable cross-validation when using AUC as a performance criterion, through an extensive simulation study. We demonstrate that the proposed leave-pair-out cross-validation approach leads to more reliable performance estimation than commonly used alternative approaches. Finally, we present a case study on applying machine learning to information extraction from biomedical literature, which combines several of the approaches considered in the thesis. The thesis is divided into two parts. Part I provides the background for the research work and summarizes the most central results, Part II consists of the five original research articles that are the main contribution of this thesis.