973 resultados para combinatorial pattern matching
Resumo:
La tesis que se presenta tiene como propósito la construcción automática de ontologías a partir de textos, enmarcándose en el área denominada Ontology Learning. Esta disciplina tiene como objetivo automatizar la elaboración de modelos de dominio a partir de fuentes información estructurada o no estructurada, y tuvo su origen con el comienzo del milenio, a raíz del crecimiento exponencial del volumen de información accesible en Internet. Debido a que la mayoría de información se presenta en la web en forma de texto, el aprendizaje automático de ontologías se ha centrado en el análisis de este tipo de fuente, nutriéndose a lo largo de los años de técnicas muy diversas provenientes de áreas como la Recuperación de Información, Extracción de Información, Sumarización y, en general, de áreas relacionadas con el procesamiento del lenguaje natural. La principal contribución de esta tesis consiste en que, a diferencia de la mayoría de las técnicas actuales, el método que se propone no analiza la estructura sintáctica superficial del lenguaje, sino que estudia su nivel semántico profundo. Su objetivo, por tanto, es tratar de deducir el modelo del dominio a partir de la forma con la que se articulan los significados de las oraciones en lenguaje natural. Debido a que el nivel semántico profundo es independiente de la lengua, el método permitirá operar en escenarios multilingües, en los que es necesario combinar información proveniente de textos en diferentes idiomas. Para acceder a este nivel del lenguaje, el método utiliza el modelo de las interlinguas. Estos formalismos, provenientes del área de la traducción automática, permiten representar el significado de las oraciones de forma independiente de la lengua. Se utilizará en concreto UNL (Universal Networking Language), considerado como la única interlingua de propósito general que está normalizada. La aproximación utilizada en esta tesis supone la continuación de trabajos previos realizados tanto por su autor como por el equipo de investigación del que forma parte, en los que se estudió cómo utilizar el modelo de las interlinguas en las áreas de extracción y recuperación de información multilingüe. Básicamente, el procedimiento definido en el método trata de identificar, en la representación UNL de los textos, ciertas regularidades que permiten deducir las piezas de la ontología del dominio. Debido a que UNL es un formalismo basado en redes semánticas, estas regularidades se presentan en forma de grafos, generalizándose en estructuras denominadas patrones lingüísticos. Por otra parte, UNL aún conserva ciertos mecanismos de cohesión del discurso procedentes de los lenguajes naturales, como el fenómeno de la anáfora. Con el fin de aumentar la efectividad en la comprensión de las expresiones, el método provee, como otra contribución relevante, la definición de un algoritmo para la resolución de la anáfora pronominal circunscrita al modelo de la interlingua, limitada al caso de pronombres personales de tercera persona cuando su antecedente es un nombre propio. El método propuesto se sustenta en la definición de un marco formal, que ha debido elaborarse adaptando ciertas definiciones provenientes de la teoría de grafos e incorporando otras nuevas, con el objetivo de ubicar las nociones de expresión UNL, patrón lingüístico y las operaciones de encaje de patrones, que son la base de los procesos del método. Tanto el marco formal como todos los procesos que define el método se han implementado con el fin de realizar la experimentación, aplicándose sobre un artículo de la colección EOLSS “Encyclopedia of Life Support Systems” de la UNESCO. ABSTRACT The purpose of this thesis is the automatic construction of ontologies from texts. This thesis is set within the area of Ontology Learning. This discipline aims to automatize domain models from structured or unstructured information sources, and had its origin with the beginning of the millennium, as a result of the exponential growth in the volume of information accessible on the Internet. Since most information is presented on the web in the form of text, the automatic ontology learning is focused on the analysis of this type of source, nourished over the years by very different techniques from areas such as Information Retrieval, Information Extraction, Summarization and, in general, by areas related to natural language processing. The main contribution of this thesis consists of, in contrast with the majority of current techniques, the fact that the method proposed does not analyze the syntactic surface structure of the language, but explores his deep semantic level. Its objective, therefore, is trying to infer the domain model from the way the meanings of the sentences are articulated in natural language. Since the deep semantic level does not depend on the language, the method will allow to operate in multilingual scenarios, where it is necessary to combine information from texts in different languages. To access to this level of the language, the method uses the interlingua model. These formalisms, coming from the area of machine translation, allow to represent the meaning of the sentences independently of the language. In this particular case, UNL (Universal Networking Language) will be used, which considered to be the only interlingua of general purpose that is standardized. The approach used in this thesis corresponds to the continuation of previous works carried out both by the author of this thesis and by the research group of which he is part, in which it is studied how to use the interlingua model in the areas of multilingual information extraction and retrieval. Basically, the procedure defined in the method tries to identify certain regularities at the UNL representation of texts that allow the deduction of the parts of the ontology of the domain. Since UNL is a formalism based on semantic networks, these regularities are presented in the form of graphs, generalizing in structures called linguistic patterns. On the other hand, UNL still preserves certain mechanisms of discourse cohesion from natural languages, such as the phenomenon of the anaphora. In order to increase the effectiveness in the understanding of expressions, the method provides, as another significant contribution, the definition of an algorithm for the resolution of pronominal anaphora limited to the model of the interlingua, in the case of third person personal pronouns when its antecedent is a proper noun. The proposed method is based on the definition of a formal framework, adapting some definitions from Graph Theory and incorporating new ones, in order to locate the notions of UNL expression and linguistic pattern, as well as the operations of pattern matching, which are the basis of the method processes. Both the formal framework and all the processes that define the method have been implemented in order to carry out the experimentation, applying on an article of the "Encyclopedia of Life Support Systems" of the UNESCO-EOLSS collection.
Resumo:
The mutagenic activity of the major DNA adduct formed by the liver carcinogen aflatoxin B1 (AFB1) was investigated in vivo. An oligonucleotide containing a single 8,9-dihydro-8-(N7-guanyl)-9-hydroxyaflatoxin B1 (AFB1-N7-Gua) adduct was inserted into the single-stranded genome of bacteriophage M13. Replication in SOS-induced Escherichia coli yielded a mutation frequency for AFB1-N7-Gua of 4%. The predominant mutation was G --> T, identical to the principal mutation in human liver tumors believed to be induced by aflatoxin. The G --> T mutations of AFB1-N7-Gua, unlike those (if the AFB1-N7-Gua-derived apurinic site, were much more strongly dependent on MucAB than UmuDC, a pattern matching that in intact cells treated with the toxin. It is concluded that the AFB1-N7-Gua adduct, and not the apurinic site, has genetic requirements for mutagenesis that best explain mutations in aflatoxin-treated cells. While most mutations were targeted to the site of the lesion, a significant fraction (13%) occurred at the base 5' to the modified guanine. In contrast, the apurinic site-containing genome gave rise only to targeted mutations. The mutational asymmetry observed for AFB1-N7-Gua is consistent with structural models indicating that the aflatoxin moiety of the aflatoxin guanine adduct is covalently intercalated on the 5' face of the guanine residue. These results suggest a molecular mechanism that could explain an important step in the carcinogenicity of aflatoxin B1.
Resumo:
This paper reports on the development of an artificial neural network (ANN) method to detect laminar defects following the pattern matching approach utilizing dynamic measurement. Although structural health monitoring (SHM) using ANN has attracted much attention in the last decade, the problem of how to select the optimal class of ANN models has not been investigated in great depth. It turns out that the lack of a rigorous ANN design methodology is one of the main reasons for the delay in the successful application of the promising technique in SHM. In this paper, a Bayesian method is applied in the selection of the optimal class of ANN models for a given set of input/target training data. The ANN design method is demonstrated for the case of the detection and characterisation of laminar defects in carbon fibre-reinforced beams using flexural vibration data for beams with and without non-symmetric delamination damage.
Resumo:
This thesis begins by providing a review of techniques for interpreting the thermal response at the earth's surface acquired using remote sensing technology. Historic limitations in the precision with which imagery acquired from airborne platforms can be geometrically corrected and co-registered has meant that relatively little work has been carried out examining the diurnal variation of surface temperature over wide regions. Although emerging remote sensing systems provide the potential to register temporal image data within satisfactory levels of accuracy, this technology is still not widely available and does not address the issue of historic data sets which cannot be rectified using conventional parametric approaches. In overcoming these problems, the second part of this thesis describes the development of an alternative approach for rectifying airborne line-scanned imagery. The underlying assumption that scan lines within the imagery are straight greatly reduces the number of ground control points required to describe the image geometry. Furthermore, the use of pattern matching procedures to identify geometric disparities between raw line-scanned imagery and corresponding aerial photography enables the correction procedure to be almost fully automated. By reconstructing the raw image data on a truly line-by-line basis, it is possible to register the airborne line-scanned imagery to the aerial photography with an average accuracy of better than one pixel. Providing corresponding aerial photography is available, this approach can be applied in the absence of platform altitude information allowing multi-temporal data sets to be corrected and registered.
Resumo:
Purpose – To examine management literature for guidance on what constitutes a discipline. To examine supply management publications to determine whether the field constitutes a discipline or an emerging discipline. To contribute a structured evaluation to the body of supply management theory/discipline development knowledge. Design/methodology/approach – Literature review of what constitutes a discipline and an initial assessment of whether supply management is a discipline. Development of research questions used to design tests, using combinations of qualitative pattern matching, journal quality rankings, and social science citations index impact factor. Application of the tests, to evaluate field coherence, quality and the existence of a discipline-debate, to determine whether supply management is an emerging discipline. Findings – An initial literature review finds supply management not to be a discipline, as the field lacks quality of theoretical development and discussion, and coherence. Tests for increasing evidence of coherence, quality and impact yield positive results, indicating that supply management is progressing in its theoretical development. The test findings combined with the existence of the start of a discipline-debate indicate that supply management should be judged to be an emerging discipline. Originality/value – Drawing from the management literature, the paper provides a unique structured evaluation of the field of supply management, finding it not to be a discipline, but showing evidence of being an emerging discipline.
Resumo:
Purpose - The purpose of this paper is to assess high-dimensional visualisation, combined with pattern matching, as an approach to observing dynamic changes in the ways people tweet about science topics. Design/methodology/approach - The high-dimensional visualisation approach was applied to three scientific topics to test its effectiveness for longitudinal analysis of message framing on Twitter over two disjoint periods in time. The paper uses coding frames to drive categorisation and visual analytics of tweets discussing the science topics. Findings - The findings point to the potential of this mixed methods approach, as it allows sufficiently high sensitivity to recognise and support the analysis of non-trending as well as trending topics on Twitter. Research limitations/implications - Three topics are studied and these illustrate a range of frames, but results may not be representative of all scientific topics. Social implications - Funding bodies increasingly encourage scientists to participate in public engagement. As social media provides an avenue actively utilised for public communication, understanding the nature of the dialog on this medium is important for the scientific community and the public at large. Originality/value - This study differs from standard approaches to the analysis of microblog data, which tend to focus on machine driven analysis large-scale datasets. It provides evidence that this approach enables practical and effective analysis of the content of midsize to large collections of microposts.
Resumo:
To promote regional or mutual improvement, numerous interjurisdictional efforts to share tax bases have been attempted. Most of these efforts fail to be consummated. Motivations to share revenues include: narrowing fiscal disparities, enhancing regional cooperation and economic development, rationalizing land-use, and minimizing revenue losses caused by competition to attract and keep businesses. Various researchers have developed theories to aid understanding of why interjurisdictional cooperation efforts succeed or fail. Walter Rosenbaum and Gladys Kammerer studied two contemporaneous Florida local-government consolidation attempts. Boyd Messinger subsequently tested their Theory of Successful Consolidation on nine consolidation attempts. Paul Peterson's dual theories on Modern Federalism posit that all governmental levels attempt to further economic development and that politicians act in ways that either further their futures or cement job security. Actions related to the latter theory often interfere with the former. Samuel Nunn and Mark Rosentraub sought to learn how interjurisdictional cooperation evolves. Through multiple case studies they developed a model framing interjurisdictional cooperation in four dimensions. ^ This dissertation investigates the ability of the above theories to help predict success or failure of regional tax-base revenue sharing attempts. A research plan was formed that used five sequenced steps to gather data, analyze it, and conclude if hypotheses concerning the application of these theories were valid. The primary analytical tools were: multiple case studies, cross-case analysis, and pattern matching. Data was gathered from historical records, questionnaires, and interviews. ^ The results of this research indicate that Rosenbaum-Kammerer theory can be a predictor of success or failure in implementing tax-base revenue sharing if it is amended as suggested by Messinger and further modified by a recommendation in this dissertation. Peterson's Functional and Legislative theories considered together were able to predict revenue sharing proposal outcomes. Many of the indicators of interjurisdictional cooperation forwarded in the Nunn-Rosentraub model appeared in the cases studied, but the model was not a reliable forecasting instrument. ^
Resumo:
After a productivity decrease of established national export industries in Finland such as mobile and paper industries, innovative, smaller companies with the intentions to internationalize right from the start have been proliferating. For software companies early internationalization is an especially good opportunity, as Internet usage becomes increasingly homogeneous across borders and software products often do not need a physical distribution channel. Globalization also makes Finnish companies turn to unfamiliar export markets like Latin America, a very untraditional market for Finns. Relationships consisting of Finnish and Latin American business partners have therefore not been widely studied, especially from a new-age software company’s perspective. To study these partnerships, relationship marketing theory was taken into the core of the study, as its practice focuses mainly on establishing and maintaining relationships with stakeholders at a profit, so that the objectives of all parties are met, which is done by a mutual exchange and fulfillment of promises. The most important dimensions of relationship marketing were identified as trust, commitment and attraction, which were then focused on, as the study aims to understand the implications Latin American business culture has for the understanding, and hence, effective application of relationship marketing in the Latin American market. The question to be answered consecutively was how should the dimensions of trust, commitment and attraction be understood in business relationships in Latin America? The study was conducted by first joining insights given by Latin American business culture literature with overall theories on the three dimensions. Through pattern matching, these insights were compared to empirical evidence collected from business professionals of the Latin American market and from the experiences of Finnish software businesses that had recently expanded into the market. What was found was that previous literature on Latin American business culture had already named many implications for the relationship marketing dimensions that were relevant also for small Finnish software firms on the market. However, key findings also presented important new drivers for the three constructs. Local presence in the area where the Latin American partner is located was found to drive or enhance trust, commitment and attraction. High-frequency follow up procedures were in turn found to drive commitment and attraction. Both local presence and follow up were defined according to the respective evidence in the study. Also, in the context of Finnish software firms in relationships with Latin American partners, the national origins or the foreignness of the Finnish party was seen to enhance trust and attraction in the relationship
Resumo:
Document representations can rapidly become unwieldy if they try to encapsulate all possible document properties, ranging from abstract structure to detailed rendering and layout. We present a composite document approach wherein an XMLbased document representation is linked via a shadow tree of bi-directional pointers to a PDF representation of the same document. Using a two-window viewer any material selected in the PDF can be related back to the corresponding material in the XML, and vice versa. In this way the treatment of specialist material such as mathematics, music or chemistry (e.g. via read aloud or play aloud ) can be activated via standard tools working within the XML representation, rather than requiring that application-specific structures be embedded in the PDF itself. The problems of textual recognition and tree pattern matching between the two representations are discussed in detail. Comparisons are drawn between our use of a shadow tree of pointers to map between document representations and the use of a code-replacement shadow tree in technologies such as XBL.
Resumo:
SQL Injection Attack (SQLIA) remains a technique used by a computer network intruder to pilfer an organisation’s confidential data. This is done by an intruder re-crafting web form’s input and query strings used in web requests with malicious intent to compromise the security of an organisation’s confidential data stored at the back-end database. The database is the most valuable data source, and thus, intruders are unrelenting in constantly evolving new techniques to bypass the signature’s solutions currently provided in Web Application Firewalls (WAF) to mitigate SQLIA. There is therefore a need for an automated scalable methodology in the pre-processing of SQLIA features fit for a supervised learning model. However, obtaining a ready-made scalable dataset that is feature engineered with numerical attributes dataset items to train Artificial Neural Network (ANN) and Machine Leaning (ML) models is a known issue in applying artificial intelligence to effectively address ever evolving novel SQLIA signatures. This proposed approach applies numerical attributes encoding ontology to encode features (both legitimate web requests and SQLIA) to numerical data items as to extract scalable dataset for input to a supervised learning model in moving towards a ML SQLIA detection and prevention model. In numerical attributes encoding of features, the proposed model explores a hybrid of static and dynamic pattern matching by implementing a Non-Deterministic Finite Automaton (NFA). This combined with proxy and SQL parser Application Programming Interface (API) to intercept and parse web requests in transition to the back-end database. In developing a solution to address SQLIA, this model allows processed web requests at the proxy deemed to contain injected query string to be excluded from reaching the target back-end database. This paper is intended for evaluating the performance metrics of a dataset obtained by numerical encoding of features ontology in Microsoft Azure Machine Learning (MAML) studio using Two-Class Support Vector Machines (TCSVM) binary classifier. This methodology then forms the subject of the empirical evaluation.
Resumo:
Recent years have seen an astronomical rise in SQL Injection Attacks (SQLIAs) used to compromise the confidentiality, authentication and integrity of organisations’ databases. Intruders becoming smarter in obfuscating web requests to evade detection combined with increasing volumes of web traffic from the Internet of Things (IoT), cloud-hosted and on-premise business applications have made it evident that the existing approaches of mostly static signature lack the ability to cope with novel signatures. A SQLIA detection and prevention solution can be achieved through exploring an alternative bio-inspired supervised learning approach that uses input of labelled dataset of numerical attributes in classifying true positives and negatives. We present in this paper a Numerical Encoding to Tame SQLIA (NETSQLIA) that implements a proof of concept for scalable numerical encoding of features to a dataset attributes with labelled class obtained from deep web traffic analysis. In the numerical attributes encoding: the model leverages proxy in the interception and decryption of web traffic. The intercepted web requests are then assembled for front-end SQL parsing and pattern matching by applying traditional Non-Deterministic Finite Automaton (NFA). This paper is intended for a technique of numerical attributes extraction of any size primed as an input dataset to an Artificial Neural Network (ANN) and statistical Machine Learning (ML) algorithms implemented using Two-Class Averaged Perceptron (TCAP) and Two-Class Logistic Regression (TCLR) respectively. This methodology then forms the subject of the empirical evaluation of the suitability of this model in the accurate classification of both legitimate web requests and SQLIA payloads.
Resumo:
A visibility/invisibility paradox of trust operates in the development of distributed educational leadership for online communities. If trust is to be established, the team-based informal ethos of online collaborative networked communities requires a different kind of leadership from that observed in more formal face-to-face positional hierarchies. Such leadership is more flexible and sophisticated, being capable of encompassing both ambiguity and agile response to change. Online educational leaders need to be partially invisible, delegating discretionary powers, to facilitate the effective distribution of leadership tasks in a highly trusting team-based culture. Yet, simultaneously, online communities are facilitated by the visibility and subtle control effected by expert leaders. This paradox: that leaders need to be both highly visible and invisible when appropriate, was derived during research on 'Trust and Leadership' and tested in the analysis of online community case study discussions using a pattern-matching process to measure conversational interactions. This paper argues that both leader visibility and invisibility are important for effective trusting collaboration in online distributed leadership. Advanced leadership responses to complex situations in online communities foster positive group interaction, mutual trust and effective decision-making, facilitated through the active distribution of tasks.
Resumo:
We consider the statistical problem of catalogue matching from a machine learning perspective with the goal of producing probabilistic outputs, and using all available information. A framework is provided that unifies two existing approaches to producing probabilistic outputs in the literature, one based on combining distribution estimates and the other based on combining probabilistic classifiers. We apply both of these to the problem of matching the HI Parkes All Sky Survey radio catalogue with large positional uncertainties to the much denser SuperCOSMOS catalogue with much smaller positional uncertainties. We demonstrate the utility of probabilistic outputs by a controllable completeness and efficiency trade-off and by identifying objects that have high probability of being rare. Finally, possible biasing effects in the output of these classifiers are also highlighted and discussed.