837 resultados para semi binary based feature detectordescriptor
Resumo:
Biomedical natural language processing (BioNLP) is a subfield of natural language processing, an area of computational linguistics concerned with developing programs that work with natural language: written texts and speech. Biomedical relation extraction concerns the detection of semantic relations such as protein-protein interactions (PPI) from scientific texts. The aim is to enhance information retrieval by detecting relations between concepts, not just individual concepts as with a keyword search. In recent years, events have been proposed as a more detailed alternative for simple pairwise PPI relations. Events provide a systematic, structural representation for annotating the content of natural language texts. Events are characterized by annotated trigger words, directed and typed arguments and the ability to nest other events. For example, the sentence “Protein A causes protein B to bind protein C” can be annotated with the nested event structure CAUSE(A, BIND(B, C)). Converted to such formal representations, the information of natural language texts can be used by computational applications. Biomedical event annotations were introduced by the BioInfer and GENIA corpora, and event extraction was popularized by the BioNLP'09 Shared Task on Event Extraction. In this thesis we present a method for automated event extraction, implemented as the Turku Event Extraction System (TEES). A unified graph format is defined for representing event annotations and the problem of extracting complex event structures is decomposed into a number of independent classification tasks. These classification tasks are solved using SVM and RLS classifiers, utilizing rich feature representations built from full dependency parsing. Building on earlier work on pairwise relation extraction and using a generalized graph representation, the resulting TEES system is capable of detecting binary relations as well as complex event structures. We show that this event extraction system has good performance, reaching the first place in the BioNLP'09 Shared Task on Event Extraction. Subsequently, TEES has achieved several first ranks in the BioNLP'11 and BioNLP'13 Shared Tasks, as well as shown competitive performance in the binary relation Drug-Drug Interaction Extraction 2011 and 2013 shared tasks. The Turku Event Extraction System is published as a freely available open-source project, documenting the research in detail as well as making the method available for practical applications. In particular, in this thesis we describe the application of the event extraction method to PubMed-scale text mining, showing how the developed approach not only shows good performance, but is generalizable and applicable to large-scale real-world text mining projects. Finally, we discuss related literature, summarize the contributions of the work and present some thoughts on future directions for biomedical event extraction. This thesis includes and builds on six original research publications. The first of these introduces the analysis of dependency parses that leads to development of TEES. The entries in the three BioNLP Shared Tasks, as well as in the DDIExtraction 2011 task are covered in four publications, and the sixth one demonstrates the application of the system to PubMed-scale text mining.
Resumo:
Abstract This doctoral thesis concerns the active galactic nucleus (AGN) most often referred to with the catalogue number OJ287. The publications in the thesis present new discoveries of the system in the context of a supermassive binary black hole model. In addition, the introduction discusses general characteristics of the OJ287 system and the physical fundamentals behind these characteristics. The place of OJ287 in the hierarchy of known types of AGN is also discussed. The introduction presents a large selection of fundamental physics required to have a basic understanding of active galactic nuclei, binary black holes, relativistic jets and accretion disks. Particularly the general relativistic nature of the orbits of close binaries of supermassive black holes is explored with some detail. Analytic estimates of some of the general relativistic effects in such a binary are presented, as well as numerical methods to calculate the effects more precisely. It is also shown how these results can be applied to the OJ287 system. The binary orbit model forms the basis for models of the recurring optical outbursts in the OJ287 system. In the introduction, two physical outburst models are presented in some detail and compared. The radiation hydrodynamics of the outbursts are discussed and optical light curve predictions are derived. The precursor outbursts studied in Paper III are also presented, and tied into the model of OJ287. To complete the discussion of the observable features of OJ287, the nature of the relativistic jets in the system, and in active galactic nuclei in general, is discussed. Basic physics of relativistic jets are presented, with additional detail added in the form of helical jet models. The results of Papers II, IV and V concerning the jet of OJ287 are presented, and their relation to other facets of the binary black hole model is discussed. As a whole, the introduction serves as a guide, though terse, for the physics and numerical methods required to successfully understand and simulate a close binary of supermassive black holes. For this purpose, the introduction necessarily combines a large number of both fundamental and specific results from broad disciplines like general relativity and radiation hydrodynamics. With the material included in the introduction, the publications of the thesis, which present new results with a much narrower focus, can be readily understood. Of the publications, Paper I presents newly discovered optical data points for OJ287, detected on archival astronomical plates from the Harvard College Observatory. These data points show the 1900 outburst of OJ287 for the first time. In addition, new data points covering the 1913 outburst allowed the determination of the start of the outburst with more precision than was possible before. These outbursts were then successfully numerically modelled with an N-body simulation of the OJ287 binary and accretion disc. In Paper II, mechanisms for the spin-up of the secondary black hole in OJ287 via interaction with the primary accretion disc and the magnetic fields in the system are discussed. Timescales for spin-up and alignment via both processes are estimated. It is found that the secondary black hole likely has a high spin. Paper III reports a new outburst of OJ287 in March 2013. The outburst was found to be rather similar to the ones reported in 1993 and 2004. All these outbursts happened just before the main outburst season, and are called precursor outbursts. In this paper, a mechanism was proposed for the precursor outbursts, where the secondary black hole collides with a gas cloud in the primary accretion disc corona. From this, estimates of brightness and timescales for the precursor were derived, as well as a prediction of the timing of the next precursor outburst. In Paper IV, observations from the 2004–2006 OJ287 observing program are used to investigate the existence of short periodicities in OJ287. The existence of a _50 day quasiperiodic component is confirmed. In addition, statistically significant 250 day and 3.5 day periods are found. Primary black hole accretion of a spiral density wave in the accretion disc is proposed as the source of the 50 day period, with numerical simulations supporting these results. Lorentz contracted jet re-emission is then proposed as the reason for the 3.5 day timescale. Paper V fits optical observations and mm and cm radio observations of OJ287 with a helical jet model. The jet is found to have a spine–sheath structure, with the sheath having a much lower Lorentz gamma factor than the spine. The sheath opening angle and Lorentz factor, as well as the helical wavelength of the jet are reported for the first time. Tiivistelmä Tässä väitöskirjatutkimuksessa on keskitytty tutkimaan aktiivista galaksiydintä OJ287. Väitöskirjan osana olevat tieteelliset julkaisut esittelevät OJ287-systeemistä saatuja uusia tuloksia kaksoismusta-aukkomallin kontekstissa. Väitöskirjan johdannossa käsitellään OJ287:n yleisiä ominaisuuksia ja niitä fysikaalisia perusilmiöitä, jotka näiden ominaisuuksien taustalla vaikuttavat. Johdanto selvittää myös OJ287-järjestelmän sijoittumisen aktiivisten galaksiytimien hierarkiassa. Johdannossa käydään läpi joitakin perusfysiikan tuloksia, jotka ovat tarpeen aktiivisten galaksiydinten, mustien aukkojen binäärien, relativististen suihkujen ja kertymäkiekkojen ymmärtämiseksi. Kahden toisiaan kiertävän mustan aukon keskinäisen radan suhteellisuusteoreettiset perusteet käydään läpi yksityiskohtaisemmin. Johdannossa esitetään joitakin analyyttisiä tuloksia tällaisessa binäärissä havaittavista suhteellisuusteoreettisista ilmiöistä. Myös numeerisia menetelmiä näiden ilmiöiden tarkempaan laskemiseen esitellään. Tuloksia sovelletaan OJ287-systeemiin, ja verrataan havaintoihin. OJ287:n mustien aukkojen ratamalli muodostaa pohjan systeemin toistuvien optisten purkausten malleille. Johdannossa esitellään yksityiskohtaisemmin kaksi fysikaalista purkausmallia, ja vertaillaan niitä. Purkausten säteilyhydrodynamiikka käydään läpi, ja myös ennusteet purkausten valokäyrille johdetaan. Johdannossa esitellään myös Julkaisussa III johdettu prekursoripurkausten malli, ja osoitetaan sen sopivan yhteen OJ287:n binäärimallin kanssa. Johdanto esittelee myös relativististen suihkujen fysiikkaa sekä OJ287- systeemiin liittyen että aktiivisten galaksiydinten kontekstissa yleisesti. Relativististen suihkujen perusfysiikka esitellään, kuten myös malleja kierteisistä suihkuista. Julkaisujen II, IV ja V OJ287-systeemin suihkuja koskevat tulokset esitellään binäärimallin kontekstissa. Kokonaisuutena johdanto palvelee suppeana oppaana, joka esittelee tarvittavan fysiikan ja tarpeelliset numeeriset menetelmät mustien aukkojen binäärijärjestelmän ymmärtämiseen ja simulointiin. Tätä tarkoitusta varten johdanto yhdistää sekä perustuloksia että joitakin syvällisempiä tuloksia laajoilta fysiikan osa-alueilta kuten suhteellisuusteoriasta ja säteilyhydrodynamiikasta. Johdannon sisältämän materiaalin avulla väitöskirjan julkaisut, ja niiden esittämät tulokset, ovat hyvin ymmärrettävissä. Väitöskirjan julkaisuista ensimmäinen esittelee uusia OJ287-systeemistä saatuja havaintopisteitä, jotka on paikallistettu Harvardin yliopiston observatorion arkiston valokuvauslevyiltä. OJ287:n vuonna 1900 tapahtunut purkaus nähdään ensimmäistä kertaa näissä havaintopisteissä. Uudet havaintopisteet mahdollistivat myös vuoden 1913 purkauksen alun ajoittamisen tarkemmin kuin aiemmin oli mahdollista. Havaitut purkaukset mallinnettiin onnistuneesti simuloimalla OJ287-järjestelmän mustien aukkojen paria ja kertymäkiekkoa. Julkaisussa II käsitellään mekanismeja OJ287:n sekundäärisen mustan aukon spinin kasvamiseen vuorovaikutuksessa primäärin kertymäkiekon ja systeemin magneettikenttien kanssa. Julkaisussa arvioidaan maksimispinin saavuttamisen ja spinin suunnan vakiintumisen aikaskaalat kummallakin mekanismilla. Tutkimuksessa havaitaan sekundäärin spinin olevan todennäköisesti suuri. Julkaisu III esittelee OJ287-systeemissä maaliskuussa 2013 tapahtuneen purkauksen. Purkauksen havaittiin muistuttavan vuosina 1993 ja 2004 tapahtuneita purkauksia, joita kutsutaan yhteisnimityksellä prekursoripurkaus (precursor outburst). Julkaisussa esitellään purkauksen synnylle mekanismi, jossa OJ287-systeemin sekundäärinen musta aukko osuu primäärisen mustan aukon kertymäkiekon koronassa olevaan kaasupilveen. Mekanismin avulla johdetaan arviot prekursoripurkausten kirkkaudelle ja aikaskaalalle. Julkaisussa johdetaan myös ennuste seuraavan prekursoripurkauksen ajankohdalle. Julkaisussa IV käytetään vuosina 2004–2006 kerättyjä havaintoja OJ287- systeemistä lyhyiden jaksollisuuksien etsintään. Julkaisussa varmennetaan systeemissä esiintyvä n. 50 päivän kvasiperiodisuus. Lisäksi tilastollisesti merkittävät 250 päivän ja 3,5 päivän jaksollisuudet havaitaan. Julkaisussa esitetään malli, jossa primäärisen mustan aukon kertymäkiekossa oleva spiraalitiheysaalto aiheuttaa 50 päivän jaksollisuuden. Mallista tehty numeerinen simulaatio tukee tulosta. Systeemin relativistisen suihkun emittoima aikadilatoitunut säteily esitetään aiheuttajaksi 3,5 päivän jaksollisuusaikaskaalalle. Julkaisussa V sovitetaan kierresuihkumalli OJ287-systeemistä tehtyihin optisiin havaintoihin ja millimetri- sekä senttimetriaallonpituuden radiohavaintoihin. Suihkun rakenteen havaitaan olevan kaksijakoinen ja koostuvan ytimestä ja kuoresta. Suihkun kuorella on merkittävästi pienempi Lorentzin gamma-tekijä kuin suihkun ytimellä. Kuoren avautumiskulma ja Lorentztekijä sekä suihkun kierteen aallonpituus raportoidaan julkaisussa ensimmäistä kertaa.
Resumo:
Twenty areas from eight Brazilian states were compared according to a list of 224 species of Poaceae. In order to determinate affinity patterns between the areas, a binary matrix was submitted to cluster and ordination analysis. The patterns found were then faced to climate and geographic position. The scores corresponding to the areas obtained from the cluster analysis showed a strong correlation to temperature. The scores corresponding to the species suggest a gradient that associates distribution patterns to the photosynthetic pathway (C3 or C4). The current results suggest that the traditional classification of the Southern American grasslands might require some modification in order to be broadly applicable in the Brazilian context.
Resumo:
Nitric oxide (NO) donors produce NO-related activity when applied to biological systems. Among its diverse functions, NO has been implicated in vascular smooth muscle relaxation. Despite the great importance of NO in biological systems, its pharmacological and physiological studies have been limited due to its high reactivity and short half-life. In this review we will focus on our recent investigations of nitrosyl ruthenium complexes as NO-delivery agents and their effects on vascular smooth muscle cell relaxation. The high affinity of ruthenium for NO is a marked feature of its chemistry. The main signaling pathway responsible for the vascular relaxation induced by NO involves the activation of soluble guanylyl-cyclase, with subsequent accumulation of cGMP and activation of cGMP-dependent protein kinase. This in turn can activate several proteins such as K+ channels as well as induce vasodilatation by a decrease in cytosolic Ca2+. Oxidative stress and associated oxidative damage are mediators of vascular damage in several cardiovascular diseases, including hypertension. The increased production of the superoxide anion (O2-) by the vascular wall has been observed in different animal models of hypertension. Vascular relaxation to the endogenous NO-related response or to NO released from NO deliverers is impaired in vessels from renal hypertensive (2K-1C) rats. A growing amount of evidence supports the possibility that increased NO inactivation by excess O2- may account for the decreased NO bioavailability and vascular dysfunction in hypertension.
Resumo:
Previous assessment of verticality by means of rod and rod and frame tests indicated that human subjects can be more (field dependent) or less (field independent) influenced by a frame placed around a tilted rod. In the present study we propose a new approach to these tests. The judgment of visual verticality (rod test) was evaluated in 50 young subjects (28 males, ranging in age from 20 to 27 years) by randomly projecting a luminous rod tilted between -18 and +18° (negative values indicating left tilts) onto a tangent screen. In the rod and frame test the rod was displayed within a luminous fixed frame tilted at +18 or -18°. Subjects were instructed to verbally indicate the rod’s inclination direction (forced choice). Visual dependency was estimated by means of a Visual Index calculated from rod and rod and frame test values. Based on this index, volunteers were classified as field dependent, intermediate and field independent. A fourth category was created within the field-independent subjects for whom the amount of correct guesses in the rod and frame test exceeded that of the rod test, thus indicating improved performance when a surrounding frame was present. In conclusion, the combined use of subjective visual vertical and the rod and frame test provides a specific and reliable form of evaluation of verticality in healthy subjects and might be of use to probe changes in brain function after central or peripheral lesions.
Resumo:
Object detection is a fundamental task of computer vision that is utilized as a core part in a number of industrial and scientific applications, for example, in robotics, where objects need to be correctly detected and localized prior to being grasped and manipulated. Existing object detectors vary in (i) the amount of supervision they need for training, (ii) the type of a learning method adopted (generative or discriminative) and (iii) the amount of spatial information used in the object model (model-free, using no spatial information in the object model, or model-based, with the explicit spatial model of an object). Although some existing methods report good performance in the detection of certain objects, the results tend to be application specific and no universal method has been found that clearly outperforms all others in all areas. This work proposes a novel generative part-based object detector. The generative learning procedure of the developed method allows learning from positive examples only. The detector is based on finding semantically meaningful parts of the object (i.e. a part detector) that can provide additional information to object location, for example, pose. The object class model, i.e. the appearance of the object parts and their spatial variance, constellation, is explicitly modelled in a fully probabilistic manner. The appearance is based on bio-inspired complex-valued Gabor features that are transformed to part probabilities by an unsupervised Gaussian Mixture Model (GMM). The proposed novel randomized GMM enables learning from only a few training examples. The probabilistic spatial model of the part configurations is constructed with a mixture of 2D Gaussians. The appearance of the parts of the object is learned in an object canonical space that removes geometric variations from the part appearance model. Robustness to pose variations is achieved by object pose quantization, which is more efficient than previously used scale and orientation shifts in the Gabor feature space. Performance of the resulting generative object detector is characterized by high recall with low precision, i.e. the generative detector produces large number of false positive detections. Thus a discriminative classifier is used to prune false positive candidate detections produced by the generative detector improving its precision while keeping high recall. Using only a small number of positive examples, the developed object detector performs comparably to state-of-the-art discriminative methods.
Resumo:
Tässä työssä testattiin partikkelikokojakaumien analysoinnissa käytettävää kuvankäsittelyohjelmaa INCA Feature. Partikkelikokojakaumat määritettiin elektronimikroskooppikuvista INCA Feature ohjelmaa käyttäen partikkeleiden projektiokuvista päällystyspigmenttinä käytettävälle talkille ja kahdelle eri karbonaattilaadulle. Lisäksi määritettiin partikkelikokojakaumat suodatuksessa ja puhdistuksessa apuaineina käytettäville piidioksidi- ja alumiinioksidihiukkasille. Kuvankäsittelyohjelmalla määritettyjä partikkelikokojakaumia verrattiin partikkelin laskeutumisnopeuteen eli sedimentaatioon perustuvalla SediGraph 5100 analysaattorilla ja laserdiffraktioon perustuvalla Coulter LS 230 menetelmällä analysoituihin partikkelikokojakaumiin. SediGraph 5100 ja kuva-analyysiohjelma antoivat talkkipartikkelien kokojakaumalle hyvin samankaltaisen keskiarvon. Sen sijaan Coulter LS 230 laitteen antama kokojakauman keskiarvo poikkesi edellisistä. Kaikki vertailussa olleet partikkelikokojakaumamenetelmät asettivat eri näytteiden partikkelit samaan kokojärjestykseen. Kuitenkaan menetelmien tuloksia ei voida numeerisesti verrata toisiinsa, sillä kaikissa käytetyissä analyysimenetelmissä partikkelikoon mittaus perustuu partikkelin eri ominaisuuteen. Työn perusteella kaikki testatut analyysimenetelmät soveltuvat paperipigmenttien partikkelikokojakaumien määrittämiseen. Tässä työssä selvitettiin myös kuva-analyysiin tarvittava partikkelien lukumäärä, jolla analyysitulos on luotettava. Työssä todettiin, että analysoitavien partikkelien lukumäärän tulee olla vähintään 300 partikkelia. Liian suuri näytemäärä lisää kokojakauman hajontaa ja pidentää analyysiin käytettyä aikaa useaan tuntiin. Näytteenkäsittely vaatii vielä lisää tutkimuksia, sillä se on tärkein ja kriittisin vaihe SEM ja kuva-analyysiohjelmalla tehtävää partikkelikokoanalyysiä. Automaattisten mikroskooppien yleistyminen helpottaa ja nopeuttaa analyysien tekoa, jolloin menetelmän suosio tulee kasvamaan myös paperipigmenttien tutkimuksessa. Laitteiden korkea hinta ja käyttäjältä vaadittava eritysosaaminen tulevat rajaamaan käytön ainakin toistaiseksi tutkimuslaitoksiin.
Resumo:
There are a considerable number of programs and agencies that count on the existence of a unique relationship between nature and human development. In addition, there are significant bodies of literature dedicated to understanding developmentally focused nature-based experiences. This research project was designed to flirther the understanding of this phenomenon. Consequently, the purpose of this research endeavour was to discover the essence ofthe intersection ofpersonal transformation and nature-based leisure, culminating in a rich and detailed account of this otherwise tacit phenomenon. As such, this research built on the assumption of this beneficial intersection of nature and personal transformation and contributes to the understanding ofhow this context is supporting or generating of selfactualization and positive development. Heuristic methods were employed because heuristics is concerned with the quality and essence of an experience, not causal relationships (Moustakas, 1990). Heuristic inquiry begins with the primary researcher and her personal experience and knowledge of the phenomenon. This study also involved four other coresearchers who had also experienced this phenomenon intensely. Co-researchers were found through purposeful and snowball sampling. Rich narrative descriptions of their experiences were gathered through in-depth, semi-structured interviews, and artifact elicitation was employed as a means to get at co-researchers' tacit knowledge. Each coresearcher was interviewed twice (the first interview focused on personal transformation, the second on nature) for approximately four and a half hours in total. Transcripts were read repeatedly to discern patterns that emerged from the study of the narratives and were coded accordingly. Individual narratives were consolidated to create a composite narrative of the experience. Finally, a creative synthesis was developed to represent the essence of this tacit experience. In conclusion the essence of the intersection of nature-based leisure and personal transformation was found to lie in the convergence of the lived experience of authenticity. The physical environment of nature was perceived and experienced to be a space and context of authenticity, leisure experiences were experienced as an engagement of authenticity, and individuals themselves encountered a true or authentic self that emanated from within. The implications of these findings are many, offering suggestions, considerations and implications from reconsidered approaches to environmental education to support for selfdirected human development.
Resumo:
Relation algebras is one of the state-of-the-art means used by mathematicians and computer scientists for solving very complex problems. As a result, a computer algebra system for relation algebras called RelView has been developed at Kiel University. RelView works within the standard model of relation algebras. On the other hand, relation algebras do have other models which may have different properties. For example, in the standard model we always have L;L=L (the composition of two (heterogeneous) universal relations yields a universal relation). This is not true in some non-standard models. Therefore, any example in RelView will always satisfy this property even though it is not true in general. On the other hand, it has been shown that every relation algebra with relational sums and subobjects can be seen as matrix algebra similar to the correspondence of binary relations between sets and Boolean matrices. The aim of my research is to develop a new system that works with both standard and non-standard models for arbitrary relations using multiple-valued decision diagrams (MDDs). This system will implement relations as matrix algebras. The proposed structure is a library written in C which can be imported by other languages such as Java or Haskell.
Resumo:
Researchers have conceptualized repetitive behaviours in individuals with Autism Spectrum Disorder (ASD) on a continuum oflower-Ievel, motoric, repetitive behaviours and higher-order, repetitive behaviours that include symptoms ofOCD (Hollander, Wang, Braun, & Marsh, 2009). Although obsessional, ritualistic, and stereotyped behaviours are a core feature of ASD, individuals with ASD frequently experience obsessions and compulsions that meet DSM-IV-TR (American Psychiatric Association, 2000) criteria for Obsessive-Compulsive Disorder (OCD). Given the acknowledged difficulty in differentiating between OCD and Autism-related obsessive-compulsive phenomena, the present study uses the term Obsessive Compulsive Behaviour (OCB) to represent both phenomena. This study used a multiple baseline design across behaviours and ABC designs (Cooper, Heron, & Heward, 2007) to investigate if a 9-week Group Function-Based Cognitive Behavioural Therapy (CBT) decreased OCB in four children (ages 7 - 11 years) with High Functioning Autism (HFA). Key treatment components included traditional CBT components (awareness training, cognitive-behavioural skills training, exposure and response prevention) as well as function-based assessment and intervention. Time series data indicated significant decreases in OCBs. Standardized assessments showed decreases in symptom severity, and increases in quality of life for the participants and their families. Issues regarding symptom presentation, assessment, and treatment of a dually diagnosed child are discussed.
Resumo:
The curse of dimensionality is a major problem in the fields of machine learning, data mining and knowledge discovery. Exhaustive search for the most optimal subset of relevant features from a high dimensional dataset is NP hard. Sub–optimal population based stochastic algorithms such as GP and GA are good choices for searching through large search spaces, and are usually more feasible than exhaustive and deterministic search algorithms. On the other hand, population based stochastic algorithms often suffer from premature convergence on mediocre sub–optimal solutions. The Age Layered Population Structure (ALPS) is a novel metaheuristic for overcoming the problem of premature convergence in evolutionary algorithms, and for improving search in the fitness landscape. The ALPS paradigm uses an age–measure to control breeding and competition between individuals in the population. This thesis uses a modification of the ALPS GP strategy called Feature Selection ALPS (FSALPS) for feature subset selection and classification of varied supervised learning tasks. FSALPS uses a novel frequency count system to rank features in the GP population based on evolved feature frequencies. The ranked features are translated into probabilities, which are used to control evolutionary processes such as terminal–symbol selection for the construction of GP trees/sub-trees. The FSALPS metaheuristic continuously refines the feature subset selection process whiles simultaneously evolving efficient classifiers through a non–converging evolutionary process that favors selection of features with high discrimination of class labels. We investigated and compared the performance of canonical GP, ALPS and FSALPS on high–dimensional benchmark classification datasets, including a hyperspectral image. Using Tukey’s HSD ANOVA test at a 95% confidence interval, ALPS and FSALPS dominated canonical GP in evolving smaller but efficient trees with less bloat expressions. FSALPS significantly outperformed canonical GP and ALPS and some reported feature selection strategies in related literature on dimensionality reduction.
Resumo:
The curse of dimensionality is a major problem in the fields of machine learning, data mining and knowledge discovery. Exhaustive search for the most optimal subset of relevant features from a high dimensional dataset is NP hard. Sub–optimal population based stochastic algorithms such as GP and GA are good choices for searching through large search spaces, and are usually more feasible than exhaustive and determinis- tic search algorithms. On the other hand, population based stochastic algorithms often suffer from premature convergence on mediocre sub–optimal solutions. The Age Layered Population Structure (ALPS) is a novel meta–heuristic for overcoming the problem of premature convergence in evolutionary algorithms, and for improving search in the fitness landscape. The ALPS paradigm uses an age–measure to control breeding and competition between individuals in the population. This thesis uses a modification of the ALPS GP strategy called Feature Selection ALPS (FSALPS) for feature subset selection and classification of varied supervised learning tasks. FSALPS uses a novel frequency count system to rank features in the GP population based on evolved feature frequencies. The ranked features are translated into probabilities, which are used to control evolutionary processes such as terminal–symbol selection for the construction of GP trees/sub-trees. The FSALPS meta–heuristic continuously refines the feature subset selection process whiles simultaneously evolving efficient classifiers through a non–converging evolutionary process that favors selection of features with high discrimination of class labels. We investigated and compared the performance of canonical GP, ALPS and FSALPS on high–dimensional benchmark classification datasets, including a hyperspectral image. Using Tukey’s HSD ANOVA test at a 95% confidence interval, ALPS and FSALPS dominated canonical GP in evolving smaller but efficient trees with less bloat expressions. FSALPS significantly outperformed canonical GP and ALPS and some reported feature selection strategies in related literature on dimensionality reduction.
Resumo:
Feature selection plays an important role in knowledge discovery and data mining nowadays. In traditional rough set theory, feature selection using reduct - the minimal discerning set of attributes - is an important area. Nevertheless, the original definition of a reduct is restrictive, so in one of the previous research it was proposed to take into account not only the horizontal reduction of information by feature selection, but also a vertical reduction considering suitable subsets of the original set of objects. Following the work mentioned above, a new approach to generate bireducts using a multi--objective genetic algorithm was proposed. Although the genetic algorithms were used to calculate reduct in some previous works, we did not find any work where genetic algorithms were adopted to calculate bireducts. Compared to the works done before in this area, the proposed method has less randomness in generating bireducts. The genetic algorithm system estimated a quality of each bireduct by values of two objective functions as evolution progresses, so consequently a set of bireducts with optimized values of these objectives was obtained. Different fitness evaluation methods and genetic operators, such as crossover and mutation, were applied and the prediction accuracies were compared. Five datasets were used to test the proposed method and two datasets were used to perform a comparison study. Statistical analysis using the one-way ANOVA test was performed to determine the significant difference between the results. The experiment showed that the proposed method was able to reduce the number of bireducts necessary in order to receive a good prediction accuracy. Also, the influence of different genetic operators and fitness evaluation strategies on the prediction accuracy was analyzed. It was shown that the prediction accuracies of the proposed method are comparable with the best results in machine learning literature, and some of them outperformed it.
Resumo:
Suzumura shows that a binary relation has a weak order extension if and only if it is consistent. However, consistency is demonstrably not sufficient to extend an upper semi-continuous binary relation to an upper semicontinuous weak order. Jaffray proves that any asymmetric (or reflexive), transitive and upper semicontinuous binary relation has an upper semicontinuous strict (or weak) order extension. We provide sufficient conditions for existence of upper semicontinuous extensions of consistence rather than transitive relations. For asymmetric relations, consistency and upper semicontinuity suffice. For more general relations, we prove one theorem using a further consistency property and another with an additional continuity requirement.
Resumo:
Recent work shows that a low correlation between the instruments and the included variables leads to serious inference problems. We extend the local-to-zero analysis of models with weak instruments to models with estimated instruments and regressors and with higher-order dependence between instruments and disturbances. This makes this framework applicable to linear models with expectation variables that are estimated non-parametrically. Two examples of such models are the risk-return trade-off in finance and the impact of inflation uncertainty on real economic activity. Results show that inference based on Lagrange Multiplier (LM) tests is more robust to weak instruments than Wald-based inference. Using LM confidence intervals leads us to conclude that no statistically significant risk premium is present in returns on the S&P 500 index, excess holding yields between 6-month and 3-month Treasury bills, or in yen-dollar spot returns.