Biblioteca Digital

786 resultados para Data mining models

What matters for predicting spatial distributions of trees: techniques, data, or species' characteristics?

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Data characteristics and species traits are expected to influence the accuracy with which species' distributions can be modeled and predicted. We compare 10 modeling techniques in terms of predictive power and sensitivity to location error, change in map resolution, and sample size, and assess whether some species traits can explain variation in model performance. We focused on 30 native tree species in Switzerland and used presence-only data to model current distribution, which we evaluated against independent presence-absence data. While there are important differences between the predictive performance of modeling methods, the variance in model performance is greater among species than among techniques. Within the range of data perturbations in this study, some extrinsic parameters of data affect model performance more than others: location error and sample size reduced performance of many techniques, whereas grain had little effect on most techniques. No technique can rescue species that are difficult to predict. The predictive power of species-distribution models can partly be predicted from a series of species characteristics and traits based on growth rate, elevational distribution range, and maximum elevation. Slow-growing species or species with narrow and specialized niches tend to be better modeled. The Swiss presence-only tree data produce models that are reliable enough to be useful in planning and management applications.

In silico pathway reconstruction: Iron-sulfur cluster biogenesis in Saccharomyces cerevisiae

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Background: Current advances in genomics, proteomics and other areas of molecular biology make the identification and reconstruction of novel pathways an emerging area of great interest. One such class of pathways is involved in the biogenesis of Iron-Sulfur Clusters (ISC). Results: Our goal is the development of a new approach based on the use and combination of mathematical, theoretical and computational methods to identify the topology of a target network. In this approach, mathematical models play a central role for the evaluation of the alternative network structures that arise from literature data-mining, phylogenetic profiling, structural methods, and human curation. As a test case, we reconstruct the topology of the reaction and regulatory network for the mitochondrial ISC biogenesis pathway in S. cerevisiae. Predictions regarding how proteins act in ISC biogenesis are validated by comparison with published experimental results. For example, the predicted role of Arh1 and Yah1 and some of the interactions we predict for Grx5 both matches experimental evidence. A putative role for frataxin in directly regulating mitochondrial iron import is discarded from our analysis, which agrees with also published experimental results. Additionally, we propose a number of experiments for testing other predictions and further improve the identification of the network structure. Conclusion: We propose and apply an iterative in silico procedure for predictive reconstruction of the network topology of metabolic pathways. The procedure combines structural bioinformatics tools and mathematical modeling techniques that allow the reconstruction of biochemical networks. Using the Iron Sulfur cluster biogenesis in S. cerevisiae as a test case we indicate how this procedure can be used to analyze and validate the network model against experimental results. Critical evaluation of the obtained results through this procedure allows devising new wet lab experiments to confirm its predictions or provide alternative explanations for further improving the models.

Automatic target validation based on neuroscientific literature mining for tractography.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Target identification for tractography studies requires solid anatomical knowledge validated by an extensive literature review across species for each seed structure to be studied. Manual literature review to identify targets for a given seed region is tedious and potentially subjective. Therefore, complementary approaches would be useful. We propose to use text-mining models to automatically suggest potential targets from the neuroscientific literature, full-text articles and abstracts, so that they can be used for anatomical connection studies and more specifically for tractography. We applied text-mining models to three structures: two well-studied structures, since validated deep brain stimulation targets, the internal globus pallidus and the subthalamic nucleus and, the nucleus accumbens, an exploratory target for treating psychiatric disorders. We performed a systematic review of the literature to document the projections of the three selected structures and compared it with the targets proposed by text-mining models, both in rat and primate (including human). We ran probabilistic tractography on the nucleus accumbens and compared the output with the results of the text-mining models and literature review. Overall, text-mining the literature could find three times as many targets as two man-weeks of curation could. The overall efficiency of the text-mining against literature review in our study was 98% recall (at 36% precision), meaning that over all the targets for the three selected seeds, only one target has been missed by text-mining. We demonstrate that connectivity for a structure of interest can be extracted from a very large amount of publications and abstracts. We believe this tool will be useful in helping the neuroscience community to facilitate connectivity studies of particular brain regions. The text mining tools used for the study are part of the HBP Neuroinformatics Platform, publicly available at http://connectivity-brainer.rhcloud.com/.

Biotic interactions boost spatial models of species richness

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Biotic interactions are known to affect the composition of species assemblages via several mechanisms, such as competition and facilitation. However, most spatial models of species richness do not explicitly consider inter-specific interactions. Here, we test whether incorporating biotic interactions into high-resolution models alters predictions of species richness as hypothesised. We included key biotic variables (cover of three dominant arctic-alpine plant species) into two methodologically divergent species richness modelling frameworks - stacked species distribution models (SSDM) and macroecological models (MEM) - for three ecologically and evolutionary distinct taxonomic groups (vascular plants, bryophytes and lichens). Predictions from models including biotic interactions were compared to the predictions of models based on climatic and abiotic data only. Including plant-plant interactions consistently and significantly lowered bias in species richness predictions and increased predictive power for independent evaluation data when compared to the conventional climatic and abiotic data based models. Improvements in predictions were constant irrespective of the modelling framework or taxonomic group used. The global biodiversity crisis necessitates accurate predictions of how changes in biotic and abiotic conditions will potentially affect species richness patterns. Here, we demonstrate that models of the spatial distribution of species richness can be improved by incorporating biotic interactions, and thus that these key predictor factors must be accounted for in biodiversity forecasts

A Dependency Parsing Approach to Biomedical Text Mining

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Biomedical research is currently facing a new type of challenge: an excess of information, both in terms of raw data from experiments and in the number of scientific publications describing their results. Mirroring the focus on data mining techniques to address the issues of structured data, there has recently been great interest in the development and application of text mining techniques to make more effective use of the knowledge contained in biomedical scientific publications, accessible only in the form of natural human language. This thesis describes research done in the broader scope of projects aiming to develop methods, tools and techniques for text mining tasks in general and for the biomedical domain in particular. The work described here involves more specifically the goal of extracting information from statements concerning relations of biomedical entities, such as protein-protein interactions. The approach taken is one using full parsing—syntactic analysis of the entire structure of sentences—and machine learning, aiming to develop reliable methods that can further be generalized to apply also to other domains. The five papers at the core of this thesis describe research on a number of distinct but related topics in text mining. In the first of these studies, we assessed the applicability of two popular general English parsers to biomedical text mining and, finding their performance limited, identified several specific challenges to accurate parsing of domain text. In a follow-up study focusing on parsing issues related to specialized domain terminology, we evaluated three lexical adaptation methods. We found that the accurate resolution of unknown words can considerably improve parsing performance and introduced a domain-adapted parser that reduced the error rate of theoriginal by 10% while also roughly halving parsing time. To establish the relative merits of parsers that differ in the applied formalisms and the representation given to their syntactic analyses, we have also developed evaluation methodology, considering different approaches to establishing comparable dependency-based evaluation results. We introduced a methodology for creating highly accurate conversions between different parse representations, demonstrating the feasibility of unification of idiverse syntactic schemes under a shared, application-oriented representation. In addition to allowing formalism-neutral evaluation, we argue that such unification can also increase the value of parsers for domain text mining. As a further step in this direction, we analysed the characteristics of publicly available biomedical corpora annotated for protein-protein interactions and created tools for converting them into a shared form, thus contributing also to the unification of text mining resources. The introduced unified corpora allowed us to perform a task-oriented comparative evaluation of biomedical text mining corpora. This evaluation established clear limits on the comparability of results for text mining methods evaluated on different resources, prompting further efforts toward standardization. To support this and other research, we have also designed and annotated BioInfer, the first domain corpus of its size combining annotation of syntax and biomedical entities with a detailed annotation of their relationships. The corpus represents a major design and development effort of the research group, with manual annotation that identifies over 6000 entities, 2500 relationships and 28,000 syntactic dependencies in 1100 sentences. In addition to combining these key annotations for a single set of sentences, BioInfer was also the first domain resource to introduce a representation of entity relations that is supported by ontologies and able to capture complex, structured relationships. Part I of this thesis presents a summary of this research in the broader context of a text mining system, and Part II contains reprints of the five included publications.

Asiakkuuksien strateginen arvottaminen ja luokittelu

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Tämän tutkimuksen kohdeorganisaatio on suuren teollisuusyrityksen sisäinen raaka-aineen hankkija ja toimittaja. Tutkimuksessa selvitetään, mistä kohdeorganisaation hankinta-asiakkuuksien arvo muodostuu ja kuinka olemassa olevan liiketoimintadatan perusteella voidaan tutkia, arvioida ja luokitella kauppojen ja asiakkuuksien arvokkuutta aikaan sitomatta, objektiivisesti ja luotettavasti. Tutkimuksen teoriaosiossa esitellään lähestymistapoja ja menetelmiä, joiden avulla voidaan jalostaa olemassa olevasta datasta uutta sidosryhmätietämystä liiketoiminnan käyttöön, sekä tarkastellaan asiakaskannattavuusanalyysin, portfolioanalyysin, sekä asiakassegmentoinnin perusteita ja malleja. Näiden teorioiden ja mallien pohjalta rakennetaan kohdeorganisaatiolle räätälöity, indeksoituihin hinta-, määrä- ja kauppojen toistuvuus-muuttujiin perustuva, asiakkuuksien arvottamis- ja luokittelumalli. Arvottamis- ja luokittelumalli testataan vuosien 2003–2007 liiketoimintadatasta muodostetulla 389 336 kaupparivin otoksella, joka sisältää 42 186 arvioitavaa asiakkuussuhdetta. Merkittävin esille nouseva havainto on noin 5 000:n keskimääräistä selkeästi kalliimman asiakkuuden ryhmä. Aineisto ja sen poikkeavuudet testataan tilastollisin menetelmin, jotta saadaan selville asiakkuuden arvoon vaikuttavat ja arvoa selittävät tekijät. Lopuksi pohditaan arvottamismallin merkitystä analyyttisemman ostotoiminnan ja asiakkuudenhallinnan välineenä, sekä esitetään muutamia parannusehdotuksia.

Comparison of diffusion models for description of osmotic dehydration of radish slices dipped in brine

Relevância:

90.00% 90.00%

Publicador:

Resumo:

ABSTRACT This paper aims at describing the osmotic dehydration of radish cut into cylindrical pieces, using one- and two-dimensional analytical solutions of diffusion equation with boundary conditions of the first and third kind. These solutions were coupled with an optimizer to determine the process parameters, using experimental data. Three models were proposed to describe the osmotic dehydration of radish slices in brine at low temperature. The two-dimensional model with boundary condition of the third kind well described the kinetics of mass transfers, and it enabled prediction of moisture and solid distributions at any given time.

A platform for assessing the current state of data repositories in Korea, Japan and China

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014

Visual analytics for behavioral and niche market segmentation

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Companies require information in order to gain an improved understanding of their customers. Data concerning customers, their interests and behavior are collected through different loyalty programs. The amount of data stored in company data bases has increased exponentially over the years and become difficult to handle. This research area is the subject of much current interest, not only in academia but also in practice, as is shown by several magazines and blogs that are covering topics on how to get to know your customers, Big Data, information visualization, and data warehousing. In this Ph.D. thesis, the Self-Organizing Map and two extensions of it – the Weighted Self-Organizing Map (WSOM) and the Self-Organizing Time Map (SOTM) – are used as data mining methods for extracting information from large amounts of customer data. The thesis focuses on how data mining methods can be used to model and analyze customer data in order to gain an overview of the customer base, as well as, for analyzing niche-markets. The thesis uses real world customer data to create models for customer profiling. Evaluation of the built models is performed by CRM experts from the retailing industry. The experts considered the information gained with help of the models to be valuable and useful for decision making and for making strategic planning for the future.

Location intelligence strategisen päätöksenteon tukena

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Kilpailuetua tavoittelevan yrityksen pitää kyetä jalostamaan tietoa ja tunnistamaan sen avulla uusia tulevaisuuden mahdollisuuksia. Tulevaisuuden mielikuvien luomiseksi yrityksen on tunnettava toimintaympäristönsä ja olla herkkänä havaitsemaan muutostrendit ja muut toimintaympäristön signaalit. Ympäristön elintärkeät signaalit liittyvät kilpailijoihin, teknologian kehittymiseen, arvomaailman muutoksiin, globaaleihin väestötrendeihin tai jopa ympäristön muutoksiin. Spatiaaliset suhteet ovat peruspilareita käsitteellistää maailmaamme. Pitney (2015) on arvioinut, että 80 % kaikesta bisnesdatasta sisältää jollakin tavoin viittauksia paikkatietoon. Siitä huolimatta paikkatietoa on vielä huonosti hyödynnetty yritysten strategisten päätösten tukena. Teknologioiden kehittyminen, tiedon nopea siirto ja paikannustekniikoiden integroiminen eri laitteisiin ovat mahdollistaneet sen, että paikkatietoa hyödyntäviä palveluja ja ratkaisuja tullaan yhä enemmän näkemään yrityskentässä. Tutkimuksen tavoitteena oli selvittää voiko location intelligence toimia strategisen päätöksenteon tukena ja jos voi, niin miten. Työ toteutettiin konstruktiivista tutkimusmenetelmää käyttäen, jolla pyritään ratkaisemaan jokin relevantti ongelma. Konstruktiivinen tutkimus tehtiin tiiviissä yhteistyössä kolmen pk-yrityksen kanssa ja siihen haastateltiin kuutta eri strategiasta vastaavaa henkilöä. Tutkimuksen tuloksena löydettiin, että location intelligenceä voidaan hyödyntää strategisen päätöksenteon tukena usealla eri tasolla. Yksinkertaisimmassa karttaratkaisussa halutut tiedot tuodaan kartalle ja luodaan visuaalinen esitys, jonka avulla johtopäätöksien tekeminen helpottuu. Toisen tason karttaratkaisu pitää sisällään sekä sijainti- että ominaisuustietoa, jota on yhdistetty eri lähteistä. Tämä toisen tason karttaratkaisu on usein kuvailevaa analytiikkaa, joka mahdollistaa erilaisten ilmiöiden analysoinnin. Kolmannen eli ylimmän tason karttaratkaisu tarjoaa ennakoivaa analytiikkaa ja malleja tulevaisuudesta. Tällöin ohjelmaan koodataan älykkyyttä, jossa informaation keskinäisiä suhteita on määritelty joko tiedon louhintaa tai tilastollisia analyysejä hyödyntäen. Tutkimuksen johtopäätöksenä voidaan todeta, että location intelligence pystyy tarjoamaan lisäarvoa strategisen päätöksenteon tueksi, mikäli yritykselle on hyödyllistä ymmärtää eri ilmiöiden, asiakastarpeiden, kilpailijoiden ja markkinamuutoksien maantieteellisiä eroavaisuuksia. Parhaimmillaan location intelligence -ratkaisu tarjoaa luotettavan analyysin, jossa tieto välittyy muuttumattomana päätöksentekijältä toiselle ja johtopäätökseen johtaneita syitä on mahdollista palata tarkastelemaan tarvittaessa uudelleen.

Data and Inventory Management in Spare Part Business: Developing Operations in the Case Company

Relevância:

90.00% 90.00%

Publicador:

Resumo:

After sales business is an effective way to create profit and increase customer satisfaction in manufacturing companies. Despite this, some special business characteristics that are linked to these functions, make it exceptionally challenging in its own way. This Master’s Thesis examines the current situation of the data and inventory management in the case company regarding possibilities and challenges related to the consolidation of current business operations. The research examines process steps, procedures, data requirements, data mining practices and data storage management of spare part sales process, whereas the part focusing on inventory management is reviewing the current stock value and examining current practices and operational principles. There are two global after sales units which supply spare parts and issues reviewed in this study are examined from both units’ perspective. The analysis is focused on the operations of that unit where functions would be centralized by default, if change decisions are carried out. It was discovered that both data and inventory management include clear shortcomings, which result from lack of internal instructions and established processes as well as lack of cooperation with other stakeholders related to product’s lifecycle. The main product of data management was a guideline for consolidating the functions, tailored for the company’s needs. Additionally, potentially scrapped spare part were listed and a proposal of inventory management instructions was drafted. If the suggested spare part materials will be scrapped, stock value will decrease 46 percent. A guideline which was reviewed and commented in this thesis was chosen as the basis of the inventory management instructions.

Modelling software quality : a multidimensional approach

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Les sociétés modernes dépendent de plus en plus sur les systèmes informatiques et ainsi, il y a de plus en plus de pression sur les équipes de développement pour produire des logiciels de bonne qualité. Plusieurs compagnies utilisent des modèles de qualité, des suites de programmes qui analysent et évaluent la qualité d'autres programmes, mais la construction de modèles de qualité est difficile parce qu'il existe plusieurs questions qui n'ont pas été répondues dans la littérature. Nous avons étudié les pratiques de modélisation de la qualité auprès d'une grande entreprise et avons identifié les trois dimensions où une recherche additionnelle est désirable : Le support de la subjectivité de la qualité, les techniques pour faire le suivi de la qualité lors de l'évolution des logiciels, et la composition de la qualité entre différents niveaux d'abstraction. Concernant la subjectivité, nous avons proposé l'utilisation de modèles bayésiens parce qu'ils sont capables de traiter des données ambiguës. Nous avons appliqué nos modèles au problème de la détection des défauts de conception. Dans une étude de deux logiciels libres, nous avons trouvé que notre approche est supérieure aux techniques décrites dans l'état de l'art, qui sont basées sur des règles. Pour supporter l'évolution des logiciels, nous avons considéré que les scores produits par un modèle de qualité sont des signaux qui peuvent être analysés en utilisant des techniques d'exploration de données pour identifier des patrons d'évolution de la qualité. Nous avons étudié comment les défauts de conception apparaissent et disparaissent des logiciels. Un logiciel est typiquement conçu comme une hiérarchie de composants, mais les modèles de qualité ne tiennent pas compte de cette organisation. Dans la dernière partie de la dissertation, nous présentons un modèle de qualité à deux niveaux. Ces modèles ont trois parties: un modèle au niveau du composant, un modèle qui évalue l'importance de chacun des composants, et un autre qui évalue la qualité d'un composé en combinant la qualité de ses composants. L'approche a été testée sur la prédiction de classes à fort changement à partir de la qualité des méthodes. Nous avons trouvé que nos modèles à deux niveaux permettent une meilleure identification des classes à fort changement. Pour terminer, nous avons appliqué nos modèles à deux niveaux pour l'évaluation de la navigabilité des sites web à partir de la qualité des pages. Nos modèles étaient capables de distinguer entre des sites de très bonne qualité et des sites choisis aléatoirement. Au cours de la dissertation, nous présentons non seulement des problèmes théoriques et leurs solutions, mais nous avons également mené des expériences pour démontrer les avantages et les limitations de nos solutions. Nos résultats indiquent qu'on peut espérer améliorer l'état de l'art dans les trois dimensions présentées. En particulier, notre travail sur la composition de la qualité et la modélisation de l'importance est le premier à cibler ce problème. Nous croyons que nos modèles à deux niveaux sont un point de départ intéressant pour des travaux de recherche plus approfondis.

Mean Squared Residue Based Biclustering Algorithms for the Analysis of Gene Expression Data

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Computational Biology is the research are that contributes to the analysis of biological data through the development of algorithms which will address significant research problems.The data from molecular biology includes DNA,RNA ,Protein and Gene expression data.Gene Expression Data provides the expression level of genes under different conditions.Gene expression is the process of transcribing the DNA sequence of a gene into mRNA sequences which in turn are later translated into proteins.The number of copies of mRNA produced is called the expression level of a gene.Gene expression data is organized in the form of a matrix. Rows in the matrix represent genes and columns in the matrix represent experimental conditions.Experimental conditions can be different tissue types or time points.Entries in the gene expression matrix are real values.Through the analysis of gene expression data it is possible to determine the behavioral patterns of genes such as similarity of their behavior,nature of their interaction,their respective contribution to the same pathways and so on. Similar expression patterns are exhibited by the genes participating in the same biological process.These patterns have immense relevance and application in bioinformatics and clinical research.Theses patterns are used in the medical domain for aid in more accurate diagnosis,prognosis,treatment planning.drug discovery and protein network analysis.To identify various patterns from gene expression data,data mining techniques are essential.Clustering is an important data mining technique for the analysis of gene expression data.To overcome the problems associated with clustering,biclustering is introduced.Biclustering refers to simultaneous clustering of both rows and columns of a data matrix. Clustering is a global whereas biclustering is a local model.Discovering local expression patterns is essential for identfying many genetic pathways that are not apparent otherwise.It is therefore necessary to move beyond the clustering paradigm towards developing approaches which are capable of discovering local patterns in gene expression data.A biclusters is a submatrix of the gene expression data matrix.The rows and columns in the submatrix need not be contiguous as in the gene expression data matrix.Biclusters are not disjoint.Computation of biclusters is costly because one will have to consider all the combinations of columans and rows in order to find out all the biclusters.The search space for the biclustering problem is 2 m+n where m and n are the number of genes and conditions respectively.Usually m+n is more than 3000.The biclustering problem is NP-hard.Biclustering is a powerful analytical tool for the biologist.The research reported in this thesis addresses the problem of biclustering.Ten algorithms are developed for the identification of coherent biclusters from gene expression data.All these algorithms are making use of a measure called mean squared residue to search for biclusters.The objective here is to identify the biclusters of maximum size with the mean squared residue lower than a given threshold. All these algorithms begin the search from tightly coregulated submatrices called the seeds.These seeds are generated by K-Means clustering algorithm.The algorithms developed can be classified as constraint based,greedy and metaheuristic.Constarint based algorithms uses one or more of the various constaints namely the MSR threshold and the MSR difference threshold.The greedy approach makes a locally optimal choice at each stage with the objective of finding the global optimum.In metaheuristic approaches particle Swarm Optimization(PSO) and variants of Greedy Randomized Adaptive Search Procedure(GRASP) are used for the identification of biclusters.These algorithms are implemented on the Yeast and Lymphoma datasets.Biologically relevant and statistically significant biclusters are identified by all these algorithms which are validated by Gene Ontology database.All these algorithms are compared with some other biclustering algorithms.Algorithms developed in this work overcome some of the problems associated with the already existing algorithms.With the help of some of the algorithms which are developed in this work biclusters with very high row variance,which is higher than the row variance of any other algorithm using mean squared residue, are identified from both Yeast and Lymphoma data sets.Such biclusters which make significant change in the expression level are highly relevant biologically.

Statistical Machine Learning Techniques for the Prediction of Learning Disabilities in School-Age Children

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Learning Disability (LD) is a general term that describes specific kinds of learning problems. It is a neurological condition that affects a child's brain and impairs his ability to carry out one or many specific tasks. The learning disabled children are neither slow nor mentally retarded. This disorder can make it problematic for a child to learn as quickly or in the same way as some child who isn't affected by a learning disability. An affected child can have normal or above average intelligence. They may have difficulty paying attention, with reading or letter recognition, or with mathematics. It does not mean that children who have learning disabilities are less intelligent. In fact, many children who have learning disabilities are more intelligent than an average child. Learning disabilities vary from child to child. One child with LD may not have the same kind of learning problems as another child with LD. There is no cure for learning disabilities and they are life-long. However, children with LD can be high achievers and can be taught ways to get around the learning disability. In this research work, data mining using machine learning techniques are used to analyze the symptoms of LD, establish interrelationships between them and evaluate the relative importance of these symptoms. To increase the diagnostic accuracy of learning disability prediction, a knowledge based tool based on statistical machine learning or data mining techniques, with high accuracy,according to the knowledge obtained from the clinical information, is proposed. The basic idea of the developed knowledge based tool is to increase the accuracy of the learning disability assessment and reduce the time used for the same. Different statistical machine learning techniques in data mining are used in the study. Identifying the important parameters of LD prediction using the data mining techniques, identifying the hidden relationship between the symptoms of LD and estimating the relative significance of each symptoms of LD are also the parts of the objectives of this research work. The developed tool has many advantages compared to the traditional methods of using check lists in determination of learning disabilities. For improving the performance of various classifiers, we developed some preprocessing methods for the LD prediction system. A new system based on fuzzy and rough set models are also developed for LD prediction. Here also the importance of pre-processing is studied. A Graphical User Interface (GUI) is designed for developing an integrated knowledge based tool for prediction of LD as well as its degree. The designed tool stores the details of the children in the student database and retrieves their LD report as and when required. The present study undoubtedly proves the effectiveness of the tool developed based on various machine learning techniques. It also identifies the important parameters of LD and accurately predicts the learning disability in school age children. This thesis makes several major contributions in technical, general and social areas. The results are found very beneficial to the parents, teachers and the institutions. They are able to diagnose the child’s problem at an early stage and can go for the proper treatments/counseling at the correct time so as to avoid the academic and social losses.

“Reconstruction of Gene Regulatory Network from Expression Profile of Plasma RNA Data of Colorectal Cancer Patients using Soft Computing Techniques”

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Microarray data analysis is one of data mining tool which is used to extract meaningful information hidden in biological data. One of the major focuses on microarray data analysis is the reconstruction of gene regulatory network that may be used to provide a broader understanding on the functioning of complex cellular systems. Since cancer is a genetic disease arising from the abnormal gene function, the identification of cancerous genes and the regulatory pathways they control will provide a better platform for understanding the tumor formation and development. The major focus of this thesis is to understand the regulation of genes responsible for the development of cancer, particularly colorectal cancer by analyzing the microarray expression data. In this thesis, four computational algorithms namely fuzzy logic algorithm, modified genetic algorithm, dynamic neural fuzzy network and Takagi Sugeno Kang-type recurrent neural fuzzy network are used to extract cancer specific gene regulatory network from plasma RNA dataset of colorectal cancer patients. Plasma RNA is highly attractive for cancer analysis since it requires a collection of small amount of blood and it can be obtained at any time in repetitive fashion allowing the analysis of disease progression and treatment response.

«
1
2
...
18
19
20
21
22
23
24
...
52
53
»