989 resultados para Structure mining


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Dissertação para obtenção do Grau de Mestre em Engenharia Informática

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The principal topic of this work is the application of data mining techniques, in particular of machine learning, to the discovery of knowledge in a protein database. In the first chapter a general background is presented. Namely, in section 1.1 we overview the methodology of a Data Mining project and its main algorithms. In section 1.2 an introduction to the proteins and its supporting file formats is outlined. This chapter is concluded with section 1.3 which defines that main problem we pretend to address with this work: determine if an amino acid is exposed or buried in a protein, in a discrete way (i.e.: not continuous), for five exposition levels: 2%, 10%, 20%, 25% and 30%. In the second chapter, following closely the CRISP-DM methodology, whole the process of construction the database that supported this work is presented. Namely, it is described the process of loading data from the Protein Data Bank, DSSP and SCOP. Then an initial data exploration is performed and a simple prediction model (baseline) of the relative solvent accessibility of an amino acid is introduced. It is also introduced the Data Mining Table Creator, a program developed to produce the data mining tables required for this problem. In the third chapter the results obtained are analyzed with statistical significance tests. Initially the several used classifiers (Neural Networks, C5.0, CART and Chaid) are compared and it is concluded that C5.0 is the most suitable for the problem at stake. It is also compared the influence of parameters like the amino acid information level, the amino acid window size and the SCOP class type in the accuracy of the predictive models. The fourth chapter starts with a brief revision of the literature about amino acid relative solvent accessibility. Then, we overview the main results achieved and finally discuss about possible future work. The fifth and last chapter consists of appendices. Appendix A has the schema of the database that supported this thesis. Appendix B has a set of tables with additional information. Appendix C describes the software provided in the DVD accompanying this thesis that allows the reconstruction of the present work.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

ABSTRACT We aimed in this work to study natural populations of copaiba (Copaifera multijuga Hayne) on the Monte Branco mountain at Porto Trombetas-PA, in order to support sustainable management and the exploitation of oleoresin from copaiba. We studied the population structure of copaiba on hillsides and valleys of the south face of Monte Branco, within Saracá Taquera National Forest, where bauxite ore was extracted in the biennium 2013-2014 by Mineração Rio do Norte (MRN). We produced a 100% forest inventory of the specie and of oleoresin extraction in order to quantify the potential production of the remaining area. The density of copaiba individuals with DBH > 30 cm was 0.33 individuals per hectare in the hillside and 0.25 individuals per hectare in the valley. Both environments presented a density of 0.28 individuals per hectare. The average copaiba oleoresin yield was 0.661±0.334 liters in the hillside and 0.765±0.280 liters in the valley. The average value of both environments together (hillside and valley) was 0.714±0.218 liters. From all individuals with DBH over 30 cm, 38 (58%) produced some amount of oleoresin, averaging 1.113±0.562 liters in the hillside, 1.329±0.448 liters in the valley and 1.190±0.355 liters in both environments together. The results show the need for planning the use of the surroundings of the study area in order to reach the required volume of copaiba to make feasible the sustainable management of oleoresin extraction in the region.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Austenitic stainless steels cannot be conventionally nitrided at temperatures near 550 degrees C due to the intense precipitation of chromium nitrides in the diffusion zone. The precipitation of chro-mium nitrides increases the hardness but severely impairs corrosion resistance. Plasma nitriding allows introducing nitrogen in the steel at temperatures below 450 degrees C, forming pre-dominantly expanded austenite (gamma(N)), with a crystalline structure best represented by a special triclin-ic lattice, with a very high nitrogen atomic concentration promoting high compressive residual stresses at the surface, increasing substrate hardness from 4 GPa up to 14 GPa on the nitrided case.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The new technologies for Knowledge Discovery from Databases (KDD) and data mining promise to bring new insights into a voluminous growing amount of biological data. KDD technology is complementary to laboratory experimentation and helps speed up biological research. This article contains an introduction to KDD, a review of data mining tools, and their biological applications. We discuss the domain concepts related to biological data and databases, as well as current KDD and data mining developments in biology.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Data mining can be defined as the extraction of previously unknown and potentially useful information from large datasets. The main principle is to devise computer programs that run through databases and automatically seek deterministic patterns. It is applied in different fields of application, e.g., remote sensing, biometry, speech recognition, but has seldom been applied to forensic case data. The intrinsic difficulty related to the use of such data lies in its heterogeneity, which comes from the many different sources of information. The aim of this study is to highlight potential uses of pattern recognition that would provide relevant results from a criminal intelligence point of view. The role of data mining within a global crime analysis methodology is to detect all types of structures in a dataset. Once filtered and interpreted, those structures can point to previously unseen criminal activities. The interpretation of patterns for intelligence purposes is the final stage of the process. It allows the researcher to validate the whole methodology and to refine each step if necessary. An application to cutting agents found in illicit drug seizures was performed. A combinatorial approach was done, using the presence and the absence of products. Methods coming from the graph theory field were used to extract patterns in data constituted by links between products and place and date of seizure. A data mining process completed using graphing techniques is called ``graph mining''. Patterns were detected that had to be interpreted and compared with preliminary knowledge to establish their relevancy. The illicit drug profiling process is actually an intelligence process that uses preliminary illicit drug classes to classify new samples. Methods proposed in this study could be used \textit{a priori} to compare structures from preliminary and post-detection patterns. This new knowledge of a repeated structure may provide valuable complementary information to profiling and become a source of intelligence.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

La présente étude est à la fois une évaluation du processus de la mise en oeuvre et des impacts de la police de proximité dans les cinq plus grandes zones urbaines de Suisse - Bâle, Berne, Genève, Lausanne et Zurich. La police de proximité (community policing) est à la fois une philosophie et une stratégie organisationnelle qui favorise un partenariat renouvelé entre la police et les communautés locales dans le but de résoudre les problèmes relatifs à la sécurité et à l'ordre public. L'évaluation de processus a analysé des données relatives aux réformes internes de la police qui ont été obtenues par l'intermédiaire d'entretiens semi-structurés avec des administrateurs clés des cinq départements de police, ainsi que dans des documents écrits de la police et d'autres sources publiques. L'évaluation des impacts, quant à elle, s'est basée sur des variables contextuelles telles que des statistiques policières et des données de recensement, ainsi que sur des indicateurs d'impacts construit à partir des données du Swiss Crime Survey (SCS) relatives au sentiment d'insécurité, à la perception du désordre public et à la satisfaction de la population à l'égard de la police. Le SCS est un sondage régulier qui a permis d'interroger des habitants des cinq grandes zones urbaines à plusieurs reprises depuis le milieu des années 1980. L'évaluation de processus a abouti à un « Calendrier des activités » visant à créer des données de panel permettant de mesurer les progrès réalisés dans la mise en oeuvre de la police de proximité à l'aide d'une grille d'évaluation à six dimensions à des intervalles de cinq ans entre 1990 et 2010. L'évaluation des impacts, effectuée ex post facto, a utilisé un concept de recherche non-expérimental (observational design) dans le but d'analyser les impacts de différents modèles de police de proximité dans des zones comparables à travers les cinq villes étudiées. Les quartiers urbains, délimités par zone de code postal, ont ainsi été regroupés par l'intermédiaire d'une typologie réalisée à l'aide d'algorithmes d'apprentissage automatique (machine learning). Des algorithmes supervisés et non supervisés ont été utilisés sur les données à haute dimensionnalité relatives à la criminalité, à la structure socio-économique et démographique et au cadre bâti dans le but de regrouper les quartiers urbains les plus similaires dans des clusters. D'abord, les cartes auto-organisatrices (self-organizing maps) ont été utilisées dans le but de réduire la variance intra-cluster des variables contextuelles et de maximiser simultanément la variance inter-cluster des réponses au sondage. Ensuite, l'algorithme des forêts d'arbres décisionnels (random forests) a permis à la fois d'évaluer la pertinence de la typologie de quartier élaborée et de sélectionner les variables contextuelles clés afin de construire un modèle parcimonieux faisant un minimum d'erreurs de classification. Enfin, pour l'analyse des impacts, la méthode des appariements des coefficients de propension (propensity score matching) a été utilisée pour équilibrer les échantillons prétest-posttest en termes d'âge, de sexe et de niveau d'éducation des répondants au sein de chaque type de quartier ainsi identifié dans chacune des villes, avant d'effectuer un test statistique de la différence observée dans les indicateurs d'impacts. De plus, tous les résultats statistiquement significatifs ont été soumis à une analyse de sensibilité (sensitivity analysis) afin d'évaluer leur robustesse face à un biais potentiel dû à des covariables non observées. L'étude relève qu'au cours des quinze dernières années, les cinq services de police ont entamé des réformes majeures de leur organisation ainsi que de leurs stratégies opérationnelles et qu'ils ont noué des partenariats stratégiques afin de mettre en oeuvre la police de proximité. La typologie de quartier développée a abouti à une réduction de la variance intra-cluster des variables contextuelles et permet d'expliquer une partie significative de la variance inter-cluster des indicateurs d'impacts avant la mise en oeuvre du traitement. Ceci semble suggérer que les méthodes de géocomputation aident à équilibrer les covariables observées et donc à réduire les menaces relatives à la validité interne d'un concept de recherche non-expérimental. Enfin, l'analyse des impacts a révélé que le sentiment d'insécurité a diminué de manière significative pendant la période 2000-2005 dans les quartiers se trouvant à l'intérieur et autour des centres-villes de Berne et de Zurich. Ces améliorations sont assez robustes face à des biais dus à des covariables inobservées et covarient dans le temps et l'espace avec la mise en oeuvre de la police de proximité. L'hypothèse alternative envisageant que les diminutions observées dans le sentiment d'insécurité soient, partiellement, un résultat des interventions policières de proximité semble donc être aussi plausible que l'hypothèse nulle considérant l'absence absolue d'effet. Ceci, même si le concept de recherche non-expérimental mis en oeuvre ne peut pas complètement exclure la sélection et la régression à la moyenne comme explications alternatives. The current research project is both a process and impact evaluation of community policing in Switzerland's five major urban areas - Basel, Bern, Geneva, Lausanne, and Zurich. Community policing is both a philosophy and an organizational strategy that promotes a renewed partnership between the police and the community to solve problems of crime and disorder. The process evaluation data on police internal reforms were obtained through semi-structured interviews with key administrators from the five police departments as well as from police internal documents and additional public sources. The impact evaluation uses official crime records and census statistics as contextual variables as well as Swiss Crime Survey (SCS) data on fear of crime, perceptions of disorder, and public attitudes towards the police as outcome measures. The SCS is a standing survey instrument that has polled residents of the five urban areas repeatedly since the mid-1980s. The process evaluation produced a "Calendar of Action" to create panel data to measure community policing implementation progress over six evaluative dimensions in intervals of five years between 1990 and 2010. The impact evaluation, carried out ex post facto, uses an observational design that analyzes the impact of the different community policing models between matched comparison areas across the five cities. Using ZIP code districts as proxies for urban neighborhoods, geospatial data mining algorithms serve to develop a neighborhood typology in order to match the comparison areas. To this end, both unsupervised and supervised algorithms are used to analyze high-dimensional data on crime, the socio-economic and demographic structure, and the built environment in order to classify urban neighborhoods into clusters of similar type. In a first step, self-organizing maps serve as tools to develop a clustering algorithm that reduces the within-cluster variance in the contextual variables and simultaneously maximizes the between-cluster variance in survey responses. The random forests algorithm then serves to assess the appropriateness of the resulting neighborhood typology and to select the key contextual variables in order to build a parsimonious model that makes a minimum of classification errors. Finally, for the impact analysis, propensity score matching methods are used to match the survey respondents of the pretest and posttest samples on age, gender, and their level of education for each neighborhood type identified within each city, before conducting a statistical test of the observed difference in the outcome measures. Moreover, all significant results were subjected to a sensitivity analysis to assess the robustness of these findings in the face of potential bias due to some unobserved covariates. The study finds that over the last fifteen years, all five police departments have undertaken major reforms of their internal organization and operating strategies and forged strategic partnerships in order to implement community policing. The resulting neighborhood typology reduced the within-cluster variance of the contextual variables and accounted for a significant share of the between-cluster variance in the outcome measures prior to treatment, suggesting that geocomputational methods help to balance the observed covariates and hence to reduce threats to the internal validity of an observational design. Finally, the impact analysis revealed that fear of crime dropped significantly over the 2000-2005 period in the neighborhoods in and around the urban centers of Bern and Zurich. These improvements are fairly robust in the face of bias due to some unobserved covariate and covary temporally and spatially with the implementation of community policing. The alternative hypothesis that the observed reductions in fear of crime were at least in part a result of community policing interventions thus appears at least as plausible as the null hypothesis of absolutely no effect, even if the observational design cannot completely rule out selection and regression to the mean as alternative explanations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In addition to the more reactive forms, metals can occur in the structure of minerals, and the sum of all these forms defines their total contents in different soil fractions. The isomorphic substitution of heavy metals for example alters the dimensions of the unit cell and mineral size. This study proposed a method of chemical fractionation of heavy metals, using more powerful extraction methods, to remove the organic and different mineral phases completely. Soil samples were taken from eight soil profiles (0-10, 10-20 and 20-40 cm) in a Pb mining and metallurgy area in Adrianópolis, Paraná, Brazil. The Pb and Zn concentrations were determined in the following fractions (complete phase removal in each sequential extraction): exchangeable; carbonates; organic matter; amorphous and crystalline Fe oxides; Al oxide, amorphous aluminosilicates and kaolinite; and residual fractions. The complete removal of organic matter and mineral phases in sequential extractions resulted in low participation of residual forms of Pb and Zn in the total concentrations of these metals in the soils: there was lower association of metals with primary and 2:1 minerals and refractory oxides. The powerful methods used here allow an identification of the complete metal-mineral associations, such as the occurrence of Pb and Zn in the structure of the minerals. The higher incidence of Zn than Pb in the structure of Fe oxides, due to isomorphic substitution, was attributed to a smaller difference between the ionic radius of Zn2+ and Fe3+.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The construction of a soil after surface coal mining involves heavy machinery traffic during the topographic regeneration of the area, resulting in compaction of the relocated soil layers. This leads to problems with water infiltration and redistribution along the new profile, causing water erosion and consequently hampering the revegetation of the reconstructed soil. The planting of species useful in the process of soil decompaction is a promising strategy for the recovery of the soil structural quality. This study investigated the influence of different perennial grasses on the recovery of reconstructed soil aggregation in a coal mining area of the Companhia Riograndense de Mineração, located in Candiota-RS, which were planted in September/October 2007. The treatments consisted of planting: T1- Cynodon dactylon cv vaquero; T2 - Urochloa brizantha; T3 - Panicum maximun; T4 - Urochloa humidicola; T5 - Hemarthria altissima; T6 - Cynodon dactylon cv tifton 85. Bare reconstructed soil, adjacent to the experimental area, was used as control treatment (T7) and natural soil adjacent to the mining area covered with native vegetation was used as reference area (T8). Disturbed and undisturbed soil samples were collected in October/2009 (layers 0.00-0.05 and 0.10-0.15 m) to determine the percentage of macro- and microaggregates, mean weight diameter (MWD) of aggregates, organic matter content, bulk density, and macro- and microporosity. The lower values of macroaggregates and MWD in the surface than in the subsurface layer of the reconstructed soil resulted from the high degree of compaction caused by the traffic of heavy machinery on the clay material. After 24 months, all experimental grass treatments showed improvements in soil aggregation compared to the bare reconstructed soil (control), mainly in the 0.00-0.05 m layer, particularly in the two Urochloa treatments (T2 and T4) and Hemarthria altissima (T5). However, the great differences between the treatments with grasses and natural soil (reference) indicate that the recovery of the pre-mining soil structure could take decades.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Target identification for tractography studies requires solid anatomical knowledge validated by an extensive literature review across species for each seed structure to be studied. Manual literature review to identify targets for a given seed region is tedious and potentially subjective. Therefore, complementary approaches would be useful. We propose to use text-mining models to automatically suggest potential targets from the neuroscientific literature, full-text articles and abstracts, so that they can be used for anatomical connection studies and more specifically for tractography. We applied text-mining models to three structures: two well-studied structures, since validated deep brain stimulation targets, the internal globus pallidus and the subthalamic nucleus and, the nucleus accumbens, an exploratory target for treating psychiatric disorders. We performed a systematic review of the literature to document the projections of the three selected structures and compared it with the targets proposed by text-mining models, both in rat and primate (including human). We ran probabilistic tractography on the nucleus accumbens and compared the output with the results of the text-mining models and literature review. Overall, text-mining the literature could find three times as many targets as two man-weeks of curation could. The overall efficiency of the text-mining against literature review in our study was 98% recall (at 36% precision), meaning that over all the targets for the three selected seeds, only one target has been missed by text-mining. We demonstrate that connectivity for a structure of interest can be extracted from a very large amount of publications and abstracts. We believe this tool will be useful in helping the neuroscience community to facilitate connectivity studies of particular brain regions. The text mining tools used for the study are part of the HBP Neuroinformatics Platform, publicly available at http://connectivity-brainer.rhcloud.com/.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Biomedical research is currently facing a new type of challenge: an excess of information, both in terms of raw data from experiments and in the number of scientific publications describing their results. Mirroring the focus on data mining techniques to address the issues of structured data, there has recently been great interest in the development and application of text mining techniques to make more effective use of the knowledge contained in biomedical scientific publications, accessible only in the form of natural human language. This thesis describes research done in the broader scope of projects aiming to develop methods, tools and techniques for text mining tasks in general and for the biomedical domain in particular. The work described here involves more specifically the goal of extracting information from statements concerning relations of biomedical entities, such as protein-protein interactions. The approach taken is one using full parsing—syntactic analysis of the entire structure of sentences—and machine learning, aiming to develop reliable methods that can further be generalized to apply also to other domains. The five papers at the core of this thesis describe research on a number of distinct but related topics in text mining. In the first of these studies, we assessed the applicability of two popular general English parsers to biomedical text mining and, finding their performance limited, identified several specific challenges to accurate parsing of domain text. In a follow-up study focusing on parsing issues related to specialized domain terminology, we evaluated three lexical adaptation methods. We found that the accurate resolution of unknown words can considerably improve parsing performance and introduced a domain-adapted parser that reduced the error rate of theoriginal by 10% while also roughly halving parsing time. To establish the relative merits of parsers that differ in the applied formalisms and the representation given to their syntactic analyses, we have also developed evaluation methodology, considering different approaches to establishing comparable dependency-based evaluation results. We introduced a methodology for creating highly accurate conversions between different parse representations, demonstrating the feasibility of unification of idiverse syntactic schemes under a shared, application-oriented representation. In addition to allowing formalism-neutral evaluation, we argue that such unification can also increase the value of parsers for domain text mining. As a further step in this direction, we analysed the characteristics of publicly available biomedical corpora annotated for protein-protein interactions and created tools for converting them into a shared form, thus contributing also to the unification of text mining resources. The introduced unified corpora allowed us to perform a task-oriented comparative evaluation of biomedical text mining corpora. This evaluation established clear limits on the comparability of results for text mining methods evaluated on different resources, prompting further efforts toward standardization. To support this and other research, we have also designed and annotated BioInfer, the first domain corpus of its size combining annotation of syntax and biomedical entities with a detailed annotation of their relationships. The corpus represents a major design and development effort of the research group, with manual annotation that identifies over 6000 entities, 2500 relationships and 28,000 syntactic dependencies in 1100 sentences. In addition to combining these key annotations for a single set of sentences, BioInfer was also the first domain resource to introduce a representation of entity relations that is supported by ontologies and able to capture complex, structured relationships. Part I of this thesis presents a summary of this research in the broader context of a text mining system, and Part II contains reprints of the five included publications.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objective To construct a Portuguese language index of information on the practice of diagnostic radiology in order to improve the standardization of the medical language and terminology. Materials and Methods A total of 61,461 definitive reports were collected from the database of the Radiology Information System at Hospital das Clínicas – Faculdade de Medicina de Ribeirão Preto (RIS/HCFMRP) as follows: 30,000 chest x-ray reports; 27,000 mammography reports; and 4,461 thyroid ultrasonography reports. The text mining technique was applied for the selection of terms, and the ANSI/NISO Z39.19-2005 standard was utilized to construct the index based on a thesaurus structure. The system was created in *html. Results The text mining resulted in a set of 358,236 (n = 100%) words. Out of this total, 76,347 (n = 21%) terms were selected to form the index. Such terms refer to anatomical pathology description, imaging techniques, equipment, type of study and some other composite terms. The index system was developed with 78,538 *html web pages. Conclusion The utilization of text mining on a radiological reports database has allowed the construction of a lexical system in Portuguese language consistent with the clinical practice in Radiology.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Mining has severe impacts on its surrounding. Particularly in the developing countries it has degraded the environment and signigicantly altered the socio-economical dynamics of the hosts. Especially relocation disrupts people from their homes, livelihoods, cultures and social activities. Mining industry has failed to develop the local host and streghten its governance structures; instead it has further degraded the development of mineral rich third world countries, which are among the world poorest ones. Cash flows derived from mining companies have not benefitted the crass-root level that however, bears most of the detrimental impacts. Especially if the governance structure of the host is weak, the sudden wealth is likely to accelerate disparities, corruption and even fuel wars. Environmental degradation, miscommunication, mistrust and disputes over land use have created conflicts between the communities and a mining company in Obuasi, Ghana; a case study of this thesis. The disputes are deeply rooted and further fuelled by unrealistic expectations and broken promises. The relations with artisanal and illegal miners have been especially troublesome. Illegal activities, mainly encroachment of the land and assets of the mine, such as vandalising tailings pipes have resulted in profits losses, environmental degradation and security hazards. All challenges mentioned above have to be addressed locally with site-specific solutions. It is vital to increase two-way communication, initiate collaboration and build capacity of the stakeholders such as local communities, NGOs and governance authorities. The locals must be engaged to create livelihood opportunities that are designed with and for them. Capacity can also be strengthened through education and skills training, such as women’s literacy programs. In order to diminish the overdependence of locals to the mine, the activities have to be self -sufficient and able to survive without external financial and managerial inputs. Additionally adequate and fair compensation practises and dispute resolution methods that are understood and accepted by all parties have to be agreed on as early as possible.