873 resultados para deviance mining
Resumo:
Detecting bugs as early as possible plays an important role in ensuring software quality before shipping. We argue that mining previous bug fixes can produce good knowledge about why bugs happen and how they are fixed. In this paper, we mine the change history of 717 open source projects to extract bug-fix patterns. We also manually inspect many of the bugs we found to get insights into the contexts and reasons behind those bugs. For instance, we found out that missing null checks and missing initializations are very recurrent and we believe that they can be automatically detected and fixed.
Resumo:
Dynamically typed languages lack information about the types of variables in the source code. Developers care about this information as it supports program comprehension. Ba- sic type inference techniques are helpful, but may yield many false positives or negatives. We propose to mine information from the software ecosys- tem on how frequently given types are inferred unambigu- ously to improve the quality of type inference for a single system. This paper presents an approach to augment existing type inference techniques by supplementing the informa- tion available in the source code of a project with data from other projects written in the same language. For all available projects, we track how often messages are sent to instance variables throughout the source code. Predictions for the type of a variable are made based on the messages sent to it. The evaluation of a proof-of-concept prototype shows that this approach works well for types that are sufficiently popular, like those from the standard librarie, and tends to create false positives for unpopular or domain specific types. The false positives are, in most cases, fairly easily identifiable. Also, the evaluation data shows a substantial increase in the number of correctly inferred types when compared to the non-augmented type inference.
Artisanal and small scale mining in Mongolia: Statistical overview based on survey data by suom 2012
Resumo:
Background Simple Sequence Repeats (SSRs) are widely used in population genetic studies but their classical development is costly and time-consuming. The ever-increasing available DNA datasets generated by high-throughput techniques offer an inexpensive alternative for SSRs discovery. Expressed Sequence Tags (ESTs) have been widely used as SSR source for plants of economic relevance but their application to non-model species is still modest. Methods Here, we explored the use of publicly available ESTs (GenBank at the National Center for Biotechnology Information-NCBI) for SSRs development in non-model plants, focusing on genera listed by the International Union for the Conservation of Nature (IUCN). We also search two model genera with fully annotated genomes for EST-SSRs, Arabidopsis and Oryza, and used them as controls for genome distribution analyses. Overall, we downloaded 16 031 555 sequences for 258 plant genera which were mined for SSRsand their primers with the help of QDD1. Genome distribution analyses in Oryza and Arabidopsis were done by blasting the sequences with SSR against the Oryza sativa and Arabidopsis thaliana reference genomes implemented in the Basal Local Alignment Tool (BLAST) of the NCBI website. Finally, we performed an empirical test to determine the performance of our EST-SSRs in a few individuals from four species of two eudicot genera, Trifolium and Centaurea. Results We explored a total of 14 498 726 EST sequences from the dbEST database (NCBI) in 257 plant genera from the IUCN Red List. We identify a very large number (17 102) of ready-to-test EST-SSRs in most plant genera (193) at no cost. Overall, dinucleotide and trinucleotide repeats were the prevalent types but the abundance of the various types of repeat differed between taxonomic groups. Control genomes revealed that trinucleotide repeats were mostly located in coding regions while dinucleotide repeats were largely associated with untranslated regions. Our results from the empirical test revealed considerable amplification success and transferability between congenerics. Conclusions The present work represents the first large-scale study developing SSRs by utilizing publicly accessible EST databases in threatened plants. Here we provide a very large number of ready-to-test EST-SSR (17 102) for 193 genera. The cross-species transferability suggests that the number of possible target species would be large. Since trinucleotide repeats are abundant and mainly linked to exons they might be useful in evolutionary and conservation studies. Altogether, our study highly supports the use of EST databases as an extremely affordable and fast alternative for SSR developing in threatened plants.
Resumo:
The rapid expansion of the mineral and metal mining sector in the past decade was accompanied by an increase in social conflicts. What are the impacts of large-scale mining operations? What are the strategies used by transnational corporations to gain access to underground resources and legitimize their activities? And how do local and indigenous communities confronted with mining react to, negotiate with and resist these activities? This book covers 13 case studies of copper, gold, uranium and other mining operations, situated in Latin America, Africa, Asia, Australia and Switzerland. With an extensive introduction to the subject and a systematic comparison across mining operations in different phases of development and social contexts, it serves as a primer and reference book for activists, students and researchers alike.
Resumo:
Index tracking has become one of the most common strategies in asset management. The index-tracking problem consists of constructing a portfolio that replicates the future performance of an index by including only a subset of the index constituents in the portfolio. Finding the most representative subset is challenging when the number of stocks in the index is large. We introduce a new three-stage approach that at first identifies promising subsets by employing data-mining techniques, then determines the stock weights in the subsets using mixed-binary linear programming, and finally evaluates the subsets based on cross validation. The best subset is returned as the tracking portfolio. Our approach outperforms state-of-the-art methods in terms of out-of-sample performance and running times.
Resumo:
In this paper, we describe NewsCATS (news categorization and trading system), a system implemented to predict stock price trends for the time immediately after the publication of press releases. NewsCATS consists mainly of three components. The first component retrieves relevant information from press releases through the application of text preprocessing techniques. The second component sorts the press releases into predefined categories. Finally, appropriate trading strategies are derived by the third component by means of the earlier categorization. The findings indicate that a categorization of press releases is able to provide additional information that can be used to forecast stock price trends, but that an adequate trading strategy is essential for the results of the categorization to be fully exploited.
Resumo:
Academic and industrial research in the late 90s have brought about an exponential explosion of DNA sequence data. Automated expert systems are being created to help biologists to extract patterns, trends and links from this ever-deepening ocean of information. Two such systems aimed on retrieving and subsequently utilizing phylogenetically relevant information have been developed in this dissertation, the major objective of which was to automate the often difficult and confusing phylogenetic reconstruction process. ^ Popular phylogenetic reconstruction methods, such as distance-based methods, attempt to find an optimal tree topology (that reflects the relationships among related sequences and their evolutionary history) by searching through the topology space. Various compromises between the fast (but incomplete) and exhaustive (but computationally prohibitive) search heuristics have been suggested. An intelligent compromise algorithm that relies on a flexible “beam” search principle from the Artificial Intelligence domain and uses the pre-computed local topology reliability information to adjust the beam search space continuously is described in the second chapter of this dissertation. ^ However, sometimes even a (virtually) complete distance-based method is inferior to the significantly more elaborate (and computationally expensive) maximum likelihood (ML) method. In fact, depending on the nature of the sequence data in question either method might prove to be superior. Therefore, it is difficult (even for an expert) to tell a priori which phylogenetic reconstruction method—distance-based, ML or maybe maximum parsimony (MP)—should be chosen for any particular data set. ^ A number of factors, often hidden, influence the performance of a method. For example, it is generally understood that for a phylogenetically “difficult” data set more sophisticated methods (e.g., ML) tend to be more effective and thus should be chosen. However, it is the interplay of many factors that one needs to consider in order to avoid choosing an inferior method (potentially a costly mistake, both in terms of computational expenses and in terms of reconstruction accuracy.) ^ Chapter III of this dissertation details a phylogenetic reconstruction expert system that selects a superior proper method automatically. It uses a classifier (a Decision Tree-inducing algorithm) to map a new data set to the proper phylogenetic reconstruction method. ^
Resumo:
At the end of the communist era, which was characterised as a closed social experiment, Romania found itself in the middle of a globalization process. Its industrial capacities have been considerably reduced through a poor and spendthrift management. There was a mass exodus of the labour force abroad and the educational background for the remaining part was no longer in agreement with the labour market. On these grounds, the vectors of globalization, in the form of foreign investments, entered Romania effortlessly. There even were local communities where the arrival of foreign investors was expected like a second coming of Christ. This is the context in which a Canadian company set forth the mining project Rosia Montana Gold Corporation. The implementation of the project should have started in 2005. Nevertheless, the project has not been effectively launched yet. This situation is based on what we call Romanian glocalization, namely a specific confrontation between global and local on Romanian land