845 resultados para mining data streams
Resumo:
We surveyed macroinvertebrate communities in 31 hill streams in the Vouga River and Mondego River catchments in central Portugal. Despite applying a "least-impacted" criterion, channel and bank management was common, with 38% of streams demonstrating channel modification (damming) and 80% with evidence of bank modification. Principal component analysis (PCA) at the family and species level related the macroinvertebrates to habitat variables derived at three spatial scales -- site (20 m), reach (200 m), and catchment. Variation in community structure between sites was similar at the species and family level and was statistically related to pH, conductivity, temperature, flow, shade, and substrate size at the site scale; channel and bank habitat and riparian vegetation and land-use at the reach scale; and altitude and slope at the catchment scale. While the effects of river management were apparent in various ecologically important habitat features at the site and reach scale, a direct relationship with macroinvertebrate assemblages was only apparent between the extent of walled banks and the secondary PCA axis described by species data. The strong relationship between catchment scale variables and descriptors of physical structure at the reach and site scale suggests that catchment-scale parameters are valuable predicators of macroinvertebrate community structure in these streams despite the anthropogenic modifications of the natural habitat.
Resumo:
Considerable attention has been paid to the potentially confounding effects of geological and seasonal variation on outputs from bioassessments in temperate streams, but our understanding about these influences is limited for many tropical systems. We explored variation in macroinvertebrate assemblage composition and the environmental characteristics of 3rd- to 5th-order streams in a geologically heterogeneous tropical landscape in the wet and dry seasons. Study streams drained catchments with land cover ranging from predominantly forested to agricultural land, but data indicated that distinct water-chemistry and substratum conditions associated with predominantly calcareous and silicate geologies were key determinants of macroinvertebrate assemblage composition. Most notably, calcareous streams were characterized by a relatively abundant noninsect fauna, particularly a pachychilid gastropod snail. The association between geological variation and assemblage composition was apparent during both seasons, but significant temporal variation in compositional characteristics was detected only in calcareous streams, possibly because of limited statistical power to detect change at silicate sites, or the limited extent of our temporal data. We discuss the implications of our findings for tropical bioassessment programs. Our key findings suggest that geology can be an important determinant of macroinvertebrate assemblages in tropical streams and that geological heterogeneity may influence the scale of temporal response in characteristic macroinvertebrate assemblages.
Resumo:
Internet users consume online targeted advertising based on information collected about them and voluntarily share personal information in social networks. Sensor information and data from smart-phones is collected and used by applications, sometimes in unclear ways. As it happens today with smartphones, in the near future sensors will be shipped in all types of connected devices, enabling ubiquitous information gathering from the physical environment, enabling the vision of Ambient Intelligence. The value of gathered data, if not obvious, can be harnessed through data mining techniques and put to use by enabling personalized and tailored services as well as business intelligence practices, fueling the digital economy. However, the ever-expanding information gathering and use undermines the privacy conceptions of the past. Natural social practices of managing privacy in daily relations are overridden by socially-awkward communication tools, service providers struggle with security issues resulting in harmful data leaks, governments use mass surveillance techniques, the incentives of the digital economy threaten consumer privacy, and the advancement of consumergrade data-gathering technology enables new inter-personal abuses. A wide range of fields attempts to address technology-related privacy problems, however they vary immensely in terms of assumptions, scope and approach. Privacy of future use cases is typically handled vertically, instead of building upon previous work that can be re-contextualized, while current privacy problems are typically addressed per type in a more focused way. Because significant effort was required to make sense of the relations and structure of privacy-related work, this thesis attempts to transmit a structured view of it. It is multi-disciplinary - from cryptography to economics, including distributed systems and information theory - and addresses privacy issues of different natures. As existing work is framed and discussed, the contributions to the state-of-theart done in the scope of this thesis are presented. The contributions add to five distinct areas: 1) identity in distributed systems; 2) future context-aware services; 3) event-based context management; 4) low-latency information flow control; 5) high-dimensional dataset anonymity. Finally, having laid out such landscape of the privacy-preserving work, the current and future privacy challenges are discussed, considering not only technical but also socio-economic perspectives.
Resumo:
With the dramatic growth of text information, there is an increasing need for powerful text mining systems that can automatically discover useful knowledge from text. Text is generally associated with all kinds of contextual information. Those contexts can be explicit, such as the time and the location where a blog article is written, and the author(s) of a biomedical publication, or implicit, such as the positive or negative sentiment that an author had when she wrote a product review; there may also be complex context such as the social network of the authors. Many applications require analysis of topic patterns over different contexts. For instance, analysis of search logs in the context of the user can reveal how we can improve the quality of a search engine by optimizing the search results according to particular users; analysis of customer reviews in the context of positive and negative sentiments can help the user summarize public opinions about a product; analysis of blogs or scientific publications in the context of a social network can facilitate discovery of more meaningful topical communities. Since context information significantly affects the choices of topics and language made by authors, in general, it is very important to incorporate it into analyzing and mining text data. In general, modeling the context in text, discovering contextual patterns of language units and topics from text, a general task which we refer to as Contextual Text Mining, has widespread applications in text mining. In this thesis, we provide a novel and systematic study of contextual text mining, which is a new paradigm of text mining treating context information as the ``first-class citizen.'' We formally define the problem of contextual text mining and its basic tasks, and propose a general framework for contextual text mining based on generative modeling of text. This conceptual framework provides general guidance on text mining problems with context information and can be instantiated into many real tasks, including the general problem of contextual topic analysis. We formally present a functional framework for contextual topic analysis, with a general contextual topic model and its various versions, which can effectively solve the text mining problems in a lot of real world applications. We further introduce general components of contextual topic analysis, by adding priors to contextual topic models to incorporate prior knowledge, regularizing contextual topic models with dependency structure of context, and postprocessing contextual patterns to extract refined patterns. The refinements on the general contextual topic model naturally lead to a variety of probabilistic models which incorporate different types of context and various assumptions and constraints. These special versions of the contextual topic model are proved effective in a variety of real applications involving topics and explicit contexts, implicit contexts, and complex contexts. We then introduce a postprocessing procedure for contextual patterns, by generating meaningful labels for multinomial context models. This method provides a general way to interpret text mining results for real users. By applying contextual text mining in the ``context'' of other text information management tasks, including ad hoc text retrieval and web search, we further prove the effectiveness of contextual text mining techniques in a quantitative way with large scale datasets. The framework of contextual text mining not only unifies many explorations of text analysis with context information, but also opens up many new possibilities for future research directions in text mining.
Resumo:
Discovery Driven Analysis (DDA) is a common feature of OLAP technology to analyze structured data. In essence, DDA helps analysts to discover anomalous data by highlighting 'unexpected' values in the OLAP cube. By giving indications to the analyst on what dimensions to explore, DDA speeds up the process of discovering anomalies and their causes. However, Discovery Driven Analysis (and OLAP in general) is only applicable on structured data, such as records in databases. We propose a system to extend DDA technology to semi-structured text documents, that is, text documents with a few structured data. Our system pipeline consists of two stages: first, the text part of each document is structured around user specified dimensions, using semi-PLSA algorithm; then, we adapt DDA to these fully structured documents, thus enabling DDA on text documents. We present some applications of this system in OLAP analysis and show how scalability issues are solved. Results show that our system can handle reasonable datasets of documents, in real time, without any need for pre-computation.
Resumo:
Work during this reporting period focused on characterizing temperature, habitat, and biological communities at candidate coolwater sites. During the past year we have collected additional temperature data from 57 candidate streams and other locations and now have records from 232 stream reaches. Eighty-two sites in Illinois have been identified as cool- or coldwater based on these records. Physical habitat surveys have been conducted at 79 sites where temperature data were available. Fish and macroinvertebrate data were obtained from the cooperative basin survey program data managers for candidate sites whenever possible and added to collections made during previous project years. This report summarizes progress for the period beginning 1 October 2009 and ending 30 September 2010. Additional analyses are ongoing and will be presented in the final report upon completion of this project.
Resumo:
The speed with which data has moved from being scarce, expensive and valuable, thus justifying detailed and careful verification and analysis to a situation where the streams of detailed data are almost too large to handle has caused a series of shifts to occur. Legal systems already have severe problems keeping up with, or even in touch with, the rate at which unexpected outcomes flow from information technology. The capacity to harness massive quantities of existing data has driven Big Data applications until recently. Now the data flows in real time are rising swiftly, become more invasive and offer monitoring potential that is eagerly sought by commerce and government alike. The ambiguities as to who own this often quite remarkably intrusive personal data need to be resolved – and rapidly - but are likely to encounter rising resistance from industrial and commercial bodies who see this data flow as ‘theirs’. There have been many changes in ICT that has led to stresses in the resolution of the conflicts between IP exploiters and their customers, but this one is of a different scale due to the wide potential for individual customisation of pricing, identification and the rising commercial value of integrated streams of diverse personal data. A new reconciliation between the parties involved is needed. New business models, and a shift in the current confusions over who owns what data into alignments that are in better accord with the community expectations. After all they are the customers, and the emergence of information monopolies needs to be balanced by appropriate consumer/subject rights. This will be a difficult discussion, but one that is needed to realise the great benefits to all that are clearly available if these issues can be positively resolved. The customers need to make these data flow contestable in some form. These Big data flows are only going to grow and become ever more instructive. A better balance is necessary, For the first time these changes are directly affecting governance of democracies, as the very effective micro targeting tools deployed in recent elections have shown. Yet the data gathered is not available to the subjects. This is not a survivable social model. The Private Data Commons needs our help. Businesses and governments exploit big data without regard for issues of legality, data quality, disparate data meanings, and process quality. This often results in poor decisions, with individuals bearing the greatest risk. The threats harbored by big data extend far beyond the individual, however, and call for new legal structures, business processes, and concepts such as a Private Data Commons. This Web extra is the audio part of a video in which author Marcus Wigan expands on his article "Big Data's Big Unintended Consequences" and discusses how businesses and governments exploit big data without regard for issues of legality, data quality, disparate data meanings, and process quality. This often results in poor decisions, with individuals bearing the greatest risk. The threats harbored by big data extend far beyond the individual, however, and call for new legal structures, business processes, and concepts such as a Private Data Commons.
Resumo:
This thesis reports on an investigation of the feasibility and usefulness of incorporating dynamic management facilities for managing sensed context data in a distributed contextaware mobile application. The investigation focuses on reducing the work required to integrate new sensed context streams in an existing context aware architecture. Current architectures require integration work for new streams and new contexts that are encountered. This means of operation is acceptable for current fixed architectures. However, as systems become more mobile the number of discoverable streams increases. Without the ability to discover and use these new streams the functionality of any given device will be limited to the streams that it knows how to decode. The integration of new streams requires that the sensed context data be understood by the current application. If the new source provides data of a type that an application currently requires then the new source should be connected to the application without any prior knowledge of the new source. If the type is similar and can be converted then this stream too should be appropriated by the application. Such applications are based on portable devices (phones, PDAs) for semi-autonomous services that use data from sensors connected to the devices, plus data exchanged with other such devices and remote servers. Such applications must handle input from a variety of sensors, refining the data locally and managing its communication from the device in volatile and unpredictable network conditions. The choice to focus on locally connected sensory input allows for the introduction of privacy and access controls. This local control can determine how the information is communicated to others. This investigation focuses on the evaluation of three approaches to sensor data management. The first system is characterised by its static management based on the pre-pended metadata. This was the reference system. Developed for a mobile system, the data was processed based on the attached metadata. The code that performed the processing was static. The second system was developed to move away from the static processing and introduce a greater freedom of handling for the data stream, this resulted in a heavy weight approach. The approach focused on pushing the processing of the data into a number of networked nodes rather than the monolithic design of the previous system. By creating a separate communication channel for the metadata it is possible to be more flexible with the amount and type of data transmitted. The final system pulled the benefits of the other systems together. By providing a small management class that would load a separate handler based on the incoming data, Dynamism was maximised whilst maintaining ease of code understanding. The three systems were then compared to highlight their ability to dynamically manage new sensed context. The evaluation took two approaches, the first is a quantitative analysis of the code to understand the complexity of the relative three systems. This was done by evaluating what changes to the system were involved for the new context. The second approach takes a qualitative view of the work required by the software engineer to reconfigure the systems to provide support for a new data stream. The evaluation highlights the various scenarios in which the three systems are most suited. There is always a trade-o↵ in the development of a system. The three approaches highlight this fact. The creation of a statically bound system can be quick to develop but may need to be completely re-written if the requirements move too far. Alternatively a highly dynamic system may be able to cope with new requirements but the developer time to create such a system may be greater than the creation of several simpler systems.
Resumo:
C3S2E '16 Proceedings of the Ninth International C* Conference on Computer Science & Software Engineering
Resumo:
La intención del proyecto es mostrar las diferentes características que ofrece Oracle en el campo de la minería de datos, con la finalidad de saber si puede ser una plataforma apta para la investigación y la educación en la universidad. En la primera parte del proyecto se estudia la aplicación “Oracle Data Miner” y como, mediante un flujo de trabajo visual e intuitivo, pueden aplicarse las distintas técnicas de minería (clasificación, regresión, clustering y asociación). Para mostrar la ejecución de estas técnicas se han usado dataset procedentes de la universidad de Irvine. Con ello se ha conseguido observar el comportamiento de los distintos algoritmos en situaciones reales. Para cada técnica se expone como evaluar su fiabilidad y como interpretar los resultados que se obtienen a partir de su aplicación. También se muestra la aplicación de las técnicas mediante el uso del lenguaje PL/SQL. Gracias a ello podemos integrar la minería de datos en nuestras aplicaciones de manera sencilla. En la segunda parte del proyecto, se ha elaborado un prototipo de una aplicación que utiliza la minería de datos, en concreto la clasificación para obtener el diagnóstico y la probabilidad de que un tumor de mama sea maligno o benigno, a partir de los resultados de una citología.
Resumo:
Gold-mining may play an important role in the maintenance of malaria worldwide. Gold-mining, mostly illegal, has significantly expanded in Colombia during the last decade in areas with limited health care and disease prevention. We report a descriptive study that was carried out to determine the malaria prevalence in gold-mining areas of Colombia, using data from the public health surveillance system (National Health Institute) during the period 2010- 2013. Gold-mining was more prevalent in the departments of Antioquia, Córdoba, Bolívar, Chocó, Nariño, Cauca, and Valle, which contributed 89.3% (270,753 cases) of the national malaria incidence from 2010-2013 and 31.6% of malaria cases were from mining areas. Mining regions, such as El Bagre, Zaragoza, and Segovia, in Antioquia, Puerto Libertador and Montelíbano, in Córdoba, and Buenaventura, in Valle del Cauca, were the most endemic areas. The annual parasite index (API) correlated with gold production (R2 0.82, p < 0.0001); for every 100 kg of gold produced, the API increased by 0.54 cases per 1,000 inhabitants. Lack of malaria control activities, together with high migration and proliferation of mosquito breeding sites, contribute to malaria in gold-mining regions. Specific control activities must be introduced to control this significant source of malaria in Colombia.
Resumo:
Streams in urban areas often utilize channelization and other bank erosion control measures to improve flood conveyance, reduce channel migration, and overbank flooding. This leads to reductions in evapotranspiration and sediment storage on floodplains. The purpose of this study is to quantify the evapotranspiration and sediment transport capacity in the Anacostia Watershed, a large Coastal Plain urban watershed, and to compare these processes to a similar sized non-urban watershed. Times series data of hydrologic and hydraulic changes in the Anacostia, as urbanization progressed between 1939-2014, were also analyzed. The data indicates lower values of warm season runoff in the non-urban stream, suggesting a shift from evapotranspiration to runoff in urban streams. Channelization in the Anacostia also increased flow velocities and decreased high flow width. The high velocities associated with channelization and the removal of floodplain storage sites allows for the continued downstream transport of sediment despite stream bank stabilization.
Resumo:
The garimpo gold mining activity has released about 2.500 tons of mercury in the Brazilian Amazonian environment in the 1980-1995 period. The northern region of Mato Grosso State, an important gold mining and trading area during the Arnazonian gold rush is now at a turning point regarding its economic future. Nowadays, the activities related to gold mining have only a low relevance on its economy. Thus, the local communities are looking for economic alternatives for the development of the region. Cooperative fish farming is one of such alternatives. However, some projects are directly implemented on areas degraded by the former garimpo activity and the mercury left behind still poses risks, especially by its potential accumulation in fish. The objective of the present study was to evaluate the levels of mercury contamination in two fish farming areas, Paranaita and Alta Floresta, with and without records of past gold-washing activity, respectively. Data such as mercury concentration in fish of different trophic level, size, and weight as well as the water physical and chemical parameters were measured and considered. These preliminary data have shown no significant difference between these two fish fanning areas, relatively to mercury levels in fish. (c) 2004 Elsevier B.V. All rights reserved.
Resumo:
The Exhibitium Project , awarded by the BBVA Foundation, is a data-driven project developed by an international consortium of research groups . One of its main objectives is to build a prototype that will serve as a base to produce a platform for the recording and exploitation of data about art-exhibitions available on the Internet . Therefore, our proposal aims to expose the methods, procedures and decision-making processes that have governed the technological implementation of this prototype, especially with regard to the reuse of WordPress (WP) as development framework.
Resumo:
Les métaheuristiques sont très utilisées dans le domaine de l'optimisation discrète. Elles permettent d’obtenir une solution de bonne qualité en un temps raisonnable, pour des problèmes qui sont de grande taille, complexes, et difficiles à résoudre. Souvent, les métaheuristiques ont beaucoup de paramètres que l’utilisateur doit ajuster manuellement pour un problème donné. L'objectif d'une métaheuristique adaptative est de permettre l'ajustement automatique de certains paramètres par la méthode, en se basant sur l’instance à résoudre. La métaheuristique adaptative, en utilisant les connaissances préalables dans la compréhension du problème, des notions de l'apprentissage machine et des domaines associés, crée une méthode plus générale et automatique pour résoudre des problèmes. L’optimisation globale des complexes miniers vise à établir les mouvements des matériaux dans les mines et les flux de traitement afin de maximiser la valeur économique du système. Souvent, en raison du grand nombre de variables entières dans le modèle, de la présence de contraintes complexes et de contraintes non-linéaires, il devient prohibitif de résoudre ces modèles en utilisant les optimiseurs disponibles dans l’industrie. Par conséquent, les métaheuristiques sont souvent utilisées pour l’optimisation de complexes miniers. Ce mémoire améliore un procédé de recuit simulé développé par Goodfellow & Dimitrakopoulos (2016) pour l’optimisation stochastique des complexes miniers stochastiques. La méthode développée par les auteurs nécessite beaucoup de paramètres pour fonctionner. Un de ceux-ci est de savoir comment la méthode de recuit simulé cherche dans le voisinage local de solutions. Ce mémoire implémente une méthode adaptative de recherche dans le voisinage pour améliorer la qualité d'une solution. Les résultats numériques montrent une augmentation jusqu'à 10% de la valeur de la fonction économique.