775 resultados para mining data streams
Resumo:
Over 150 million cubic meter of sand-sized sediment has disappeared from the central region of the San Francisco Bay Coastal System during the last half century. This enormous loss may reflect numerous anthropogenic influences, such as watershed damming, bay-fill development, aggregate mining, and dredging. The reduction in Bay sediment also appears to be linked to a reduction in sediment supply and recent widespread erosion of adjacent beaches, wetlands, and submarine environments. A unique, multi-faceted provenance study was performed to definitively establish the primary sources, sinks, and transport pathways of beach sized-sand in the region, thereby identifying the activities and processes that directly limit supply to the outer coast. This integrative program is based on comprehensive surficial sediment sampling of the San Francisco Bay Coastal System, including the seabed, Bay floor, area beaches, adjacent rock units, and major drainages. Analyses of sample morphometrics and biological composition (e.g., Foraminifera) were then integrated with a suite of tracers including 87Sr/86Sr and 143Nd/144Nd isotopes, rare earth elements, semi-quantitative X-ray diffraction mineralogy, and heavy minerals, and with process-based numerical modeling, in situ current measurements, and bedform asymmetry to robustly determine the provenance of beach-sized sand in the region.
Resumo:
During the SINOPS project, an optimal state of the art simulation of the marine silicon cycle is attempted employing a biogeochemical ocean general circulation model (BOGCM) through three particular time steps relevant for global (paleo-) climate. In order to tune the model optimally, results of the simulations are compared to a comprehensive data set of 'real' observations. SINOPS' scientific data management ensures that data structure becomes homogeneous throughout the project. Practical work routine comprises systematic progress from data acquisition, through preparation, processing, quality check and archiving, up to the presentation of data to the scientific community. Meta-information and analytical data are mapped by an n-dimensional catalogue in order to itemize the analytical value and to serve as an unambiguous identifier. In practice, data management is carried out by means of the online-accessible information system PANGAEA, which offers a tool set comprising a data warehouse, Graphical Information System (GIS), 2-D plot, cross-section plot, etc. and whose multidimensional data model promotes scientific data mining. Besides scientific and technical aspects, this alliance between scientific project team and data management crew serves to integrate the participants and allows them to gain mutual respect and appreciation.
Resumo:
Drawing upon critical, communications, and educational theories, this thesis develops a novel framing of the problem of social risk in the extractive sector, as it relates to the building of respectful relationships with indigenous peoples. Building upon Bakhtin’s dialogism, the thesis demonstrates the linkage of this aspect of social risk to professional education, and specifically, to the undergraduate mining engineering curriculum, and develops a framework for the development of skills related to intercultural competence in the education of mining engineers. The knowledge of social risk, as well as the level of intercultural competence, of students in the mining engineering program, is investigated through a mixture of surveys and focus groups – as is the impact of specific learning interventions. One aspect of this investigation is whether development of these attributes alters graduates’ conception of their identity as mining engineers, i.e. the range and scope of responsibilities, and understanding of to whom responsibilities are owed, and their role in building trusting relationships with communities. Survey results demonstrate that student openness to the perspectives of other cultures increases with exposure to the second year curriculum. Students became more knowledgeable about social dimensions of responsible mining, but not about cultural dimensions. Analysis of focus group data shows that students are highly motivated to improve community perspectives and acceptance. It is observed that students want to show respect for diverse peoples and communities where they will work, but they are hampered by their inability to appreciate the viewpoints of people who do not share their values. They embrace benefit sharing and environmental protection as norms, but they mistakenly conclude that opposition to mining is rooted in a lack of education rather than in cultural values. Three, sequential, threshold concepts are identified as impeding development of intercultural competence: Awareness and Acknowledgement of Different Forms of Knowledge; Recognition that Value Systems are a Function of Culture; Respect for varied perceptions of Social Wellbeing and Quality of Life. Future curriculum development in the undergraduate mining engineering program, as well as in other educational programs relevant to the extractive sector, can be effectively targeted by focusing on these threshold concepts.
Resumo:
One of the global phenomena with threats to environmental health and safety is artisanal mining. There are ambiguities in the manner in which an ore-processing facility operates which hinders the mining capacity of these miners in Ghana. These problems are reviewed on the basis of current socio-economic, health and safety, environmental, and use of rudimentary technologies which limits fair-trade deals to miners. This research sought to use an established data-driven, geographic information (GIS)-based system employing the spatial analysis approach for locating a centralized processing facility within the Wassa Amenfi-Prestea Mining Area (WAPMA) in the Western region of Ghana. A spatial analysis technique that utilizes ModelBuilder within the ArcGIS geoprocessing environment through suitability modeling will systematically and simultaneously analyze a geographical dataset of selected criteria. The spatial overlay analysis methodology and the multi-criteria decision analysis approach were selected to identify the most preferred locations to site a processing facility. For an optimal site selection, seven major criteria including proximity to settlements, water resources, artisanal mining sites, roads, railways, tectonic zones, and slopes were considered to establish a suitable location for a processing facility. Site characterizations and environmental considerations, incorporating identified constraints such as proximity to large scale mines, forest reserves and state lands to site an appropriate position were selected. The analysis was limited to criteria that were selected and relevant to the area under investigation. Saaty’s analytical hierarchy process was utilized to derive relative importance weights of the criteria and then a weighted linear combination technique was applied to combine the factors for determination of the degree of potential site suitability. The final map output indicates estimated potential sites identified for the establishment of a facility centre. The results obtained provide intuitive areas suitable for consideration
Resumo:
[EN]This paper describes a face detection system which goes beyond traditional approaches normally designed for still images. First the video stream context is considered to apply the detector, and therefore, the resulting system is designed taking into consideration a main feature available in a video stream, i.e. temporal coherence. The resulting system builds a feature based model for each detected face, and searches them using various model information in the next frame. The results achieved for video stream processing outperform Rowley-Kanade's and Viola-Jones' solutions providing eye and face data in a reduced time with a notable correct detection rate.
Resumo:
In this talk, I will describe various computational modelling and data mining solutions that form the basis of how the office of Deputy Head of Department (Resources) works to serve you. These include lessons I learn about, and from, optimisation issues in resource allocation, uncertainty analysis on league tables, modelling the process of winning external grants, and lessons we learn from student satisfaction surveys, some of which I have attempted to inject into our planning processes.
Resumo:
We surveyed macroinvertebrate communities in 31 hill streams in the Vouga River and Mondego River catchments in central Portugal. Despite applying a "least-impacted" criterion, channel and bank management was common, with 38% of streams demonstrating channel modification (damming) and 80% with evidence of bank modification. Principal component analysis (PCA) at the family and species level related the macroinvertebrates to habitat variables derived at three spatial scales -- site (20 m), reach (200 m), and catchment. Variation in community structure between sites was similar at the species and family level and was statistically related to pH, conductivity, temperature, flow, shade, and substrate size at the site scale; channel and bank habitat and riparian vegetation and land-use at the reach scale; and altitude and slope at the catchment scale. While the effects of river management were apparent in various ecologically important habitat features at the site and reach scale, a direct relationship with macroinvertebrate assemblages was only apparent between the extent of walled banks and the secondary PCA axis described by species data. The strong relationship between catchment scale variables and descriptors of physical structure at the reach and site scale suggests that catchment-scale parameters are valuable predicators of macroinvertebrate community structure in these streams despite the anthropogenic modifications of the natural habitat.
Resumo:
Considerable attention has been paid to the potentially confounding effects of geological and seasonal variation on outputs from bioassessments in temperate streams, but our understanding about these influences is limited for many tropical systems. We explored variation in macroinvertebrate assemblage composition and the environmental characteristics of 3rd- to 5th-order streams in a geologically heterogeneous tropical landscape in the wet and dry seasons. Study streams drained catchments with land cover ranging from predominantly forested to agricultural land, but data indicated that distinct water-chemistry and substratum conditions associated with predominantly calcareous and silicate geologies were key determinants of macroinvertebrate assemblage composition. Most notably, calcareous streams were characterized by a relatively abundant noninsect fauna, particularly a pachychilid gastropod snail. The association between geological variation and assemblage composition was apparent during both seasons, but significant temporal variation in compositional characteristics was detected only in calcareous streams, possibly because of limited statistical power to detect change at silicate sites, or the limited extent of our temporal data. We discuss the implications of our findings for tropical bioassessment programs. Our key findings suggest that geology can be an important determinant of macroinvertebrate assemblages in tropical streams and that geological heterogeneity may influence the scale of temporal response in characteristic macroinvertebrate assemblages.
Resumo:
Internet users consume online targeted advertising based on information collected about them and voluntarily share personal information in social networks. Sensor information and data from smart-phones is collected and used by applications, sometimes in unclear ways. As it happens today with smartphones, in the near future sensors will be shipped in all types of connected devices, enabling ubiquitous information gathering from the physical environment, enabling the vision of Ambient Intelligence. The value of gathered data, if not obvious, can be harnessed through data mining techniques and put to use by enabling personalized and tailored services as well as business intelligence practices, fueling the digital economy. However, the ever-expanding information gathering and use undermines the privacy conceptions of the past. Natural social practices of managing privacy in daily relations are overridden by socially-awkward communication tools, service providers struggle with security issues resulting in harmful data leaks, governments use mass surveillance techniques, the incentives of the digital economy threaten consumer privacy, and the advancement of consumergrade data-gathering technology enables new inter-personal abuses. A wide range of fields attempts to address technology-related privacy problems, however they vary immensely in terms of assumptions, scope and approach. Privacy of future use cases is typically handled vertically, instead of building upon previous work that can be re-contextualized, while current privacy problems are typically addressed per type in a more focused way. Because significant effort was required to make sense of the relations and structure of privacy-related work, this thesis attempts to transmit a structured view of it. It is multi-disciplinary - from cryptography to economics, including distributed systems and information theory - and addresses privacy issues of different natures. As existing work is framed and discussed, the contributions to the state-of-theart done in the scope of this thesis are presented. The contributions add to five distinct areas: 1) identity in distributed systems; 2) future context-aware services; 3) event-based context management; 4) low-latency information flow control; 5) high-dimensional dataset anonymity. Finally, having laid out such landscape of the privacy-preserving work, the current and future privacy challenges are discussed, considering not only technical but also socio-economic perspectives.
Resumo:
With the dramatic growth of text information, there is an increasing need for powerful text mining systems that can automatically discover useful knowledge from text. Text is generally associated with all kinds of contextual information. Those contexts can be explicit, such as the time and the location where a blog article is written, and the author(s) of a biomedical publication, or implicit, such as the positive or negative sentiment that an author had when she wrote a product review; there may also be complex context such as the social network of the authors. Many applications require analysis of topic patterns over different contexts. For instance, analysis of search logs in the context of the user can reveal how we can improve the quality of a search engine by optimizing the search results according to particular users; analysis of customer reviews in the context of positive and negative sentiments can help the user summarize public opinions about a product; analysis of blogs or scientific publications in the context of a social network can facilitate discovery of more meaningful topical communities. Since context information significantly affects the choices of topics and language made by authors, in general, it is very important to incorporate it into analyzing and mining text data. In general, modeling the context in text, discovering contextual patterns of language units and topics from text, a general task which we refer to as Contextual Text Mining, has widespread applications in text mining. In this thesis, we provide a novel and systematic study of contextual text mining, which is a new paradigm of text mining treating context information as the ``first-class citizen.'' We formally define the problem of contextual text mining and its basic tasks, and propose a general framework for contextual text mining based on generative modeling of text. This conceptual framework provides general guidance on text mining problems with context information and can be instantiated into many real tasks, including the general problem of contextual topic analysis. We formally present a functional framework for contextual topic analysis, with a general contextual topic model and its various versions, which can effectively solve the text mining problems in a lot of real world applications. We further introduce general components of contextual topic analysis, by adding priors to contextual topic models to incorporate prior knowledge, regularizing contextual topic models with dependency structure of context, and postprocessing contextual patterns to extract refined patterns. The refinements on the general contextual topic model naturally lead to a variety of probabilistic models which incorporate different types of context and various assumptions and constraints. These special versions of the contextual topic model are proved effective in a variety of real applications involving topics and explicit contexts, implicit contexts, and complex contexts. We then introduce a postprocessing procedure for contextual patterns, by generating meaningful labels for multinomial context models. This method provides a general way to interpret text mining results for real users. By applying contextual text mining in the ``context'' of other text information management tasks, including ad hoc text retrieval and web search, we further prove the effectiveness of contextual text mining techniques in a quantitative way with large scale datasets. The framework of contextual text mining not only unifies many explorations of text analysis with context information, but also opens up many new possibilities for future research directions in text mining.
Resumo:
Discovery Driven Analysis (DDA) is a common feature of OLAP technology to analyze structured data. In essence, DDA helps analysts to discover anomalous data by highlighting 'unexpected' values in the OLAP cube. By giving indications to the analyst on what dimensions to explore, DDA speeds up the process of discovering anomalies and their causes. However, Discovery Driven Analysis (and OLAP in general) is only applicable on structured data, such as records in databases. We propose a system to extend DDA technology to semi-structured text documents, that is, text documents with a few structured data. Our system pipeline consists of two stages: first, the text part of each document is structured around user specified dimensions, using semi-PLSA algorithm; then, we adapt DDA to these fully structured documents, thus enabling DDA on text documents. We present some applications of this system in OLAP analysis and show how scalability issues are solved. Results show that our system can handle reasonable datasets of documents, in real time, without any need for pre-computation.
Resumo:
Work during this reporting period focused on characterizing temperature, habitat, and biological communities at candidate coolwater sites. During the past year we have collected additional temperature data from 57 candidate streams and other locations and now have records from 232 stream reaches. Eighty-two sites in Illinois have been identified as cool- or coldwater based on these records. Physical habitat surveys have been conducted at 79 sites where temperature data were available. Fish and macroinvertebrate data were obtained from the cooperative basin survey program data managers for candidate sites whenever possible and added to collections made during previous project years. This report summarizes progress for the period beginning 1 October 2009 and ending 30 September 2010. Additional analyses are ongoing and will be presented in the final report upon completion of this project.
Resumo:
The speed with which data has moved from being scarce, expensive and valuable, thus justifying detailed and careful verification and analysis to a situation where the streams of detailed data are almost too large to handle has caused a series of shifts to occur. Legal systems already have severe problems keeping up with, or even in touch with, the rate at which unexpected outcomes flow from information technology. The capacity to harness massive quantities of existing data has driven Big Data applications until recently. Now the data flows in real time are rising swiftly, become more invasive and offer monitoring potential that is eagerly sought by commerce and government alike. The ambiguities as to who own this often quite remarkably intrusive personal data need to be resolved – and rapidly - but are likely to encounter rising resistance from industrial and commercial bodies who see this data flow as ‘theirs’. There have been many changes in ICT that has led to stresses in the resolution of the conflicts between IP exploiters and their customers, but this one is of a different scale due to the wide potential for individual customisation of pricing, identification and the rising commercial value of integrated streams of diverse personal data. A new reconciliation between the parties involved is needed. New business models, and a shift in the current confusions over who owns what data into alignments that are in better accord with the community expectations. After all they are the customers, and the emergence of information monopolies needs to be balanced by appropriate consumer/subject rights. This will be a difficult discussion, but one that is needed to realise the great benefits to all that are clearly available if these issues can be positively resolved. The customers need to make these data flow contestable in some form. These Big data flows are only going to grow and become ever more instructive. A better balance is necessary, For the first time these changes are directly affecting governance of democracies, as the very effective micro targeting tools deployed in recent elections have shown. Yet the data gathered is not available to the subjects. This is not a survivable social model. The Private Data Commons needs our help. Businesses and governments exploit big data without regard for issues of legality, data quality, disparate data meanings, and process quality. This often results in poor decisions, with individuals bearing the greatest risk. The threats harbored by big data extend far beyond the individual, however, and call for new legal structures, business processes, and concepts such as a Private Data Commons. This Web extra is the audio part of a video in which author Marcus Wigan expands on his article "Big Data's Big Unintended Consequences" and discusses how businesses and governments exploit big data without regard for issues of legality, data quality, disparate data meanings, and process quality. This often results in poor decisions, with individuals bearing the greatest risk. The threats harbored by big data extend far beyond the individual, however, and call for new legal structures, business processes, and concepts such as a Private Data Commons.
Resumo:
This thesis reports on an investigation of the feasibility and usefulness of incorporating dynamic management facilities for managing sensed context data in a distributed contextaware mobile application. The investigation focuses on reducing the work required to integrate new sensed context streams in an existing context aware architecture. Current architectures require integration work for new streams and new contexts that are encountered. This means of operation is acceptable for current fixed architectures. However, as systems become more mobile the number of discoverable streams increases. Without the ability to discover and use these new streams the functionality of any given device will be limited to the streams that it knows how to decode. The integration of new streams requires that the sensed context data be understood by the current application. If the new source provides data of a type that an application currently requires then the new source should be connected to the application without any prior knowledge of the new source. If the type is similar and can be converted then this stream too should be appropriated by the application. Such applications are based on portable devices (phones, PDAs) for semi-autonomous services that use data from sensors connected to the devices, plus data exchanged with other such devices and remote servers. Such applications must handle input from a variety of sensors, refining the data locally and managing its communication from the device in volatile and unpredictable network conditions. The choice to focus on locally connected sensory input allows for the introduction of privacy and access controls. This local control can determine how the information is communicated to others. This investigation focuses on the evaluation of three approaches to sensor data management. The first system is characterised by its static management based on the pre-pended metadata. This was the reference system. Developed for a mobile system, the data was processed based on the attached metadata. The code that performed the processing was static. The second system was developed to move away from the static processing and introduce a greater freedom of handling for the data stream, this resulted in a heavy weight approach. The approach focused on pushing the processing of the data into a number of networked nodes rather than the monolithic design of the previous system. By creating a separate communication channel for the metadata it is possible to be more flexible with the amount and type of data transmitted. The final system pulled the benefits of the other systems together. By providing a small management class that would load a separate handler based on the incoming data, Dynamism was maximised whilst maintaining ease of code understanding. The three systems were then compared to highlight their ability to dynamically manage new sensed context. The evaluation took two approaches, the first is a quantitative analysis of the code to understand the complexity of the relative three systems. This was done by evaluating what changes to the system were involved for the new context. The second approach takes a qualitative view of the work required by the software engineer to reconfigure the systems to provide support for a new data stream. The evaluation highlights the various scenarios in which the three systems are most suited. There is always a trade-o↵ in the development of a system. The three approaches highlight this fact. The creation of a statically bound system can be quick to develop but may need to be completely re-written if the requirements move too far. Alternatively a highly dynamic system may be able to cope with new requirements but the developer time to create such a system may be greater than the creation of several simpler systems.
Resumo:
C3S2E '16 Proceedings of the Ninth International C* Conference on Computer Science & Software Engineering