881 resultados para Opinion retrieval, mining and summarization framework
Resumo:
Full Text / Article complet
Resumo:
Modeling and predicting co-occurrences of events is a fundamental problem of unsupervised learning. In this contribution we develop a statistical framework for analyzing co-occurrence data in a general setting where elementary observations are joint occurrences of pairs of abstract objects from two finite sets. The main challenge for statistical models in this context is to overcome the inherent data sparseness and to estimate the probabilities for pairs which were rarely observed or even unobserved in a given sample set. Moreover, it is often of considerable interest to extract grouping structure or to find a hierarchical data organization. A novel family of mixture models is proposed which explain the observed data by a finite number of shared aspects or clusters. This provides a common framework for statistical inference and structure discovery and also includes several recently proposed models as special cases. Adopting the maximum likelihood principle, EM algorithms are derived to fit the model parameters. We develop improved versions of EM which largely avoid overfitting problems and overcome the inherent locality of EM--based optimization. Among the broad variety of possible applications, e.g., in information retrieval, natural language processing, data mining, and computer vision, we have chosen document retrieval, the statistical analysis of noun/adjective co-occurrence and the unsupervised segmentation of textured images to test and evaluate the proposed algorithms.
Resumo:
In any data mining applications, automated text and text and image retrieval of information is needed. This becomes essential with the growth of the Internet and digital libraries. Our approach is based on the latent semantic indexing (LSI) and the corresponding term-by-document matrix suggested by Berry and his co-authors. Instead of using deterministic methods to find the required number of first "k" singular triplets, we propose a stochastic approach. First, we use Monte Carlo method to sample and to build much smaller size term-by-document matrix (e.g. we build k x k matrix) from where we then find the first "k" triplets using standard deterministic methods. Second, we investigate how we can reduce the problem to finding the "k"-largest eigenvalues using parallel Monte Carlo methods. We apply these methods to the initial matrix and also to the reduced one. The algorithms are running on a cluster of workstations under MPI and results of the experiments arising in textual retrieval of Web documents as well as comparison of the stochastic methods proposed are presented. (C) 2003 IMACS. Published by Elsevier Science B.V. All rights reserved.
Resumo:
Automatic indexing and retrieval of digital data poses major challenges. The main problem arises from the ever increasing mass of digital media and the lack of efficient methods for indexing and retrieval of such data based on the semantic content rather than keywords. To enable intelligent web interactions, or even web filtering, we need to be capable of interpreting the information base in an intelligent manner. For a number of years research has been ongoing in the field of ontological engineering with the aim of using ontologies to add such (meta) knowledge to information. In this paper, we describe the architecture of a system (Dynamic REtrieval Analysis and semantic metadata Management (DREAM)) designed to automatically and intelligently index huge repositories of special effects video clips, based on their semantic content, using a network of scalable ontologies to enable intelligent retrieval. The DREAM Demonstrator has been evaluated as deployed in the film post-production phase to support the process of storage, indexing and retrieval of large data sets of special effects video clips as an exemplar application domain. This paper provides its performance and usability results and highlights the scope for future enhancements of the DREAM architecture which has proven successful in its first and possibly most challenging proving ground, namely film production, where it is already in routine use within our test bed Partners' creative processes. (C) 2009 Published by Elsevier B.V.
Resumo:
In a world of almost permanent and rapidly increasing electronic data availability, techniques of filtering, compressing, and interpreting this data to transform it into valuable and easily comprehensible information is of utmost importance. One key topic in this area is the capability to deduce future system behavior from a given data input. This book brings together for the first time the complete theory of data-based neurofuzzy modelling and the linguistic attributes of fuzzy logic in a single cohesive mathematical framework. After introducing the basic theory of data-based modelling, new concepts including extended additive and multiplicative submodels are developed and their extensions to state estimation and data fusion are derived. All these algorithms are illustrated with benchmark and real-life examples to demonstrate their efficiency. Chris Harris and his group have carried out pioneering work which has tied together the fields of neural networks and linguistic rule-based algortihms. This book is aimed at researchers and scientists in time series modeling, empirical data modeling, knowledge discovery, data mining, and data fusion.
Resumo:
The A-Train constellation of satellites provides a new capability to measure vertical cloud profiles that leads to more detailed information on ice-cloud microphysical properties than has been possible up to now. A variational radar–lidar ice-cloud retrieval algorithm (VarCloud) takes advantage of the complementary nature of the CloudSat radar and Cloud–Aerosol Lidar and Infrared Pathfinder Satellite Observations (CALIPSO) lidar to provide a seamless retrieval of ice water content, effective radius, and extinction coefficient from the thinnest cirrus (seen only by the lidar) to the thickest ice cloud (penetrated only by the radar). In this paper, several versions of the VarCloud retrieval are compared with the CloudSat standard ice-only retrieval of ice water content, two empirical formulas that derive ice water content from radar reflectivity and temperature, and retrievals of vertically integrated properties from the Moderate Resolution Imaging Spectroradiometer (MODIS) radiometer. The retrieved variables typically agree to within a factor of 2, on average, and most of the differences can be explained by the different microphysical assumptions. For example, the ice water content comparison illustrates the sensitivity of the retrievals to assumed ice particle shape. If ice particles are modeled as oblate spheroids rather than spheres for radar scattering then the retrieved ice water content is reduced by on average 50% in clouds with a reflectivity factor larger than 0 dBZ. VarCloud retrieves optical depths that are on average a factor-of-2 lower than those from MODIS, which can be explained by the different assumptions on particle mass and area; if VarCloud mimics the MODIS assumptions then better agreement is found in effective radius and optical depth is overestimated. MODIS predicts the mean vertically integrated ice water content to be around a factor-of-3 lower than that from VarCloud for the same retrievals, however, because the MODIS algorithm assumes that its retrieved effective radius (which is mostly representative of cloud top) is constant throughout the depth of the cloud. These comparisons highlight the need to refine microphysical assumptions in all retrieval algorithms and also for future studies to compare not only the mean values but also the full probability density function.
Resumo:
The need for consistent assimilation of satellite measurements for numerical weather prediction led operational meteorological centers to assimilate satellite radiances directly using variational data assimilation systems. More recently there has been a renewed interest in assimilating satellite retrievals (e.g., to avoid the use of relatively complicated radiative transfer models as observation operators for data assimilation). The aim of this paper is to provide a rigorous and comprehensive discussion of the conditions for the equivalence between radiance and retrieval assimilation. It is shown that two requirements need to be satisfied for the equivalence: (i) the radiance observation operator needs to be approximately linear in a region of the state space centered at the retrieval and with a radius of the order of the retrieval error; and (ii) any prior information used to constrain the retrieval should not underrepresent the variability of the state, so as to retain the information content of the measurements. Both these requirements can be tested in practice. When these requirements are met, retrievals can be transformed so as to represent only the portion of the state that is well constrained by the original radiance measurements and can be assimilated in a consistent and optimal way, by means of an appropriate observation operator and a unit matrix as error covariance. Finally, specific cases when retrieval assimilation can be more advantageous (e.g., when the estimate sought by the operational assimilation system depends on the first guess) are discussed.
Resumo:
The governance of water resources is prominent in both water policy agendas and academic scholarship. Political ecologists have made important advances in reconceptualising the relationship between water and society. Yet, while they have stressed both the scalar dimensions, and the politicised nature, of water governance, analyses of its scalar politics are relatively nascent. In this paper, we consider how the increased demand for water resources by the growing mining industry in Peru reconfigures and rescales water governance. In Peru, the mining industry’s thirst for water draws in, and reshapes, social relations, technologies, institutions and discourses that operate over varying spatial and temporal scales. We develop the concept of waterscape to examine these multiple ways in water is co-produced through mining, and become embedded in changing modes and structures of water governance, often beyond the watershed scale. We argue that an examination of waterscapes avoids the limitations of thinking about water in purely material terms, structuring analysis of water issues according to traditional spatial scales and institutional hierarchies, and taking these scales and structures for granted.
Resumo:
Since the implementation of Ghana's national Structural Adjustment Programme (SAP), policies associated with the programme have been criticized for perpetuating poverty within the country's subsistence economy. This article brings new evidence to bear on the contention that the SAP has both fuelled the uncontrolled growth of informal, poverty-driven artisanal gold mining and further marginalized its impoverished participants. Throughout the adjustment period, it has been a central goal of the government to promote the expansion of large-scale gold mining through foreign investment. Confronted with the challenge of resuscitating a deteriorating gold mining industry, the government introduced a number of tax breaks and policies in an effort to create an attractive investment climate for foreign multinational mining companies. The rapid rise in exploration and excavation activities that has since taken place has displaced thousands of previously-undisturbed subsistence artisanal gold miners. This, along with a laissez faire land concession allocation procedure, has exacerbated conflicts between mining parties. Despite legalizing small-scale mining in 1989, the Ghanaian government continues to implement procedurally complex and bureaucratically unwieldy regulations and policies for artisanal operators which have the effect of favouring the interests of established large-scale miners.
Resumo:
The P-found protein folding and unfolding simulation repository is designed to allow scientists to perform data mining and other analyses across large, distributed simulation data sets. There are two storage components in P-found: a primary repository of simulation data that is used to populate the second component, and a data warehouse that contains important molecular properties. These properties may be used for data mining studies. Here we demonstrate how grid technologies can support multiple, distributed P-found installations. In particular, we look at two aspects: firstly, how grid data management technologies can be used to access the distributed data warehouses; and secondly, how the grid can be used to transfer analysis programs to the primary repositories — this is an important and challenging aspect of P-found, due to the large data volumes involved and the desire of scientists to maintain control of their own data. The grid technologies we are developing with the P-found system will allow new large data sets of protein folding simulations to be accessed and analysed in novel ways, with significant potential for enabling scientific discovery.
Resumo:
Analyses of neo-liberal change in African mining tend to frame discussion through the lens of an overarching structural perspective. Far less attention has been paid to the way change is enacted within social relations in mining communities. To this end, our chapter considers how development in the Tanzanian mineral sector transforms people’s relationships and stimulates new iterations of power and agency within local trajectories of development, focusing on the case of artisanal gold mining in Mgusu village in Geita region, Tanzania. The aim is to trace how neo-liberal change configures market rationality and property relations in ways that can fundamentally alter social relationships within the local community, occupational groups and families, raising both opportunities for wealth accumulation and the potential to entrench poverty. The creative action involved in these processes generates new associational ties and repertoires of practice, as miners’ respond to change and the need to protect their livelihoods.
Resumo:
Artisanal miners have tended to be portrayed in the literature and media as people who work hard and play hard, not infrequently depicted as ‘rough diamonds’ likely to cross the boundaries of appropriate behaviour through pursuit of wealth and flamboyant living, often at the cost of local environmental damage. A popular alternative image is that of marginalised labourers, driven by poverty to toil in harsh conditions and pursuing mining livelihoods in the face of national governments and large-scale mining companies’ subversion of their land and mineral rights. Both views reflect partial realities, but are inclined to exaggerate the position of miners as mischief-making rogues or victims. Through documentation of the multi-faceted nature of Tanzanian artisanal miners’ work and home lives during the country’s on-going economic mineralisation, we endeavour to convey a balanced rendering of their aspirations, occupational identity and social ties. Our emphasis is on their working lives as artisans, how they organise themselves and contend with the risks of their occupation, including their engagement with government policy and large-scale mining interests.
Resumo:
Coal mining and incineration of solid residues of health services (SRHS) generate several contaminants that are delivered into the environment, such as heavy metals and dioxins. These xenobiotics can lead to oxidative stress overgeneration in organisms and cause different kinds of pathologies, including cancer. In the present study the concentrations of heavy metals such as lead, copper, iron, manganese and zinc in the urine, as well as several enzymatic and non-enzymatic biomarkers of oxidative stress in the blood (contents of lipoperoxidation = TBARS, protein carbonyls = PC, protein thiols = PT, alpha-tocopherol = AT, reduced glutathione = GSH, and the activities of glutathione S-transferase = GST, glutathione reductase = GR, glutathione peroxidase = GPx, catalase = CAT and superoxide dismutase = SOD), in the blood of six different groups (n = 20 each) of subjects exposed to airborne contamination related to coal mining as well as incineration of solid residues of health services (SRHS) after vitamin E (800 mg/day) and vitamin C (500 mg/day) supplementation during 6 months, which were compared to the situation before the antioxidant intervention (Avila et al., Ecotoxicology 18:1150-1157, 2009; Possamai et al., Ecotoxicology 18:1158-1164, 2009). Except for the decreased manganese contents, heavy metal concentrations were elevated in all groups exposed to both sources of airborne contamination when compared to controls. TBARS and PC concentrations, which were elevated before the antioxidant intervention decreased after the antioxidant supplementation. Similarly, the contents of PC, AT and GSH, which were decreased before the antioxidant intervention, reached values near those found in controls, GPx activity was reestablished in underground miners, and SOD, CAT and GST activities were reestablished in all groups. The results showed that the oxidative stress condition detected previously to the antioxidant supplementation in both directly and indirectly subjects exposed to the airborne contamination from coal dusts and SRHS incineration, was attenuated after the antioxidant intervention.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Includes bibliography.