899 resultados para Information Filtering, Pattern Mining, Relevance Feature Discovery, Text Mining
Resumo:
The Cerrado is the second largest Brazilian biome and contains the headwaters of three major hydrological basins in Brazil. In spite of the biological and ecological relevance of this biome, there is little information about how land use changes affect the chemistry of low-order streams in the Cerrado. To evaluate these effects streams that drain areas under natural, rural, and urban land cover were sampled near Brasilia, Brazil. Water samples were collected between September 2004 and December 2006. Chemical concentrations generally followed the pattern of Urban > Rural > Natural. Median conductivity of stream water of 21.6 (interquartile: 22.7) mu S/cm in urban streams was three and five-fold greater relative to rural and natural areas, respectively. In the wet season, despite of increasing discharge, concentration of many solutes were higher, particularly in rural and natural streams. Streams also presented higher total dissolved N (TDN) loads from natural to rural and urban although DIN:DON ratios did not differ significantly. In natural and urban streams TDN was 80 and 77% dissolved organic N, respectively. These results indicate that alterations in land cover from natural to rural and urban are changing stream water chemistry in the Cerrado with increasing solute concentrations, in addition to increased TDN output in areas under urban cover, with potential effects on ecosystem function.
Resumo:
Introduction: Internet users are increasingly using the worldwide web to search for information relating to their health. This situation makes it necessary to create specialized tools capable of supporting users in their searches. Objective: To apply and compare strategies that were developed to investigate the use of the Portuguese version of Medical Subject Headings (MeSH) for constructing an automated classifier for Brazilian Portuguese-language web-based content within or outside of the field of healthcare, focusing on the lay public. Methods: 3658 Brazilian web pages were used to train the classifier and 606 Brazilian web pages were used to validate it. The strategies proposed were constructed using content-based vector methods for text classification, such that Naive Bayes was used for the task of classifying vector patterns with characteristics obtained through the proposed strategies. Results: A strategy named InDeCS was developed specifically to adapt MeSH for the problem that was put forward. This approach achieved better accuracy for this pattern classification task (0.94 sensitivity, specificity and area under the ROC curve). Conclusions: Because of the significant results achieved by InDeCS, this tool has been successfully applied to the Brazilian healthcare search portal known as Busca Saude. Furthermore, it could be shown that MeSH presents important results when used for the task of classifying web-based content focusing on the lay public. It was also possible to show from this study that MeSH was able to map out mutable non-deterministic characteristics of the web. (c) 2010 Elsevier Inc. All rights reserved.
Resumo:
This text aims to approach museums` role in the production of knowledge and how objects are transformed into documents when museums incorporate them. On accepting the effects of such transformation, museums start working not only with material goods, but also symbolic goods. The collection manager or exhibition curator communicate through documents rather than bringing into light its intrinsic content. In this sense, every process involving museum documents, from the selection of collections to exhibitions, has a rhetoric and ideological nature which is given. Museums must search for meanings through correlations established in the process of producing information. Exhibitions should present objects in multiple contexts, giving visitors the opportunity to participate and attribute their own meanings to them.
Resumo:
Age-related changes in running kinematics have been reported in the literature using classical inferential statistics. However, this approach has been hampered by the increased number of biomechanical gait variables reported and subsequently the lack of differences presented in these studies. Data mining techniques have been applied in recent biomedical studies to solve this problem using a more general approach. In the present work, we re-analyzed lower extremity running kinematic data of 17 young and 17 elderly male runners using the Support Vector Machine (SVM) classification approach. In total, 31 kinematic variables were extracted to train the classification algorithm and test the generalized performance. The results revealed different accuracy rates across three different kernel methods adopted in the classifier, with the linear kernel performing the best. A subsequent forward feature selection algorithm demonstrated that with only six features, the linear kernel SVM achieved 100% classification performance rate, showing that these features provided powerful combined information to distinguish age groups. The results of the present work demonstrate potential in applying this approach to improve knowledge about the age-related differences in running gait biomechanics and encourages the use of the SVM in other clinical contexts. (C) 2010 Elsevier Ltd. All rights reserved.
Resumo:
The brown rot fungus Wolfiporia cocos and the selective white rot fungus Perenniporia medulla-panis produce peptides and phenolate-derivative compounds as low molecular weight Fe(3+)-reductants. Phenolates were the major compounds with Fe(3+)-reducing activity in both fungi and displayed Fe(3+)-reducing activity at pH 2.0 and 4.5 in the absence and presence of oxalic acid. The chemical structures of these compounds were identified. Together with Fe(3+) and H(2)O(2) (mediated Fenton reaction) they produced oxygen radicals that oxidized lignocellulosic polysaccharides and lignin extensively in vitro under conditions similar to those found in vivo. These results indicate that, in addition to the extensively studied Gloeophyllum trabeum-a model brown rot fungus-other brown rot fungi as well as selective white rot fungi, possess the means to promote Fenton chemistry to degrade cellulose and hemicellulose, and to modify lignin. Moreover, new information is provided, particularly regarding how lignin is attacked, and either repolymerized or solubilized depending on the type of fungal attack, and suggests a new pathway for selective white rot degradation of wood. The importance of Fenton reactions mediated by phenolates operating separately or synergistically with carbohydrate-degrading enzymes in brown rot fungi, and lignin-modifying enzymes in white rot fungi is discussed. This research improves our understanding of natural processes in carbon cycling in the environment, which may enable the exploration of novel methods for bioconversion of lignocellulose in the production of biofuels or polymers, in addition to the development of new and better ways to protect wood from degradation by microorganisms.
Resumo:
This paper deals with the H(infinity) recursive estimation problem for general rectangular time-variant descriptor systems in discrete time. Riccati-equation based recursions for filtered and predicted estimates are developed based on a data fitting approach and game theory. In this approach, the nature determines a state sequence seeking to maximize the estimation cost, whereas the estimator tries to find an estimate that brings the estimation cost to a minimum. A solution exists for a specified gamma-level if the resulting cost is positive. In order to present some computational alternatives to the H(infinity) filters developed, they are rewritten in information form along with the respective array algorithms. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
This work proposes a method based on both preprocessing and data mining with the objective of identify harmonic current sources in residential consumers. In addition, this methodology can also be applied to identify linear and nonlinear loads. It should be emphasized that the entire database was obtained through laboratory essays, i.e., real data were acquired from residential loads. Thus, the residential system created in laboratory was fed by a configurable power source and in its output were placed the loads and the power quality analyzers (all measurements were stored in a microcomputer). So, the data were submitted to pre-processing, which was based on attribute selection techniques in order to minimize the complexity in identifying the loads. A newer database was generated maintaining only the attributes selected, thus, Artificial Neural Networks were trained to realized the identification of loads. In order to validate the methodology proposed, the loads were fed both under ideal conditions (without harmonics), but also by harmonic voltages within limits pre-established. These limits are in accordance with IEEE Std. 519-1992 and PRODIST (procedures to delivery energy employed by Brazilian`s utilities). The results obtained seek to validate the methodology proposed and furnish a method that can serve as alternative to conventional methods.
Resumo:
Since the 1990s several large companies have been publishing nonfinancial performance reports. Focusing initially on the physical environment, these reports evolved to consider social relations, as well as data on the firm`s economic performance. A few mining companies pioneered this trend, and in the last years some of them incorporated the three dimensions of sustainable development, publishing so-called sustainability reports. This article reviews 31 reports published between 2001 and 2006 by four major mining companies. A set of 62 assessment items organized in six categories (namely context and commitment, management, environmental, social and economic performance, and accessibility and assurance) were selected to guide the review. The items were derived from international literature and recommended best practices, including the Global Reporting Initiative G3 framework. A content analysis was performed using the report as a sampling unit, and using phrases, graphics, or tables containing certain information as data collection units. A basic rating scale (0 or 1) was used for noting the presence or absence of information and a final percentage score was obtained for each report. Results show that there is a clear evolution in report`s comprehensiveness and depth. Categories ""accessibility and assurance"" and ""economic performance"" featured the lowest scores and do not present a clear evolution trend in the period, whereas categories ""context and commitment"" and ""social performance"" presented the best results and regular improvement; the category ""environmental performance,"" despite it not reaching the biggest scores, also featured constant evolution. Description of data measurement techniques, besides more comprehensive third-party verification are the items most in need of improvement.
Resumo:
Gene duplication followed by acquisition of specific targeting information and dual targeting were evolutionary strategies enabling organelles to cope with overlapping functions. We examined the evolutionary trend of dual-targeted single-gene products in Arabidopsis and rice genomes. The number of paralogous proteins encoded by gene families and the dual-targeted orthologous proteins were analysed. The number of dual-targeted proteins and the corresponding gene-family sizes were similar in Arabidopsis and rice irrespective of genome sizes. We show that dual targeting of methionine aminopeptidase, monodehydroascorbate reductase, glutamyl-tRNA synthetase, and tyrosyl-tRNA synthetase was maintained despite occurrence of whole-genome duplications in Arabidopsis and rice as well as a polyploidization followed by a diploidization event (gene loss) in the latter.
Resumo:
One of the goals of the ARC funded Eresearch project called Sharing access and analytical tools for ethnographic digital media using high speed networks, or simply EthnoER is to take outputs of normal linguistic analytical processes and present them online in a system we have called the EthnoER online presentation and annotation system, or EOPAS.
Resumo:
Two experiments were conducted on the nature of expert perception in the sport of squash. In the first experiment, ten expert and fifteen novice players attempted to predict the direction and force of squash strokes from either a film display (occluded at variable time periods before and after the opposing player had struck the ball) or a matched point-light display (containing only the basic kinematic features of the opponent's movement pattern). Experts outperformed the novices under both display conditions, and the same basic time windows that characterised expert and novice pick-up of information in the film task also persisted in the point-light task. This suggests that the experts' perceptual advantage is directly related to their superior pick-up of essential kinematic information. In the second experiment, the vision of six expert and six less skilled players was occluded by remotely triggered liquid-crystal spectacles at quasi-random intervals during simulated match play. Players were required to complete their current stroke even when the display was occluded and their prediction performance was assessed with respect to whether they moved to the correct half of the court to match the direction and depth of the opponent's stroke. Consistent with experiment 1, experts were found to be superior in their advance pick-up of both directional and depth information when the display was occluded during the opponent's hitting action. However, experts also remained better than chance, and clearly superior to less skilled players, in their prediction performance under conditions where occlusion occurred before any significant pre-contact preparatory movement by the opposing player was visible. This additional source of expert superiority is attributable to their superior attunement to the information contained in the situational probabilities and sequential dependences within their opponent's pattern of play.
Resumo:
There are many techniques for electricity market price forecasting. However, most of them are designed for expected price analysis rather than price spike forecasting. An effective method of predicting the occurrence of spikes has not yet been observed in the literature so far. In this paper, a data mining based approach is presented to give a reliable forecast of the occurrence of price spikes. Combined with the spike value prediction techniques developed by the same authors, the proposed approach aims at providing a comprehensive tool for price spike forecasting. In this paper, feature selection techniques are firstly described to identify the attributes relevant to the occurrence of spikes. A simple introduction to the classification techniques is given for completeness. Two algorithms: support vector machine and probability classifier are chosen to be the spike occurrence predictors and are discussed in details. Realistic market data are used to test the proposed model with promising results.
Resumo:
This paper develops an interactive approach for exploratory spatial data analysis. Measures of attribute similarity and spatial proximity are combined in a clustering model to support the identification of patterns in spatial information. Relationships between the developed clustering approach, spatial data mining and choropleth display are discussed. Analysis of property crime rates in Brisbane, Australia is presented. A surprising finding in this research is that there are substantial inconsistencies in standard choropleth display options found in two widely used commercial geographical information systems, both in terms of definition and performance. The comparative results demonstrate the usefulness and appeal of the developed approach in a geographical information system environment for exploratory spatial data analysis.
Resumo:
Intelligence (IQ) can be seen as the efficiency of mental processes or cognition, as can basic information processing (IP) tasks like those used in our ongoing Memory, Attention and Problem Solving (MAPS) study. Measures of IQ and IP are correlated and both have a genetic component, so we are studying how the genetic variance in IQ is related to the genetic variance in IP. We measured intelligence with five subscales of the Multidimensional Aptitude Battery (MAB). The IP tasks included four variants of choice reaction time (CRT) and a visual inspection time (IT). The influence of genetic factors on the variances in each of the IQ, IP, and IT tasks was investigated in 250 identical and nonidentical twin pairs aged 16 years. For a subset of 50 pairs we have test–retest data that allow us to estimate the stability of the measures. MX was used for a multivariate genetic analysis that addresses whether the variance in IQ and IP measures is possibly mediated by common genetic factors. Analyses that show the modeled genetic and environmental influences on these measures of cognitive efficiency will be presented and their relevance to ideas on intelligence will be discussed.