45 resultados para File processing (Computer science)
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)
Resumo:
This paper presents an approach for assisting low-literacy readers in accessing Web online information. The oEducational FACILITAo tool is a Web content adaptation tool that provides innovative features and follows more intuitive interaction models regarding accessibility concerns. Especially, we propose an interaction model and a Web application that explore the natural language processing tasks of lexical elaboration and named entity labeling for improving Web accessibility. We report on the results obtained from a pilot study on usability analysis carried out with low-literacy users. The preliminary results show that oEducational FACILITAo improves the comprehension of text elements, although the assistance mechanisms might also confuse users when word sense ambiguity is introduced, by gathering, for a complex word, a list of synonyms with multiple meanings. This fact evokes a future solution in which the correct sense for a complex word in a sentence is identified, solving this pervasive characteristic of natural languages. The pilot study also identified that experienced computer users find the tool to be more useful than novice computer users do.
Resumo:
The amount of textual information digitally stored is growing every day. However, our capability of processing and analyzing that information is not growing at the same pace. To overcome this limitation, it is important to develop semiautomatic processes to extract relevant knowledge from textual information, such as the text mining process. One of the main and most expensive stages of the text mining process is the text pre-processing stage, where the unstructured text should be transformed to structured format such as an attribute-value table. The stemming process, i.e. linguistics normalization, is usually used to find the attributes of this table. However, the stemming process is strongly dependent on the language in which the original textual information is given. Furthermore, for most languages, the stemming algorithms proposed in the literature are computationally expensive. In this work, several improvements of the well know Porter stemming algorithm for the Portuguese language, which explore the characteristics of this language, are proposed. Experimental results show that the proposed algorithm executes in far less time without affecting the quality of the generated stems.
Resumo:
Today several different unsupervised classification algorithms are commonly used to cluster similar patterns in a data set based only on its statistical properties. Specially in image data applications, self-organizing methods for unsupervised classification have been successfully applied for clustering pixels or group of pixels in order to perform segmentation tasks. The first important contribution of this paper refers to the development of a self-organizing method for data classification, named Enhanced Independent Component Analysis Mixture Model (EICAMM), which was built by proposing some modifications in the Independent Component Analysis Mixture Model (ICAMM). Such improvements were proposed by considering some of the model limitations as well as by analyzing how it should be improved in order to become more efficient. Moreover, a pre-processing methodology was also proposed, which is based on combining the Sparse Code Shrinkage (SCS) for image denoising and the Sobel edge detector. In the experiments of this work, the EICAMM and other self-organizing models were applied for segmenting images in their original and pre-processed versions. A comparative analysis showed satisfactory and competitive image segmentation results obtained by the proposals presented herein. (C) 2008 Published by Elsevier B.V.
Resumo:
The goal of this paper is to study and propose a new technique for noise reduction used during the reconstruction of speech signals, particularly for biomedical applications. The proposed method is based on Kalman filtering in the time domain combined with spectral subtraction. Comparison with discrete Kalman filter in the frequency domain shows better performance of the proposed technique. The performance is evaluated by using the segmental signal-to-noise ratio and the Itakura-Saito`s distance. Results have shown that Kalman`s filter in time combined with spectral subtraction is more robust and efficient, improving the Itakura-Saito`s distance by up to four times. (C) 2007 Elsevier Ltd. All rights reserved.
Resumo:
Most post-processors for boundary element (BE) analysis use an auxiliary domain mesh to display domain results, working against the profitable modelling process of a pure boundary discretization. This paper introduces a novel visualization technique which preserves the basic properties of the boundary element methods. The proposed algorithm does not require any domain discretization and is based on the direct and automatic identification of isolines. Another critical aspect of the visualization of domain results in BE analysis is the effort required to evaluate results in interior points. In order to tackle this issue, the present article also provides a comparison between the performance of two different BE formulations (conventional and hybrid). In addition, this paper presents an overview of the most common post-processing and visualization techniques in BE analysis, such as the classical algorithms of scan line and the interpolation over a domain discretization. The results presented herein show that the proposed algorithm offers a very high performance compared with other visualization procedures.
Resumo:
In Natural Language Processing (NLP) symbolic systems, several linguistic phenomena, for instance, the thematic role relationships between sentence constituents, such as AGENT, PATIENT, and LOCATION, can be accounted for by the employment of a rule-based grammar. Another approach to NLP concerns the use of the connectionist model, which has the benefits of learning, generalization and fault tolerance, among others. A third option merges the two previous approaches into a hybrid one: a symbolic thematic theory is used to supply the connectionist network with initial knowledge. Inspired on neuroscience, it is proposed a symbolic-connectionist hybrid system called BIO theta PRED (BIOlogically plausible thematic (theta) symbolic-connectionist PREDictor), designed to reveal the thematic grid assigned to a sentence. Its connectionist architecture comprises, as input, a featural representation of the words (based on the verb/noun WordNet classification and on the classical semantic microfeature representation), and, as output, the thematic grid assigned to the sentence. BIO theta PRED is designed to ""predict"" thematic (semantic) roles assigned to words in a sentence context, employing biologically inspired training algorithm and architecture, and adopting a psycholinguistic view of thematic theory.
Resumo:
An implementation of a computational tool to generate new summaries from new source texts is presented, by means of the connectionist approach (artificial neural networks). Among other contributions that this work intends to bring to natural language processing research, the use of a more biologically plausible connectionist architecture and training for automatic summarization is emphasized. The choice relies on the expectation that it may bring an increase in computational efficiency when compared to the sa-called biologically implausible algorithms.
Resumo:
This paper presents a small-area CMOS current-steering segmented digital-to-analog converter (DAC) design intended for RF transmitters in 2.45 GHz Bluetooth applications. The current-source design strategy is based on an iterative scheme whose variables are adjusted in a simple way, minimizing the area and the power consumption, and meeting the design specifications. A theoretical analysis of static-dynamic requirements and a new layout strategy to attain a small-area current-steering DAC are included. The DAC was designed and implemented in 0.35 mu m CMOS technology, requiring an active area of just 200 mu m x 200 mu m. Experimental results, with a full-scale output current of 700 mu A and a 3.3 V power supply, showed a spurious-free dynamic range of 58 dB for a 1 MHz output sine wave and sampling frequency of 50 MHz, with differential and integral nonlinearity of 0.3 and 0.37 LSB, respectively.
Resumo:
This work proposes a method based on both preprocessing and data mining with the objective of identify harmonic current sources in residential consumers. In addition, this methodology can also be applied to identify linear and nonlinear loads. It should be emphasized that the entire database was obtained through laboratory essays, i.e., real data were acquired from residential loads. Thus, the residential system created in laboratory was fed by a configurable power source and in its output were placed the loads and the power quality analyzers (all measurements were stored in a microcomputer). So, the data were submitted to pre-processing, which was based on attribute selection techniques in order to minimize the complexity in identifying the loads. A newer database was generated maintaining only the attributes selected, thus, Artificial Neural Networks were trained to realized the identification of loads. In order to validate the methodology proposed, the loads were fed both under ideal conditions (without harmonics), but also by harmonic voltages within limits pre-established. These limits are in accordance with IEEE Std. 519-1992 and PRODIST (procedures to delivery energy employed by Brazilian`s utilities). The results obtained seek to validate the methodology proposed and furnish a method that can serve as alternative to conventional methods.
Resumo:
In this paper, a framework for detection of human skin in digital images is proposed. This framework is composed of a training phase and a detection phase. A skin class model is learned during the training phase by processing several training images in a hybrid and incremental fuzzy learning scheme. This scheme combines unsupervised-and supervised-learning: unsupervised, by fuzzy clustering, to obtain clusters of color groups from training images; and supervised to select groups that represent skin color. At the end of the training phase, aggregation operators are used to provide combinations of selected groups into a skin model. In the detection phase, the learned skin model is used to detect human skin in an efficient way. Experimental results show robust and accurate human skin detection performed by the proposed framework.
Resumo:
In this work, an algorithm to compute the envelope of non-destructive testing (NDT) signals is proposed. This method allows increasing the speed and reducing the memory in extensive data processing. Also, this procedure presents advantage of preserving the data information for physical modeling applications of time-dependent measurements. The algorithm is conceived to be applied for analyze data from non-destructive testing. The comparison between different envelope methods and the proposed method, applied to Magnetic Bark Signal (MBN), is studied. (C) 2010 Elsevier Ltd. All rights reserved.
Resumo:
The classical approach for acoustic imaging consists of beamforming, and produces the source distribution of interest convolved with the array point spread function. This convolution smears the image of interest, significantly reducing its effective resolution. Deconvolution methods have been proposed to enhance acoustic images and have produced significant improvements. Other proposals involve covariance fitting techniques, which avoid deconvolution altogether. However, in their traditional presentation, these enhanced reconstruction methods have very high computational costs, mostly because they have no means of efficiently transforming back and forth between a hypothetical image and the measured data. In this paper, we propose the Kronecker Array Transform ( KAT), a fast separable transform for array imaging applications. Under the assumption of a separable array, it enables the acceleration of imaging techniques by several orders of magnitude with respect to the fastest previously available methods, and enables the use of state-of-the-art regularized least-squares solvers. Using the KAT, one can reconstruct images with higher resolutions than was previously possible and use more accurate reconstruction techniques, opening new and exciting possibilities for acoustic imaging.
Resumo:
In Part I [""Fast Transforms for Acoustic Imaging-Part I: Theory,"" IEEE TRANSACTIONS ON IMAGE PROCESSING], we introduced the Kronecker array transform (KAT), a fast transform for imaging with separable arrays. Given a source distribution, the KAT produces the spectral matrix which would be measured by a separable sensor array. In Part II, we establish connections between the KAT, beamforming and 2-D convolutions, and show how these results can be used to accelerate classical and state of the art array imaging algorithms. We also propose using the KAT to accelerate general purpose regularized least-squares solvers. Using this approach, we avoid ill-conditioned deconvolution steps and obtain more accurate reconstructions than previously possible, while maintaining low computational costs. We also show how the KAT performs when imaging near-field source distributions, and illustrate the trade-off between accuracy and computational complexity. Finally, we show that separable designs can deliver accuracy competitive with multi-arm logarithmic spiral geometries, while having the computational advantages of the KAT.
Resumo:
The canonical representation of speech constitutes a perfect reconstruction (PR) analysis-synthesis system. Its parameters are the autoregressive (AR) model coefficients, the pitch period and the voiced and unvoiced components of the excitation represented as transform coefficients. Each set of parameters may be operated on independently. A time-frequency unvoiced excitation (TFUNEX) model is proposed that has high time resolution and selective frequency resolution. Improved time-frequency fit is obtained by using for antialiasing cancellation the clustering of pitch-synchronous transform tracks defined in the modulation transform domain. The TFUNEX model delivers high-quality speech while compressing the unvoiced excitation representation about 13 times over its raw transform coefficient representation for wideband speech.
Resumo:
The TCP/IP architecture was consolidated as a standard to the distributed systems. However, there are several researches and discussions about alternatives to the evolution of this architecture and, in this study area, this work presents the Title Model to contribute with the application needs support by the cross layer ontology use and the horizontal addressing, in a next generation Internet. For a practical viewpoint, is showed the network cost reduction for the distributed programming example, in networks with layer 2 connectivity. To prove the title model enhancement, it is presented the network analysis performed for the message passing interface, sending a vector of integers and returning its sum. By this analysis, it is confirmed that the current proposal allows, in this environment, a reduction of 15,23% over the total network traffic, in bytes.