3 resultados para FEC using Reed-Solomon-like codes
em Massachusetts Institute of Technology
Resumo:
There are numerous text documents available in electronic form. More and more are becoming available every day. Such documents represent a massive amount of information that is easily accessible. Seeking value in this huge collection requires organization; much of the work of organizing documents can be automated through text classification. The accuracy and our understanding of such systems greatly influences their usefulness. In this paper, we seek 1) to advance the understanding of commonly used text classification techniques, and 2) through that understanding, improve the tools that are available for text classification. We begin by clarifying the assumptions made in the derivation of Naive Bayes, noting basic properties and proposing ways for its extension and improvement. Next, we investigate the quality of Naive Bayes parameter estimates and their impact on classification. Our analysis leads to a theorem which gives an explanation for the improvements that can be found in multiclass classification with Naive Bayes using Error-Correcting Output Codes. We use experimental evidence on two commonly-used data sets to exhibit an application of the theorem. Finally, we show fundamental flaws in a commonly-used feature selection algorithm and develop a statistics-based framework for text feature selection. Greater understanding of Naive Bayes and the properties of text allows us to make better use of it in text classification.
Resumo:
In previous work (Olshausen & Field 1996), an algorithm was described for learning linear sparse codes which, when trained on natural images, produces a set of basis functions that are spatially localized, oriented, and bandpass (i.e., wavelet-like). This note shows how the algorithm may be interpreted within a maximum-likelihood framework. Several useful insights emerge from this connection: it makes explicit the relation to statistical independence (i.e., factorial coding), it shows a formal relationship to the algorithm of Bell and Sejnowski (1995), and it suggests how to adapt parameters that were previously fixed.
Resumo:
Each player in the financial industry, each bank, stock exchange, government agency, or insurance company operates its own financial information system or systems. By its very nature, financial information, like the money that it represents, changes hands. Therefore the interoperation of financial information systems is the cornerstone of the financial services they support. E-services frameworks such as web services are an unprecedented opportunity for the flexible interoperation of financial systems. Naturally the critical economic role and the complexity of financial information led to the development of various standards. Yet standards alone are not the panacea: different groups of players use different standards or different interpretations of the same standard. We believe that the solution lies in the convergence of flexible E-services such as web-services and semantically rich meta-data as promised by the semantic Web; then a mediation architecture can be used for the documentation, identification, and resolution of semantic conflicts arising from the interoperation of heterogeneous financial services. In this paper we illustrate the nature of the problem in the Electronic Bill Presentment and Payment (EBPP) industry and the viability of the solution we propose. We describe and analyze the integration of services using four different formats: the IFX, OFX and SWIFT standards, and an example proprietary format. To accomplish this integration we use the COntext INterchange (COIN) framework. The COIN architecture leverages a model of sources and receivers’ contexts in reference to a rich domain model or ontology for the description and resolution of semantic heterogeneity.