827 resultados para clustering accuracy
Resumo:
To identify chemical descriptors to distinguish Cuban from non-Cuban rums, analyses of 44 samples of rum from 15 different countries are described. To provide the chemical descriptors, analyses of the the mineral fraction, phenolic compounds, caramel, alcohols, acetic acid, ethyl acetate, ketones, and aldehydes were carried out. The analytical data were treated through the following chemometric methods: principal component analysis (PCA), partial least square-discriminate analysis (PLS-DA), and linear discriminate analysis (LDA). These analyses indicated 23 analytes as relevant chemical descriptors for the separation of rums into two distinct groups. The possibility of clustering the rum samples investigated through PCA analysis led to an accumulative percentage of 70.4% in the first three principal components, and isoamyl alcohol, n-propyl alcohol, copper, iron, 2-furfuraldehyde (furfuraldehyde), phenylmethanal (benzaldehyde), epicatechin, and vanillin were used as chemical descriptors. By applying the PLS-DA technique to the whole set of analytical data, the following analytes have been selected as descriptors: acetone, sec-butyl alcohol, isobutyl alcohol, ethyl acetate, methanol, isoamyl alcohol, magnesium, sodium, lead, iron, manganese, copper, zinc, 4-hydroxy3,5-dimethoxybenzaldehyde (syringaldehyde), methaldehyde (formaldehyde), 5-hydroxymethyl-2furfuraldehyde (5-HMF), acetalclehyde, 2-furfuraldehyde, 2-butenal (crotonaldehyde), n-pentanal (valeraldehyde), iso-pentanal (isovaleraldehyde), benzaldehyde, 2,3-butanodione monoxime, acetylacetone, epicatechin, and vanillin. By applying the LIDA technique, a model was developed, and the following analytes were selected as descriptors: ethyl acetate, sec-butyl alcohol, n-propyl alcohol, n-butyl alcohol, isoamyl alcohol, isobutyl alcohol, caramel, catechin, vanillin, epicatechin, manganese, acetalclehyde, 4-hydroxy-3-methoxybenzoic acid, 2-butenal, 4-hydroxy-3,5-dimethoxybenzoic acid, cyclopentanone, acetone, lead, zinc, calcium, barium, strontium, and sodium. This model allowed the discrimination of Cuban rums from the others with 88.2% accuracy.
The effect of teacher correction and student revision on university A-level student written accuracy
Resumo:
Data mining is a relatively new field of research that its objective is to acquire knowledge from large amounts of data. In medical and health care areas, due to regulations and due to the availability of computers, a large amount of data is becoming available [27]. On the one hand, practitioners are expected to use all this data in their work but, at the same time, such a large amount of data cannot be processed by humans in a short time to make diagnosis, prognosis and treatment schedules. A major objective of this thesis is to evaluate data mining tools in medical and health care applications to develop a tool that can help make rather accurate decisions. In this thesis, the goal is finding a pattern among patients who got pneumonia by clustering of lab data values which have been recorded every day. By this pattern we can generalize it to the patients who did not have been diagnosed by this disease whose lab values shows the same trend as pneumonia patients does. There are 10 tables which have been extracted from a big data base of a hospital in Jena for my work .In ICU (intensive care unit), COPRA system which is a patient management system has been used. All the tables and data stored in German Language database.
Resumo:
In the field of operational water management, Model Predictive Control (MPC) has gained popularity owing to its versatility and flexibility. The MPC controller, which takes predictions, time delay and uncertainties into account, can be designed for multi-objective management problems and for large-scale systems. Nonetheless, a critical obstacle, which needs to be overcome in MPC, is the large computational burden when a large-scale system is considered or a long prediction horizon is involved. In order to solve this problem, we use an adaptive prediction accuracy (APA) approach that can reduce the computational burden almost by half. The proposed MPC scheme with this scheme is tested on the northern Dutch water system, which comprises Lake IJssel, Lake Marker, the River IJssel and the North Sea Canal. The simulation results show that by using the MPC-APA scheme, the computational time can be reduced to a large extent and a flood protection problem over longer prediction horizons can be well solved.
Resumo:
Using vector autoregressive (VAR) models and Monte-Carlo simulation methods we investigate the potential gains for forecasting accuracy and estimation uncertainty of two commonly used restrictions arising from economic relationships. The Örst reduces parameter space by imposing long-term restrictions on the behavior of economic variables as discussed by the literature on cointegration, and the second reduces parameter space by imposing short-term restrictions as discussed by the literature on serial-correlation common features (SCCF). Our simulations cover three important issues on model building, estimation, and forecasting. First, we examine the performance of standard and modiÖed information criteria in choosing lag length for cointegrated VARs with SCCF restrictions. Second, we provide a comparison of forecasting accuracy of Ötted VARs when only cointegration restrictions are imposed and when cointegration and SCCF restrictions are jointly imposed. Third, we propose a new estimation algorithm where short- and long-term restrictions interact to estimate the cointegrating and the cofeature spaces respectively. We have three basic results. First, ignoring SCCF restrictions has a high cost in terms of model selection, because standard information criteria chooses too frequently inconsistent models, with too small a lag length. Criteria selecting lag and rank simultaneously have a superior performance in this case. Second, this translates into a superior forecasting performance of the restricted VECM over the VECM, with important improvements in forecasting accuracy ñreaching more than 100% in extreme cases. Third, the new algorithm proposed here fares very well in terms of parameter estimation, even when we consider the estimation of long-term parameters, opening up the discussion of joint estimation of short- and long-term parameters in VAR models.
Resumo:
A descoberta e a análise de conglomerados textuais são processos muito importantes para a estruturação, organização e a recuperação de informações, assim como para a descoberta de conhecimento. Isto porque o ser humano coleta e armazena uma quantidade muito grande de dados textuais, que necessitam ser vasculhados, estudados, conhecidos e organizados de forma a fornecerem informações que lhe dêem o conhecimento para a execução de uma tarefa que exija a tomada de uma decisão. É justamente nesse ponto que os processos de descoberta e de análise de conglomerados (clustering) se insere, pois eles auxiliam na exploração e análise dos dados, permitindo conhecer melhor seu conteúdo e inter-relações. No entanto, esse processo, por ser aplicado em textos, está sujeito a sofrer interferências decorrentes de problemas da própria linguagem e do vocabulário utilizado nos mesmos, tais como erros ortográficos, sinonímia, homonímia, variações morfológicas e similares. Esta Tese apresenta uma solução para minimizar esses problemas, que consiste na utilização de “conceitos” (estruturas capazes de representar objetos e idéias presentes nos textos) na modelagem do conteúdo dos documentos. Para tanto, são apresentados os conceitos e as áreas relacionadas com o tema, os trabalhos correlatos (revisão bibliográfica), a metodologia proposta e alguns experimentos que permitem desenvolver determinados argumentos e comprovar algumas hipóteses sobre a proposta. As conclusões principais desta Tese indicam que a técnica de conceitos possui diversas vantagens, dentre elas a utilização de uma quantidade muito menor, porém mais representativa, de descritores para os documentos, o que torna o tempo e a complexidade do seu processamento muito menor, permitindo que uma quantidade muito maior deles seja analisada. Outra vantagem está no fato de o poder de expressão de conceitos permitir que os usuários analisem os aglomerados resultantes muito mais facilmente e compreendam melhor seu conteúdo e forma. Além do método e da metodologia proposta, esta Tese possui diversas contribuições, entre elas vários trabalhos e artigos desenvolvidos em parceria com outros pesquisadores e colegas.
Resumo:
Industrial companies in developing countries are facing rapid growths, and this requires having in place the best organizational processes to cope with the market demand. Sales forecasting, as a tool aligned with the general strategy of the company, needs to be as much accurate as possible, in order to achieve the sales targets by making available the right information for purchasing, planning and control of production areas, and finally attending in time and form the demand generated. The present dissertation uses a single case study from the subsidiary of an international explosives company based in Brazil, Maxam, experiencing high growth in sales, and therefore facing the challenge to adequate its structure and processes properly for the rapid growth expected. Diverse sales forecast techniques have been analyzed to compare the actual monthly sales forecast, based on the sales force representatives’ market knowledge, with forecasts based on the analysis of historical sales data. The dissertation findings show how the combination of both qualitative and quantitative forecasts, by the creation of a combined forecast that considers both client´s demand knowledge from the sales workforce with time series analysis, leads to the improvement on the accuracy of the company´s sales forecast.
Resumo:
Este estudo investiga o poder preditivo fora da amostra, um mês à frente, de um modelo baseado na regra de Taylor para previsão de taxas de câmbio. Revisamos trabalhos relevantes que concluem que modelos macroeconômicos podem explicar a taxa de câmbio de curto prazo. Também apresentamos estudos que são céticos em relação à capacidade de variáveis macroeconômicas preverem as variações cambiais. Para contribuir com o tema, este trabalho apresenta sua própria evidência através da implementação do modelo que demonstrou o melhor resultado preditivo descrito por Molodtsova e Papell (2009), o “symmetric Taylor rule model with heterogeneous coefficients, smoothing, and a constant”. Para isso, utilizamos uma amostra de 14 moedas em relação ao dólar norte-americano que permitiu a geração de previsões mensais fora da amostra de janeiro de 2000 até março de 2014. Assim como o critério adotado por Galimberti e Moura (2012), focamos em países que adotaram o regime de câmbio flutuante e metas de inflação, porém escolhemos moedas de países desenvolvidos e em desenvolvimento. Os resultados da nossa pesquisa corroboram o estudo de Rogoff e Stavrakeva (2008), ao constatar que a conclusão da previsibilidade da taxa de câmbio depende do teste estatístico adotado, sendo necessária a adoção de testes robustos e rigorosos para adequada avaliação do modelo. Após constatar não ser possível afirmar que o modelo implementado provém previsões mais precisas do que as de um passeio aleatório, avaliamos se, pelo menos, o modelo é capaz de gerar previsões “racionais”, ou “consistentes”. Para isso, usamos o arcabouço teórico e instrumental definido e implementado por Cheung e Chinn (1998) e concluímos que as previsões oriundas do modelo de regra de Taylor são “inconsistentes”. Finalmente, realizamos testes de causalidade de Granger com o intuito de verificar se os valores defasados dos retornos previstos pelo modelo estrutural explicam os valores contemporâneos observados. Apuramos que o modelo fundamental é incapaz de antecipar os retornos realizados.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)