903 resultados para Data selection
Resumo:
The thin-layer drying behaviour of bananas in a beat pump dehumidifier dryer was examined. Four pre-treatments (blanching, chilling, freezing and combined blanching and freezing) were applied to the bananas, which were dried at 50 degreesC with an air velocity of 3.1 m s(-1) and with the relative humidity of the inlet air of 10-35%. Three drying models, the simple model, the two-term exponential model and the Page model were examined. All models were evaluated using three statistical measures, correlation coefficient, root means square error, and mean absolute percent error. Moisture diffusivity was calculated based on the diffusion equation for an infinite cylindrical shape using the slope method. The rate of drying was higher for the pre-treatments involving freezing. The sample which was blanched only did not show any improvement in drying rate. In fact, a longer drying time resulted due to water absorption during blanching. There was no change in the rate for the chilled sample compared with the control. While all models closely fitted the drying data, the simple model showed greatest deviation from the experimental results. The two-term exponential model was found to be the best model for describing the drying curves of bananas because its parameters represent better the physical characteristics of the drying process. Moisture diffusivities of bananas were in the range 4.3-13.2 x 10(-10) m(2)s(-1). (C) 2002 Published by Elsevier Science Ltd.
Resumo:
The effect of number of samples and selection of data for analysis on the calculation of surface motor unit potential (SMUP) size in the statistical method of motor unit number estimates (MUNE) was determined in 10 normal subjects and 10 with amyotrophic lateral sclerosis (ALS). We recorded 500 sequential compound muscle action potentials (CMAPs) at three different stable stimulus intensities (10–50% of maximal CMAP). Estimated mean SMUP sizes were calculated using Poisson statistical assumptions from the variance of 500 sequential CMAP obtained at each stimulus intensity. The results with the 500 data points were compared with smaller subsets from the same data set. The results using a range of 50–80% of the 500 data points were compared with the full 500. The effect of restricting analysis to data between 5–20% of the CMAP and to standard deviation limits was also assessed. No differences in mean SMUP size were found with stimulus intensity or use of different ranges of data. Consistency was improved with a greater sample number. Data within 5% of CMAP size gave both increased consistency and reduced mean SMUP size in many subjects, but excluded valid responses present at that stimulus intensity. These changes were more prominent in ALS patients in whom the presence of isolated SMUP responses was a striking difference from normal subjects. Noise, spurious data, and large SMUP limited the Poisson assumptions. When these factors are considered, consistent statistical MUNE can be calculated from a continuous sequence of data points. A 2 to 2.5 SD or 10% window are reasonable methods of limiting data for analysis. Muscle Nerve 27: 320–331, 2003
Resumo:
This study aimed to evaluate the efficiency of simultaneous selection (selection indices) using estimated genetic gains in yellow passion fruit and to make a comparison between the methodologies of Mulamba & Mock and Elston. The study was conducted with 26 sib progenies of yellow passion fruit for intrinsic production characteristics including fruit number, fruit mass, fruit length and diameter, and for the fruit characteristics skin thickness, soluble solids and acidity. Two methodologies were applied: first, in the joint analysis of fruit characteristics and of intrinsic production characteristics in a single phase of selection; and second, in the analysis in two phases, in which priority was given to the intrinsic production characteristics in the first phase, and later, in the second phase, the best fruit characteristics were chosen among the progenies of the first phase. The analysis of variance was applied to the data to detect genetic variability among progenies. The Elston's selection indice was unable to provide distribution of genetic gains consistent with the purposes of the study, as it selected a single progeny of passion fruit. However, the index based on the sum of ranks of Mulamba & Mock was more suitable, as it provided a balanced distribution of gains, selecting a larger number of progenies. The methodology of selection using indices is advantageous in passion fruit, since it contributes to higher genetic gains for all the traits evaluated, and the selection in a single phase was proved efficient for progeny selection.
Resumo:
The objective of the present work was to evaluate 27 progenies of cocoa crosses considering the agronomic traits and select F1 plants within superior crosses. The experiment was installed in March 2005, in the Experimental Station Joaquim Bahiana (ESJOB), in Itajuipe, Bahia. The area of the experiment is of approximately 3 ha, with a total of 3240 plants. Thirteen evaluations of vegetative brooms, five of cushion brooms and 15 of number of pods per plant were accomplished. Thirty pollinations were made for each selected plant to test for self-compatibility. The production, based on the number of pods per plant, and resistance to witches´ broom indicated CEPEC 94 x CCN 10, RB 39 x CCN 51 and CCN 10 x VB 1151 as superior progenies. All selections tested were self-compatible. The analyses of progenies and individual tree data, associated to visual field observations, allowed the selection of 17 plants which were included in a network of regional tests to determine the phenotypic stability.
Resumo:
Motion compensated frame interpolation (MCFI) is one of the most efficient solutions to generate side information (SI) in the context of distributed video coding. However, it creates SI with rather significant motion compensated errors for some frame regions while rather small for some other regions depending on the video content. In this paper, a low complexity Infra mode selection algorithm is proposed to select the most 'critical' blocks in the WZ frame and help the decoder with some reliable data for those blocks. For each block, the novel coding mode selection algorithm estimates the encoding rate for the Intra based and WZ coding modes and determines the best coding mode while maintaining a low encoder complexity. The proposed solution is evaluated in terms of rate-distortion performance with improvements up to 1.2 dB regarding a WZ coding mode only solution.
Resumo:
INTRODUCTION: The correct identification of the underlying cause of death and its precise assignment to a code from the International Classification of Diseases are important issues to achieve accurate and universally comparable mortality statistics These factors, among other ones, led to the development of computer software programs in order to automatically identify the underlying cause of death. OBJECTIVE: This work was conceived to compare the underlying causes of death processed respectively by the Automated Classification of Medical Entities (ACME) and the "Sistema de Seleção de Causa Básica de Morte" (SCB) programs. MATERIAL AND METHOD: The comparative evaluation of the underlying causes of death processed respectively by ACME and SCB systems was performed using the input data file for the ACME system that included deaths which occurred in the State of S. Paulo from June to December 1993, totalling 129,104 records of the corresponding death certificates. The differences between underlying causes selected by ACME and SCB systems verified in the month of June, when considered as SCB errors, were used to correct and improve SCB processing logic and its decision tables. RESULTS: The processing of the underlying causes of death by the ACME and SCB systems resulted in 3,278 differences, that were analysed and ascribed to lack of answer to dialogue boxes during processing, to deaths due to human immunodeficiency virus [HIV] disease for which there was no specific provision in any of the systems, to coding and/or keying errors and to actual problems. The detailed analysis of these latter disclosed that the majority of the underlying causes of death processed by the SCB system were correct and that different interpretations were given to the mortality coding rules by each system, that some particular problems could not be explained with the available documentation and that a smaller proportion of problems were identified as SCB errors. CONCLUSION: These results, disclosing a very low and insignificant number of actual problems, guarantees the use of the version of the SCB system for the Ninth Revision of the International Classification of Diseases and assures the continuity of the work which is being undertaken for the Tenth Revision version.
Resumo:
Copyright © 2013 Springer Netherlands.
Resumo:
27th Annual Conference of the European Cetacean Society. Setúbal, Portugal, 8-10 April 2013.
Resumo:
Dissertação para obtenção do grau de Mestre em Engenharia Informática
Resumo:
Trabalho de Projeto para obtenção do grau de Mestre em Engenharia Informática e de Computadores
Resumo:
Research on cluster analysis for categorical data continues to develop, new clustering algorithms being proposed. However, in this context, the determination of the number of clusters is rarely addressed. We propose a new approach in which clustering and the estimation of the number of clusters is done simultaneously for categorical data. We assume that the data originate from a finite mixture of multinomial distributions and use a minimum message length criterion (MML) to select the number of clusters (Wallace and Bolton, 1986). For this purpose, we implement an EM-type algorithm (Silvestre et al., 2008) based on the (Figueiredo and Jain, 2002) approach. The novelty of the approach rests on the integration of the model estimation and selection of the number of clusters in a single algorithm, rather than selecting this number based on a set of pre-estimated candidate models. The performance of our approach is compared with the use of Bayesian Information Criterion (BIC) (Schwarz, 1978) and Integrated Completed Likelihood (ICL) (Biernacki et al., 2000) using synthetic data. The obtained results illustrate the capacity of the proposed algorithm to attain the true number of cluster while outperforming BIC and ICL since it is faster, which is especially relevant when dealing with large data sets.
Resumo:
Cluster analysis for categorical data has been an active area of research. A well-known problem in this area is the determination of the number of clusters, which is unknown and must be inferred from the data. In order to estimate the number of clusters, one often resorts to information criteria, such as BIC (Bayesian information criterion), MML (minimum message length, proposed by Wallace and Boulton, 1968), and ICL (integrated classification likelihood). In this work, we adopt the approach developed by Figueiredo and Jain (2002) for clustering continuous data. They use an MML criterion to select the number of clusters and a variant of the EM algorithm to estimate the model parameters. This EM variant seamlessly integrates model estimation and selection in a single algorithm. For clustering categorical data, we assume a finite mixture of multinomial distributions and implement a new EM algorithm, following a previous version (Silvestre et al., 2008). Results obtained with synthetic datasets are encouraging. The main advantage of the proposed approach, when compared to the above referred criteria, is the speed of execution, which is especially relevant when dealing with large data sets.
Resumo:
Electrocardiography (ECG) biometrics is emerging as a viable biometric trait. Recent developments at the sensor level have shown the feasibility of performing signal acquisition at the fingers and hand palms, using one-lead sensor technology and dry electrodes. These new locations lead to ECG signals with lower signal to noise ratio and more prone to noise artifacts; the heart rate variability is another of the major challenges of this biometric trait. In this paper we propose a novel approach to ECG biometrics, with the purpose of reducing the computational complexity and increasing the robustness of the recognition process enabling the fusion of information across sessions. Our approach is based on clustering, grouping individual heartbeats based on their morphology. We study several methods to perform automatic template selection and account for variations observed in a person's biometric data. This approach allows the identification of different template groupings, taking into account the heart rate variability, and the removal of outliers due to noise artifacts. Experimental evaluation on real world data demonstrates the advantages of our approach.
Resumo:
The problem of selecting suppliers/partners is a crucial and important part in the process of decision making for companies that intend to perform competitively in their area of activity. The selection of supplier/partner is a time and resource-consuming task that involves data collection and a careful analysis of the factors that can positively or negatively influence the choice. Nevertheless it is a critical process that affects significantly the operational performance of each company. In this work, there were identified five broad selection criteria: Quality, Financial, Synergies, Cost, and Production System. Within these criteria, it was also included five sub-criteria. After the identification criteria, a survey was elaborated and companies were contacted in order to understand which factors have more weight in their decisions to choose the partners. Interpreted the results and processed the data, it was adopted a model of linear weighting to reflect the importance of each factor. The model has a hierarchical structure and can be applied with the Analytic Hierarchy Process (AHP) method or Value Analysis. The goal of the paper it's to supply a selection reference model that can represent an orientation/pattern for a decision making on the suppliers/partners selection process
Resumo:
In research on Silent Speech Interfaces (SSI), different sources of information (modalities) have been combined, aiming at obtaining better performance than the individual modalities. However, when combining these modalities, the dimensionality of the feature space rapidly increases, yielding the well-known "curse of dimensionality". As a consequence, in order to extract useful information from this data, one has to resort to feature selection (FS) techniques to lower the dimensionality of the learning space. In this paper, we assess the impact of FS techniques for silent speech data, in a dataset with 4 non-invasive and promising modalities, namely: video, depth, ultrasonic Doppler sensing, and surface electromyography. We consider two supervised (mutual information and Fisher's ratio) and two unsupervised (meanmedian and arithmetic mean geometric mean) FS filters. The evaluation was made by assessing the classification accuracy (word recognition error) of three well-known classifiers (knearest neighbors, support vector machines, and dynamic time warping). The key results of this study show that both unsupervised and supervised FS techniques improve on the classification accuracy on both individual and combined modalities. For instance, on the video component, we attain relative performance gains of 36.2% in error rates. FS is also useful as pre-processing for feature fusion. Copyright © 2014 ISCA.