22 resultados para NIRS. Plum. Multivariate calibration. Variables selection
em Repositório Científico do Instituto Politécnico de Lisboa - Portugal
Resumo:
In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which are not directly involved to cluster the data. An approach is proposed in the model-based clustering context to select a number of clusters which both fits the data well and takes advantage of the potential illustrative ability of the external variables. This approach makes use of the integrated joint likelihood of the data and the partitions at hand, namely the model-based partition and the partitions associated to the external variables. It is noteworthy that each mixture model is fitted by the maximum likelihood methodology to the data, excluding the external variables which are used to select a relevant mixture model only. Numerical experiments illustrate the promising behaviour of the derived criterion. © 2014 Springer-Verlag Berlin Heidelberg.
Resumo:
In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which are not directly involved to cluster the data. An approach is proposed in the model-based clustering context to select a number of clusters which both fits the data well and takes advantage of the potential illustrative ability of the external variables. This approach makes use of the integrated joint likelihood of the data and the partitions at hand, namely the model-based partition and the partitions associated to the external variables. It is noteworthy that each mixture model is fitted by the maximum likelihood methodology to the data, excluding the external variables which are used to select a relevant mixture model only. Numerical experiments illustrate the promising behaviour of the derived criterion.
Resumo:
Research on the problem of feature selection for clustering continues to develop. This is a challenging task, mainly due to the absence of class labels to guide the search for relevant features. Categorical feature selection for clustering has rarely been addressed in the literature, with most of the proposed approaches having focused on numerical data. In this work, we propose an approach to simultaneously cluster categorical data and select a subset of relevant features. Our approach is based on a modification of a finite mixture model (of multinomial distributions), where a set of latent variables indicate the relevance of each feature. To estimate the model parameters, we implement a variant of the expectation-maximization algorithm that simultaneously selects the subset of relevant features, using a minimum message length criterion. The proposed approach compares favourably with two baseline methods: a filter based on an entropy measure and a wrapper based on mutual information. The results obtained on synthetic data illustrate the ability of the proposed expectation-maximization method to recover ground truth. An application to real data, referred to official statistics, shows its usefulness.
Resumo:
Materials selection is a matter of great importance to engineering design and software tools are valuable to inform decisions in the early stages of product development. However, when a set of alternative materials is available for the different parts a product is made of, the question of what optimal material mix to choose for a group of parts is not trivial. The engineer/designer therefore goes about this in a part-by-part procedure. Optimizing each part per se can lead to a global sub-optimal solution from the product point of view. An optimization procedure to deal with products with multiple parts, each with discrete design variables, and able to determine the optimal solution assuming different objectives is therefore needed. To solve this multiobjective optimization problem, a new routine based on Direct MultiSearch (DMS) algorithm is created. Results from the Pareto front can help the designer to align his/hers materials selection for a complete set of materials with product attribute objectives, depending on the relative importance of each objective.
Resumo:
Motion compensated frame interpolation (MCFI) is one of the most efficient solutions to generate side information (SI) in the context of distributed video coding. However, it creates SI with rather significant motion compensated errors for some frame regions while rather small for some other regions depending on the video content. In this paper, a low complexity Infra mode selection algorithm is proposed to select the most 'critical' blocks in the WZ frame and help the decoder with some reliable data for those blocks. For each block, the novel coding mode selection algorithm estimates the encoding rate for the Intra based and WZ coding modes and determines the best coding mode while maintaining a low encoder complexity. The proposed solution is evaluated in terms of rate-distortion performance with improvements up to 1.2 dB regarding a WZ coding mode only solution.
Resumo:
Reclaimed water from small wastewater treatment facilities in the rural areas of the Beira Interior region (Portugal) may constitute an alternative water source for aquifer recharge. A 21-month monitoring period in a constructed wetland treatment system has shown that 21,500 m(3) year(-1) of treated wastewater (reclaimed water) could be used for aquifer recharge. A GIS-based multi-criteria analysis was performed, combining ten thematic maps and economic, environmental and technical criteria, in order to produce a suitability map for the location of sites for reclaimed water infiltration. The areas chosen for aquifer recharge with infiltration basins are mainly composed of anthrosol with more than 1 m deep and fine sand texture, which allows an average infiltration velocity of up to 1 m d(-1). These characteristics will provide a final polishing treatment of the reclaimed water after infiltration (soil aquifer treatment (SAT)), suitable for the removal of the residual load (trace organics, nutrients, heavy metals and pathogens). The risk of groundwater contamination is low since the water table in the anthrosol areas ranges from 10 m to 50 m. Oil the other hand, these depths allow a guaranteed unsaturated area suitable for SAT. An area of 13,944 ha was selected for study, but only 1607 ha are suitable for reclaimed water infiltration. Approximately 1280 m(2) were considered enough to set up 4 infiltration basins to work in flooding and drying cycles.
Resumo:
This work focuses on the appraisal of public and environmental projects and, more specifically, on the calculation of the social discount rate (SDR) for this kind of very long-term investment projects. As a rule, we can state that the instantaneous discount rate must be equal to the hazard rate of the public good or to the mortality rate of the population that the project is intended to. The hazard can be due to technical failures of the system, but, in this paper, we are going to consider different independent variables that can cause the hazard. That is, we are going to consider a multivariate hazard rate. In our empirical application, the Spanish forest surface will be the system and the forest fire will be the fail that can be caused by several factors. The aim of this work is to integrate the different variables that produce the fail in the calculation of the SDR from a multivariate hazard rate approach.
Resumo:
The importance of Social Responsibility (SR) is higher if this business variable is related with other ones of strategic nature in business activity (competitive success that the company achieved, performance that the firms develop and innovations that they carries out). The hypothesis is that organizations that focus on SR are those who get higher outputs and innovate more, achieving greater competitive success. A scale for measuring the orientation to SR has defined in order to determine the degree of relationship between above elements. This instrument is original because previous scales do not exist in the literature which could measure, on the one hand, the three classics sub-constructs theoretically accepted that SR is made up and, on the other hand, the relationship between SR and the other variables. As a result of causal relationships analysis we conclude with a scale of 21 indicators, validated scale with a sample of firms belonging to the Autonomous Community of Extremadura and it is the first empirical validation of these dimensions we know so far, in this context.
Resumo:
We are concerned with providing more empirical evidence on forecast failure, developing forecast models, and examining the impact of events such as audit reports. A joint consideration of classic financial ratios and relevant external indicators leads us to build a basic prediction model focused in non-financial Galician SMEs. Explanatory variables are relevant financial indicators from the viewpoint of the financial logic and financial failure theory. The paper explores three mathematical models: discriminant analysis, Logit, and linear multivariate regression. We conclude that, even though they both offer high explanatory and predictive abilities, Logit and MDA models should be used and interpreted jointly.
Resumo:
The aim of this work is to use the MANCOVA model to study the influence of the phenotype of an enzyme - Acid phosphatase - and a genetic factor - Haptoglobin genotype - on two dependent variables - Activity of Acid Phosphatase (ACP1) and the Body Mass Index (BMI). Therefore it's used a general linear model, namely a multivariate analysis of covariance (Two-way MANCOVA). The covariate is the age of the subject. This covariate works as control variable for the independent factors, serving to reduce the error term in the model. The main results showed that only the ACP1 phenotype has a significant effect on the activity of ACP1 and the covariate has a significant effect in both dependent variables. The univariate analysis showed that ACP1 phenotype accounts for about 12.5% of the variability in the activity of ACP1. In respect to this covariate it can be seen that accounts for about 4.6% of the variability in the activity of ACP1 and 37.3% in the BMI.
Resumo:
Dissertação de Mestrado para obtenção do grau de Mestre em Engenharia Mecânica Ramo de Manutenção e Produção
Resumo:
O estudo teve como objectivo comparar o impacto do estigma e do bem-estar subjectivo em pessoas com diferentes doenças crónicas. Foram avaliados 729 doentes, recrutados em hospitais de Portugal, que após o diagnóstico retomaram a sua vida normal. Controlando para um conjunto de variáveis sócio-demográficas e clínicas, a aplicação de Modelos de Análise de Covariância Multivariada, permitiu verificar diferenças significativas apenas para a percepção do estigma entre os grupos de doenças crónicas. Pessoas com obesidade, epilepsia e esclerose múltipla referem mais estigma e pessoas com diabetes tipo1 e miastenia gravis referem menos estigma.
Resumo:
Cluster analysis for categorical data has been an active area of research. A well-known problem in this area is the determination of the number of clusters, which is unknown and must be inferred from the data. In order to estimate the number of clusters, one often resorts to information criteria, such as BIC (Bayesian information criterion), MML (minimum message length, proposed by Wallace and Boulton, 1968), and ICL (integrated classification likelihood). In this work, we adopt the approach developed by Figueiredo and Jain (2002) for clustering continuous data. They use an MML criterion to select the number of clusters and a variant of the EM algorithm to estimate the model parameters. This EM variant seamlessly integrates model estimation and selection in a single algorithm. For clustering categorical data, we assume a finite mixture of multinomial distributions and implement a new EM algorithm, following a previous version (Silvestre et al., 2008). Results obtained with synthetic datasets are encouraging. The main advantage of the proposed approach, when compared to the above referred criteria, is the speed of execution, which is especially relevant when dealing with large data sets.
Resumo:
Trabalho Final de Mestrado para obtenção do grau de Mestre em Engenharia Civil
Resumo:
Electrocardiography (ECG) biometrics is emerging as a viable biometric trait. Recent developments at the sensor level have shown the feasibility of performing signal acquisition at the fingers and hand palms, using one-lead sensor technology and dry electrodes. These new locations lead to ECG signals with lower signal to noise ratio and more prone to noise artifacts; the heart rate variability is another of the major challenges of this biometric trait. In this paper we propose a novel approach to ECG biometrics, with the purpose of reducing the computational complexity and increasing the robustness of the recognition process enabling the fusion of information across sessions. Our approach is based on clustering, grouping individual heartbeats based on their morphology. We study several methods to perform automatic template selection and account for variations observed in a person's biometric data. This approach allows the identification of different template groupings, taking into account the heart rate variability, and the removal of outliers due to noise artifacts. Experimental evaluation on real world data demonstrates the advantages of our approach.