18 resultados para Mixture of data types
em Repositório Científico do Instituto Politécnico de Lisboa - Portugal
Resumo:
The aim of this paper is to develop models for experimental open-channel water delivery systems and assess the use of three data-driven modeling tools toward that end. Water delivery canals are nonlinear dynamical systems and thus should be modeled to meet given operational requirements while capturing all relevant dynamics, including transport delays. Typically, the derivation of first principle models for open-channel systems is based on the use of Saint-Venant equations for shallow water, which is a time-consuming task and demands for specific expertise. The present paper proposes and assesses the use of three data-driven modeling tools: artificial neural networks, composite local linear models and fuzzy systems. The canal from Hydraulics and Canal Control Nucleus (A parts per thousand vora University, Portugal) will be used as a benchmark: The models are identified using data collected from the experimental facility, and then their performances are assessed based on suitable validation criterion. The performance of all models is compared among each other and against the experimental data to show the effectiveness of such tools to capture all significant dynamics within the canal system and, therefore, provide accurate nonlinear models that can be used for simulation or control. The models are available upon request to the authors.
Resumo:
Conferência: CONTROLO’2012 - 16-18 July 2012 - Funchal
Resumo:
We generalize Wertheim's first order perturbation theory to account for the effect in the thermodynamics of the self-assembly of rings characterized by two energy scales. The theory is applied to a lattice model of patchy particles and tested against Monte Carlo simulations on a fcc lattice. These particles have 2 patches of type A and 10 patches of type B, which may form bonds AA or AB that decrease the energy by epsilon(AA) and by epsilon(AB) = r epsilon(AA), respectively. The angle theta between the 2 A-patches on each particle is fixed at 601, 90 degrees or 120 degrees. For values of r below 1/2 and above a threshold r(th)(theta) the models exhibit a phase diagram with two critical points. Both theory and simulation predict that rth increases when theta decreases. We show that the mechanism that prevents phase separation for models with decreasing values of theta is related to the formation of loops containing AB bonds. Moreover, we show that by including the free energy of B-rings ( loops containing one AB bond), the theory describes the trends observed in the simulation results, but that for the lowest values of theta, the theoretical description deteriorates due to the increasing number of loops containing more than one AB bond.
Resumo:
Cluster analysis for categorical data has been an active area of research. A well-known problem in this area is the determination of the number of clusters, which is unknown and must be inferred from the data. In order to estimate the number of clusters, one often resorts to information criteria, such as BIC (Bayesian information criterion), MML (minimum message length, proposed by Wallace and Boulton, 1968), and ICL (integrated classification likelihood). In this work, we adopt the approach developed by Figueiredo and Jain (2002) for clustering continuous data. They use an MML criterion to select the number of clusters and a variant of the EM algorithm to estimate the model parameters. This EM variant seamlessly integrates model estimation and selection in a single algorithm. For clustering categorical data, we assume a finite mixture of multinomial distributions and implement a new EM algorithm, following a previous version (Silvestre et al., 2008). Results obtained with synthetic datasets are encouraging. The main advantage of the proposed approach, when compared to the above referred criteria, is the speed of execution, which is especially relevant when dealing with large data sets.
Resumo:
Storm- and tsunami-deposits are generated by similar depositional mechanisms making their discrimination hard to establish using classic sedimentologic methods. Here we propose an original approach to identify tsunami-induced deposits by combining numerical simulation and rock magnetism. To test our method, we investigate the tsunami deposit of the Boca do Rio estuary generated by the 1755 earthquake in Lisbon which is well described in the literature. We first test the 1755 tsunami scenario using a numerical inundation model to provide physical parameters for the tsunami wave. Then we use concentration (MS. SIRM) and grain size (chi(ARM), ARM, B1/2, ARM/SIRM) sensitive magnetic proxies coupled with SEM microscopy to unravel the magnetic mineralogy of the tsunami-induced deposit and its associated depositional mechanisms. In order to study the connection between the tsunami deposit and the different sedimentologic units present in the estuary, magnetic data were processed by multivariate statistical analyses. Our numerical simulation show a large inundation of the estuary with flow depths varying from 0.5 to 6 m and run up of similar to 7 m. Magnetic data show a dominance of paramagnetic minerals (quartz) mixed with lesser amount of ferromagnetic minerals, namely titanomagnetite and titanohematite both of a detrital origin and reworked from the underlying units. Multivariate statistical analyses indicate a better connection between the tsunami-induced deposit and a mixture of Units C and D. All these results point to a scenario where the energy released by the tsunami wave was strong enough to overtop and erode important amount of sand from the littoral dune and mixed it with reworked materials from underlying layers at least 1 m in depth. The method tested here represents an original and promising tool to identify tsunami-induced deposits in similar embayed beach environments.
Resumo:
Low noise surfaces have been increasingly considered as a viable and cost-effective alternative to acoustical barriers. However, road planners and administrators frequently lack information on the correlation between the type of road surface and the resulting noise emission profile. To address this problem, a method to identify and classify different types of road pavements was developed, whereby near field road noise is analyzed using statistical learning methods. The vehicle rolling sound signal near the tires and close to the road surface was acquired by two microphones in a special arrangement which implements the Close-Proximity method. A set of features, characterizing the properties of the road pavement, was extracted from the corresponding sound profiles. A feature selection method was used to automatically select those that are most relevant in predicting the type of pavement, while reducing the computational cost. A set of different types of road pavement segments were tested and the performance of the classifier was evaluated. Results of pavement classification performed during a road journey are presented on a map, together with geographical data. This procedure leads to a considerable improvement in the quality of road pavement noise data, thereby increasing the accuracy of road traffic noise prediction models.
Resumo:
This paper is an elaboration of the DECA algorithm [1] to blindly unmix hyperspectral data. The underlying mixing model is linear, meaning that each pixel is a linear mixture of the endmembers signatures weighted by the correspondent abundance fractions. The proposed method, as DECA, is tailored to highly mixed mixtures in which the geometric based approaches fail to identify the simplex of minimum volume enclosing the observed spectral vectors. We resort then to a statitistical framework, where the abundance fractions are modeled as mixtures of Dirichlet densities, thus enforcing the constraints on abundance fractions imposed by the acquisition process, namely non-negativity and constant sum. With respect to DECA, we introduce two improvements: 1) the number of Dirichlet modes are inferred based on the minimum description length (MDL) principle; 2) The generalized expectation maximization (GEM) algorithm we adopt to infer the model parameters is improved by using alternating minimization and augmented Lagrangian methods to compute the mixing matrix. The effectiveness of the proposed algorithm is illustrated with simulated and read data.
Resumo:
Storm- and tsunami-deposits are generated by similar depositional mechanisms making their discrimination hard to establish using classic sedimentologic methods. Here we propose an original approach to identify tsunami-induced deposits by combining numerical simulation and rock magnetism. To test our method, we investigate the tsunami deposit of the Boca do Rio estuary generated by the 1755 earthquake in Lisbon which is well described in the literature. We first test the 1755 tsunami scenario using a numerical inundation model to provide physical parameters for the tsunami wave. Then we use concentration (MS. SIRM) and grain size (chi(ARM), ARM, B1/2, ARM/SIRM) sensitive magnetic proxies coupled with SEM microscopy to unravel the magnetic mineralogy of the tsunami-induced deposit and its associated depositional mechanisms. In order to study the connection between the tsunami deposit and the different sedimentologic units present in the estuary, magnetic data were processed by multivariate statistical analyses. Our numerical simulation show a large inundation of the estuary with flow depths varying from 0.5 to 6 m and run up of similar to 7 m. Magnetic data show a dominance of paramagnetic minerals (quartz) mixed with lesser amount of ferromagnetic minerals, namely titanomagnetite and titanohematite both of a detrital origin and reworked from the underlying units. Multivariate statistical analyses indicate a better connection between the tsunami-induced deposit and a mixture of Units C and D. All these results point to a scenario where the energy released by the tsunami wave was strong enough to overtop and erode important amount of sand from the littoral dune and mixed it with reworked materials from underlying layers at least 1 m in depth. The method tested here represents an original and promising tool to identify tsunami-induced deposits in similar embayed beach environments.
Resumo:
LHC has found hints for a Higgs particle of 125 GeV. We investigate the possibility that such a particle is a mixture of scalar and pseudoscalar states. For definiteness, we concentrate on a two-Higgs doublet model with explicit CP violation and soft Z(2) violation. Including all Higgs production mechanisms, we determine the current constraints obtained by comparing h -> yy with h -> VV*, and comment on the information which can be gained by measurements of h -> b (b) over bar. We find bounds vertical bar s(2)vertical bar less than or similar to 0.83 at one sigma, where vertical bar s(2)vertical bar = 0 (vertical bar s(2)vertical bar = 1) corresponds to a pure scalar (pure pseudoscalar) state.
Resumo:
Storm- and tsunami-deposits are generated by similar depositional mechanisms making their discrimination hard to establish using classic sedimentologic methods. Here we propose an original approach to identify tsunami-induced deposits by combining numerical simulation and rock magnetism. To test our method, we investigate the tsunami deposit of the Boca do Rio estuary generated by the 1755 earthquake in Lisbon which is well described in the literature. We first test the 1755 tsunami scenario using a numerical inundation model to provide physical parameters for the tsunami wave. Then we use concentration (MS. SIRM) and grain size (chi(ARM), ARM, B1/2, ARM/SIRM) sensitive magnetic proxies coupled with SEM microscopy to unravel the magnetic mineralogy of the tsunami-induced deposit and its associated depositional mechanisms. In order to study the connection between the tsunami deposit and the different sedimentologic units present in the estuary, magnetic data were processed by multivariate statistical analyses. Our numerical simulation show a large inundation of the estuary with flow depths varying from 0.5 to 6 m and run up of similar to 7 m. Magnetic data show a dominance of paramagnetic minerals (quartz) mixed with lesser amount of ferromagnetic minerals, namely titanomagnetite and titanohematite both of a detrital origin and reworked from the underlying units. Multivariate statistical analyses indicate a better connection between the tsunami-induced deposit and a mixture of Units C and D. All these results point to a scenario where the energy released by the tsunami wave was strong enough to overtop and erode important amount of sand from the littoral dune and mixed it with reworked materials from underlying layers at least 1 m in depth. The method tested here represents an original and promising tool to identify tsunami-induced deposits in similar embayed beach environments.
Resumo:
Tuberculosis (TB) is a worldwide infectious disease that has shown over time extremely high mortality levels. The urgent need to develop new antitubercular drugs is due to the increasing rate of appearance of multi-drug resistant strains to the commonly used drugs, and the longer durations of therapy and recovery, particularly in immuno-compromised patients. The major goal of the present study is the exploration of data from different families of compounds through the use of a variety of machine learning techniques so that robust QSAR-based models can be developed to further guide in the quest for new potent anti-TB compounds. Eight QSAR models were built using various types of descriptors (from ADRIANA.Code and Dragon software) with two publicly available structurally diverse data sets, including recent data deposited in PubChem. QSAR methodologies used Random Forests and Associative Neural Networks. Predictions for the external evaluation sets obtained accuracies in the range of 0.76-0.88 (for active/inactive classifications) and Q(2)=0.66-0.89 for regressions. Models developed in this study can be used to estimate the anti-TB activity of drug candidates at early stages of drug development (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
The aim of this article is to present the results of an action research project, which has been put into practice in Primary Education. This project was intended to develop students’ textual competence, considering both comprehension and textual production. Our starting hypothesis was that teaching the schematisation of text types, focusing on linguistic devices that underlie text production, would promote the development of textual competence, leading to the production of more coherent and cohesive texts. In order to test this hypothesis we implemented the project in three phases. First, before the intervention, we collected texts produced by the students. Secondly, we implemented a didactic program designed to develop students’ textual competence. Lastly, after the intervention, we collected students’ texts once again. Data was analyzed according to categories that confer cohesion and coherence to different types of texts. Narrative, descriptive, and explanatory texts were assessed in terms of 1) building an autonomous text; 2) hierarchisation of information, and 3) textual organisation. Overall, results indicate that students developed their text conceptualisations, their understanding of the different structures of texts, and produced better writing. Indeed, their written work shows a marked progression from the beginning of the intervention program to the end of the program.
Resumo:
Independent component analysis (ICA) has recently been proposed as a tool to unmix hyperspectral data. ICA is founded on two assumptions: 1) the observed spectrum vector is a linear mixture of the constituent spectra (endmember spectra) weighted by the correspondent abundance fractions (sources); 2)sources are statistically independent. Independent factor analysis (IFA) extends ICA to linear mixtures of independent sources immersed in noise. Concerning hyperspectral data, the first assumption is valid whenever the multiple scattering among the distinct constituent substances (endmembers) is negligible, and the surface is partitioned according to the fractional abundances. The second assumption, however, is violated, since the sum of abundance fractions associated to each pixel is constant due to physical constraints in the data acquisition process. Thus, sources cannot be statistically independent, this compromising the performance of ICA/IFA algorithms in hyperspectral unmixing. This paper studies the impact of hyperspectral source statistical dependence on ICA and IFA performances. We conclude that the accuracy of these methods tends to improve with the increase of the signature variability, of the number of endmembers, and of the signal-to-noise ratio. In any case, there are always endmembers incorrectly unmixed. We arrive to this conclusion by minimizing the mutual information of simulated and real hyperspectral mixtures. The computation of mutual information is based on fitting mixtures of Gaussians to the observed data. A method to sort ICA and IFA estimates in terms of the likelihood of being correctly unmixed is proposed.
Resumo:
Chapter in Book Proceedings with Peer Review First Iberian Conference, IbPRIA 2003, Puerto de Andratx, Mallorca, Spain, JUne 4-6, 2003. Proceedings
Resumo:
Research on cluster analysis for categorical data continues to develop, new clustering algorithms being proposed. However, in this context, the determination of the number of clusters is rarely addressed. We propose a new approach in which clustering and the estimation of the number of clusters is done simultaneously for categorical data. We assume that the data originate from a finite mixture of multinomial distributions and use a minimum message length criterion (MML) to select the number of clusters (Wallace and Bolton, 1986). For this purpose, we implement an EM-type algorithm (Silvestre et al., 2008) based on the (Figueiredo and Jain, 2002) approach. The novelty of the approach rests on the integration of the model estimation and selection of the number of clusters in a single algorithm, rather than selecting this number based on a set of pre-estimated candidate models. The performance of our approach is compared with the use of Bayesian Information Criterion (BIC) (Schwarz, 1978) and Integrated Completed Likelihood (ICL) (Biernacki et al., 2000) using synthetic data. The obtained results illustrate the capacity of the proposed algorithm to attain the true number of cluster while outperforming BIC and ICL since it is faster, which is especially relevant when dealing with large data sets.