12 resultados para Data Mining and its Application
em Repositório Científico do Instituto Politécnico de Lisboa - Portugal
Resumo:
The aim of this paper is to formulate an approximation of the US actuarial balance model and apply it to the Spanish public retirement pension system under various scenarios in order to determine a consistent indicator of the system's financial state comparable to those used by the most advanced social security systems. This will enable us to answer the question as to whether there is any justification for reforming the pension system in Spain. This type of actuarial balance uses projections to show future challenges to the financial side of the pension system deriving basically from ageing, the projected increase in longevity and fluctuations in economic activity. If one is compiled periodically it can provide various indicators to help depoliticize the management of the pay-as-you-go system by bringing the planning horizons of politicians and the system itself closer together.
Resumo:
A copper C(2)-symmetric bis(oxazoline), CuBox, was introduced in two forms of commercial Y zeolite: a sodium form (NaY) and an ultrastable form (NaUSY). CuBox was introduced by first partially exchanging the sodium cations of both zeolites for copper and then by refluxing the obtained materials with a solution of bis(oxazoline) (Box). Two different loadings were prepared for each form of zeolite. The materials were characterized by copper ICP-AES, elemental analysis, XPS, FTIR, TG, and nitrogen adsorption isotherms at -196 degrees C. Evidence for Box ligand location in the supercages of NaY and NaUSY zeolites and its coordination to the exchanged copper(II) was obtained by the several techniques used. The materials were all active in the cyclopropanation of styrene with ethyldiazoacetate at room temperature and diastereoselective toward trans cydopropanes. Although the materials containing Box showed low enantioselectivities, their catalytic activities were higher than the parent copper exchanged zeolites, and did not decrease with reuse, at least during three consecutive cycles.
Resumo:
Trabalho de Projeto para obtenção do grau de Mestre em Engenharia Informática e de Computadores
Resumo:
Dissertação para obtenção do grau de Mestre em Engenharia Informática
Resumo:
This paper discusses the results of applied research on the eco-driving domain based on a huge data set produced from a fleet of Lisbon's public transportation buses for a three-year period. This data set is based on events automatically extracted from the control area network bus and enriched with GPS coordinates, weather conditions, and road information. We apply online analytical processing (OLAP) and knowledge discovery (KD) techniques to deal with the high volume of this data set and to determine the major factors that influence the average fuel consumption, and then classify the drivers involved according to their driving efficiency. Consequently, we identify the most appropriate driving practices and styles. Our findings show that introducing simple practices, such as optimal clutch, engine rotation, and engine running in idle, can reduce fuel consumption on average from 3 to 5l/100 km, meaning a saving of 30 l per bus on one day. These findings have been strongly considered in the drivers' training sessions.
Resumo:
The development of high spatial resolution airborne and spaceborne sensors has improved the capability of ground-based data collection in the fields of agriculture, geography, geology, mineral identification, detection [2, 3], and classification [4–8]. The signal read by the sensor from a given spatial element of resolution and at a given spectral band is a mixing of components originated by the constituent substances, termed endmembers, located at that element of resolution. This chapter addresses hyperspectral unmixing, which is the decomposition of the pixel spectra into a collection of constituent spectra, or spectral signatures, and their corresponding fractional abundances indicating the proportion of each endmember present in the pixel [9, 10]. Depending on the mixing scales at each pixel, the observed mixture is either linear or nonlinear [11, 12]. The linear mixing model holds when the mixing scale is macroscopic [13]. The nonlinear model holds when the mixing scale is microscopic (i.e., intimate mixtures) [14, 15]. The linear model assumes negligible interaction among distinct endmembers [16, 17]. The nonlinear model assumes that incident solar radiation is scattered by the scene through multiple bounces involving several endmembers [18]. Under the linear mixing model and assuming that the number of endmembers and their spectral signatures are known, hyperspectral unmixing is a linear problem, which can be addressed, for example, under the maximum likelihood setup [19], the constrained least-squares approach [20], the spectral signature matching [21], the spectral angle mapper [22], and the subspace projection methods [20, 23, 24]. Orthogonal subspace projection [23] reduces the data dimensionality, suppresses undesired spectral signatures, and detects the presence of a spectral signature of interest. The basic concept is to project each pixel onto a subspace that is orthogonal to the undesired signatures. As shown in Settle [19], the orthogonal subspace projection technique is equivalent to the maximum likelihood estimator. This projection technique was extended by three unconstrained least-squares approaches [24] (signature space orthogonal projection, oblique subspace projection, target signature space orthogonal projection). Other works using maximum a posteriori probability (MAP) framework [25] and projection pursuit [26, 27] have also been applied to hyperspectral data. In most cases the number of endmembers and their signatures are not known. Independent component analysis (ICA) is an unsupervised source separation process that has been applied with success to blind source separation, to feature extraction, and to unsupervised recognition [28, 29]. ICA consists in finding a linear decomposition of observed data yielding statistically independent components. Given that hyperspectral data are, in given circumstances, linear mixtures, ICA comes to mind as a possible tool to unmix this class of data. In fact, the application of ICA to hyperspectral data has been proposed in reference 30, where endmember signatures are treated as sources and the mixing matrix is composed by the abundance fractions, and in references 9, 25, and 31–38, where sources are the abundance fractions of each endmember. In the first approach, we face two problems: (1) The number of samples are limited to the number of channels and (2) the process of pixel selection, playing the role of mixed sources, is not straightforward. In the second approach, ICA is based on the assumption of mutually independent sources, which is not the case of hyperspectral data, since the sum of the abundance fractions is constant, implying dependence among abundances. This dependence compromises ICA applicability to hyperspectral images. In addition, hyperspectral data are immersed in noise, which degrades the ICA performance. IFA [39] was introduced as a method for recovering independent hidden sources from their observed noisy mixtures. IFA implements two steps. First, source densities and noise covariance are estimated from the observed data by maximum likelihood. Second, sources are reconstructed by an optimal nonlinear estimator. Although IFA is a well-suited technique to unmix independent sources under noisy observations, the dependence among abundance fractions in hyperspectral imagery compromises, as in the ICA case, the IFA performance. Considering the linear mixing model, hyperspectral observations are in a simplex whose vertices correspond to the endmembers. Several approaches [40–43] have exploited this geometric feature of hyperspectral mixtures [42]. Minimum volume transform (MVT) algorithm [43] determines the simplex of minimum volume containing the data. The MVT-type approaches are complex from the computational point of view. Usually, these algorithms first find the convex hull defined by the observed data and then fit a minimum volume simplex to it. Aiming at a lower computational complexity, some algorithms such as the vertex component analysis (VCA) [44], the pixel purity index (PPI) [42], and the N-FINDR [45] still find the minimum volume simplex containing the data cloud, but they assume the presence in the data of at least one pure pixel of each endmember. This is a strong requisite that may not hold in some data sets. In any case, these algorithms find the set of most pure pixels in the data. Hyperspectral sensors collects spatial images over many narrow contiguous bands, yielding large amounts of data. For this reason, very often, the processing of hyperspectral data, included unmixing, is preceded by a dimensionality reduction step to reduce computational complexity and to improve the signal-to-noise ratio (SNR). Principal component analysis (PCA) [46], maximum noise fraction (MNF) [47], and singular value decomposition (SVD) [48] are three well-known projection techniques widely used in remote sensing in general and in unmixing in particular. The newly introduced method [49] exploits the structure of hyperspectral mixtures, namely the fact that spectral vectors are nonnegative. The computational complexity associated with these techniques is an obstacle to real-time implementations. To overcome this problem, band selection [50] and non-statistical [51] algorithms have been introduced. This chapter addresses hyperspectral data source dependence and its impact on ICA and IFA performances. The study consider simulated and real data and is based on mutual information minimization. Hyperspectral observations are described by a generative model. This model takes into account the degradation mechanisms normally found in hyperspectral applications—namely, signature variability [52–54], abundance constraints, topography modulation, and system noise. The computation of mutual information is based on fitting mixtures of Gaussians (MOG) to data. The MOG parameters (number of components, means, covariances, and weights) are inferred using the minimum description length (MDL) based algorithm [55]. We study the behavior of the mutual information as a function of the unmixing matrix. The conclusion is that the unmixing matrix minimizing the mutual information might be very far from the true one. Nevertheless, some abundance fractions might be well separated, mainly in the presence of strong signature variability, a large number of endmembers, and high SNR. We end this chapter by sketching a new methodology to blindly unmix hyperspectral data, where abundance fractions are modeled as a mixture of Dirichlet sources. This model enforces positivity and constant sum sources (full additivity) constraints. The mixing matrix is inferred by an expectation-maximization (EM)-type algorithm. This approach is in the vein of references 39 and 56, replacing independent sources represented by MOG with mixture of Dirichlet sources. Compared with the geometric-based approaches, the advantage of this model is that there is no need to have pure pixels in the observations. The chapter is organized as follows. Section 6.2 presents a spectral radiance model and formulates the spectral unmixing as a linear problem accounting for abundance constraints, signature variability, topography modulation, and system noise. Section 6.3 presents a brief resume of ICA and IFA algorithms. Section 6.4 illustrates the performance of IFA and of some well-known ICA algorithms with experimental data. Section 6.5 studies the ICA and IFA limitations in unmixing hyperspectral data. Section 6.6 presents results of ICA based on real data. Section 6.7 describes the new blind unmixing scheme and some illustrative examples. Section 6.8 concludes with some remarks.
Resumo:
Today, information overload and the lack of systems that enable locating employees with the right knowledge or skills are common challenges that large organisations face. This makes knowledge workers to re-invent the wheel and have problems to retrieve information from both internal and external resources. In addition, information is dynamically changing and ownership of data is moving from corporations to the individuals. However, there is a set of web based tools that may cause a major progress in the way people collaborate and share their knowledge. This article aims to analyse the impact of ‘Web 2.0’ on organisational knowledge strategies. A comprehensive literature review was done to present the academic background followed by a review of current ‘Web 2.0’ technologies and assessment of their strengths and weaknesses. As the framework of this study is oriented to business applications, the characteristics of the involved segments and tools were reviewed from an organisational point of view. Moreover, the ‘Enterprise 2.0’ paradigm does not only imply tools but also changes the way people collaborate, the way the work is done (processes) and finally impacts on other technologies. Finally, gaps in the literature in this area are outlined.
Resumo:
Background: With the decrease of DNA sequencing costs, sequence-based typing methods are rapidly becoming the gold standard for epidemiological surveillance. These methods provide reproducible and comparable results needed for a global scale bacterial population analysis, while retaining their usefulness for local epidemiological surveys. Online databases that collect the generated allelic profiles and associated epidemiological data are available but this wealth of data remains underused and are frequently poorly annotated since no user-friendly tool exists to analyze and explore it. Results: PHYLOViZ is platform independent Java software that allows the integrated analysis of sequence-based typing methods, including SNP data generated from whole genome sequence approaches, and associated epidemiological data. goeBURST and its Minimum Spanning Tree expansion are used for visualizing the possible evolutionary relationships between isolates. The results can be displayed as an annotated graph overlaying the query results of any other epidemiological data available. Conclusions: PHYLOViZ is a user-friendly software that allows the combined analysis of multiple data sources for microbial epidemiological and population studies. It is freely available at http://www.phyloviz.net.
Resumo:
It has been described that fullerenes (C60) present interesting properties with potential application in clinical conditions related to oxidative stress. One of the most prominent features of fullerenes is the ability to quench free radicals. However, because of its poor solubility, this has been studied mostly in organic solutions, while the antioxidant activity and cytotoxicity of fullerenes and their derivates in aqueous medium is not well characterized. The antioxidant capacity of synthesised C60-conjugates has been investigated and its was higher comparing to C60 isolated. The aim of this study was to assess the viability of C60-conjugates by determining its antioxidant activity and cytotoxicity in bio-relevant media.
Resumo:
PURPOSE: Fatty liver disease (FLD) is an increasing prevalent disease that can be reversed if detected early. Ultrasound is the safest and ubiquitous method for identifying FLD. Since expert sonographers are required to accurately interpret the liver ultrasound images, lack of the same will result in interobserver variability. For more objective interpretation, high accuracy, and quick second opinions, computer aided diagnostic (CAD) techniques may be exploited. The purpose of this work is to develop one such CAD technique for accurate classification of normal livers and abnormal livers affected by FLD. METHODS: In this paper, the authors present a CAD technique (called Symtosis) that uses a novel combination of significant features based on the texture, wavelet transform, and higher order spectra of the liver ultrasound images in various supervised learning-based classifiers in order to determine parameters that classify normal and FLD-affected abnormal livers. RESULTS: On evaluating the proposed technique on a database of 58 abnormal and 42 normal liver ultrasound images, the authors were able to achieve a high classification accuracy of 93.3% using the decision tree classifier. CONCLUSIONS: This high accuracy added to the completely automated classification procedure makes the authors' proposed technique highly suitable for clinical deployment and usage.
Resumo:
The Ni-II and Zn-II complexes [MCl(Tpms(Ph))] (Tpms(Ph) = SO3C(pz(Ph))(3), pz = pyrazolyl; M = Ni 2 or Zn 3) and the Cu-II complex [CuCl(Tpms(Ph))(H2O)] (4) have been prepared by treatment of the lithium salt of the sterically demanding and coordination flexible tris(3-phenyl-1-pyrazolyl)methanesulfonate (Tpms(Ph))(-) (1) with the respective metal chlorides. The (Tpms(Ph))(-) ligand shows the N-3 or N2O coordination modes in 2 and 3 or in 4, respectively. Upon reaction of 2 and 3 with Ag(CF3SO3) in acetonitrile the complexes [M(Tpms(Ph))-(MeCN)](CF3SO3) (M = Ni 5 or Zn 6, respectively) were formed. The compounds were obtained in good yields and characterized by analytic and spectral (IR, H-1 and C-13{H-1} NMR, ESI-MS) data, density functional theory (DFT) methods and {for 4 and [(Bu4N)-Bu-n](Tpms(Ph)) (7), the tatter obtained upon Li+ replacement by [(Bu4N)-Bu-n](+) in Li(Tpms(Ph))} by single crystal X-ray diffraction analysis. The Zn-II and Cu-II complexes (3 and 4, respectively) act as efficient catalyst precursors for the diastereoselective nitroaldol reaction of benzaldehydes and nitroethane to the corresponding beta-nitroalkanols (up to 99% yield, at room temperature) with diastereoselectivity towards the formation of the anti isomer, whereas the Ni-II complex 2 only shows a modest catalytic activity.
Resumo:
Isoniazid (INH) is still one of the two most effective antitubercular drugs and is included in all recommended multitherapeutic regimens. Because of the increasing resistance of Mycobacterium tuberculosis to INH, mainly associated with mutations in the katG gene, new INH-based compounds have been proposed to circumvent this problem. In this work, we present a detailed comparative study of the molecular determinants of the interactions between wt KatG or its S315T mutant form and either INH or INH-C10, a new acylated INH derivative. MD simulations were used to explore the conformational space of both proteins, and results indicate that the S315T mutation did not have a significant impact on the average size of the access tunnel in the vicinity of these residues. Our simulations also indicate that the steric hindrance role assigned to Asp137 is transient and that electrostatic changes can be important in understanding the enzyme activity data of mutations in KatG. Additionally, molecular docking studies were used to determine the preferred modes of binding of the two substrates. Upon mutation, the apparently less favored docking solution for reaction became the most abundant, suggesting that S315T mutation favors less optimal binding modes. Moreover, the aliphatic tail in INH-C10 seems to bring the hydrazine group closer to the heme, thus favoring the apparent most reactive binding mode, regardless of the enzyme form. The ITC data is in agreement with our interpretation of the C10 alkyl chain role and helped to rationalize the significantly lower experimental MIC value observed for INH-C10. This compound seems to be able to counterbalance most of the conformational restrictions introduced by the mutation, which are thought to be responsible for the decrease in INH activity in the mutated strain. Therefore, INH-C10 appears to be a very promising lead compound for drug development.