942 resultados para Geospatial Data Model
Resumo:
This thesis presents the Fuzzy Monte Carlo Model for Transmission Power Systems Reliability based studies (FMC-TRel) methodology, which is based on statistical failure and repair data of the transmission power system components and uses fuzzyprobabilistic modeling for system component outage parameters. Using statistical records allows developing the fuzzy membership functions of system component outage parameters. The proposed hybrid method of fuzzy set and Monte Carlo simulation based on the fuzzy-probabilistic models allows catching both randomness and fuzziness of component outage parameters. A network contingency analysis to identify any overloading or voltage violation in the network is performed once obtained the system states. This is followed by a remedial action algorithm, based on Optimal Power Flow, to reschedule generations and alleviate constraint violations and, at the same time, to avoid any load curtailment, if possible, or, otherwise, to minimize the total load curtailment, for the states identified by the contingency analysis. For the system states that cause load curtailment, an optimization approach is applied to reduce the probability of occurrence of these states while minimizing the costs to achieve that reduction. This methodology is of most importance for supporting the transmission system operator decision making, namely in the identification of critical components and in the planning of future investments in the transmission power system. A case study based on Reliability Test System (RTS) 1996 IEEE 24 Bus is presented to illustrate with detail the application of the proposed methodology.
Resumo:
A detailed analytic and numerical study of baryogenesis through leptogenesis is performed in the framework of the standard model of electroweak interactions extended by the addition of three right-handed neutrinos, leading to the seesaw mechanism. We analyze the connection between GUT-motivated relations for the quark and lepton mass matrices and the possibility of obtaining a viable leptogenesis scenario. In particular, we analyze whether the constraints imposed by SO(10) GUTs can be compatible with all the available solar, atmospheric and reactor neutrino data and, simultaneously, be capable of producing the required baryon asymmetry via the leptogenesis mechanism. It is found that the Just-So(2) and SMA solar solutions lead to a viable leptogenesis even for the simplest SO(10) GUT, while the LMA, LOW and VO solar solutions would require a different hierarchy for the Dirac neutrino masses in order to generate the observed baryon asymmetry. Some implications on CP violation at low energies and on neutrinoless double beta decay are also considered. (C) 2002 Elsevier Science B.V. All rights reserved.
Resumo:
Given a set of mixed spectral (multispectral or hyperspectral) vectors, linear spectral mixture analysis, or linear unmixing, aims at estimating the number of reference substances, also called endmembers, their spectral signatures, and their abundance fractions. This paper presents a new method for unsupervised endmember extraction from hyperspectral data, termed vertex component analysis (VCA). The algorithm exploits two facts: (1) the endmembers are the vertices of a simplex and (2) the affine transformation of a simplex is also a simplex. In a series of experiments using simulated and real data, the VCA algorithm competes with state-of-the-art methods, with a computational complexity between one and two orders of magnitude lower than the best available method.
Resumo:
Proceedings of International Conference - SPIE 7477, Image and Signal Processing for Remote Sensing XV - 28 September 2009
Resumo:
We present a new model of the lepton sector that uses a family symmetry A(4) to make predictions for lepton mixing which are invariant under any permutation of the three flavours. We show that those predictions broadly agree with the experimental data, leading to a largish sin(2)theta(12) greater than or similar to 0.34, to vertical bar cos delta vertical bar greater than or similar to 0.7, and to vertical bar 0.5 - sin(2)theta(23)vertical bar greater than or similar to 0.08; cos delta and 0.5 - sin(2)theta(23) are predicted to have identical signs. (C) 2013 Elsevier B.V. All rights reserved.
Resumo:
Research on the problem of feature selection for clustering continues to develop. This is a challenging task, mainly due to the absence of class labels to guide the search for relevant features. Categorical feature selection for clustering has rarely been addressed in the literature, with most of the proposed approaches having focused on numerical data. In this work, we propose an approach to simultaneously cluster categorical data and select a subset of relevant features. Our approach is based on a modification of a finite mixture model (of multinomial distributions), where a set of latent variables indicate the relevance of each feature. To estimate the model parameters, we implement a variant of the expectation-maximization algorithm that simultaneously selects the subset of relevant features, using a minimum message length criterion. The proposed approach compares favourably with two baseline methods: a filter based on an entropy measure and a wrapper based on mutual information. The results obtained on synthetic data illustrate the ability of the proposed expectation-maximization method to recover ground truth. An application to real data, referred to official statistics, shows its usefulness.
Resumo:
Research on cluster analysis for categorical data continues to develop, new clustering algorithms being proposed. However, in this context, the determination of the number of clusters is rarely addressed. We propose a new approach in which clustering and the estimation of the number of clusters is done simultaneously for categorical data. We assume that the data originate from a finite mixture of multinomial distributions and use a minimum message length criterion (MML) to select the number of clusters (Wallace and Bolton, 1986). For this purpose, we implement an EM-type algorithm (Silvestre et al., 2008) based on the (Figueiredo and Jain, 2002) approach. The novelty of the approach rests on the integration of the model estimation and selection of the number of clusters in a single algorithm, rather than selecting this number based on a set of pre-estimated candidate models. The performance of our approach is compared with the use of Bayesian Information Criterion (BIC) (Schwarz, 1978) and Integrated Completed Likelihood (ICL) (Biernacki et al., 2000) using synthetic data. The obtained results illustrate the capacity of the proposed algorithm to attain the true number of cluster while outperforming BIC and ICL since it is faster, which is especially relevant when dealing with large data sets.
Resumo:
Cluster analysis for categorical data has been an active area of research. A well-known problem in this area is the determination of the number of clusters, which is unknown and must be inferred from the data. In order to estimate the number of clusters, one often resorts to information criteria, such as BIC (Bayesian information criterion), MML (minimum message length, proposed by Wallace and Boulton, 1968), and ICL (integrated classification likelihood). In this work, we adopt the approach developed by Figueiredo and Jain (2002) for clustering continuous data. They use an MML criterion to select the number of clusters and a variant of the EM algorithm to estimate the model parameters. This EM variant seamlessly integrates model estimation and selection in a single algorithm. For clustering categorical data, we assume a finite mixture of multinomial distributions and implement a new EM algorithm, following a previous version (Silvestre et al., 2008). Results obtained with synthetic datasets are encouraging. The main advantage of the proposed approach, when compared to the above referred criteria, is the speed of execution, which is especially relevant when dealing with large data sets.
Resumo:
The principal topic of this work is the application of data mining techniques, in particular of machine learning, to the discovery of knowledge in a protein database. In the first chapter a general background is presented. Namely, in section 1.1 we overview the methodology of a Data Mining project and its main algorithms. In section 1.2 an introduction to the proteins and its supporting file formats is outlined. This chapter is concluded with section 1.3 which defines that main problem we pretend to address with this work: determine if an amino acid is exposed or buried in a protein, in a discrete way (i.e.: not continuous), for five exposition levels: 2%, 10%, 20%, 25% and 30%. In the second chapter, following closely the CRISP-DM methodology, whole the process of construction the database that supported this work is presented. Namely, it is described the process of loading data from the Protein Data Bank, DSSP and SCOP. Then an initial data exploration is performed and a simple prediction model (baseline) of the relative solvent accessibility of an amino acid is introduced. It is also introduced the Data Mining Table Creator, a program developed to produce the data mining tables required for this problem. In the third chapter the results obtained are analyzed with statistical significance tests. Initially the several used classifiers (Neural Networks, C5.0, CART and Chaid) are compared and it is concluded that C5.0 is the most suitable for the problem at stake. It is also compared the influence of parameters like the amino acid information level, the amino acid window size and the SCOP class type in the accuracy of the predictive models. The fourth chapter starts with a brief revision of the literature about amino acid relative solvent accessibility. Then, we overview the main results achieved and finally discuss about possible future work. The fifth and last chapter consists of appendices. Appendix A has the schema of the database that supported this thesis. Appendix B has a set of tables with additional information. Appendix C describes the software provided in the DVD accompanying this thesis that allows the reconstruction of the present work.
Resumo:
Estuaries are perhaps the most threatened environments in the coastal fringe; the coincidence of high natural value and attractiveness for human use has led to conflicts between conservation and development. These conflicts occur in the Sado Estuary since its location is near the industrialised zone of Peninsula of Setúbal and at the same time, a great part of the Estuary is classified as a Natural Reserve due to its high biodiversity. These facts led us to the need of implementing a model of environmental management and quality assessment, based on methodologies that enable the assessment of the Sado Estuary quality and evaluation of the human pressures in the estuary. These methodologies are based on indicators that can better depict the state of the environment and not necessarily all that could be measured or analysed. Sediments have always been considered as an important temporary source of some compounds or a sink for other type of materials or an interface where a great diversity of biogeochemical transformations occur. For all this they are of great importance in the formulation of coastal management system. Many authors have been using sediments to monitor aquatic contamination, showing great advantages when compared to the sampling of the traditional water column. The main objective of this thesis was to develop an estuary environmental management framework applied to Sado Estuary using the DPSIR Model (EMMSado), including data collection, data processing and data analysis. The support infrastructure of EMMSado were a set of spatially contiguous and homogeneous regions of sediment structure (management units). The environmental quality of the estuary was assessed through the sediment quality assessment and integrated in a preliminary stage with the human pressure for development. Besides the earlier explained advantages, studying the quality of the estuary mainly based on the indicators and indexes of the sediment compartment also turns this methodology easier, faster and human and financial resource saving. These are essential factors to an efficient environmental management of coastal areas. Data management, visualization, processing and analysis was obtained through the combined use of indicators and indices, sampling optimization techniques, Geographical Information Systems, remote sensing, statistics for spatial data, Global Positioning Systems and best expert judgments. As a global conclusion, from the nineteen management units delineated and analyzed three showed no ecological risk (18.5 % of the study area). The areas of more concern (5.6 % of the study area) are located in the North Channel and are under strong human pressure mainly due to industrial activities. These areas have also low hydrodynamics and are, thus associated with high levels of deposition. In particular the areas near Lisnave and Eurominas industries can also accumulate the contamination coming from Águas de Moura Channel, since particles coming from that channel can settle down in that area due to residual flow. In these areas the contaminants of concern, from those analyzed, are the heavy metals and metalloids (Cd, Cu, Zn and As exceeded the PEL guidelines) and the pesticides BHC isomers, heptachlor, isodrin, DDT and metabolits, endosulfan and endrin. In the remain management units (76 % of the study area) there is a moderate impact potential of occurrence of adverse ecological effects and in some of these areas no stress agents could be identified. This emphasizes the need for further research, since unmeasured chemicals may be causing or contributing to these adverse effects. Special attention must be taken to the units with moderate impact potential of occurrence of adverse ecological effects, located inside the natural reserve. Non-point source pollution coming from agriculture and aquaculture activities also seem to contribute with important pollution load into the estuary entering from Águas de Moura Channel. This pressure is expressed in a moderate impact potential for ecological risk existent in the areas near the entrance of this Channel. Pressures may also came from Alcácer Channel although they were not quantified in this study. The management framework presented here, including all the methodological tools may be applied and tested in other estuarine ecosystems, which will also allow a comparison between estuarine ecosystems in other parts of the globe.
Resumo:
Consider the problem of designing an algorithm for acquiring sensor readings. Consider specifically the problem of obtaining an approximate representation of sensor readings where (i) sensor readings originate from different sensor nodes, (ii) the number of sensor nodes is very large, (iii) all sensor nodes are deployed in a small area (dense network) and (iv) all sensor nodes communicate over a communication medium where at most one node can transmit at a time (a single broadcast domain). We present an efficient algorithm for this problem, and our novel algorithm has two desired properties: (i) it obtains an interpolation based on all sensor readings and (ii) it is scalable, that is, its time-complexity is independent of the number of sensor nodes. Achieving these two properties is possible thanks to the close interlinking of the information processing algorithm, the communication system and a model of the physical world.
Resumo:
Não existe uma definição única de processo de memória de longo prazo. Esse processo é geralmente definido como uma série que possui um correlograma decaindo lentamente ou um espectro infinito de frequência zero. Também se refere que uma série com tal propriedade é caracterizada pela dependência a longo prazo e por não periódicos ciclos longos, ou que essa característica descreve a estrutura de correlação de uma série de longos desfasamentos ou que é convencionalmente expressa em termos do declínio da lei-potência da função auto-covariância. O interesse crescente da investigação internacional no aprofundamento do tema é justificado pela procura de um melhor entendimento da natureza dinâmica das séries temporais dos preços dos ativos financeiros. Em primeiro lugar, a falta de consistência entre os resultados reclama novos estudos e a utilização de várias metodologias complementares. Em segundo lugar, a confirmação de processos de memória longa tem implicações relevantes ao nível da (1) modelação teórica e econométrica (i.e., dos modelos martingale de preços e das regras técnicas de negociação), (2) dos testes estatísticos aos modelos de equilíbrio e avaliação, (3) das decisões ótimas de consumo / poupança e de portefólio e (4) da medição de eficiência e racionalidade. Em terceiro lugar, ainda permanecem questões científicas empíricas sobre a identificação do modelo geral teórico de mercado mais adequado para modelar a difusão das séries. Em quarto lugar, aos reguladores e gestores de risco importa saber se existem mercados persistentes e, por isso, ineficientes, que, portanto, possam produzir retornos anormais. O objetivo do trabalho de investigação da dissertação é duplo. Por um lado, pretende proporcionar conhecimento adicional para o debate da memória de longo prazo, debruçando-se sobre o comportamento das séries diárias de retornos dos principais índices acionistas da EURONEXT. Por outro lado, pretende contribuir para o aperfeiçoamento do capital asset pricing model CAPM, considerando uma medida de risco alternativa capaz de ultrapassar os constrangimentos da hipótese de mercado eficiente EMH na presença de séries financeiras com processos sem incrementos independentes e identicamente distribuídos (i.i.d.). O estudo empírico indica a possibilidade de utilização alternativa das obrigações do tesouro (OT’s) com maturidade de longo prazo no cálculo dos retornos do mercado, dado que o seu comportamento nos mercados de dívida soberana reflete a confiança dos investidores nas condições financeiras dos Estados e mede a forma como avaliam as respetiva economias com base no desempenho da generalidade dos seus ativos. Embora o modelo de difusão de preços definido pelo movimento Browniano geométrico gBm alegue proporcionar um bom ajustamento das séries temporais financeiras, os seus pressupostos de normalidade, estacionariedade e independência das inovações residuais são adulterados pelos dados empíricos analisados. Por isso, na procura de evidências sobre a propriedade de memória longa nos mercados recorre-se à rescaled-range analysis R/S e à detrended fluctuation analysis DFA, sob abordagem do movimento Browniano fracionário fBm, para estimar o expoente Hurst H em relação às séries de dados completas e para calcular o expoente Hurst “local” H t em janelas móveis. Complementarmente, são realizados testes estatísticos de hipóteses através do rescaled-range tests R/S , do modified rescaled-range test M - R/S e do fractional differencing test GPH. Em termos de uma conclusão única a partir de todos os métodos sobre a natureza da dependência para o mercado acionista em geral, os resultados empíricos são inconclusivos. Isso quer dizer que o grau de memória de longo prazo e, assim, qualquer classificação, depende de cada mercado particular. No entanto, os resultados gerais maioritariamente positivos suportam a presença de memória longa, sob a forma de persistência, nos retornos acionistas da Bélgica, Holanda e Portugal. Isto sugere que estes mercados estão mais sujeitos a maior previsibilidade (“efeito José”), mas também a tendências que podem ser inesperadamente interrompidas por descontinuidades (“efeito Noé”), e, por isso, tendem a ser mais arriscados para negociar. Apesar da evidência de dinâmica fractal ter suporte estatístico fraco, em sintonia com a maior parte dos estudos internacionais, refuta a hipótese de passeio aleatório com incrementos i.i.d., que é a base da EMH na sua forma fraca. Atendendo a isso, propõem-se contributos para aperfeiçoamento do CAPM, através da proposta de uma nova fractal capital market line FCML e de uma nova fractal security market line FSML. A nova proposta sugere que o elemento de risco (para o mercado e para um ativo) seja dado pelo expoente H de Hurst para desfasamentos de longo prazo dos retornos acionistas. O expoente H mede o grau de memória de longo prazo nos índices acionistas, quer quando as séries de retornos seguem um processo i.i.d. não correlacionado, descrito pelo gBm(em que H = 0,5 , confirmando- se a EMH e adequando-se o CAPM), quer quando seguem um processo com dependência estatística, descrito pelo fBm(em que H é diferente de 0,5, rejeitando-se a EMH e desadequando-se o CAPM). A vantagem da FCML e da FSML é que a medida de memória de longo prazo, definida por H, é a referência adequada para traduzir o risco em modelos que possam ser aplicados a séries de dados que sigam processos i.i.d. e processos com dependência não linear. Então, estas formulações contemplam a EMH como um caso particular possível.
Resumo:
The simulation analysis is important approach to developing and evaluating the systems in terms of development time and cost. This paper demonstrates the application of Time Division Cluster Scheduling (TDCS) tool for the configuration of IEEE 802.15.4/ZigBee beaconenabled cluster-tree WSNs using the simulation analysis, as an illustrative example that confirms the practical applicability of the tool. The simulation study analyses how the number of retransmissions impacts the reliability of data transmission, the energy consumption of the nodes and the end-to-end communication delay, based on the simulation model that was implemented in the Opnet Modeler. The configuration parameters of the network are obtained directly from the TDCS tool. The simulation results show that the number of retransmissions impacts the reliability, the energy consumption and the end-to-end delay, in a way that improving the one may degrade the others.
Resumo:
Doctoral Thesis in Information Systems and Technologies Area of Engineering and Manag ement Information Systems
Resumo:
In this thesis we implement estimating procedures in order to estimate threshold parameters for the continuous time threshold models driven by stochastic di®erential equations. The ¯rst procedure is based on the EM (expectation-maximization) algorithm applied to the threshold model built from the Brownian motion with drift process. The second procedure mimics one of the fundamental ideas in the estimation of the thresholds in time series context, that is, conditional least squares estimation. We implement this procedure not only for the threshold model built from the Brownian motion with drift process but also for more generic models as the ones built from the geometric Brownian motion or the Ornstein-Uhlenbeck process. Both procedures are implemented for simu- lated data and the least squares estimation procedure is also implemented for real data of daily prices from a set of international funds. The ¯rst fund is the PF-European Sus- tainable Equities-R fund from the Pictet Funds company and the second is the Parvest Europe Dynamic Growth fund from the BNP Paribas company. The data for both funds are daily prices from the year 2004. The last fund to be considered is the Converging Europe Bond fund from the Schroder company and the data are daily prices from the year 2005.