59 resultados para Massive Data
Resumo:
In this work liver contour is semi-automatically segmented and quantified in order to help the identification and diagnosis of diffuse liver disease. The features extracted from the liver contour are jointly used with clinical and laboratorial data in the staging process. The classification results of a support vector machine, a Bayesian and a k-nearest neighbor classifier are compared. A population of 88 patients at five different stages of diffuse liver disease and a leave-one-out cross-validation strategy are used in the classification process. The best results are obtained using the k-nearest neighbor classifier, with an overall accuracy of 80.68%. The good performance of the proposed method shows a reliable indicator that can improve the information in the staging of diffuse liver disease.
Resumo:
Water covers over 70% of the Earth's surface, and is vital for all known forms of life. But only 3% of the Earth's water is fresh water, and less than 0.3% of all freshwater is in rivers, lakes, reservoirs and the atmosphere. However, rivers and lakes are an important part of fresh surface water, amounting to about 89%. In this Master Thesis dissertation, the focus is on three types of water bodies – rivers, lakes and reservoirs, and their water quality issues in Asian countries. The surface water quality in a region is largely determined both by the natural processes such as climate or geographic conditions, and the anthropogenic influences such as industrial and agricultural activities or land use conversion. The quality of the water can be affected by pollutants discharge from a specific point through a sewer pipe and also by extensive drainage from agriculture/urban areas and within basin. Hence, water pollutant sources can be divided into two categories: Point source pollution and Non-point source (NPS) pollution. Seasonal variations in precipitation and surface run-off have a strong effect on river discharge and the concentration of pollutants in water bodies. For example, in the rainy season, heavy and persistent rain wash off the ground, the runoff flow increases and may contain various kinds of pollutants and, eventually, enters the water bodies. In some cases, especially in confined water bodies, the quality may be positive related with rainfall in the wet season, because this confined type of fresh water systems allows high dilution of pollutants, decreasing their possible impacts. During the dry season, the quality of water is largely related to industrialization and urbanization pollution. The aim of this study is to identify the most common water quality problems in Asian countries and to enumerate and analyze the methodologies used for assessment of water quality conditions of both rivers and confined water bodies (lakes and reservoirs). Based on the evaluation of a sample of 57 papers, dated between 2000 and 2012, it was found that over the past decade, the water quality of rivers, lakes, and reservoirs in developing countries is being degraded. Water pollution and destruction of aquatic ecosystems have caused massive damage to the functions and integrity of water resources. The most widespread NPS in Asian countries and those which have the greatest spatial impacts are urban runoff and agriculture. Locally, mine waste runoff and rice paddy are serious NPS problems. The most relevant point pollution sources are the effluents from factories, sewage treatment plant, and public or household facilities. It was found that the most used methodology was unquestionably the monitoring activity, used in 49 of analyzed studies, accounting for 86%. Sometimes, data from historical databases were used as well. It can be seen that taking samples from the water body and then carry on laboratory work (chemical analyses) is important because it can give an understanding of the water quality. 6 papers (11%) used a method that combined monitoring data and modeling. 6 papers (11%) just applied a model to estimate the quality of water. Modeling is a useful resource when there is limited budget since some models are of free download and use. In particular, several of used models come from the U.S.A, but they have their own purposes and features, meaning that a careful application of the models to other countries and a critical discussion of the results are crucial. 5 papers (9%) focus on a method combining monitoring data and statistical analysis. When there is a huge data matrix, the researchers need an efficient way of interpretation of the information which is provided by statistics. 3 papers (5%) used a method combining monitoring data, statistical analysis and modeling. These different methods are all valuable to evaluate the water quality. It was also found that the evaluation of water quality was made as well by using other types of sampling different than water itself, and they also provide useful information to understand the condition of the water body. These additional monitoring activities are: Air sampling, sediment sampling, phytoplankton sampling and aquatic animal tissues sampling. Despite considerable progress in developing and applying control regulations to point and NPS pollution, the pollution status of rivers, lakes, and reservoirs in Asian countries is not improving. In fact, this reflects the slow pace of investment in new infrastructure for pollution control and growing population pressures. Water laws or regulations and public involvement in enforcement can play a constructive and indispensable role in environmental protection. In the near future, in order to protect water from further contamination, rapid action is highly needed to control the various kinds of effluents in one region. Environmental remediation and treatment of industrial effluent and municipal wastewaters is essential. It is also important to prevent the direct input of agricultural and mine site runoff. Finally, stricter environmental regulation for water quality is required to support protection and management strategies. It would have been possible to get further information based in the 57 sample of papers. For instance, it would have been interesting to compare the level of concentrations of some pollutants in the diferente Asian countries. However the limit of three months duration for this study prevented further work to take place. In spite of this, the study objectives were achieved: the work provided an overview of the most relevant water quality problems in rivers, lakes and reservoirs in Asian countries, and also listed and analyzed the most common methodologies.
Resumo:
Dissertação para obtenção do grau de Mestre em Engenharia Informática
Resumo:
Trabalho de Projeto para obtenção do grau de Mestre em Engenharia Informática e de Computadores
Resumo:
A detailed analytic and numerical study of baryogenesis through leptogenesis is performed in the framework of the standard model of electroweak interactions extended by the addition of three right-handed neutrinos, leading to the seesaw mechanism. We analyze the connection between GUT-motivated relations for the quark and lepton mass matrices and the possibility of obtaining a viable leptogenesis scenario. In particular, we analyze whether the constraints imposed by SO(10) GUTs can be compatible with all the available solar, atmospheric and reactor neutrino data and, simultaneously, be capable of producing the required baryon asymmetry via the leptogenesis mechanism. It is found that the Just-So(2) and SMA solar solutions lead to a viable leptogenesis even for the simplest SO(10) GUT, while the LMA, LOW and VO solar solutions would require a different hierarchy for the Dirac neutrino masses in order to generate the observed baryon asymmetry. Some implications on CP violation at low energies and on neutrinoless double beta decay are also considered. (C) 2002 Elsevier Science B.V. All rights reserved.
Resumo:
Independent component analysis (ICA) has recently been proposed as a tool to unmix hyperspectral data. ICA is founded on two assumptions: 1) the observed spectrum vector is a linear mixture of the constituent spectra (endmember spectra) weighted by the correspondent abundance fractions (sources); 2)sources are statistically independent. Independent factor analysis (IFA) extends ICA to linear mixtures of independent sources immersed in noise. Concerning hyperspectral data, the first assumption is valid whenever the multiple scattering among the distinct constituent substances (endmembers) is negligible, and the surface is partitioned according to the fractional abundances. The second assumption, however, is violated, since the sum of abundance fractions associated to each pixel is constant due to physical constraints in the data acquisition process. Thus, sources cannot be statistically independent, this compromising the performance of ICA/IFA algorithms in hyperspectral unmixing. This paper studies the impact of hyperspectral source statistical dependence on ICA and IFA performances. We conclude that the accuracy of these methods tends to improve with the increase of the signature variability, of the number of endmembers, and of the signal-to-noise ratio. In any case, there are always endmembers incorrectly unmixed. We arrive to this conclusion by minimizing the mutual information of simulated and real hyperspectral mixtures. The computation of mutual information is based on fitting mixtures of Gaussians to the observed data. A method to sort ICA and IFA estimates in terms of the likelihood of being correctly unmixed is proposed.
Resumo:
Chapter in Book Proceedings with Peer Review First Iberian Conference, IbPRIA 2003, Puerto de Andratx, Mallorca, Spain, JUne 4-6, 2003. Proceedings
Resumo:
Chapter in Book Proceedings with Peer Review First Iberian Conference, IbPRIA 2003, Puerto de Andratx, Mallorca, Spain, JUne 4-6, 2003. Proceedings
Resumo:
Given a set of mixed spectral (multispectral or hyperspectral) vectors, linear spectral mixture analysis, or linear unmixing, aims at estimating the number of reference substances, also called endmembers, their spectral signatures, and their abundance fractions. This paper presents a new method for unsupervised endmember extraction from hyperspectral data, termed vertex component analysis (VCA). The algorithm exploits two facts: (1) the endmembers are the vertices of a simplex and (2) the affine transformation of a simplex is also a simplex. In a series of experiments using simulated and real data, the VCA algorithm competes with state-of-the-art methods, with a computational complexity between one and two orders of magnitude lower than the best available method.
Resumo:
The hand is one of the most important instruments of the human body, mainly due to the possibility of grip movements. Grip strength has been described as an important predictor of functional capacity. There are several factors that may influence it, such as gender, age and anthropometric characteristics. Functional capacity refers to the ability to perform daily activities which allow the individual to self-care and to live with autonomy. Composite Physical Function (CPF) scale is an evaluation tool for functional capacity that includes daily activities, self-care, sports activities, upper limb function and gait capacity. In 2011, Portugal had 15% of young population (0-14years) and 19% of elderly population (over 65 years). Considering the double-ageing phenomen, it is important to understand the effect of the grip strength in elderly individuals, considering their characteristics, as the need to maintainin dependency as long as possible.
Resumo:
A estimativa da idade gestacional (IG) em restos cadavéricos fetais é importante em contextos forenses. Para esse efeito, os especialistas forenses recorrem à avaliação do padrão de calcificação dentária e/ou ao estudo do esqueleto. Neste último, o comprimento das diáfises de ossos longos é um dos métodos mais utilizados, sendo utilizadas equações de regressão de obras pouco atuais ou baseadas em dados ecográficos, cujas medições diferem das efetuadas diretamente no osso. Este trabalho tem como objetivo principal a obtenção de equações de regressão para a população Portuguesa, com base na medição das diáfises de fémur, tíbia e úmero, utilizando radiografias postmortem. A amostra é constituída por 80 fetos de IG conhecida. Tratando-se de um estudo retrospectivo, os casos foram selecionados com base nas informações clínicas e anatomopatológicas, excluindo-se aqueles cujo normal crescimento se encontrava efetiva ou potencialmente comprometido. Os resultados confirmaram uma forte correlação entre o comprimento das diáfises estudadas e a IG, apresentando o fémur a correlação mais forte (r=0.967; p <0,01). Assim, foi possível obter uma equação de regressão para cada um dos ossos estudados. Concluindo, os objetivos do estudo foram atingidos com a obtenção das equações de regressão para os ossos estudados. Pretende-se, futuramente, alargar a amostra para validar e consolidar os resultados obtidos neste estudo.
Resumo:
The aim of this paper is to develop models for experimental open-channel water delivery systems and assess the use of three data-driven modeling tools toward that end. Water delivery canals are nonlinear dynamical systems and thus should be modeled to meet given operational requirements while capturing all relevant dynamics, including transport delays. Typically, the derivation of first principle models for open-channel systems is based on the use of Saint-Venant equations for shallow water, which is a time-consuming task and demands for specific expertise. The present paper proposes and assesses the use of three data-driven modeling tools: artificial neural networks, composite local linear models and fuzzy systems. The canal from Hydraulics and Canal Control Nucleus (A parts per thousand vora University, Portugal) will be used as a benchmark: The models are identified using data collected from the experimental facility, and then their performances are assessed based on suitable validation criterion. The performance of all models is compared among each other and against the experimental data to show the effectiveness of such tools to capture all significant dynamics within the canal system and, therefore, provide accurate nonlinear models that can be used for simulation or control. The models are available upon request to the authors.
Resumo:
Research on the problem of feature selection for clustering continues to develop. This is a challenging task, mainly due to the absence of class labels to guide the search for relevant features. Categorical feature selection for clustering has rarely been addressed in the literature, with most of the proposed approaches having focused on numerical data. In this work, we propose an approach to simultaneously cluster categorical data and select a subset of relevant features. Our approach is based on a modification of a finite mixture model (of multinomial distributions), where a set of latent variables indicate the relevance of each feature. To estimate the model parameters, we implement a variant of the expectation-maximization algorithm that simultaneously selects the subset of relevant features, using a minimum message length criterion. The proposed approach compares favourably with two baseline methods: a filter based on an entropy measure and a wrapper based on mutual information. The results obtained on synthetic data illustrate the ability of the proposed expectation-maximization method to recover ground truth. An application to real data, referred to official statistics, shows its usefulness.
Resumo:
Research on cluster analysis for categorical data continues to develop, new clustering algorithms being proposed. However, in this context, the determination of the number of clusters is rarely addressed. We propose a new approach in which clustering and the estimation of the number of clusters is done simultaneously for categorical data. We assume that the data originate from a finite mixture of multinomial distributions and use a minimum message length criterion (MML) to select the number of clusters (Wallace and Bolton, 1986). For this purpose, we implement an EM-type algorithm (Silvestre et al., 2008) based on the (Figueiredo and Jain, 2002) approach. The novelty of the approach rests on the integration of the model estimation and selection of the number of clusters in a single algorithm, rather than selecting this number based on a set of pre-estimated candidate models. The performance of our approach is compared with the use of Bayesian Information Criterion (BIC) (Schwarz, 1978) and Integrated Completed Likelihood (ICL) (Biernacki et al., 2000) using synthetic data. The obtained results illustrate the capacity of the proposed algorithm to attain the true number of cluster while outperforming BIC and ICL since it is faster, which is especially relevant when dealing with large data sets.
Resumo:
Cluster analysis for categorical data has been an active area of research. A well-known problem in this area is the determination of the number of clusters, which is unknown and must be inferred from the data. In order to estimate the number of clusters, one often resorts to information criteria, such as BIC (Bayesian information criterion), MML (minimum message length, proposed by Wallace and Boulton, 1968), and ICL (integrated classification likelihood). In this work, we adopt the approach developed by Figueiredo and Jain (2002) for clustering continuous data. They use an MML criterion to select the number of clusters and a variant of the EM algorithm to estimate the model parameters. This EM variant seamlessly integrates model estimation and selection in a single algorithm. For clustering categorical data, we assume a finite mixture of multinomial distributions and implement a new EM algorithm, following a previous version (Silvestre et al., 2008). Results obtained with synthetic datasets are encouraging. The main advantage of the proposed approach, when compared to the above referred criteria, is the speed of execution, which is especially relevant when dealing with large data sets.