989 resultados para web clustering
Resumo:
Research on the problem of feature selection for clustering continues to develop. This is a challenging task, mainly due to the absence of class labels to guide the search for relevant features. Categorical feature selection for clustering has rarely been addressed in the literature, with most of the proposed approaches having focused on numerical data. In this work, we propose an approach to simultaneously cluster categorical data and select a subset of relevant features. Our approach is based on a modification of a finite mixture model (of multinomial distributions), where a set of latent variables indicate the relevance of each feature. To estimate the model parameters, we implement a variant of the expectation-maximization algorithm that simultaneously selects the subset of relevant features, using a minimum message length criterion. The proposed approach compares favourably with two baseline methods: a filter based on an entropy measure and a wrapper based on mutual information. The results obtained on synthetic data illustrate the ability of the proposed expectation-maximization method to recover ground truth. An application to real data, referred to official statistics, shows its usefulness.
Resumo:
Research on cluster analysis for categorical data continues to develop, new clustering algorithms being proposed. However, in this context, the determination of the number of clusters is rarely addressed. We propose a new approach in which clustering and the estimation of the number of clusters is done simultaneously for categorical data. We assume that the data originate from a finite mixture of multinomial distributions and use a minimum message length criterion (MML) to select the number of clusters (Wallace and Bolton, 1986). For this purpose, we implement an EM-type algorithm (Silvestre et al., 2008) based on the (Figueiredo and Jain, 2002) approach. The novelty of the approach rests on the integration of the model estimation and selection of the number of clusters in a single algorithm, rather than selecting this number based on a set of pre-estimated candidate models. The performance of our approach is compared with the use of Bayesian Information Criterion (BIC) (Schwarz, 1978) and Integrated Completed Likelihood (ICL) (Biernacki et al., 2000) using synthetic data. The obtained results illustrate the capacity of the proposed algorithm to attain the true number of cluster while outperforming BIC and ICL since it is faster, which is especially relevant when dealing with large data sets.
Resumo:
In data clustering, the problem of selecting the subset of most relevant features from the data has been an active research topic. Feature selection for clustering is a challenging task due to the absence of class labels for guiding the search for relevant features. Most methods proposed for this goal are focused on numerical data. In this work, we propose an approach for clustering and selecting categorical features simultaneously. We assume that the data originate from a finite mixture of multinomial distributions and implement an integrated expectation-maximization (EM) algorithm that estimates all the parameters of the model and selects the subset of relevant features simultaneously. The results obtained on synthetic data illustrate the performance of the proposed approach. An application to real data, referred to official statistics, shows its usefulness.
Resumo:
When exploring a virtual environment, realism depends mainly on two factors: realistic images and real-time feedback (motions, behaviour etc.). In this context, photo realism and physical validity of computer generated images required by emerging applications, such as advanced e-commerce, still impose major challenges in the area of rendering research whereas the complexity of lighting phenomena further requires powerful and predictable computing if time constraints must be attained. In this technical report we address the state-of-the-art on rendering, trying to put the focus on approaches, techniques and technologies that might enable real-time interactive web-based clientserver rendering systems. The focus is on the end-systems and not the networking technologies used to interconnect client(s) and server(s).
Resumo:
This paper focus on a demand response model analysis in a smart grid context considering a contingency scenario. A fuzzy clustering technique is applied on the developed demand response model and an analysis is performed for the contingency scenario. Model considerations and architecture are described. The demand response developed model aims to support consumers decisions regarding their consumption needs and possible economic benefits.
Resumo:
Actas do Encontro sobre Web 2.0. Braga: CIEd
Resumo:
Purpose - The education and training of a nuclear medicine technologist (NMT) is not homogeneous among European countries, which leads to different scope of practices and, therefore, different technical skills are assigned. The goal of this research was to characterize the education and training of NMT in Europe. Materials and methods - This study was based on a literature research to characterize the education and training of NMT and support the historical evolution of this profession. It was divided into two different phases: the first phase included analysis of scientific articles and the second phase included research of curricula that allow health professionals to work as NMT in Europe. Results - The majority of the countries [N=31 (89%)] offer the NMT curriculum integrated into the high education system and only in four (11%) countries the education is provided by professional schools. The duration in each education system is not equal, varying in professional schools (2-3 years) and high education level system (2-4 years), which means that different European Credit Transfer and Accumulation System, such as 240, 230, 222, 210 or 180 European Credit Transfer and Accumulation System, are attributed to the graduates. The professional title and scope of the practice of NMT are different in different countries in Europe. In most countries of Europe, nuclear medicine training is not specific and curriculum does not demonstrate the Nuclear Medicine competencies performed in clinical practice. Conclusion - The heterogeneity in education and training for NMT is an issue prevalent among European countries. For NMT professional development, there is a huge need to formalize and unify educational and training programmes in Europe.
Resumo:
Dissertação apresentada na Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa para a obtenção do Grau de Mestre em Engenharia Informática.
Resumo:
With the advent of wearable sensing and mobile technologies, biosignals have seen an increasingly growing number of application areas, leading to the collection of large volumes of data. One of the difficulties in dealing with these data sets, and in the development of automated machine learning systems which use them as input, is the lack of reliable ground truth information. In this paper we present a new web-based platform for visualization, retrieval and annotation of biosignals by non-technical users, aimed at improving the process of ground truth collection for biomedical applications. Moreover, a novel extendable and scalable data representation model and persistency framework is presented. The results of the experimental evaluation with possible users has further confirmed the potential of the presented framework.
Resumo:
Biosignals analysis has become widespread, upstaging their typical use in clinical settings. Electrocardiography (ECG) plays a central role in patient monitoring as a diagnosis tool in today's medicine and as an emerging biometric trait. In this paper we adopt a consensus clustering approach for the unsupervised analysis of an ECG-based biometric records. This type of analysis highlights natural groups within the population under investigation, which can be correlated with ground truth information in order to gain more insights about the data. Preliminary results are promising, for meaningful clusters are extracted from the population under analysis. © 2014 EURASIP.
Resumo:
Constrained and unconstrained Nonlinear Optimization Problems often appear in many engineering areas. In some of these cases it is not possible to use derivative based optimization methods because the objective function is not known or it is too complex or the objective function is non-smooth. In these cases derivative based methods cannot be used and Direct Search Methods might be the most suitable optimization methods. An Application Programming Interface (API) including some of these methods was implemented using Java Technology. This API can be accessed either by applications running in the same computer where it is installed or, it can be remotely accessed through a LAN or the Internet, using webservices. From the engineering point of view, the information needed from the API is the solution for the provided problem. On the other hand, from the optimization methods researchers’ point of view, not only the solution for the problem is needed. Also additional information about the iterative process is useful, such as: the number of iterations; the value of the solution at each iteration; the stopping criteria, etc. In this paper are presented the features added to the API to allow users to access to the iterative process data.
Resumo:
Nonlinear Optimization Problems are usual in many engineering fields. Due to its characteristics the objective function of some problems might not be differentiable or its derivatives have complex expressions. There are even cases where an analytical expression of the objective function might not be possible to determine either due to its complexity or its cost (monetary, computational, time, ...). In these cases Nonlinear Optimization methods must be used. An API, including several methods and algorithms to solve constrained and unconstrained optimization problems was implemented. This API can be accessed not only as traditionally, by installing it on the developer and/or user computer, but it can also be accessed remotely using Web Services. As long as there is a network connection to the server where the API is installed, applications always access to the latest API version. Also an Web-based application, using the proposed API, was developed. This application is to be used by users that do not want to integrate methods in applications, and simply want to have a tool to solve Nonlinear Optimization Problems.
Resumo:
This paper focus on a demand response model analysis in a smart grid context considering a contingency scenario. A fuzzy clustering technique is applied on the developed demand response model and an analysis is performed for the contingency scenario. Model considerations and architecture are described. The demand response developed model aims to support consumers decisions regarding their consumption needs and possible economic benefits.
Resumo:
Dissertação apresentada na Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa para obtenção do grau de Mestre em Engenharia Informática
Resumo:
Dissertação apresentada na Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa, para a obtenção do grau de Mestre em Engenharia Informática