873 resultados para agglomerative clustering


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Trabalho de Projeto apresentado como requisito parcial para obtenção do grau de Mestre em Estatística e Gestão de Informação

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Dissertação para obtenção do Grau de Mestre em Engenharia Informática

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Dissertation to Obtain Master Degree in Biomedical Engineering

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The alignment of collective goals and individual behavior has been extensively studied by economists under a principal-agent framework. Two main solutions have been presented: explicit incentive contracts and monitoring. These solutions correspond to changes in the objective situation faced by individuals. However, an extensive literature in social psychology provides evidence that behavior is influenced, not only by situational constraints, but also by attitudes. Therefore, an important aspect of organization is to choose the structures and procedures that best contribute to the dissemination of the desired attitudes throughout the organization. This paper studies how the initial configuration of attitudes and the size of the organization affect the optimal organizational structure and the timing of information flows when the objective is to align the members' attitudes. We identify and characterize three factors that affect the optimal organizational structures and procedures and the degree of alignment of attitudes: (1) clustering effects; (2) member cross-influence effects; and (3) leader cross-influence effects.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The dissertation presented for obtaining the Master’s Degree in Electrical Engineering and Computer Science, at Universidade Nova de Lisboa, Faculdade de Ciências e Tecnologia

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A thesis submitted in fulfillment of the requirements for the degree of the Masters in Molecular Genetics and Biomedicine

Relevância:

10.00% 10.00%

Publicador:

Resumo:

O principal objectivo deste trabalho assenta em desenvolver modelos de previsão de preços de commodities para assim comparar a capacidade preditiva da simulação de Monte Carlo com a das redes neuronais. A simulação de Monte Carlo é principalmente utilizada para avaliar as opções, já as redes neuronais são utilizadas para fazer previsões, classificações, clustering ou aproximação de funções. Os diversos modelos desenvolvidos foram aplicados na previsão do preço futuro do milho, petróleo, ouro e cobre. Sendo que os horizontes temporais testados neste trabalho foram 1 dia, 5 dias, 20 dias e 60 dias. Através da análise do erro absoluto médio percentual (MAPE) concluiu-se que no geral o modelo individual que apresentou um melhor desempenho preditivo foram as redes neuronais. Contudo, nas previsões a 1 e a 5 dias os resultados obtidos foram semelhantes para ambos os modelos. Para se tentar melhorar os resultados obtidos pelos modelos individuais foram aplicadas algumas técnicas de combinação de modelos. A combinação de modelos demonstrou no geral capacidade para melhorar os resultados dos modelos individuais, porém apenas para o horizonte a 60 dias é que os resultados melhoraram significativamente.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Human Activity Recognition systems require objective and reliable methods that can be used in the daily routine and must offer consistent results according with the performed activities. These systems are under development and offer objective and personalized support for several applications such as the healthcare area. This thesis aims to create a framework for human activities recognition based on accelerometry signals. Some new features and techniques inspired in the audio recognition methodology are introduced in this work, namely Log Scale Power Bandwidth and the Markov Models application. The Forward Feature Selection was adopted as the feature selection algorithm in order to improve the clustering performances and limit the computational demands. This method selects the most suitable set of features for activities recognition in accelerometry from a 423th dimensional feature vector. Several Machine Learning algorithms were applied to the used accelerometry databases – FCHA and PAMAP databases - and these showed promising results in activities recognition. The developed algorithm set constitutes a mighty contribution for the development of reliable evaluation methods of movement disorders for diagnosis and treatment applications.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Saccharomyces cerevisiae as well as other microorganisms are frequently used in industry with the purpose of obtain different kind of products that can be applied in several areas (research investigation, pharmaceutical compounds, etc.). In order to obtain high yields for the desired product, it is necessary to make an adequate medium supplementation during the growth of the microorganisms. The higher yields are typically reached by using complex media, however the exact formulation of these media is not known. Moreover, it is difficult to control the exact composition of complex media, leading to batch-to-batch variations. So, to overcome this problem, some industries choose to use defined media, with a defined and known chemical composition. However these kind of media, many times, do not reach the same high yields that are obtained by using complex media. In order to obtain similar yield with defined media the addition of many different compounds has to be tested experimentally. Therefore, the industries use a set of empirical methods with which it is tried to formulate defined media that can reach the same high yields as complex media. In this thesis, a defined medium for Saccharomyces cerevisiae was developed using a rational design approach. In this approach a given metabolic network of Saccharomyces cerevisiae is divided into a several unique and not further decomposable sub networks of metabolic reactions that work coherently in steady state, so called elementary flux modes. The EFMtool algorithm was used in order to calculate the EFM’s for two Saccharomyces cerevisiae metabolic networks (amino acids supplemented metabolic network; amino acids non-supplemented metabolic network). For the supplemented metabolic network 1352172 EFM’s were calculated and then divided into: 1306854 EFM’s producing biomass, and 18582 EFM’s exclusively producing CO2 (cellular respiration). For the non-supplemented network 635 EFM’s were calculated and then divided into: 215 EFM’s producing biomass; 420 EFM’s producing exclusively CO2. The EFM’s of each group were normalized by the respective glucose consumption value. After that, the EFMs’ of the supplemented network were grouped again into: 30 clusters for the 1306854 EFMs producing biomass and, 20 clusters for the 18582 EFM’s producing CO2. For the non-supplemented metabolic network the respective EFM’s of each metabolic function were grouped into 10 clusters. After the clustering step, the concentrations of the other medium compounds were calculated by considering a reasonable glucose amount and by accounting for the proportionality between the compounds concentrations and the glucose ratios. The approach adopted/developed in this thesis may allow a faster and more economical way for media development.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This study analyses financial data using the result characterization of a self-organized neural network model. The goal was prototyping a tool that may help an economist or a market analyst to analyse stock market series. To reach this goal, the tool shows economic dependencies and statistics measures over stock market series. The neural network SOM (self-organizing maps) model was used to ex-tract behavioural patterns of the data analysed. Based on this model, it was de-veloped an application to analyse financial data. This application uses a portfo-lio of correlated markets or inverse-correlated markets as input. After the anal-ysis with SOM, the result is represented by micro clusters that are organized by its behaviour tendency. During the study appeared the need of a better analysis for SOM algo-rithm results. This problem was solved with a cluster solution technique, which groups the micro clusters from SOM U-Matrix analyses. The study showed that the correlation and inverse-correlation markets projects multiple clusters of data. These clusters represent multiple trend states that may be useful for technical professionals.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Botnets are a group of computers infected with a specific sub-set of a malware family and controlled by one individual, called botmaster. This kind of networks are used not only, but also for virtual extorsion, spam campaigns and identity theft. They implement different types of evasion techniques that make it harder for one to group and detect botnet traffic. This thesis introduces one methodology, called CONDENSER, that outputs clusters through a self-organizing map and that identify domain names generated by an unknown pseudo-random seed that is known by the botnet herder(s). Aditionally DNS Crawler is proposed, this system saves historic DNS data for fast-flux and double fastflux detection, and is used to identify live C&Cs IPs used by real botnets. A program, called CHEWER, was developed to automate the calculation of the SVM parameters and features that better perform against the available domain names associated with DGAs. CONDENSER and DNS Crawler were developed with scalability in mind so the detection of fast-flux and double fast-flux networks become faster. We used a SVM for the DGA classififer, selecting a total of 11 attributes and achieving a Precision of 77,9% and a F-Measure of 83,2%. The feature selection method identified the 3 most significant attributes of the total set of attributes. For clustering, a Self-Organizing Map was used on a total of 81 attributes. The conclusions of this thesis were accepted in Botconf through a submited article. Botconf is known conferênce for research, mitigation and discovery of botnets tailled for the industry, where is presented current work and research. This conference is known for having security and anti-virus companies, law enforcement agencies and researchers.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The extraction of relevant terms from texts is an extensively researched task in Text- Mining. Relevant terms have been applied in areas such as Information Retrieval or document clustering and classification. However, relevance has a rather fuzzy nature since the classification of some terms as relevant or not relevant is not consensual. For instance, while words such as "president" and "republic" are generally considered relevant by human evaluators, and words like "the" and "or" are not, terms such as "read" and "finish" gather no consensus about their semantic and informativeness. Concepts, on the other hand, have a less fuzzy nature. Therefore, instead of deciding on the relevance of a term during the extraction phase, as most extractors do, I propose to first extract, from texts, what I have called generic concepts (all concepts) and postpone the decision about relevance for downstream applications, accordingly to their needs. For instance, a keyword extractor may assume that the most relevant keywords are the most frequent concepts on the documents. Moreover, most statistical extractors are incapable of extracting single-word and multi-word expressions using the same methodology. These factors led to the development of the ConceptExtractor, a statistical and language-independent methodology which is explained in Part I of this thesis. In Part II, I will show that the automatic extraction of concepts has great applicability. For instance, for the extraction of keywords from documents, using the Tf-Idf metric only on concepts yields better results than using Tf-Idf without concepts, specially for multi-words. In addition, since concepts can be semantically related to other concepts, this allows us to build implicit document descriptors. These applications led to published work. Finally, I will present some work that, although not published yet, is briefly discussed in this document.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In the last few years, we have observed an exponential increasing of the information systems, and parking information is one more example of them. The needs of obtaining reliable and updated information of parking slots availability are very important in the goal of traffic reduction. Also parking slot prediction is a new topic that has already started to be applied. San Francisco in America and Santander in Spain are examples of such projects carried out to obtain this kind of information. The aim of this thesis is the study and evaluation of methodologies for parking slot prediction and the integration in a web application, where all kind of users will be able to know the current parking status and also future status according to parking model predictions. The source of the data is ancillary in this work but it needs to be understood anyway to understand the parking behaviour. Actually, there are many modelling techniques used for this purpose such as time series analysis, decision trees, neural networks and clustering. In this work, the author explains the best techniques at this work, analyzes the result and points out the advantages and disadvantages of each one. The model will learn the periodic and seasonal patterns of the parking status behaviour, and with this knowledge it can predict future status values given a date. The data used comes from the Smart Park Ontinyent and it is about parking occupancy status together with timestamps and it is stored in a database. After data acquisition, data analysis and pre-processing was needed for model implementations. The first test done was with the boosting ensemble classifier, employed over a set of decision trees, created with C5.0 algorithm from a set of training samples, to assign a prediction value to each object. In addition to the predictions, this work has got measurements error that indicates the reliability of the outcome predictions being correct. The second test was done using the function fitting seasonal exponential smoothing tbats model. Finally as the last test, it has been tried a model that is actually a combination of the previous two models, just to see the result of this combination. The results were quite good for all of them, having error averages of 6.2, 6.6 and 5.4 in vacancies predictions for the three models respectively. This means from a parking of 47 places a 10% average error in parking slot predictions. This result could be even better with longer data available. In order to make this kind of information visible and reachable from everyone having a device with internet connection, a web application was made for this purpose. Beside the data displaying, this application also offers different functions to improve the task of searching for parking. The new functions, apart from parking prediction, were: - Park distances from user location. It provides all the distances to user current location to the different parks in the city. - Geocoding. The service for matching a literal description or an address to a concrete location. - Geolocation. The service for positioning the user. - Parking list panel. This is not a service neither a function, is just a better visualization and better handling of the information.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This work project (WP) is a study about a clustering strategy for Sport Zone. The general cluster study’s objective is to create groups such that within each group the individuals are similar to each other, but should be different among groups. The clusters creation is a mix of common sense, trial and error and some statistical supporting techniques. Our particular objective is to support category managers to better define the product type to be displayed in the stores’ shelves by doing store clusters. This research was carried out for Sport Zone, and comprises an objective definition, a literature review, the clustering activity itself, some factor analysis and a discriminant analysis to better frame our work. Together with this quantitative part, a survey addressed to category managers to better understand their key drivers, for choosing the type of product of each store, was carried out. Based in a non-random sample of 65 stores with data referring to 2013, the final result was the choice of 6 store clusters (Figure 1) which were individually characterized as the main outcome of this work. In what relates to our selected variables, all were important for the distinction between clusters, which proves the adequacy of their choice. The interpretation of the results gives category managers a tool to understand which products best fit the clustered stores. Furthermore, as a side finding thanks to the clusterization, a STP (Segmentation, Targeting and Positioning) was initiated, being this WP the first steps of a continuous process.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

O paradigma de avaliação do ensino superior foi alterado em 2005 para ter em conta, para além do número de entradas, o número de alunos diplomados. Esta alteração pressiona as instituições académicas a melhorar o desempenho dos alunos. Um fenómeno perceptível ao analisar esse desempenho é que a performance registada não é nem uniforme nem constante ao longo da estadia do aluno no curso. Estas variações não estão a ser consideradas no esforço de melhorar o desempenho académico e surge motivação para detectar os diferentes perfis de desempenho e utilizar esse conhecimento para melhorar a o desempenho das instituições académicas. Este documento descreve o trabalho realizado no sentido de propor uma metodologia para detectar padrões de desempenho académico, num curso do ensino superior. Como ferramenta de análise são usadas técnicas de data mining, mais precisamente algoritmos de agrupamento. O caso de estudo para este trabalho é a população estudantil da licenciatura em Eng. Informática da FCT-UNL. Propõe-se dois modelos para o aluno, que servem de base para a análise. Um modelo analisa os alunos tendo em conta a sua performance num ano lectivo e o segundo analisa os alunos tendo em conta o seu percurso académico pelo curso, desde que entrou até se diplomar, transferir ou desistir. Esta análise é realizada recorrendo aos algoritmos de agrupamento: algoritmo aglomerativo hierárquico, k-means, SOM e SNN, entre outros.