994 resultados para Unstructured data


Relevância:

20.00% 20.00%

Publicador:

Resumo:

A estimativa da idade gestacional (IG) em restos cadavéricos fetais é importante em contextos forenses. Para esse efeito, os especialistas forenses recorrem à avaliação do padrão de calcificação dentária e/ou ao estudo do esqueleto. Neste último, o comprimento das diáfises de ossos longos é um dos métodos mais utilizados, sendo utilizadas equações de regressão de obras pouco atuais ou baseadas em dados ecográficos, cujas medições diferem das efetuadas diretamente no osso. Este trabalho tem como objetivo principal a obtenção de equações de regressão para a população Portuguesa, com base na medição das diáfises de fémur, tíbia e úmero, utilizando radiografias postmortem. A amostra é constituída por 80 fetos de IG conhecida. Tratando-se de um estudo retrospectivo, os casos foram selecionados com base nas informações clínicas e anatomopatológicas, excluindo-se aqueles cujo normal crescimento se encontrava efetiva ou potencialmente comprometido. Os resultados confirmaram uma forte correlação entre o comprimento das diáfises estudadas e a IG, apresentando o fémur a correlação mais forte (r=0.967; p <0,01). Assim, foi possível obter uma equação de regressão para cada um dos ossos estudados. Concluindo, os objetivos do estudo foram atingidos com a obtenção das equações de regressão para os ossos estudados. Pretende-se, futuramente, alargar a amostra para validar e consolidar os resultados obtidos neste estudo.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A classical application of biosignal analysis has been the psychophysiological detection of deception, also known as the polygraph test, which is currently a part of standard practices of law enforcement agencies and several other institutions worldwide. Although its validity is far from gathering consensus, the underlying psychophysiological principles are still an interesting add-on for more informal applications. In this paper we present an experimental off-the-person hardware setup, propose a set of feature extraction criteria and provide a comparison of two classification approaches, targeting the detection of deception in the context of a role-playing interactive multimedia environment. Our work is primarily targeted at recreational use in the context of a science exhibition, where the main goal is to present basic concepts related with knowledge discovery, biosignal analysis and psychophysiology in an educational way, using techniques that are simple enough to be understood by children of different ages. Nonetheless, this setting will also allow us to build a significant data corpus, annotated with ground-truth information, and collected with non-intrusive sensors, enabling more advanced research on the topic. Experimental results have shown interesting findings and provided useful guidelines for future work. Pattern Recognition

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Relatório apresentado à Escola Superior de Educação de Lisboa para obtenção de grau de mestre em Ensino do 1º e do 2º Ciclos do Ensino Básico

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The aim of this paper is to develop models for experimental open-channel water delivery systems and assess the use of three data-driven modeling tools toward that end. Water delivery canals are nonlinear dynamical systems and thus should be modeled to meet given operational requirements while capturing all relevant dynamics, including transport delays. Typically, the derivation of first principle models for open-channel systems is based on the use of Saint-Venant equations for shallow water, which is a time-consuming task and demands for specific expertise. The present paper proposes and assesses the use of three data-driven modeling tools: artificial neural networks, composite local linear models and fuzzy systems. The canal from Hydraulics and Canal Control Nucleus (A parts per thousand vora University, Portugal) will be used as a benchmark: The models are identified using data collected from the experimental facility, and then their performances are assessed based on suitable validation criterion. The performance of all models is compared among each other and against the experimental data to show the effectiveness of such tools to capture all significant dynamics within the canal system and, therefore, provide accurate nonlinear models that can be used for simulation or control. The models are available upon request to the authors.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Seismic data is difficult to analyze and classical mathematical tools reveal strong limitations in exposing hidden relationships between earthquakes. In this paper, we study earthquake phenomena in the perspective of complex systems. Global seismic data, covering the period from 1962 up to 2011 is analyzed. The events, characterized by their magnitude, geographic location and time of occurrence, are divided into groups, either according to the Flinn-Engdahl (F-E) seismic regions of Earth or using a rectangular grid based in latitude and longitude coordinates. Two methods of analysis are considered and compared in this study. In a first method, the distributions of magnitudes are approximated by Gutenberg-Richter (G-R) distributions and the parameters used to reveal the relationships among regions. In the second method, the mutual information is calculated and adopted as a measure of similarity between regions. In both cases, using clustering analysis, visualization maps are generated, providing an intuitive and useful representation of the complex relationships that are present among seismic data. Such relationships might not be perceived on classical geographic maps. Therefore, the generated charts are a valid alternative to other visualization tools, for understanding the global behavior of earthquakes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

OBJECTIVE: To examine the relationship between social contextual factors and child and adolescent labor. METHODS: Population-based cohort study carried out with 2,512 families living in 23 subareas of a large urban city in Brazil from 2000 to 2002. A random one-stage cluster sampling was used to select families. Data were obtained through individual household interviews using questionnaires. The annual cumulative incidence of child and adolescent labor was estimated for each district. New child and adolescent labor cases were those who had their first job over the two-year follow-up. The annual cumulative incidence of child and adolescent labor was the response variable and predictors were contextual factors such as lack of social support, social deprivation, unstructured family, perceived violence, poor school quality, poor environment conditions, and poor public services. Pearson's correlation and multiple linear regression were used to assess the associations. RESULTS: There were selected 943 families corresponding to 1,326 non-working children and adolescents aged 8 to 17 years. Lack of social support, social deprivation, perceived violence were all positively and individually associated with the annual cumulative incidence of child and adolescent labor. In the multiple linear regression model, however, only lack of social support and perceived violence in the neighborhood were positively associated to child and adolescent labor. No effect was found for poor school quality, poor environment conditions, poor public services or unstructured family. CONCLUSIONS: Poverty reduction programs can reduce the contextual factors associated with child and adolescent labor. Violence reduction programs and strengthening social support at the community level may contribute to reduce CAL.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Research on the problem of feature selection for clustering continues to develop. This is a challenging task, mainly due to the absence of class labels to guide the search for relevant features. Categorical feature selection for clustering has rarely been addressed in the literature, with most of the proposed approaches having focused on numerical data. In this work, we propose an approach to simultaneously cluster categorical data and select a subset of relevant features. Our approach is based on a modification of a finite mixture model (of multinomial distributions), where a set of latent variables indicate the relevance of each feature. To estimate the model parameters, we implement a variant of the expectation-maximization algorithm that simultaneously selects the subset of relevant features, using a minimum message length criterion. The proposed approach compares favourably with two baseline methods: a filter based on an entropy measure and a wrapper based on mutual information. The results obtained on synthetic data illustrate the ability of the proposed expectation-maximization method to recover ground truth. An application to real data, referred to official statistics, shows its usefulness.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Research on cluster analysis for categorical data continues to develop, new clustering algorithms being proposed. However, in this context, the determination of the number of clusters is rarely addressed. We propose a new approach in which clustering and the estimation of the number of clusters is done simultaneously for categorical data. We assume that the data originate from a finite mixture of multinomial distributions and use a minimum message length criterion (MML) to select the number of clusters (Wallace and Bolton, 1986). For this purpose, we implement an EM-type algorithm (Silvestre et al., 2008) based on the (Figueiredo and Jain, 2002) approach. The novelty of the approach rests on the integration of the model estimation and selection of the number of clusters in a single algorithm, rather than selecting this number based on a set of pre-estimated candidate models. The performance of our approach is compared with the use of Bayesian Information Criterion (BIC) (Schwarz, 1978) and Integrated Completed Likelihood (ICL) (Biernacki et al., 2000) using synthetic data. The obtained results illustrate the capacity of the proposed algorithm to attain the true number of cluster while outperforming BIC and ICL since it is faster, which is especially relevant when dealing with large data sets.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Cluster analysis for categorical data has been an active area of research. A well-known problem in this area is the determination of the number of clusters, which is unknown and must be inferred from the data. In order to estimate the number of clusters, one often resorts to information criteria, such as BIC (Bayesian information criterion), MML (minimum message length, proposed by Wallace and Boulton, 1968), and ICL (integrated classification likelihood). In this work, we adopt the approach developed by Figueiredo and Jain (2002) for clustering continuous data. They use an MML criterion to select the number of clusters and a variant of the EM algorithm to estimate the model parameters. This EM variant seamlessly integrates model estimation and selection in a single algorithm. For clustering categorical data, we assume a finite mixture of multinomial distributions and implement a new EM algorithm, following a previous version (Silvestre et al., 2008). Results obtained with synthetic datasets are encouraging. The main advantage of the proposed approach, when compared to the above referred criteria, is the speed of execution, which is especially relevant when dealing with large data sets.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A navegação e a interpretação do meio envolvente por veículos autónomos em ambientes não estruturados continua a ser um grande desafio na actualidade. Sebastian Thrun, descreve em [Thr02], que o problema do mapeamento em sistemas robóticos é o da aquisição de um modelo espacial do meio envolvente do robô. Neste contexto, a integração de sistemas sensoriais em plataformas robóticas, que permitam a construção de mapas do mundo que as rodeia é de extrema importância. A informação recolhida desses dados pode ser interpretada, tendo aplicabilidade em tarefas de localização, navegação e manipulação de objectos. Até à bem pouco tempo, a generalidade dos sistemas robóticos que realizavam tarefas de mapeamento ou Simultaneous Localization And Mapping (SLAM), utilizavam dispositivos do tipo laser rangefinders e câmaras stereo. Estes equipamentos, para além de serem dispendiosos, fornecem apenas informação bidimensional, recolhidas através de cortes transversais 2D, no caso dos rangefinders. O paradigma deste tipo de tecnologia mudou consideravelmente, com o lançamento no mercado de câmaras RGB-D, como a desenvolvida pela PrimeSense TM e o subsequente lançamento da Kinect, pela Microsoft R para a Xbox 360 no final de 2010. A qualidade do sensor de profundidade, dada a natureza de baixo custo e a sua capacidade de aquisição de dados em tempo real, é incontornável, fazendo com que o sensor se tornasse instantaneamente popular entre pesquisadores e entusiastas. Este avanço tecnológico deu origem a várias ferramentas de desenvolvimento e interacção humana com este tipo de sensor, como por exemplo a Point Cloud Library [RC11] (PCL). Esta ferramenta tem como objectivo fornecer suporte para todos os blocos de construção comuns que uma aplicação 3D necessita, dando especial ênfase ao processamento de nuvens de pontos de n dimensões adquiridas a partir de câmaras RGB-D, bem como scanners laser, câmaras Time-of-Flight ou câmaras stereo. Neste contexto, é realizada nesta dissertação, a avaliação e comparação de alguns dos módulos e métodos constituintes da biblioteca PCL, para a resolução de problemas inerentes à construção e interpretação de mapas, em ambientes indoor não estruturados, utilizando os dados provenientes da Kinect. A partir desta avaliação, é proposta uma arquitectura de sistema que sistematiza o registo de nuvens de pontos, correspondentes a vistas parciais do mundo, num modelo global consistente. Os resultados da avaliação realizada à biblioteca PCL atestam a sua viabilidade, para a resolução dos problemas propostos. Prova da sua viabilidade, são os resultados práticos obtidos, da implementação da arquitectura de sistema proposta, que apresenta resultados de desempenho interessantes, como também boas perspectivas de integração deste tipo de conceitos e tecnologia em plataformas robóticas desenvolvidas no âmbito de projectos do Laboratório de Sistemas Autónomos (LSA).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Consider the problem of disseminating data from an arbitrary source node to all other nodes in a distributed computer system, like Wireless Sensor Networks (WSNs). We assume that wireless broadcast is used and nodes do not know the topology. We propose new protocols which disseminate data faster and use fewer broadcasts than the simple broadcast protocol.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Nowadays, due to the incredible grow of the mobile devices market, when we want to implement a client-server applications we must consider mobile devices limitations. In this paper we discuss which can be the more reliable and fast way to exchange information between a server and an Android mobile application. This is an important issue because with a responsive application the user experience is more enjoyable. In this paper we present a study that test and evaluate two data transfer protocols, socket and HTTP, and three data serialization formats (XML, JSON and Protocol Buffers) using different environments and mobile devices to realize which is the most practical and fast to use.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Dissertação de Natureza Científica para obtenção do grau de Mestre em Engenharia Civil na Área de Especialização de Edificações

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The goal of the this paper is to show that the DGPS data Internet service we designed and developed provides campus-wide real time access to Differential GPS (DGPS) data and, thus, supports precise outdoor navigation. First we describe the developed distributed system in terms of architecture (a three tier client/server application), services provided (real time DGPS data transportation from remote DGPS sources and campus wide data dissemination) and transmission modes implemented (raw and frame mode over TCP and UDP). Then we present and discuss the results obtained and, finally, we draw some conclusions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The principal topic of this work is the application of data mining techniques, in particular of machine learning, to the discovery of knowledge in a protein database. In the first chapter a general background is presented. Namely, in section 1.1 we overview the methodology of a Data Mining project and its main algorithms. In section 1.2 an introduction to the proteins and its supporting file formats is outlined. This chapter is concluded with section 1.3 which defines that main problem we pretend to address with this work: determine if an amino acid is exposed or buried in a protein, in a discrete way (i.e.: not continuous), for five exposition levels: 2%, 10%, 20%, 25% and 30%. In the second chapter, following closely the CRISP-DM methodology, whole the process of construction the database that supported this work is presented. Namely, it is described the process of loading data from the Protein Data Bank, DSSP and SCOP. Then an initial data exploration is performed and a simple prediction model (baseline) of the relative solvent accessibility of an amino acid is introduced. It is also introduced the Data Mining Table Creator, a program developed to produce the data mining tables required for this problem. In the third chapter the results obtained are analyzed with statistical significance tests. Initially the several used classifiers (Neural Networks, C5.0, CART and Chaid) are compared and it is concluded that C5.0 is the most suitable for the problem at stake. It is also compared the influence of parameters like the amino acid information level, the amino acid window size and the SCOP class type in the accuracy of the predictive models. The fourth chapter starts with a brief revision of the literature about amino acid relative solvent accessibility. Then, we overview the main results achieved and finally discuss about possible future work. The fifth and last chapter consists of appendices. Appendix A has the schema of the database that supported this thesis. Appendix B has a set of tables with additional information. Appendix C describes the software provided in the DVD accompanying this thesis that allows the reconstruction of the present work.