10 resultados para NCHS data brief (Series)
Resumo:
The principal topic of this work is the application of data mining techniques, in particular of machine learning, to the discovery of knowledge in a protein database. In the first chapter a general background is presented. Namely, in section 1.1 we overview the methodology of a Data Mining project and its main algorithms. In section 1.2 an introduction to the proteins and its supporting file formats is outlined. This chapter is concluded with section 1.3 which defines that main problem we pretend to address with this work: determine if an amino acid is exposed or buried in a protein, in a discrete way (i.e.: not continuous), for five exposition levels: 2%, 10%, 20%, 25% and 30%. In the second chapter, following closely the CRISP-DM methodology, whole the process of construction the database that supported this work is presented. Namely, it is described the process of loading data from the Protein Data Bank, DSSP and SCOP. Then an initial data exploration is performed and a simple prediction model (baseline) of the relative solvent accessibility of an amino acid is introduced. It is also introduced the Data Mining Table Creator, a program developed to produce the data mining tables required for this problem. In the third chapter the results obtained are analyzed with statistical significance tests. Initially the several used classifiers (Neural Networks, C5.0, CART and Chaid) are compared and it is concluded that C5.0 is the most suitable for the problem at stake. It is also compared the influence of parameters like the amino acid information level, the amino acid window size and the SCOP class type in the accuracy of the predictive models. The fourth chapter starts with a brief revision of the literature about amino acid relative solvent accessibility. Then, we overview the main results achieved and finally discuss about possible future work. The fifth and last chapter consists of appendices. Appendix A has the schema of the database that supported this thesis. Appendix B has a set of tables with additional information. Appendix C describes the software provided in the DVD accompanying this thesis that allows the reconstruction of the present work.
Resumo:
Dissertação apresentada como requisito parcial para obtenção do grau de Mestre em EstatÃstica e Gestão de Informação
Resumo:
The section at Cristo Rei shows sandy beds with intercalated clayey lenses (IVb division from the Lisbon Miocene series) that correspond to a major regression event dated from between ca. 17.6 and 17 Ma. They also correspond to a distal position (relatively to the typical fluviatile facies in Lisbon), nearer the basin's axis. Geologic data and paleontological analysis (plant fossils, fishes, crocodilians, land mammals) allow the reconstruction of environments that were represented in the concerned area: estuary with channels and ox-bows; upstream, areas occupied by brackish waters where Gryphaea griphoides banks developped; still farther upstream, freshwaters sided by humid forests and low mountain subtropical forests under warm temperate and rainy conditions, as well as not far away, seasonally dry environments (low density tree or shrub cover, or steppe).
Resumo:
Thesis submitted in the fulfillment of the requirements for the Degree of Master in Biomedical Engineering
Resumo:
The last three decades have seen quite dramatic changes the way we modeled time dependent data. Linear processes have been in the center stage in modeling time series. As far as the second order properties are concerned, the theory and the methodology are very adequate.However, there are more and more evidences that linear models are not sufficiently flexible and rich enough for modeling purposes and that failure to account for non-linearities can be very misleading and have undesired consequences.
Resumo:
Dissertation submitted in the fufillment of the requirements for the Degree of Master in Biomedical Engineering
Resumo:
This article develops a latent class model for estimating willingness-to-pay for public goods using simultaneously contingent valuation (CV) and attitudinal data capturing protest attitudes related to the lack of trust in public institutions providing those goods. A measure of the social cost associated with protest responses and the consequent loss in potential contributions for providing the public good is proposed. The presence of potential justification biases is further considered, that is, the possibility that for psychological reasons the response to the CV question affects the answers to the attitudinal questions. The results from our empirical application suggest that psychological factors should not be ignored in CV estimation for policy purposes, allowing for a correct identification of protest responses.
Resumo:
The continued increase in availability of economic data in recent years and, more importantly, the possibility to construct larger frequency time series, have fostered the use (and development) of statistical and econometric techniques to treat them more accurately. This paper presents an exposition of structural time series models by which a time series can be decomposed as the sum of a trend, seasonal and irregular components. In addition to a detailled analysis of univariate speci fications we also address the SUTSE multivariate case and the issue of cointegration. Finally, the recursive estimation and smoothing by means of the Kalman filter algorithm is described taking into account its different stages, from initialisation to parameter s estimation.
Resumo:
This study analyses financial data using the result characterization of a self-organized neural network model. The goal was prototyping a tool that may help an economist or a market analyst to analyse stock market series. To reach this goal, the tool shows economic dependencies and statistics measures over stock market series. The neural network SOM (self-organizing maps) model was used to ex-tract behavioural patterns of the data analysed. Based on this model, it was de-veloped an application to analyse financial data. This application uses a portfo-lio of correlated markets or inverse-correlated markets as input. After the anal-ysis with SOM, the result is represented by micro clusters that are organized by its behaviour tendency. During the study appeared the need of a better analysis for SOM algo-rithm results. This problem was solved with a cluster solution technique, which groups the micro clusters from SOM U-Matrix analyses. The study showed that the correlation and inverse-correlation markets projects multiple clusters of data. These clusters represent multiple trend states that may be useful for technical professionals.
Resumo:
Crisis-affected communities and global organizations for international aid are becoming increasingly digital as consequence geotechnology popularity. Humanitarian sector changed in profound ways by adopting new technical approach to obtain information from area with difficult geographical or political access. Since 2011, turkey is hosting a growing number of Syrian refugees along southeastern region. Turkish policy of hosting them in camps and the difficulty created by governors to international aid group expeditions to get information, made such international organizations to investigate and adopt other approach in order to obtain information needed. They intensified its remote sensing approach. However, the majority of studies used very high-resolution satellite imagery (VHRSI). The study area is extensive and the temporal resolution of VHRSI is low, besides it is infeasible only using these sensors as unique approach for the whole area. The focus of this research, aims to investigate the potentialities of mid-resolution imagery (here only Landsat) to obtain information from region in crisis (here, southeastern Turkey) through a new web-based platform called Google Earth Engine (GEE). Hereby it is also intended to verify GEE currently reliability once the Application Programming Interface (API) is still in beta version. The finds here shows that the basic functions are trustworthy. Results pointed out that Landsat can recognize change in the spectral resolution clearly only for the first settlement. The ongoing modifications vary for each case. Overall, Landsat demonstrated high limitations, but need more investigations and may be used, with restriction, as a support of VHRSI.