7 resultados para on-disk data layout
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo
Resumo:
Statistical methods have been widely employed to assess the capabilities of credit scoring classification models in order to reduce the risk of wrong decisions when granting credit facilities to clients. The predictive quality of a classification model can be evaluated based on measures such as sensitivity, specificity, predictive values, accuracy, correlation coefficients and information theoretical measures, such as relative entropy and mutual information. In this paper we analyze the performance of a naive logistic regression model (Hosmer & Lemeshow, 1989) and a logistic regression with state-dependent sample selection model (Cramer, 2004) applied to simulated data. Also, as a case study, the methodology is illustrated on a data set extracted from a Brazilian bank portfolio. Our simulation results so far revealed that there is no statistically significant difference in terms of predictive capacity between the naive logistic regression models and the logistic regression with state-dependent sample selection models. However, there is strong difference between the distributions of the estimated default probabilities from these two statistical modeling techniques, with the naive logistic regression models always underestimating such probabilities, particularly in the presence of balanced samples. (C) 2012 Elsevier Ltd. All rights reserved.
Resumo:
Smoking cue-provoked craving is an intricate behavior associated with strong changes in neural networks. Craving is one of the main reasons subjects continue to smoke; therefore interventions that can modify activity in neural networks associated with craving can be useful tools in future research investigating novel treatments for smoking cessation. The goal of this study was to use a neuromodulatory technique associated with a powerful effect on spontaneous neuronal firing - transcranial direct current stimulation (tDCS) - to modify cue-provoked smoking craving. Based on preliminary data showing that craving can be modified after a single tDCS session, here we investigated the effects of repeated tDCS sessions on craving behavior. Twenty-seven subjects were randomized to receive sham or active tDCS (anodal tDCS of the left DLPFC). Our results show a significant cumulative effect of tDCS on modifying smoking cue-provoked craving. In fact, in the group of active stimulation, smoking cues had an opposite effect on craving after stimulation - it decreased craving - as compared to sham stimulation in which there was a small decrease or increase on craving. In addition, during these 5 days of stimulation there was a small but significant decrease in the number of cigarettes smoked in the active as compared to sham tDCS group. Our findings extend the results of our previous study as they confirm the notion that tDCS has a specific effect on craving behavior and that the effects of several sessions can increase the magnitude of its effect. These results open avenues for the exploration of this method as a therapeutic alternative for smoking cessation and also as a mean to change stimulus-induced behavior. (C) 2009 Elsevier Ireland Ltd. All rights reserved.
Resumo:
We estimate the impact of regulatory heterogeneity on agri-food trade using a gravity analysis that relies on detailed data on non-tariff measures (NTMs) collected by the NTM-Impact project. The data cover a broad range of import requirements for agricultural and food products for the EU and nine of its major trade partners. We find that trade is significantly reduced when importing countries have stricter maximum residue limits (MRLs) for plant products than exporting countries. For most other measures, due to their qualitative nature, we were unable to infer whether the importer has stricter standards relative to the exporter, and we do not find a robust relationship between these measures and trade. Our findings suggest that, at least for some import standards, harmonising regulations will increase trade. We also conclude that tariff reductions remain an effective means to increase trade even when NTMs abound.
Resumo:
The aim of this research was to evaluate economic costs of respiratory and circulatory diseases in the municipality of Cubatao, in the state of Sao Paulo, Brazil. Data on hospital admissions and on missed working days due to hospitalization (for age group 14 to 70 years old) from the database of Sistema Unico de Sa de (SUS - Brazilian National Health System) were used. Results: Based on these data, it was calculated that R$ 22.1 million were spent in the period 2000 to 2009 due to diseases of the respiratory and circulatory systems. Part of these expenses can be directly related to the emission of atmospheric pollutants in the city. In order to estimate the costs related to air pollution, data on Cubatao were compared to data from two other municipalities that are also located at the coast side (Guaruja and Peru be), but which have little industrial activity in comparison to Cubatao. It was verified that, in both, average per capita costs were lower when compared to Cubatao, but that this difference has been decreasing in recent years.
Resumo:
Empirical approaches and, more recently, physical approaches, have grounded the establishment of logical connections between radiometric variables derived from remote data and biophysical variables derived from vegetation cover. This study was aimed at evaluating correlations of dendrometric and density data from canopies of Eucalyptus spp., as collected in Capao Bonito forest unit, with radiometric data from imagery acquired by the TM/Landsat-5 sensor on two orbital passages over the study site (dates close to field data collection). Results indicate that stronger correlations were identified between crown dimensions and canopy height with near-infrared spectral band data (rho(s)4), irrespective of the satellite passage date. Estimates of spatial distribution of dendrometric data and canopy density (D) using spectral characterization were consistent with the spatial distribution of tree ages during the study period. Statistical tests were applied to evaluate performance disparities of empirical models depending on which date data were acquired. Results indicated a significant difference between models based on distinct data acquisition dates.
Resumo:
Abstract Background The study and analysis of gene expression measurements is the primary focus of functional genomics. Once expression data is available, biologists are faced with the task of extracting (new) knowledge associated to the underlying biological phenomenon. Most often, in order to perform this task, biologists execute a number of analysis activities on the available gene expression dataset rather than a single analysis activity. The integration of heteregeneous tools and data sources to create an integrated analysis environment represents a challenging and error-prone task. Semantic integration enables the assignment of unambiguous meanings to data shared among different applications in an integrated environment, allowing the exchange of data in a semantically consistent and meaningful way. This work aims at developing an ontology-based methodology for the semantic integration of gene expression analysis tools and data sources. The proposed methodology relies on software connectors to support not only the access to heterogeneous data sources but also the definition of transformation rules on exchanged data. Results We have studied the different challenges involved in the integration of computer systems and the role software connectors play in this task. We have also studied a number of gene expression technologies, analysis tools and related ontologies in order to devise basic integration scenarios and propose a reference ontology for the gene expression domain. Then, we have defined a number of activities and associated guidelines to prescribe how the development of connectors should be carried out. Finally, we have applied the proposed methodology in the construction of three different integration scenarios involving the use of different tools for the analysis of different types of gene expression data. Conclusions The proposed methodology facilitates the development of connectors capable of semantically integrating different gene expression analysis tools and data sources. The methodology can be used in the development of connectors supporting both simple and nontrivial processing requirements, thus assuring accurate data exchange and information interpretation from exchanged data.
Resumo:
Semi-supervised learning is a classification paradigm in which just a few labeled instances are available for the training process. To overcome this small amount of initial label information, the information provided by the unlabeled instances is also considered. In this paper, we propose a nature-inspired semi-supervised learning technique based on attraction forces. Instances are represented as points in a k-dimensional space, and the movement of data points is modeled as a dynamical system. As the system runs, data items with the same label cooperate with each other, and data items with different labels compete among them to attract unlabeled points by applying a specific force function. In this way, all unlabeled data items can be classified when the system reaches its stable state. Stability analysis for the proposed dynamical system is performed and some heuristics are proposed for parameter setting. Simulation results show that the proposed technique achieves good classification results on artificial data sets and is comparable to well-known semi-supervised techniques using benchmark data sets.