958 resultados para Open Data-bank


Relevância:

40.00% 40.00%

Publicador:

Resumo:

The open provenance architecture (OPA) approach to the challenge was distinct in several regards. In particular, it is based on an open, well-defined data model and architecture, allowing different components of the challenge workflow to independently record documentation, and for the workflow to be executed in any environment. Another noticeable feature is that we distinguish between the data recorded about what has occurred, emphprocess documentation, and the emphprovenance of a data item, which is all that caused the data item to be as it is and is obtained as the result of a query over process documentation. This distinction allows us to tailor the system to separately best address the requirements of recording and querying documentation. Other notable features include the explicit recording of causal relationships between both events and data items, an interaction-based world model, intensional definition of data items in queries rather than relying on explicit naming mechanisms, and emphstyling of documentation to support non-functional application requirements such as reducing storage costs or ensuring privacy of data. In this paper we describe how each of these features aid us in answering the challenge provenance queries.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

I start presenting an explicit solution to Taylorís (2001) model, in order to illustrate the link between the target interest rate and the overnight interest rate prevailing in the economy. Next, I use Vector Auto Regressions to shed some light on the evolution of key macroeconomic variables after the Central Bank of Brazil increases the target interest rate by 1%. Point estimates show a four-year accumulated output loss ranging from 0:04% (whole sample, 1980 : 1-2004 : 2; quarterly data) to 0:25% (Post-Real data only) with a Örst-year peak output response between 0:04% and 1:0%; respectively. Prices decline between 2% and 4% in a 4-year horizon. The accumulated output response is found to be between 3:5 and 6 times higher after the Real Plan than when the whole sample is considered. The 95% confidence bands obtained using bias-corrected bootstrap always include the null output response when the whole sample is used, but not when the data is restricted to the Post-Real period. Innovations to interest rates explain between 4:9% (whole sample) and 9:2% (post-Real sample) of the forecast error of GDP.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Studies on efficiency achieved greater relevance in organisations within an open market framework, which in Brazil began around 1990. The objective of this paper, applying the data envelopment analysis methodology, is to analyse the efficiency of banks operating in the country using the database termed 'the biggest banks', periodically divulged by the Central Bank of Brazil in 2010-2012. The methodology was applied to the 26 largest banking organisations via two approaches, one was financial intermediation and the other was results. In the financial intermediation approach, the efficiency increase was the highest among banks specialised in credit from 2010 to 2012. Retail banks, especially the large ones, felt most intensely the reaction of 2011, a year considered as the sector's low performance year. In the results approach, the efficiency increase was higher among retail banks. Factors such as retractions in the SELIC rate and bank spreads impacted all banks, regardless of the segment.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Statistical methods have been widely employed to assess the capabilities of credit scoring classification models in order to reduce the risk of wrong decisions when granting credit facilities to clients. The predictive quality of a classification model can be evaluated based on measures such as sensitivity, specificity, predictive values, accuracy, correlation coefficients and information theoretical measures, such as relative entropy and mutual information. In this paper we analyze the performance of a naive logistic regression model (Hosmer & Lemeshow, 1989) and a logistic regression with state-dependent sample selection model (Cramer, 2004) applied to simulated data. Also, as a case study, the methodology is illustrated on a data set extracted from a Brazilian bank portfolio. Our simulation results so far revealed that there is no statistically significant difference in terms of predictive capacity between the naive logistic regression models and the logistic regression with state-dependent sample selection models. However, there is strong difference between the distributions of the estimated default probabilities from these two statistical modeling techniques, with the naive logistic regression models always underestimating such probabilities, particularly in the presence of balanced samples. (C) 2012 Elsevier Ltd. All rights reserved.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In the past decade, the advent of efficient genome sequencing tools and high-throughput experimental biotechnology has lead to enormous progress in the life science. Among the most important innovations is the microarray tecnology. It allows to quantify the expression for thousands of genes simultaneously by measurin the hybridization from a tissue of interest to probes on a small glass or plastic slide. The characteristics of these data include a fair amount of random noise, a predictor dimension in the thousand, and a sample noise in the dozens. One of the most exciting areas to which microarray technology has been applied is the challenge of deciphering complex disease such as cancer. In these studies, samples are taken from two or more groups of individuals with heterogeneous phenotypes, pathologies, or clinical outcomes. these samples are hybridized to microarrays in an effort to find a small number of genes which are strongly correlated with the group of individuals. Eventhough today methods to analyse the data are welle developed and close to reach a standard organization (through the effort of preposed International project like Microarray Gene Expression Data -MGED- Society [1]) it is not unfrequant to stumble in a clinician's question that do not have a compelling statistical method that could permit to answer it.The contribution of this dissertation in deciphering disease regards the development of new approaches aiming at handle open problems posed by clinicians in handle specific experimental designs. In Chapter 1 starting from a biological necessary introduction, we revise the microarray tecnologies and all the important steps that involve an experiment from the production of the array, to the quality controls ending with preprocessing steps that will be used into the data analysis in the rest of the dissertation. While in Chapter 2 a critical review of standard analysis methods are provided stressing most of problems that In Chapter 3 is introduced a method to adress the issue of unbalanced design of miacroarray experiments. In microarray experiments, experimental design is a crucial starting-point for obtaining reasonable results. In a two-class problem, an equal or similar number of samples it should be collected between the two classes. However in some cases, e.g. rare pathologies, the approach to be taken is less evident. We propose to address this issue by applying a modified version of SAM [2]. MultiSAM consists in a reiterated application of a SAM analysis, comparing the less populated class (LPC) with 1,000 random samplings of the same size from the more populated class (MPC) A list of the differentially expressed genes is generated for each SAM application. After 1,000 reiterations, each single probe given a "score" ranging from 0 to 1,000 based on its recurrence in the 1,000 lists as differentially expressed. The performance of MultiSAM was compared to the performance of SAM and LIMMA [3] over two simulated data sets via beta and exponential distribution. The results of all three algorithms over low- noise data sets seems acceptable However, on a real unbalanced two-channel data set reagardin Chronic Lymphocitic Leukemia, LIMMA finds no significant probe, SAM finds 23 significantly changed probes but cannot separate the two classes, while MultiSAM finds 122 probes with score >300 and separates the data into two clusters by hierarchical clustering. We also report extra-assay validation in terms of differentially expressed genes Although standard algorithms perform well over low-noise simulated data sets, multi-SAM seems to be the only one able to reveal subtle differences in gene expression profiles on real unbalanced data. In Chapter 4 a method to adress similarities evaluation in a three-class prblem by means of Relevance Vector Machine [4] is described. In fact, looking at microarray data in a prognostic and diagnostic clinical framework, not only differences could have a crucial role. In some cases similarities can give useful and, sometimes even more, important information. The goal, given three classes, could be to establish, with a certain level of confidence, if the third one is similar to the first or the second one. In this work we show that Relevance Vector Machine (RVM) [2] could be a possible solutions to the limitation of standard supervised classification. In fact, RVM offers many advantages compared, for example, with his well-known precursor (Support Vector Machine - SVM [3]). Among these advantages, the estimate of posterior probability of class membership represents a key feature to address the similarity issue. This is a highly important, but often overlooked, option of any practical pattern recognition system. We focused on Tumor-Grade-three-class problem, so we have 67 samples of grade I (G1), 54 samples of grade 3 (G3) and 100 samples of grade 2 (G2). The goal is to find a model able to separate G1 from G3, then evaluate the third class G2 as test-set to obtain the probability for samples of G2 to be member of class G1 or class G3. The analysis showed that breast cancer samples of grade II have a molecular profile more similar to breast cancer samples of grade I. Looking at the literature this result have been guessed, but no measure of significance was gived before.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

BACKGROUND The population-based effectiveness of thoracic endovascular aortic repair (TEVAR) versus open surgery for descending thoracic aortic aneurysm remains in doubt. METHODS Patients aged over 50 years, without a history of aortic dissection, undergoing repair of a thoracic aortic aneurysm between 2006 and 2011 were assessed using mortality-linked individual patient data from Hospital Episode Statistics (England). The principal outcomes were 30-day operative mortality, long-term survival (5 years) and aortic-related reinterventions. TEVAR and open repair were compared using crude and multivariable models that adjusted for age and sex. RESULTS Overall, 759 patients underwent thoracic aortic aneurysm repair, mainly for intact aneurysms (618, 81·4 per cent). Median ages of TEVAR and open cohorts were 73 and 71 years respectively (P < 0·001), with more men undergoing TEVAR (P = 0·004). For intact aneurysms, the operative mortality rate was similar for TEVAR and open repair (6·5 versus 7·6 per cent; odds ratio 0·79, 95 per cent confidence interval (c.i.) 0·41 to 1·49), but the 5-year survival rate was significantly worse after TEVAR (54·2 versus 65·6 per cent; adjusted hazard ratio 1·45, 95 per cent c.i. 1·08 to 1·94). After 5 years, aortic-related mortality was similar in the two groups, but cardiopulmonary mortality was higher after TEVAR. TEVAR was associated with more aortic-related reinterventions (23·1 versus 14·3 per cent; adjusted HR 1·70, 95 per cent c.i. 1·11 to 2·60). There were 141 procedures for ruptured thoracic aneurysm (97 TEVAR, 44 open), with TEVAR showing no significant advantage in terms of operative mortality. CONCLUSION In England, operative mortality for degenerative descending thoracic aneurysm was similar after either TEVAR or open repair. Patients who had TEVAR appeared to have a higher reintervention rate and worse long-term survival, possibly owing to cardiopulmonary morbidity and other selection bias.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Despite the extensive work on currency mismatches, research on the determinants and effects of maturity mismatches is scarce. In this paper I show that emerging market maturity mismatches are negatively affected by capital inflows and price volatilities. Furthermore, I find that banks with low maturity mismatches are more profitable during crisis periods but less profitable otherwise. The later result implies that banks face a tradeoff between higher returns and risk, hence channeling short term capital into long term loans is caused by cronyism and implicit guarantees rather than the depth of the financial market. The positive relationship between maturity mismatches and price volatility, on the other hand, shows that the banks of countries with high exchange rate and interest rate volatilities can not, or choose not to hedge themselves. These results follow from a panel regression on a data set I constructed by merging bank level data with aggregate data. This is advantageous over traditional studies which focus only on aggregate data.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Data of twenty buoy stations were used to compile a new chart of permanent currents in the surface layer (10 m depth) for the region of the Yucatan shelf (Campeche Bank). It was found that vertical variations in direction of the currents are insignificant within the shallow plateau of the banks.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Providing descriptions of isolated sensors and sensor networks in natural language, understandable by the general public, is useful to help users find relevant sensors and analyze sensor data. In this paper, we discuss the feasibility of using geographic knowledge from public databases available on the Web (such as OpenStreetMap, Geonames, or DBpedia) to automatically construct such descriptions. We present a general method that uses such information to generate sensor descriptions in natural language. The results of the evaluation of our method in a hydrologic national sensor network showed that this approach is feasible and capable of generating adequate sensor descriptions with a lower development effort compared to other approaches. In the paper we also analyze certain problems that we found in public databases (e.g., heterogeneity, non-standard use of labels, or rigid search methods) and their impact in the generation of sensor descriptions.