930 resultados para data availability
Predicting sense of community and participation by applying machine learning to open government data
Resumo:
Community capacity is used to monitor socio-economic development. It is composed of a number of dimensions, which can be measured to understand the possible issues in the implementation of a policy or the outcome of a project targeting a community. Measuring community capacity dimensions is usually expensive and time consuming, requiring locally organised surveys. Therefore, we investigate a technique to estimate them by applying the Random Forests algorithm on secondary open government data. This research focuses on the prediction of measures for two dimensions: sense of community and participation. The most important variables for this prediction were determined. The variables included in the datasets used to train the predictive models complied with two criteria: nationwide availability; sufficiently fine-grained geographic breakdown, i.e. neighbourhood level. The models explained 77% of the sense of community measures and 63% of participation. Due to the low geographic detail of the outcome measures available, further research is required to apply the predictive models to a neighbourhood level. The variables that were found to be more determinant for prediction were only partially in agreement with the factors that, according to the social science literature consulted, are the most influential for sense of community and participation. This finding should be further investigated from a social science perspective, in order to be understood in depth.
Resumo:
Satellite-based rainfall monitoring is widely used for climatological studies because of its full global coverage but it is also of great importance for operational purposes especially in areas such as Africa where there is a lack of ground-based rainfall data. Satellite rainfall estimates have enormous potential benefits as input to hydrological and agricultural models because of their real time availability, low cost and full spatial coverage. One issue that needs to be addressed is the uncertainty on these estimates. This is particularly important in assessing the likely errors on the output from non-linear models (rainfall-runoff or crop yield) which make use of the rainfall estimates, aggregated over an area, as input. Correct assessment of the uncertainty on the rainfall is non-trivial as it must take account of • the difference in spatial support of the satellite information and independent data used for calibration • uncertainties on the independent calibration data • the non-Gaussian distribution of rainfall amount • the spatial intermittency of rainfall • the spatial correlation of the rainfall field This paper describes a method for estimating the uncertainty on satellite-based rainfall values taking account of these factors. The method involves firstly a stochastic calibration which completely describes the probability of rainfall occurrence and the pdf of rainfall amount for a given satellite value, and secondly the generation of ensemble of rainfall fields based on the stochastic calibration but with the correct spatial correlation structure within each ensemble member. This is achieved by the use of geostatistical sequential simulation. The ensemble generated in this way may be used to estimate uncertainty at larger spatial scales. A case study of daily rainfall monitoring in the Gambia, west Africa for the purpose of crop yield forecasting is presented to illustrate the method.
Resumo:
The Olsen method is an indicator of plant-available phosphorus (P). The effect of time and temperature on residual phosphate in soils was measured using the Olsen method in a pot experiment. Four soils were investigated: two from Pakistan and one each from England (calcareous) and Colombia (acidic). Two levels of residual phosphate were developed in each soil after addition of phosphate by incubation at either 10degreesC or 45degreesC. The amount of phosphate added was based on the P maximum of each soil, calculated using the Langmuir equation. Rvegrass was used as the test crop. The pooled data for the four soils incubated at 10degreesC showed good correlation between Olsen P and dry matter yield or P uptake (r(2) = 0.85 and 0.77, respectively), whereas at 45 degreesC, each soil had its own relationship and pooled data did not show correlation of Olsen P with dry matter yield or P uptake. When the data at both temperatures were pooled, Olsen P was a good indicator of yield and uptake for the English soil. For the Pakistani soils, Olsen P after 45 degreesC treatment was an underestimate relative to the 10 degreesC data and for the Colombian soil it was an overestimate. The reasons for these differences need to be explored further before high temperature incubation can be used to simulate long-term changes in the field.
Resumo:
The article considers screening human populations with two screening tests. If any of the two tests is positive, then full evaluation of the disease status is undertaken; however, if both diagnostic tests are negative, then disease status remains unknown. This procedure leads to a data constellation in which, for each disease status, the 2 × 2 table associated with the two diagnostic tests used in screening has exactly one empty, unknown cell. To estimate the unobserved cell counts, previous approaches assume independence of the two diagnostic tests and use specific models, including the special mixture model of Walter or unconstrained capture–recapture estimates. Often, as is also demonstrated in this article by means of a simple test, the independence of the two screening tests is not supported by the data. Two new estimators are suggested that allow associations of the screening test, although the form of association must be assumed to be homogeneous over disease status. These estimators are modifications of the simple capture–recapture estimator and easy to construct. The estimators are investigated for several screening studies with fully evaluated disease status in which the superior behavior of the new estimators compared to the previous conventional ones can be shown. Finally, the performance of the new estimators is compared with maximum likelihood estimators, which are more difficult to obtain in these models. The results indicate the loss of efficiency as minor.
Resumo:
The article considers screening human populations with two screening tests. If any of the two tests is positive, then full evaluation of the disease status is undertaken; however, if both diagnostic tests are negative, then disease status remains unknown. This procedure leads to a data constellation in which, for each disease status, the 2 x 2 table associated with the two diagnostic tests used in screening has exactly one empty, unknown cell. To estimate the unobserved cell counts, previous approaches assume independence of the two diagnostic tests and use specific models, including the special mixture model of Walter or unconstrained capture-recapture estimates. Often, as is also demonstrated in this article by means of a simple test, the independence of the two screening tests is not supported by the data. Two new estimators are suggested that allow associations of the screening test, although the form of association must be assumed to be homogeneous over disease status. These estimators are modifications of the simple capture-recapture estimator and easy to construct. The estimators are investigated for several screening studies with fully evaluated disease status in which the superior behavior of the new estimators compared to the previous conventional ones can be shown. Finally, the performance of the new estimators is compared with maximum likelihood estimators, which are more difficult to obtain in these models. The results indicate the loss of efficiency as minor.
Resumo:
Experimental data for the title reaction were modeled using master equation (ME)/RRKM methods based on the Multiwell suite of programs. The starting point for the exercise was the empirical fitting provided by the NASA (Sander, S. P.; Finlayson-Pitts, B. J.; Friedl, R. R.; Golden, D. M.; Huie, R. E.; Kolb, C. E.; Kurylo, M. J.; Molina, M. J.; Moortgat, G. K.; Orkin, V. L.; Ravishankara, A. R. Chemical Kinetics and Photochemical Data for Use in Atmospheric Studies, Evaluation Number 15; Jet Propulsion Laboratory: Pasadena, California, 2006)(1) and IUPAC (Atkinson, R.; Baulch, D. L.; Cox, R. A.: R. F. Hampson, J.; Kerr, J. A.; Rossi, M. J.; Troe, J. J. Phys. Chem. Ref. Data. 2000, 29, 167) 2 data evaluation panels, which represents the data in the experimental pressure ranges rather well. Despite the availability of quite reliable parameters for these calculations (molecular vibrational frequencies (Parthiban, S.; Lee, T. J. J. Chem. Phys. 2000, 113, 145)3 and a. value (Orlando, J. J.; Tyndall, G. S. J. Phys. Chem. 1996, 100,. 19398)4 of the bond dissociation energy, D-298(BrO-NO2) = 118 kJ mol(-1), corresponding to Delta H-0(circle) = 114.3 kJ mol(-1) at 0 K) and the use of RRKM/ME methods, fitting calculations to the reported data or the empirical equations was anything but straightforward. Using these molecular parameters resulted in a discrepancy between the calculations and the database of rate constants of a factor of ca. 4 at, or close to, the low-pressure limit. Agreement between calculation and experiment could be achieved in two ways, either by increasing Delta H-0(circle) to an unrealistically high value (149.3 kJ mol(-1)) or by increasing
Resumo:
Background: In many experimental pipelines, clustering of multidimensional biological datasets is used to detect hidden structures in unlabelled input data. Taverna is a popular workflow management system that is used to design and execute scientific workflows and aid in silico experimentation. The availability of fast unsupervised methods for clustering and visualization in the Taverna platform is important to support a data-driven scientific discovery in complex and explorative bioinformatics applications. Results: This work presents a Taverna plugin, the Biological Data Interactive Clustering Explorer (BioDICE), that performs clustering of high-dimensional biological data and provides a nonlinear, topology preserving projection for the visualization of the input data and their similarities. The core algorithm in the BioDICE plugin is Fast Learning Self Organizing Map (FLSOM), which is an improved variant of the Self Organizing Map (SOM) algorithm. The plugin generates an interactive 2D map that allows the visual exploration of multidimensional data and the identification of groups of similar objects. The effectiveness of the plugin is demonstrated on a case study related to chemical compounds. Conclusions: The number and variety of available tools and its extensibility have made Taverna a popular choice for the development of scientific data workflows. This work presents a novel plugin, BioDICE, which adds a data-driven knowledge discovery component to Taverna. BioDICE provides an effective and powerful clustering tool, which can be adopted for the explorative analysis of biological datasets.
Resumo:
A data insertion method, where a dispersion model is initialized from ash properties derived from a series of satellite observations, is used to model the 8 May 2010 Eyjafjallajökull volcanic ash cloud which extended from Iceland to northern Spain. We also briefly discuss the application of this method to the April 2010 phase of the Eyjafjallajökull eruption and the May 2011 Grímsvötn eruption. An advantage of this method is that very little knowledge about the eruption itself is required because some of the usual eruption source parameters are not used. The method may therefore be useful for remote volcanoes where good satellite observations of the erupted material are available, but little is known about the properties of the actual eruption. It does, however, have a number of limitations related to the quality and availability of the observations. We demonstrate that, using certain configurations, the data insertion method is able to capture the structure of a thin filament of ash extending over northern Spain that is not fully captured by other modeling methods. It also verifies well against the satellite observations according to the quantitative object-based quality metric, SAL—structure, amplitude, location, and the spatial coverage metric, Figure of Merit in Space.
Resumo:
The increased availability of digital elevation models and satellite image data enable testing of morphometric relationships between sand dune variables (dune height, spacing and equivalent sand thickness), which were originally established using limited field survey data. These long-established geomorphological hypotheses can now be tested against very much larger samples than were possible when available data were limited to what could be collected by field surveys alone. This project uses ASTER Global Digital Elevation Model (GDEM) data to compare morphometric relationships between sand dune variables in the southwest Kalahari dunefield to those of the Namib Sand Sea, to test whether the relationships found in an active sand sea (Namib) also hold for the fixed dune system of the nearby southwest Kalahari. The data show significant morphometric differences between the simple linear dunes of the Namib sand sea and the southwest Kalahari; the latter do not show the expected positive relationship between dune height and spacing. The southwest Kalahari dunes show a similar range of dune spacings, but they are less tall, on average, than the Namib sand sea dunes. There is a clear spatial pattern to these morphometric data; the tallest and most closely spaced dunes are towards the southeast of the Kalahari dunefield; and this is where the highest values of equivalent sand thickness result. We consider the possible reasons for the observed differences and highlight the need for more studies comparing sand seas and dunefields from different environmental settings.
Resumo:
The main objective for this degree project is to implement an Application Availability Monitoring (AAM) system named Softek EnView for Fujitsu Services. The aim of implementing the AAM system is to proactively identify end user performance problems, such as application and site performance, before the actual end users experience them. No matter how well applications and sites are designed and nomatter how well they meet business requirements, they are useless to the end users if the performance is slow and/or unreliable. It is important for the customers to find out whether the end user problems are caused by the network or application malfunction. The Softek EnView was comprised of the following EnView components: Robot, Monitor, Reporter, Collector and Repository. The implemented system, however, is designed to use only some of these EnView elements: Robot, Reporter and depository. Robots can be placed at any key user location and are dedicated to customers, which means that when the number of customers increases, at the sametime the amount of Robots will increase. To make the AAM system ideal for the company to use, it was integrated with Fujitsu Services’ centralised monitoring system, BMC PATROL Enterprise Manager (PEM). That was actually the reason for deciding to drop the EnView Monitor element. After the system was fully implemented, the AAM system was ready for production. Transactions were (and are) written and deployed on Robots to simulate typical end user actions. These transactions are configured to run with certain intervals, which are defined collectively with customers. While they are driven against customers’ applicationsautomatically, transactions collect availability data and response time data all the time. In case of a failure in transactions, the robot immediately quits the transactionand writes detailed information to a log file about what went wrong and which element failed while going through an application. Then an alert is generated by a BMC PATROL Agent based on this data and is sent to the BMC PEM. Fujitsu Services’ monitoring room receives the alert, reacts to it according to the incident management process in ITIL and by alerting system specialists on critical incidents to resolve problems. As a result of the data gathered by the Robots, weekly reports, which contain detailed statistics and trend analyses of ongoing quality of IT services, is provided for the Customers.
Resumo:
Data mining is a relatively new field of research that its objective is to acquire knowledge from large amounts of data. In medical and health care areas, due to regulations and due to the availability of computers, a large amount of data is becoming available [27]. On the one hand, practitioners are expected to use all this data in their work but, at the same time, such a large amount of data cannot be processed by humans in a short time to make diagnosis, prognosis and treatment schedules. A major objective of this thesis is to evaluate data mining tools in medical and health care applications to develop a tool that can help make rather accurate decisions. In this thesis, the goal is finding a pattern among patients who got pneumonia by clustering of lab data values which have been recorded every day. By this pattern we can generalize it to the patients who did not have been diagnosed by this disease whose lab values shows the same trend as pneumonia patients does. There are 10 tables which have been extracted from a big data base of a hospital in Jena for my work .In ICU (intensive care unit), COPRA system which is a patient management system has been used. All the tables and data stored in German Language database.
Resumo:
This paper gives a first step toward a methodology to quantify the influences of regulation on short-run earnings dynamics. It also provides evidence on the patterns of wage adjustment adopted during the recent high inflationary experience in Brazil.The large variety of official wage indexation rules adopted in Brazil during the recent years combined with the availability of monthly surveys on labor markets makes the Brazilian case a good laboratory to test how regulation affects earnings dynamics. In particular, the combination of large sample sizes with the possibility of following the same worker through short periods of time allows to estimate the cross-sectional distribution of longitudinal statistics based on observed earnings (e.g., monthly and annual rates of change).The empirical strategy adopted here is to compare the distributions of longitudinal statistics extracted from actual earnings data with simulations generated from minimum adjustment requirements imposed by the Brazilian Wage Law. The analysis provides statistics on how binding were wage regulation schemes. The visual analysis of the distribution of wage adjustments proves useful to highlight stylized facts that may guide future empirical work.
Resumo:
The paper provides evidence on what affects at the margin the cost and availability of bank credit for firms in Argentina. We study in particular how banks use different pieces of private and public information to screen firms and overcome informational asymmetries in the credit market. Some private information is transferable, like balance sheet data. Private information generated in relationships is not. To capture the closeness of bank relationships, we resort to the concentration of bank credit and the number of credit lines in a bank. We also consider public information available in the Central de Deudores. The cost of credit is measured using overdrafts, the most expensive line of credit, at the bank that charges the highest rate for overdrafts. We find that the cost of credit is smaller for a firm with a close relationship to the marginal bank. Firms with large assets, a high sales/assets ratio, and a low debt/assets ratio pay a lower interest rate at the margin. A good credit history (no debt arrears and no bounced checks) and collateral also reduce the marginal interest rate. The availability of credit is measured by unused credit lines as a proportion of total liabilities with the main bank. The availability of credit depends positively on a close relationship with the main bank. Large assets, a high return over assets, a high sales/assets ratio, a low debt/assets ratio, a good credit history, and collateral lead to higher credit availability. Our measure of unused credit lines is less ambiguous than traditional measures like leverage, which may indicate financial distress rather than availability of credit.
Resumo:
This work presents one software developed to process solar radiation data. This software can be used in meteorological and climatic stations, and also as a support for solar radiation measurements in researches of solar energy availability allowing data quality control, statistical calculations and validation of models, as well as ease interchanging of data. (C) 1999 Elsevier B.V. Ltd. All rights reserved.
Resumo:
Despite the abundant availability,of protocols and application for peer-to-peer file sharing, several drawbacks are still present in the field. Among most notable drawbacks is the lack of a simple and interoperable way to share information among independent peer-to-peer networks. Another drawback is the requirement that the shared content can be accessed only by a limited number of compatible applications, making impossible their access to others applications and system. In this work we present a new approach for peer-to-peer data indexing, focused on organization and retrieval of metadata which describes the shared content. This approach results in a common and interoperable infrastructure, which provides a transparent access to data shared on multiple data sharing networks via a simple API. The proposed approach is evaluated using a case study, implemented as a cross-platform extension to Mozilla Fir fox browser; and demonstrates the advantages of such interoperability over conventional distributed data access strategies.