43 resultados para Data-Intensive Science

em University of Queensland eSpace - Australia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Effectively using heterogeneous, distributed information has attracted much research in recent years. Current web services technologies have been used successfully in some non data intensive distributed prototype systems. However, most of them can not work well in data intensive environment. This paper provides an infrastructure layer in data intensive environment for the effectively providing spatial information services by using the web services over the Internet. We extensively investigate and analyze the overhead of web services in data intensive environment, and propose some new optimization techniques which can greatly increase the system’s efficiency. Our experiments show that these techniques are suitable to data intensive environment. Finally, we present the requirement of these techniques for the information of web services over the Internet.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Systems biology is based on computational modelling and simulation of large networks of interacting components. Models may be intended to capture processes, mechanisms, components and interactions at different levels of fidelity. Input data are often large and geographically disperse, and may require the computation to be moved to the data, not vice versa. In addition, complex system-level problems require collaboration across institutions and disciplines. Grid computing can offer robust, scaleable solutions for distributed data, compute and expertise. We illustrate some of the range of computational and data requirements in systems biology with three case studies: one requiring large computation but small data (orthologue mapping in comparative genomics), a second involving complex terabyte data (the Visible Cell project) and a third that is both computationally and data-intensive (simulations at multiple temporal and spatial scales). Authentication, authorisation and audit systems are currently not well scalable and may present bottlenecks for distributed collaboration particularly where outcomes may be commercialised. Challenges remain in providing lightweight standards to facilitate the penetration of robust, scalable grid-type computing into diverse user communities to meet the evolving demands of systems biology.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The cost of spatial join processing can be very high because of the large sizes of spatial objects and the computation-intensive spatial operations. While parallel processing seems a natural solution to this problem, it is not clear how spatial data can be partitioned for this purpose. Various spatial data partitioning methods are examined in this paper. A framework combining the data-partitioning techniques used by most parallel join algorithms in relational databases and the filter-and-refine strategy for spatial operation processing is proposed for parallel spatial join processing. Object duplication caused by multi-assignment in spatial data partitioning can result in extra CPU cost as well as extra communication cost. We find that the key to overcome this problem is to preserve spatial locality in task decomposition. We show in this paper that a near-optimal speedup can be achieved for parallel spatial join processing using our new algorithms.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Many variables that are of interest in social science research are nominal variables with two or more categories, such as employment status, occupation, political preference, or self-reported health status. With longitudinal survey data it is possible to analyse the transitions of individuals between different employment states or occupations (for example). In the statistical literature, models for analysing categorical dependent variables with repeated observations belong to the family of models known as generalized linear mixed models (GLMMs). The specific GLMM for a dependent variable with three or more categories is the multinomial logit random effects model. For these models, the marginal distribution of the response does not have a closed form solution and hence numerical integration must be used to obtain maximum likelihood estimates for the model parameters. Techniques for implementing the numerical integration are available but are computationally intensive requiring a large amount of computer processing time that increases with the number of clusters (or individuals) in the data and are not always readily accessible to the practitioner in standard software. For the purposes of analysing categorical response data from a longitudinal social survey, there is clearly a need to evaluate the existing procedures for estimating multinomial logit random effects model in terms of accuracy, efficiency and computing time. The computational time will have significant implications as to the preferred approach by researchers. In this paper we evaluate statistical software procedures that utilise adaptive Gaussian quadrature and MCMC methods, with specific application to modeling employment status of women using a GLMM, over three waves of the HILDA survey.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper discusses a multi-layer feedforward (MLF) neural network incident detection model that was developed and evaluated using field data. In contrast to published neural network incident detection models which relied on simulated or limited field data for model development and testing, the model described in this paper was trained and tested on a real-world data set of 100 incidents. The model uses speed, flow and occupancy data measured at dual stations, averaged across all lanes and only from time interval t. The off-line performance of the model is reported under both incident and non-incident conditions. The incident detection performance of the model is reported based on a validation-test data set of 40 incidents that were independent of the 60 incidents used for training. The false alarm rates of the model are evaluated based on non-incident data that were collected from a freeway section which was video-taped for a period of 33 days. A comparative evaluation between the neural network model and the incident detection model in operation on Melbourne's freeways is also presented. The results of the comparative performance evaluation clearly demonstrate the substantial improvement in incident detection performance obtained by the neural network model. The paper also presents additional results that demonstrate how improvements in model performance can be achieved using variable decision thresholds. Finally, the model's fault-tolerance under conditions of corrupt or missing data is investigated and the impact of loop detector failure/malfunction on the performance of the trained model is evaluated and discussed. The results presented in this paper provide a comprehensive evaluation of the developed model and confirm that neural network models can provide fast and reliable incident detection on freeways. (C) 1997 Elsevier Science Ltd. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Two major factors are likely to impact the utilisation of remotely sensed data in the near future: (1)an increase in the number and availability of commercial and non-commercial image data sets with a range of spatial, spectral and temporal dimensions, and (2) increased access to image display and analysis software through GIS. A framework was developed to provide an objective approach to selecting remotely sensed data sets for specific environmental monitoring problems. Preliminary applications of the framework have provided successful approaches for monitoring disturbed and restored wetlands in southern California.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A major ongoing debate in population ecology has surrounded the causative factors underlying the abundance of phytophagous insects and whether or not these factors limit or regulate herbivore populations. However, it is often difficult to identify mortality agents in census data, and their distribution and relative importance across large spatial scales are rarely understood. Were, we present life tables for egg batches and larval cohorts of the processionary caterpillar Ochrogaster lunifer Herrich-Schaffer, using intensive local sampling combined with extensive regional monitoring to ascertain the relative importance of different mortality factors at different localities. Extinction of entire cohorts (representing the entire reproductive output of one female) at natural localities was high, with 82% of the initial 492 cohorts going extinct. Mortality was highest in the egg and early instar stages due to predation from dermestid beetles, and while different mortality factors (e.g. hatching failure, egg parasitism and failure to establish on the host) were present at many localities, dermestid predation, either directly observed or inferred from indirect evidence, was the dominant mortality factor at 89% of localities surveyed. Predation was significantly higher in plantations than in natural habitats. The second most important mortality factor was resource depletion, with 14 cohorts defoliating their hosts. Egg and larval parasitism were not major mortality agents. A combination of predation and resource depletion consistently accounted for the majority of mortality across localities, suggesting that both factors are important in limiting population abundance. This evidence shows that O. lunifer is not regulated by natural enemies alone, but that resource patches (Acacia trees) ultimately, and frequently, act together to limit population growth.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The new technologies for Knowledge Discovery from Databases (KDD) and data mining promise to bring new insights into a voluminous growing amount of biological data. KDD technology is complementary to laboratory experimentation and helps speed up biological research. This article contains an introduction to KDD, a review of data mining tools, and their biological applications. We discuss the domain concepts related to biological data and databases, as well as current KDD and data mining developments in biology.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Over recent years databases have become an extremely important resource for biomedical research. Immunology research is increasingly dependent on access to extensive biological databases to extract existing information, plan experiments, and analyse experimental results. This review describes 15 immunological databases that have appeared over the last 30 years. In addition, important issues regarding database design and the potential for misuse of information contained within these databases are discussed. Access pointers are provided for the major immunological databases and also for a number of other immunological resources accessible over the World Wide Web (WWW). (C) 2000 Elsevier Science B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Dual-energy X-ray absorptiometry (DXA) is a widely used method for measuring bone mineral in the growing skeleton. Because scan analysis in children offers a number of challenges, we compared DXA results using six analysis methods at the total proximal femur (PF) and five methods at the femoral neck (FN), In total we assessed 50 scans (25 boys, 25 girls) from two separate studies for cross-sectional differences in bone area, bone mineral content (BMC), and areal bone mineral density (aBMD) and for percentage change over the short term (8 months) and long term (7 years). At the proximal femur for the short-term longitudinal analysis, there was an approximate 3.5% greater change in bone area and BMC when the global region of interest (ROI) was allowed to increase in size between years as compared with when the global ROI was held constant. Trend analysis showed a significant (p < 0.05) difference between scan analysis methods for bone area and BMC across 7 years. At the femoral neck, cross-sectional analysis using a narrower (from default) ROI, without change in location, resulted in a 12.9 and 12.6% smaller bone area and BMC, respectively (both p < 0.001), Changes in FN area and BMC over 8 months were significantly greater (2.3 %, p < 0.05) using a narrower FN rather than the default ROI, Similarly, the 7-year longitudinal data revealed that differences between scan analysis methods were greatest when the narrower FN ROI was maintained across all years (p < 0.001), For aBMD there were no significant differences in group means between analysis methods at either the PF or FN, Our findings show the need to standardize the analysis of proximal femur DXA scans in growing children.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper develops an interactive approach for exploratory spatial data analysis. Measures of attribute similarity and spatial proximity are combined in a clustering model to support the identification of patterns in spatial information. Relationships between the developed clustering approach, spatial data mining and choropleth display are discussed. Analysis of property crime rates in Brisbane, Australia is presented. A surprising finding in this research is that there are substantial inconsistencies in standard choropleth display options found in two widely used commercial geographical information systems, both in terms of definition and performance. The comparative results demonstrate the usefulness and appeal of the developed approach in a geographical information system environment for exploratory spatial data analysis.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Examples from the Murray-Darling basin in Australia are used to illustrate different methods of disaggregation of reconnaissance-scale maps. One approach for disaggregation revolves around the de-convolution of the soil-landscape paradigm elaborated during a soil survey. The descriptions of soil ma units and block diagrams in a soil survey report detail soil-landscape relationships or soil toposequences that can be used to disaggregate map units into component landscape elements. Toposequences can be visualised on a computer by combining soil maps with digital elevation data. Expert knowledge or statistics can be used to implement the disaggregation. Use of a restructuring element and k-means clustering are illustrated. Another approach to disaggregation uses training areas to develop rules to extrapolate detailed mapping into other, larger areas where detailed mapping is unavailable. A two-level decision tree example is presented. At one level, the decision tree method is used to capture mapping rules from the training area; at another level, it is used to define the domain over which those rules can be extrapolated. (C) 2001 Elsevier Science B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The principle of using induction rules based on spatial environmental data to model a soil map has previously been demonstrated Whilst the general pattern of classes of large spatial extent and those with close association with geology were delineated small classes and the detailed spatial pattern of the map were less well rendered Here we examine several strategies to improve the quality of the soil map models generated by rule induction Terrain attributes that are better suited to landscape description at a resolution of 250 m are introduced as predictors of soil type A map sampling strategy is developed Classification error is reduced by using boosting rather than cross validation to improve the model Further the benefit of incorporating the local spatial context for each environmental variable into the rule induction is examined The best model was achieved by sampling in proportion to the spatial extent of the mapped classes boosting the decision trees and using spatial contextual information extracted from the environmental variables.