340 resultados para datasets


Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents an overview of the strengths and limitations of existing and emerging geophysical tools for landform studies. The objectives are to discuss recent technical developments and to provide a review of relevant recent literature, with a focus on propagating field methods with terrestrial applications. For various methods in this category, including ground-penetrating radar (GPR), electrical resistivity (ER), seismics, and electromagnetic (EM) induction, the technical backgrounds are introduced, followed by section on novel developments relevant to landform characterization. For several decades, GPR has been popular for characterization of the shallow subsurface and in particular sedimentary systems. Novel developments in GPR include the use of multi-offset systems to improve signal-to-noise ratios and data collection efficiency, amongst others, and the increased use of 3D data. Multi-electrode ER systems have become popular in recent years as they allow for relatively fast and detailed mapping. Novel developments include time-lapse monitoring of dynamic processes as well as the use of capacitively-coupled systems for fast, non-invasive surveys. EM induction methods are especially popular for fast mapping of spatial variation, but can also be used to obtain information on the vertical variation in subsurface electrical conductivity. In recent years several examples of the use of plane wave EM for characterization of landforms have been published. Seismic methods for landform characterization include seismic reflection and refraction techniques and the use of surface waves. A recent development is the use of passive sensing approaches. The use of multiple geophysical methods, which can benefit from the sensitivity to different subsurface parameters, is becoming more common. Strategies for coupled and joint inversion of complementary datasets will, once more widely available, benefit the geophysical study of landforms.Three cases studies are presented on the use of electrical and GPR methods for characterization of landforms in the range of meters to 100. s of meters in dimension. In a study of polygonal patterned ground in the Saginaw Lowlands, Michigan, USA, electrical resistivity tomography was used to characterize differences in subsurface texture and water content associated with polygon-swale topography. Also, a sand-filled thermokarst feature was identified using electrical resistivity data. The second example is on the use of constant spread traversing (CST) for characterization of large-scale glaciotectonic deformation in the Ludington Ridge, Michigan. Multiple CST surveys parallel to an ~. 60. m high cliff, where broad (~. 100. m) synclines and narrow clay-rich anticlines are visible, illustrated that at least one of the narrow structures extended inland. A third case study discusses internal structures of an eolian dune on a coastal spit in New Zealand. Both 35 and 200. MHz GPR data, which clearly identified a paleosol and internal sedimentary structures of the dune, were used to improve understanding of the development of the dune, which may shed light on paleo-wind directions.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Functional MRI studies commonly refer to activation patterns as being localized in specific Brodmann areas, referring to Brodmann’s divisions of the human cortex based on cytoarchitectonic boundaries [3]. Typically, Brodmann areas that match regions in the group averaged functional maps are estimated by eye, leading to inaccurate parcellations and significant error. To avoid this limitation, we developed a method using high-dimensional nonlinear registration to project the Brodmann areas onto individual 3D co-registered structural and functional MRI datasets, using an elastic deformation vector field in the cortical parameter space. Based on a sulcal pattern matching approach [11], an N=27 scan single subject atlas (the Colin Holmes atlas [15]) with associated Brodmann areas labeled on its surface, was deformed to match 3D cortical surface models generated from individual subjects’ structural MRIs (sMRIs). The deformed Brodmann areas were used to quantify and localize functional MRI (fMRI) BOLD activation during the performance of the Tower of London task [7].

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This chapter analyses the copyright law framework needed to ensure open access to outputs of the Australian academic and research sector such as journal articles and theses. It overviews the new knowledge landscape, the principles of copyright law, the concept of open access to knowledge, the recently developed open content models of copyright licensing and the challenges faced in providing greater access to knowledge and research outputs.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents a new metric, which we call the lighting variance ratio, for quantifying descriptors in terms of their variance to illumination changes. In many applications it is desirable to have descriptors that are robust to changes in illumination, especially in outdoor environments. The lighting variance ratio is useful for comparing descriptors and determining if a descriptor is lighting invariant enough for a given environment. The metric is analysed across a number of datasets, cameras and descriptors. The results show that the upright SIFT descriptor is typically the most lighting invariant descriptor.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Local spatio-temporal features with a Bag-of-visual words model is a popular approach used in human action recognition. Bag-of-features methods suffer from several challenges such as extracting appropriate appearance and motion features from videos, converting extracted features appropriate for classification and designing a suitable classification framework. In this paper we address the problem of efficiently representing the extracted features for classification to improve the overall performance. We introduce two generative supervised topic models, maximum entropy discrimination LDA (MedLDA) and class- specific simplex LDA (css-LDA), to encode the raw features suitable for discriminative SVM based classification. Unsupervised LDA models disconnect topic discovery from the classification task, hence yield poor results compared to the baseline Bag-of-words framework. On the other hand supervised LDA techniques learn the topic structure by considering the class labels and improve the recognition accuracy significantly. MedLDA maximizes likelihood and within class margins using max-margin techniques and yields a sparse highly discriminative topic structure; while in css-LDA separate class specific topics are learned instead of common set of topics across the entire dataset. In our representation first topics are learned and then each video is represented as a topic proportion vector, i.e. it can be comparable to a histogram of topics. Finally SVM classification is done on the learned topic proportion vector. We demonstrate the efficiency of the above two representation techniques through the experiments carried out in two popular datasets. Experimental results demonstrate significantly improved performance compared to the baseline Bag-of-features framework which uses kmeans to construct histogram of words from the feature vectors.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The commercialization of aerial image processing is highly dependent on the platforms such as UAVs (Unmanned Aerial Vehicles). However, the lack of an automated UAV forced landing site detection system has been identified as one of the main impediments to allow UAV flight over populated areas in civilian airspace. This article proposes a UAV forced landing site detection system that is based on machine learning approaches including the Gaussian Mixture Model and the Support Vector Machine. A range of learning parameters are analysed including the number of Guassian mixtures, support vector kernels including linear, radial basis function Kernel (RBF) and polynormial kernel (poly), and the order of RBF kernel and polynormial kernel. Moreover, a modified footprint operator is employed during feature extraction to better describe the geometric characteristics of the local area surrounding a pixel. The performance of the presented system is compared to a baseline UAV forced landing site detection system which uses edge features and an Artificial Neural Network (ANN) region type classifier. Experiments conducted on aerial image datasets captured over typical urban environments reveal improved landing site detection can be achieved with an SVM classifier with an RBF kernel using a combination of colour and texture features. Compared to the baseline system, the proposed system provides significant improvement in term of the chance to detect a safe landing area, and the performance is more stable than the baseline in the presence of changes to the UAV altitude.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Alignment-free methods, in which shared properties of sub-sequences (e.g. identity or match length) are extracted and used to compute a distance matrix, have recently been explored for phylogenetic inference. However, the scalability and robustness of these methods to key evolutionary processes remain to be investigated. Here, using simulated sequence sets of various sizes in both nucleotides and amino acids, we systematically assess the accuracy of phylogenetic inference using an alignment-free approach, based on D2 statistics, under different evolutionary scenarios. We find that compared to a multiple sequence alignment approach, D2 methods are more robust against among-site rate heterogeneity, compositional biases, genetic rearrangements and insertions/deletions, but are more sensitive to recent sequence divergence and sequence truncation. Across diverse empirical datasets, the alignment-free methods perform well for sequences sharing low divergence, at greater computation speed. Our findings provide strong evidence for the scalability and the potential use of alignment-free methods in large-scale phylogenomics.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In recent years, considerable research efforts have been directed to micro-array technologies and their role in providing simultaneous information on expression profiles for thousands of genes. These data, when subjected to clustering and classification procedures, can assist in identifying patterns and providing insight on biological processes. To understand the properties of complex gene expression datasets, graphical representations can be used. Intuitively, the data can be represented in terms of a bipartite graph, with weighted edges corresponding to gene-sample node couples in the dataset. Biologically meaningful subgraphs can be sought, but performance can be influenced both by the search algorithm, and, by the graph-weighting scheme and both merit rigorous investigation. In this paper, we focus on edge-weighting schemes for bipartite graphical representation of gene expression. Two novel methods are presented: the first is based on empirical evidence; the second on a geometric distribution. The schemes are compared for several real datasets, assessing efficiency of performance based on four essential properties: robustness to noise and missing values, discrimination, parameter influence on scheme efficiency and reusability. Recommendations and limitations are briefly discussed. Keywords: Edge-weighting; weighted graphs; gene expression; bi-clustering

Relevância:

10.00% 10.00%

Publicador:

Resumo:

One of the main challenges in data analytics is that discovering structures and patterns in complex datasets is a computer-intensive task. Recent advances in high-performance computing provide part of the solution. Multicore systems are now more affordable and more accessible. In this paper, we investigate how this can be used to develop more advanced methods for data analytics. We focus on two specific areas: model-driven analysis and data mining using optimisation techniques.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Most real-life data analysis problems are difficult to solve using exact methods, due to the size of the datasets and the nature of the underlying mechanisms of the system under investigation. As datasets grow even larger, finding the balance between the quality of the approximation and the computing time of the heuristic becomes non-trivial. One solution is to consider parallel methods, and to use the increased computational power to perform a deeper exploration of the solution space in a similar time. It is, however, difficult to estimate a priori whether parallelisation will provide the expected improvement. In this paper we consider a well-known method, genetic algorithms, and evaluate on two distinct problem types the behaviour of the classic and parallel implementations.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In providing simultaneous information on expression profiles for thousands of genes, microarray technologies have, in recent years, been largely used to investigate mechanisms of gene expression. Clustering and classification of such data can, indeed, highlight patterns and provide insight on biological processes. A common approach is to consider genes and samples of microarray datasets as nodes in a bipartite graphs, where edges are weighted e.g. based on the expression levels. In this paper, using a previously-evaluated weighting scheme, we focus on search algorithms and evaluate, in the context of biclustering, several variations of Genetic Algorithms. We also introduce a new heuristic “Propagate”, which consists in recursively evaluating neighbour solutions with one more or one less active conditions. The results obtained on three well-known datasets show that, for a given weighting scheme,optimal or near-optimal solutions can be identified.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper addresses the development of trust in the use of Open Data through incorporation of appropriate authentication and integrity parameters for use by end user Open Data application developers in an architecture for trustworthy Open Data Services. The advantages of this architecture scheme is that it is far more scalable, not another certificate-based hierarchy that has problems with certificate revocation management. With the use of a Public File, if the key is compromised: it is a simple matter of the single responsible entity replacing the key pair with a new one and re-performing the data file signing process. Under this proposed architecture, the the Open Data environment does not interfere with the internal security schemes that might be employed by the entity. However, this architecture incorporates, when needed, parameters from the entity, e.g. person who authorized publishing as Open Data, at the time that datasets are created/added.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This chapter discusses the methodological aspects and empirical findings of a large-scale, funded project investigating public communication through social media in Australia. The project concentrates on Twitter, but we approach it as representative of broader current trends toward the integration of large datasets and computational methods into media and communication studies in general, and social media scholarship in particular. The research discussed in this chapter aims to empirically describe networks of affiliation and interest in the Australian Twittersphere, while reflecting on the methodological implications and imperatives of ‘big data’ in the humanities. Using custom network crawling technology, we have conducted a snowball crawl of Twitter accounts operated by Australian users to identify more than one million users and their follower/followee relationships, and have mapped their interconnections. In itself, the map provides an overview of the major clusters of densely interlinked users, largely centred on shared topics of interest (from politics through arts to sport) and/or sociodemographic factors (geographic origins, age groups). Our map of the Twittersphere is the first of its kind for the Australian part of the global Twitter network, and also provides a first independent and scholarly estimation of the size of the total Australian Twitter population. In combination with our investigation of participation patterns in specific thematic hashtags, the map also enables us to examine which areas of the underlying follower/followee network are activated in the discussion of specific current topics – allowing new insights into the extent to which particular topics and issues are of interest to specialised niches or to the Australian public more broadly. Specifically, we examine the Twittersphere footprint of dedicated political discussion, under the #auspol hashtag, and compare it with the heightened, broader interest in Australian politics during election campaigns, using #ausvotes; we explore the different patterns of Twitter activity across the map for major television events (the popular competitive cooking show #masterchef, the British #royalwedding, and the annual #stateoforigin Rugby League sporting contest); and we investigate the circulation of links to the articles published by a number of major Australian news organisations across the network. Such analysis, which combines the ‘big data’-informed map and a close reading of individual communicative phenomena, makes it possible to trace the dynamic formation and dissolution of issue publics against the backdrop of longer-term network connections, and the circulation of information across these follower/followee links. Such research sheds light on the communicative dynamics of Twitter as a space for mediated social interaction. Our work demonstrates the possibilities inherent in the current ‘computational turn’ (Berry, 2010) in the digital humanities, as well as adding to the development and critical examination of methodologies for dealing with ‘big data’ (boyd and Crawford, 2011). Out tools and methods for doing Twitter research, released under Creative Commons licences through our project Website, provide the basis for replicable and verifiable digital humanities research on the processes of public communication which take place through this important new social network.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Building on hashtag datasets gathered since January 2011, this paper will compare patterns of Twitter usage during the popular revolution in Egypt and the civil war in Libya. Using custom-made tools for processing ‘big data’ (boyd & Crawford, 2011), we will examine the volume of tweets sent by English-, Arabic-, and mixed-language Twitter users over time, and examine the networks of interaction (variously through @replying, retweeting, or both) between these groups as they developed and shifted over the course of these uprisings. Examining @reply and retweet traffic, we will identify general patterns of information flow between the English- and Arabic-speaking sides of the Twittersphere, and highlight the roles played by key boundary riders connecting both language spheres. Further, we will examine the URLs shared in these hashtags by Twitter participants, to identify the most prominent overall information sources, examine differences in the information diet experienced by English- and Arabic-language users, and investigate whether there are any online sources whose URLs are transcending language boundaries more frequently than others.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Discounted Cumulative Gain (DCG) is a well-known ranking evaluation measure for models built with multiple relevance graded data. By handling tagging data used in recommendation systems as an ordinal relevance set of {negative,null,positive}, we propose to build a DCG based recommendation model. We present an efficient and novel learning-to-rank method by optimizing DCG for a recommendation model using the tagging data interpretation scheme. Evaluating the proposed method on real-world datasets, we demonstrate that the method is scalable and outperforms the benchmarking methods by generating a quality top-N item recommendation list.