864 resultados para pacs: data handling techniques


Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper analyses the probabilistic linear discriminant analysis (PLDA) speaker verification approach with limited development data. This paper investigates the use of the median as the central tendency of a speaker’s i-vector representation, and the effectiveness of weighted discriminative techniques on the performance of state-of-the-art length-normalised Gaussian PLDA (GPLDA) speaker verification systems. The analysis within shows that the median (using a median fisher discriminator (MFD)) provides a better representation of a speaker when the number of representative i-vectors available during development is reduced, and that further, usage of the pair-wise weighting approach in weighted LDA and weighted MFD provides further improvement in limited development conditions. Best performance is obtained using a weighted MFD approach, which shows over 10% improvement in EER over the baseline GPLDA system on mismatched and interview-interview conditions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we present WebPut, a prototype system that adopts a novel web-based approach to the data imputation problem. Towards this, Webput utilizes the available information in an incomplete database in conjunction with the data consistency principle. Moreover, WebPut extends effective Information Extraction (IE) methods for the purpose of formulating web search queries that are capable of effectively retrieving missing values with high accuracy. WebPut employs a confidence-based scheme that efficiently leverages our suite of data imputation queries to automatically select the most effective imputation query for each missing value. A greedy iterative algorithm is proposed to schedule the imputation order of the different missing values in a database, and in turn the issuing of their corresponding imputation queries, for improving the accuracy and efficiency of WebPut. Moreover, several optimization techniques are also proposed to reduce the cost of estimating the confidence of imputation queries at both the tuple-level and the database-level. Experiments based on several real-world data collections demonstrate not only the effectiveness of WebPut compared to existing approaches, but also the efficiency of our proposed algorithms and optimization techniques.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Talk of Big Data seems to be everywhere. Indeed, the apparently value-free concept of ‘data’ has seen a spectacular broadening of popular interest, shifting from the dry terminology of labcoat-wearing scientists to the buzzword du jour of marketers. In the business world, data is increasingly framed as an economic asset of critical importance, a commodity on a par with scarce natural resources (Backaitis, 2012; Rotella, 2012). It is social media that has most visibly brought the Big Data moment to media and communication studies, and beyond it, to the social sciences and humanities. Social media data is one of the most important areas of the rapidly growing data market (Manovich, 2012; Steele, 2011). Massive valuations are attached to companies that directly collect and profit from social media data, such as Facebook and Twitter, as well as to resellers and analytics companies like Gnip and DataSift. The expectation attached to the business models of these companies is that their privileged access to data and the resulting valuable insights into the minds of consumers and voters will make them irreplaceable in the future. Analysts and consultants argue that advanced statistical techniques will allow the detection of ongoing communicative events (natural disasters, political uprisings) and the reliable prediction of future ones (electoral choices, consumption)...

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper uses innovative content analysis techniques to map how the death of Oscar Pistorius' girlfriend, Reeva Steenkamp, was framed on Twitter conversations. Around 1.5 million posts from a two-week timeframe are analyzed with a combination of syntactic and semantic methods. This analysis is grounded in the frame analysis perspective and is different than sentiment analysis. Instead of looking for explicit evaluations, such as “he is guilty” or “he is innocent”, we showcase through the results how opinions can be identified by complex articulations of more implicit symbolic devices such as examples and metaphors repeatedly mentioned. Different frames are adopted by users as more information about the case is revealed: from a more episodic one, highly used in the very beginning, to more systemic approaches, highlighting the association of the event with urban violence, gun control issues, and violence against women. A detailed timeline of the discussions is provided.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Visual localization in outdoor environments is often hampered by the natural variation in appearance caused by such things as weather phenomena, diurnal fluctuations in lighting, and seasonal changes. Such changes are global across an environment and, in the case of global light changes and seasonal variation, the change in appearance occurs in a regular, cyclic manner. Visual localization could be greatly improved if it were possible to predict the appearance of a particular location at a particular time, based on the appearance of the location in the past and knowledge of the nature of appearance change over time. In this paper, we investigate whether global appearance changes in an environment can be learned sufficiently to improve visual localization performance. We use time of day as a test case, and generate transformations between morning and afternoon using sample images from a training set. We demonstrate the learned transformation can be generalized from training data and show the resulting visual localization on a test set is improved relative to raw image comparison. The improvement in localization remains when the area is revisited several weeks later.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Environmental monitoring is becoming critical as human activity and climate change place greater pressures on biodiversity, leading to an increasing need for data to make informed decisions. Acoustic sensors can help collect data across large areas for extended periods making them attractive in environmental monitoring. However, managing and analysing large volumes of environmental acoustic data is a great challenge and is consequently hindering the effective utilization of the big dataset collected. This paper presents an overview of our current techniques for collecting, storing and analysing large volumes of acoustic data efficiently, accurately, and cost-effectively.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Introduction This study investigated the sensitivity of calculated stereotactic radiotherapy and radiosurgery doses to the accuracy of the beam data used by the treatment planning system. Methods Two sets of field output factors were acquired using fields smaller than approximately 1 cm2, for inclusion in beam data used by the iPlan treatment planning system (Brainlab, Feldkirchen, Germany). One set of output factors were measured using an Exradin A16 ion chamber (Standard Imaging, Middleton, USA). Although this chamber has a relatively small collecting volume (0.007 cm3), measurements made in small fields using this chamber are subject to the effects of volume averaging, electronic disequilibrium and chamber perturbations. The second, more accurate, set of measurements were obtained by applying perturbation correction factors, calculated using Monte Carlo simulations according to a method recommended by Cranmer-Sargison et al. [1] to measurements made using a 60017 unshielded electron diode (PTW, Freiburg, Germany). A series of 12 sample patient treatments were used to investigate the effects of beam data accuracy on resulting planned dose. These treatments, which involved 135 fields, were planned for delivery via static conformal arcs and 3DCRT techniques, to targets ranging from prostates (up to 8 cm across) to meningiomas (usually more than 2 cm across) to arterioveinous malformations, acoustic neuromas and brain metastases (often less than 2 cm across). Isocentre doses were calculated for all of these fields using iPlan, and the results of using the two different sets of beam data were evaluated. Results While the isocentre doses for many fields are identical (difference = 0.0 %), there is a general trend for the doses calculated using the data obtained from corrected diode measurements to exceed the doses calculated using the less-accurate Exradin ion chamber measurements (difference\0.0 %). There are several alarming outliers (circled in the Fig. 1) where doses differ by more than 3 %, in beams from sample treatments planned for volumes up to 2 cm across. Discussion and conclusions These results demonstrate that treatment planning dose calculations for SRT/SRS treatments can be substantially affected when beam data for fields smaller than approximately 1 cm2 are measured inaccurately, even when treatment volumes are up to 2 cm across.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper firstly presents the benefits and critical challenges on the use of Bluetooth and Wi-Fi for crowd data collection and monitoring. The major challenges include antenna characteristics, environment’s complexity and scanning features. Wi-Fi and Bluetooth are compared in this paper in terms of architecture, discovery time, popularity of use and signal strength. Type of antennas used and the environment’s complexity such as trees for outdoor and partitions for indoor spaces highly affect the scanning range. The aforementioned challenges are empirically evaluated by “real” experiments using Bluetooth and Wi-Fi Scanners. The issues related to the antenna characteristics are also highlighted by experimenting with different antenna types. Novel scanning approaches including Overlapped Zones and Single Point Multi-Range detection methods will be then presented and verified by real-world tests. These novel techniques will be applied for location identification of the MAC IDs captured that can extract more information about people movement dynamics.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Existing techniques for automated discovery of process models from event logs largely focus on extracting flat process models. In other words, they fail to exploit the notion of subprocess, as well as structured error handling and repetition constructs provided by contemporary process modeling notations, such as the Business Process Model and Notation (BPMN). This paper presents a technique for automated discovery of BPMN models containing subprocesses, interrupting and non-interrupting boundary events, and loop and multi-instance markers. The technique analyzes dependencies between data attributes associated with events, in order to identify subprocesses and to extract their associated logs. Parent process and subprocess models are then discovered separately using existing techniques for flat process model discovery. Finally, the resulting models and logs are heuristically analyzed in order to identify boundary events and markers. A validation with one synthetic and two real-life logs shows that process models derived using the proposed technique are more accurate and less complex than those derived with flat process model discovery techniques.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Data associated with germplasm collections are typically large and multivariate with a considerable number of descriptors measured on each of many accessions. Pattern analysis methods of clustering and ordination have been identified as techniques for statistically evaluating the available diversity in germplasm data. While used in many studies, the approaches have not dealt explicitly with the computational consequences of large data sets (i.e. greater than 5000 accessions). To consider the application of these techniques to germplasm evaluation data, 11328 accessions of groundnut (Arachis hypogaea L) from the International Research Institute for the Semi-Arid Tropics, Andhra Pradesh, India were examined. Data for nine quantitative descriptors measured in the rainy and post-rainy growing seasons were used. The ordination technique of principal component analysis was used to reduce the dimensionality of the germplasm data. The identification of phenotypically similar groups of accessions within large scale data via the computationally intensive hierarchical clustering techniques was not feasible and non-hierarchical techniques had to be used. Finite mixture models that maximise the likelihood of an accession belonging to a cluster were used to cluster the accessions in this collection. The patterns of response for the different growing seasons were found to be highly correlated. However, in relating the results to passport and other characterisation and evaluation descriptors, the observed patterns did not appear to be related to taxonomy or any other well known characteristics of groundnut.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

As a sequel to a paper that dealt with the analysis of two-way quantitative data in large germplasm collections, this paper presents analytical methods appropriate for two-way data matrices consisting of mixed data types, namely, ordered multicategory and quantitative data types. While various pattern analysis techniques have been identified as suitable for analysis of the mixed data types which occur in germplasm collections, the clustering and ordination methods used often can not deal explicitly with the computational consequences of large data sets (i.e. greater than 5000 accessions) with incomplete information. However, it is shown that the ordination technique of principal component analysis and the mixture maximum likelihood method of clustering can be employed to achieve such analyses. Germplasm evaluation data for 11436 accessions of groundnut (Arachis hypogaea L.) from the International Research Institute of the Semi-Arid Tropics, Andhra Pradesh, India were examined. Data for nine quantitative descriptors measured in the post-rainy season and five ordered multicategory descriptors were used. Pattern analysis results generally indicated that the accessions could be distinguished into four regions along the continuum of growth habit (or plant erectness). Interpretation of accession membership in these regions was found to be consistent with taxonomic information, such as subspecies. Each growth habit region contained accessions from three of the most common groundnut botanical varieties. This implies that within each of the habit types there is the full range of expression for the other descriptors used in the analysis. Using these types of insights, the patterns of variability in germplasm collections can provide scientists with valuable information for their plant improvement programs.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Data in germplasm collections contain a mixture of data types; binary, multistate and quantitative. Given the multivariate nature of these data, the pattern analysis methods of classification and ordination have been identified as suitable techniques for statistically evaluating the available diversity. The proximity (or resemblance) measure, which is in part the basis of the complementary nature of classification and ordination techniques, is often specific to particular data types. The use of a combined resemblance matrix has an advantage over data type specific proximity measures. This measure accommodates the different data types without manipulating them to be of a specific type. Descriptors are partitioned into their data types and an appropriate proximity measure is used on each. The separate proximity matrices, after range standardisation, are added as a weighted average and the combined resemblance matrix is then used for classification and ordination. Germplasm evaluation data for 831 accessions of groundnut (Arachis hypogaea L.) from the Australian Tropical Field Crops Genetic Resource Centre, Biloela, Queensland were examined. Data for four binary, five ordered multistate and seven quantitative descriptors have been documented. The interpretative value of different weightings - equal and unequal weighting of data types to obtain a combined resemblance matrix - was investigated by using principal co-ordinate analysis (ordination) and hierarchical cluster analysis. Equal weighting of data types was found to be more valuable for these data as the results provided a greater insight into the patterns of variability available in the Australian groundnut germplasm collection. The complementary nature of pattern analysis techniques enables plant breeders to identify relevant accessions in relation to the descriptors which distinguish amongst them. This additional information may provide plant breeders with a more defined entry point into the germplasm collection for identifying sources of variability for their plant improvement program, thus improving the utilisation of germplasm resources.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The use of Wireless Sensor Networks (WSNs) for vibration-based Structural Health Monitoring (SHM) has become a promising approach due to many advantages such as low cost, fast and flexible deployment. However, inherent technical issues such as data asynchronicity and data loss have prevented these distinct systems from being extensively used. Recently, several SHM-oriented WSNs have been proposed and believed to be able to overcome a large number of technical uncertainties. Nevertheless, there is limited research verifying the applicability of those WSNs with respect to demanding SHM applications like modal analysis and damage identification. Based on a brief review, this paper first reveals that Data Synchronization Error (DSE) is the most inherent factor amongst uncertainties of SHM-oriented WSNs. Effects of this factor are then investigated on outcomes and performance of the most robust Output-only Modal Analysis (OMA) techniques when merging data from multiple sensor setups. The two OMA families selected for this investigation are Frequency Domain Decomposition (FDD) and data-driven Stochastic Subspace Identification (SSI-data) due to the fact that they both have been widely applied in the past decade. Accelerations collected by a wired sensory system on a large-scale laboratory bridge model are initially used as benchmark data after being added with a certain level of noise to account for the higher presence of this factor in SHM-oriented WSNs. From this source, a large number of simulations have been made to generate multiple DSE-corrupted datasets to facilitate statistical analyses. The results of this study show the robustness of FDD and the precautions needed for SSI-data family when dealing with DSE at a relaxed level. Finally, the combination of preferred OMA techniques and the use of the channel projection for the time-domain OMA technique to cope with DSE are recommended.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Self-reported health status measures are generally used to analyse Social Security Disability Insurance's (SSDI) application and award decisions as well as the relationship between its generosity and labour force participation. Due to endogeneity and measurement error, the use of self-reported health and disability indicators as explanatory variables in economic models is problematic. We employ county-level aggregate data, instrumental variables and spatial econometric techniques to analyse the determinants of variation in SSDI rates and explicitly account for the endogeneity and measurement error of the self-reported disability measure. Two surprising results are found. First, it is shown that measurement error is the dominating source of the bias and that the main source of measurement error is sampling error. Second, results suggest that there may be synergies for applying for SSDI when the disabled population is larger. © 2011 Taylor & Francis.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This research proposes the development of interfaces to support collaborative, community-driven inquiry into data, which we refer to as Participatory Data Analytics. Since the investigation is led by local communities, it is not possible to anticipate which data will be relevant and what questions are going to be asked. Therefore, users have to be able to construct and tailor visualisations to their own needs. The poster presents early work towards defining a suitable compositional model, which will allow users to mix, match, and manipulate data sets to obtain visual representations with little-to-no programming knowledge. Following a user-centred design process, we are subsequently planning to identify appropriate interaction techniques and metaphors for generating such visual specifications on wall-sized, multi-touch displays.