12 resultados para data pre-processing

em University of Queensland eSpace - Australia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Non-technical losses (NTL) identification and prediction are important tasks for many utilities. Data from customer information system (CIS) can be used for NTL analysis. However, in order to accurately and efficiently perform NTL analysis, the original data from CIS need to be pre-processed before any detailed NTL analysis can be carried out. In this paper, we propose a feature selection based method for CIS data pre-processing in order to extract the most relevant information for further analysis such as clustering and classifications. By removing irrelevant and redundant features, feature selection is an essential step in data mining process in finding optimal subset of features to improve the quality of result by giving faster time processing, higher accuracy and simpler results with fewer features. Detailed feature selection analysis is presented in the paper. Both time-domain and load shape data are compared based on the accuracy, consistency and statistical dependencies between features.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Frequent Itemsets mining is well explored for various data types, and its computational complexity is well understood. There are methods to deal effectively with computational problems. This paper shows another approach to further performance enhancements of frequent items sets computation. We have made a series of observations that led us to inventing data pre-processing methods such that the final step of the Partition algorithm, where a combination of all local candidate sets must be processed, is executed on substantially smaller input data. The paper shows results from several experiments that confirmed our general and formally presented observations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Advances in three-dimensional (313) electron microscopy (EM) and image processing are providing considerable improvements in the resolution of subcellular volumes, macromolecular assemblies and individual proteins. However, the recovery of high-frequency information from biological samples is hindered by specimen sensitivity to beam damage. Low dose electron cryo-microscopy conditions afford reduced beam damage but typically yield images with reduced contrast and low signal-to-noise ratios (SNRs). Here, we describe the properties of a new discriminative bilateral (DBL) filter that is based upon the bilateral filter implementation of Jiang et al. (Jiang, W., Baker, M.L., Wu, Q., Bajaj, C., Chin, W., 2003. Applications of a bilateral denoising filter in biological electron microscopy. J. Struc. Biol. 128, 82-97.). In contrast to the latter, the DBL filter can distinguish between object edges and high-frequency noise pixels through the use of an additional photometric exclusion function. As a result, high frequency noise pixels are smoothed, yet object edge detail is preserved. In the present study, we show that the DBL filter effectively reduces noise in low SNR single particle data as well as cellular tomograms of stained plastic sections. The properties of the DBL filter are discussed in terms of its usefulness for single particle analysis and for pre-processing cellular tomograms ahead of image segmentation. (c) 2006 Elsevier Inc. All rights reserved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We are developing a telemedicine application which offers automated diagnosis of facial (Bell's) palsy through a Web service. We used a test data set of 43 images of facial palsy patients and 44 normal people to develop the automatic recognition algorithm. Three different image pre-processing methods were used. Machine learning techniques (support vector machine, SVM) were used to examine the difference between the two halves of the face. If there was a sufficient difference, then the SVM recognized facial palsy. Otherwise, if the halves were roughly symmetrical, the SVM classified the image as normal. It was found that the facial palsy images had a greater Hamming Distance than the normal images, indicating greater asymmetry. The median distance in the normal group was 331 (interquartile range 277-435) and the median distance in the facial palsy group was 509 (interquartile range 334-703). This difference was significant (P

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A k-NN query finds the k nearest-neighbors of a given point from a point database. When it is sufficient to measure object distance using the Euclidian distance, the key to efficient k-NN query processing is to fetch and check the distances of a minimum number of points from the database. For many applications, such as vehicle movement along road networks or rover and animal movement along terrain surfaces, the distance is only meaningful when it is along a valid movement path. For this type of k-NN queries, the focus of efficient query processing is to minimize the cost of computing distances using the environment data (such as the road network data and the terrain data), which can be several orders of magnitude larger than that of the point data. Efficient processing of k-NN queries based on the Euclidian distance or the road network distance has been investigated extensively in the past. In this paper, we investigate the problem of surface k-NN query processing, where the distance is calculated from the shortest path along a terrain surface. This problem is very challenging, as the terrain data can be very large and the computational cost of finding shortest paths is very high. We propose an efficient solution based on multiresolution terrain models. Our approach eliminates the need of costly process of finding shortest paths by ranking objects using estimated lower and upper bounds of distance on multiresolution terrain models.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A specialised reconfigurable architecture is targeted at wireless base-band processing. It is built to cater for multiple wireless standards. It has lower power consumption than the processor-based solution. It can be scaled to run in parallel for processing multiple channels. Test resources are embedded on the architecture and testing strategies are included. This architecture is functionally partitioned according to the common operations found in wireless standards, such as CRC error correction, convolution and interleaving. These modules are linked via Virtual Wire Hardware modules and route-through switch matrices. Data can be processed in any order through this interconnect structure. Virtual Wire ensures the same flexibility as normal interconnects, but the area occupied and the number of switches needed is reduced. The testing algorithm scans all possible paths within the interconnection network exhaustively and searches for faults in the processing modules. The testing algorithm starts by scanning the externally addressable memory space and testing the master controller. The controller then tests every switch in the route-through switch matrix by making loops from the shared memory to each of the switches. The local switch matrix is also tested in the same way. Next the local memory is scanned. Finally, pre-defined test vectors are loaded into local memory to check the processing modules. This paper compares various base-band processing solutions. It describes the proposed platform and its implementation. It outlines the test resources and algorithm. It concludes with the mapping of Bluetooth and GSM base-band onto the platform.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Background: The results from previous studies have indicated that a pre-attentive component of the event-related potential (ERP), the mismatch negativity (MMN), may be an objective measure of the automatic auditory processing of phonemes and words. Aims: This article reviews the relationship between the MMN data and psycholinguistic models of spoken word processing, in order to determine whether the MMN may be used to objectively pinpoint spoken word processing deficits in individuals with aphasia. Main Contribution: This article outlines the ways in which the MMN data support psycholinguistic models currently used in the clinical management of aphasic individuals. Furthermore, the cell assembly model of the neurophysiological mechanisms underlying spoken word processing is discussed in relation to the MMN and psycholinguistic models. Conclusions: The MMN data support current theoretical psycholinguistic and neurophysiological models of spoken word processing. Future MMN studies that include normal and aphasic populations will further elucidate the role that the MMN may play in the clinical management of aphasic individuals.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Quantile computation has many applications including data mining and financial data analysis. It has been shown that an is an element of-approximate summary can be maintained so that, given a quantile query d (phi, is an element of), the data item at rank [phi N] may be approximately obtained within the rank error precision is an element of N over all N data items in a data stream or in a sliding window. However, scalable online processing of massive continuous quantile queries with different phi and is an element of poses a new challenge because the summary is continuously updated with new arrivals of data items. In this paper, first we aim to dramatically reduce the number of distinct query results by grouping a set of different queries into a cluster so that they can be processed virtually as a single query while the precision requirements from users can be retained. Second, we aim to minimize the total query processing costs. Efficient algorithms are developed to minimize the total number of times for reprocessing clusters and to produce the minimum number of clusters, respectively. The techniques are extended to maintain near-optimal clustering when queries are registered and removed in an arbitrary fashion against whole data streams or sliding windows. In addition to theoretical analysis, our performance study indicates that the proposed techniques are indeed scalable with respect to the number of input queries as well as the number of items and the item arrival rate in a data stream.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Traditional vegetation mapping methods use high cost, labour-intensive aerial photography interpretation. This approach can be subjective and is limited by factors such as the extent of remnant vegetation, and the differing scale and quality of aerial photography over time. An alternative approach is proposed which integrates a data model, a statistical model and an ecological model using sophisticated Geographic Information Systems (GIS) techniques and rule-based systems to support fine-scale vegetation community modelling. This approach is based on a more realistic representation of vegetation patterns with transitional gradients from one vegetation community to another. Arbitrary, though often unrealistic, sharp boundaries can be imposed on the model by the application of statistical methods. This GIS-integrated multivariate approach is applied to the problem of vegetation mapping in the complex vegetation communities of the Innisfail Lowlands in the Wet Tropics bioregion of Northeastern Australia. The paper presents the full cycle of this vegetation modelling approach including sampling sites, variable selection, model selection, model implementation, internal model assessment, model prediction assessments, models integration of discrete vegetation community models to generate a composite pre-clearing vegetation map, independent data set model validation and model prediction's scale assessments. An accurate pre-clearing vegetation map of the Innisfail Lowlands was generated (0.83r(2)) through GIS integration of 28 separate statistical models. This modelling approach has good potential for wider application, including provision of. vital information for conservation planning and management; a scientific basis for rehabilitation of disturbed and cleared areas; a viable method for the production of adequate vegetation maps for conservation and forestry planning of poorly-studied areas. (c) 2006 Elsevier B.V. All rights reserved.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A progressive spatial query retrieves spatial data based on previous queries (e.g., to fetch data in a more restricted area with higher resolution). A direct query, on the other side, is defined as an isolated window query. A multi-resolution spatial database system should support both progressive queries and traditional direct queries. It is conceptually challenging to support both types of query at the same time, as direct queries favour location-based data clustering, whereas progressive queries require fragmented data clustered by resolutions. Two new scaleless data structures are proposed in this paper. Experimental results using both synthetic and real world datasets demonstrate that the query processing time based on the new multiresolution approaches is comparable and often better than multi-representation data structures for both types of queries.