902 resultados para Large Data Sets


Relevância:

90.00% 90.00%

Publicador:

Resumo:

The Reeb graph of a scalar function tracks the evolution of the topology of its level sets. This paper describes a fast algorithm to compute the Reeb graph of a piecewise-linear (PL) function defined over manifolds and non-manifolds. The key idea in the proposed approach is to maximally leverage the efficient contour tree algorithm to compute the Reeb graph. The algorithm proceeds by dividing the input into a set of subvolumes that have loop-free Reeb graphs using the join tree of the scalar function and computes the Reeb graph by combining the contour trees of all the subvolumes. Since the key ingredient of this method is a series of union-find operations, the algorithm is fast in practice. Experimental results demonstrate that it outperforms current generic algorithms by a factor of up to two orders of magnitude, and has a performance on par with algorithms that are catered to restricted classes of input. The algorithm also extends to handle large data that do not fit in memory.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this paper we study the problem of designing SVM classifiers when the kernel matrix, K, is affected by uncertainty. Specifically K is modeled as a positive affine combination of given positive semi definite kernels, with the coefficients ranging in a norm-bounded uncertainty set. We treat the problem using the Robust Optimization methodology. This reduces the uncertain SVM problem into a deterministic conic quadratic problem which can be solved in principle by a polynomial time Interior Point (IP) algorithm. However, for large-scale classification problems, IP methods become intractable and one has to resort to first-order gradient type methods. The strategy we use here is to reformulate the robust counterpart of the uncertain SVM problem as a saddle point problem and employ a special gradient scheme which works directly on the convex-concave saddle function. The algorithm is a simplified version of a general scheme due to Juditski and Nemirovski (2011). It achieves an O(1/T-2) reduction of the initial error after T iterations. A comprehensive empirical study on both synthetic data and real-world protein structure data sets show that the proposed formulations achieve the desired robustness, and the saddle point based algorithm outperforms the IP method significantly.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Future space-based gravity wave (GW) experiments such as the Big Bang Observatory (BBO), with their excellent projected, one sigma angular resolution, will measure the luminosity distance to a large number of GW sources to high precision, and the redshift of the single galaxies in the narrow solid angles towards the sources will provide the redshifts of the gravity wave sources. One sigma BBO beams contain the actual source in only 68% of the cases; the beams that do not contain the source may contain a spurious single galaxy, leading to misidentification. To increase the probability of the source falling within the beam, larger beams have to be considered, decreasing the chances of finding single galaxies in the beams. Saini et al. T.D. Saini, S.K. Sethi, and V. Sahni, Phys. Rev. D 81, 103009 (2010)] argued, largely analytically, that identifying even a small number of GW source galaxies furnishes a rough distance-redshift relation, which could be used to further resolve sources that have multiple objects in the angular beam. In this work we further develop this idea by introducing a self-calibrating iterative scheme which works in conjunction with Monte Carlo simulations to determine the luminosity distance to GW sources with progressively greater accuracy. This iterative scheme allows one to determine the equation of state of dark energy to within an accuracy of a few percent for a gravity wave experiment possessing a beam width an order of magnitude larger than BBO (and therefore having a far poorer angular resolution). This is achieved with no prior information about the nature of dark energy from other data sets such as type Ia supernovae, baryon acoustic oscillations, cosmic microwave background, etc. DOI:10.1103/PhysRevD.87.083001

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The rapid growth in the field of data mining has lead to the development of various methods for outlier detection. Though detection of outliers has been well explored in the context of numerical data, dealing with categorical data is still evolving. In this paper, we propose a two-phase algorithm for detecting outliers in categorical data based on a novel definition of outliers. In the first phase, this algorithm explores a clustering of the given data, followed by the ranking phase for determining the set of most likely outliers. The proposed algorithm is expected to perform better as it can identify different types of outliers, employing two independent ranking schemes based on the attribute value frequencies and the inherent clustering structure in the given data. Unlike some existing methods, the computational complexity of this algorithm is not affected by the number of outliers to be detected. The efficacy of this algorithm is demonstrated through experiments on various public domain categorical data sets.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Outlier detection in high dimensional categorical data has been a problem of much interest due to the extensive use of qualitative features for describing the data across various application areas. Though there exist various established methods for dealing with the dimensionality aspect through feature selection on numerical data, the categorical domain is actively being explored. As outlier detection is generally considered as an unsupervised learning problem due to lack of knowledge about the nature of various types of outliers, the related feature selection task also needs to be handled in a similar manner. This motivates the need to develop an unsupervised feature selection algorithm for efficient detection of outliers in categorical data. Addressing this aspect, we propose a novel feature selection algorithm based on the mutual information measure and the entropy computation. The redundancy among the features is characterized using the mutual information measure for identifying a suitable feature subset with less redundancy. The performance of the proposed algorithm in comparison with the information gain based feature selection shows its effectiveness for outlier detection. The efficacy of the proposed algorithm is demonstrated on various high-dimensional benchmark data sets employing two existing outlier detection methods.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

[1] Evaporative fraction (EF) is a measure of the amount of available energy at the earth surface that is partitioned into latent heat flux. The currently operational thermal sensors like the Moderate Resolution Imaging Spectroradiometer (MODIS) on satellite platforms provide data only at 1000 m, which constraints the spatial resolution of EF estimates. A simple model (disaggregation of evaporative fraction (DEFrac)) based on the observed relationship between EF and the normalized difference vegetation index is proposed to spatially disaggregate EF. The DEFrac model was tested with EF estimated from the triangle method using 113 clear sky data sets from the MODIS sensor aboard Terra and Aqua satellites. Validation was done using the data at four micrometeorological tower sites across varied agro-climatic zones possessing different land cover conditions in India using Bowen ratio energy balance method. The root-mean-square error (RMSE) of EF estimated at 1000 m resolution using the triangle method was 0.09 for all the four sites put together. The RMSE of DEFrac disaggregated EF was 0.09 for 250 m resolution. Two models of input disaggregation were also tried with thermal data sharpened using two thermal sharpening models DisTrad and TsHARP. The RMSE of disaggregated EF was 0.14 for both the input disaggregation models for 250 m resolution. Moreover, spatial analysis of disaggregation was performed using Landsat-7 (Enhanced Thematic Mapper) ETM+ data over four grids in India for contrasted seasons. It was observed that the DEFrac model performed better than the input disaggregation models under cropped conditions while they were marginally similar under non-cropped conditions.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

PurposeTo extend the previously developed temporally constrained reconstruction (TCR) algorithm to allow for real-time availability of three-dimensional (3D) temperature maps capable of monitoring MR-guided high intensity focused ultrasound applications. MethodsA real-time TCR (RT-TCR) algorithm is developed that only uses current and previously acquired undersampled k-space data from a 3D segmented EPI pulse sequence, with the image reconstruction done in a graphics processing unit implementation to overcome computation burden. Simulated and experimental data sets of HIFU heating are used to evaluate the performance of the RT-TCR algorithm. ResultsThe simulation studies demonstrate that the RT-TCR algorithm has subsecond reconstruction time and can accurately measure HIFU-induced temperature rises of 20 degrees C in 15 s for 3D volumes of 16 slices (RMSE = 0.1 degrees C), 24 slices (RMSE = 0.2 degrees C), and 32 slices (RMSE = 0.3 degrees C). Experimental results in ex vivo porcine muscle demonstrate that the RT-TCR approach can reconstruct temperature maps with 192 x 162 x 66 mm 3D volume coverage, 1.5 x 1.5 x 3.0 mm resolution, and 1.2-s scan time with an accuracy of 0.5 degrees C. ConclusionThe RT-TCR algorithm offers an approach to obtaining large coverage 3D temperature maps in real-time for monitoring MR-guided high intensity focused ultrasound treatments. Magn Reson Med 71:1394-1404, 2014. (c) 2013 Wiley Periodicals, Inc.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Classification of pharmacologic activity of a chemical compound is an essential step in any drug discovery process. We develop two new atom-centered fragment descriptors (vertex indices) - one based solely on topological considerations without discriminating atomor bond types, and another based on topological and electronic features. We also assess their usefulness by devising a method to rank and classify molecules with regard to their antibacterial activity. Classification performances of our method are found to be superior compared to two previous studies on large heterogeneous data sets for hit finding and hit-to-lead studies even though we use much fewer parameters. It is found that for hit finding studies topological features (simple graph) alone provide significant discriminating power, and for hit-to-lead process small but consistent improvement can be made by additionally including electronic features (colored graph). Our approach is simple, interpretable, and suitable for design of molecules as we do not use any physicochemical properties. The singular use of vertex index as descriptor, novel range based feature extraction, and rigorous statistical validation are the key elements of this study.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The paper describes an algorithm for multi-label classification. Since a pattern can belong to more than one class, the task of classifying a test pattern is a challenging one. We propose a new algorithm to carry out multi-label classification which works for discrete data. We have implemented the algorithm and presented the results for different multi-label data sets. The results have been compared with the algorithm multi-label KNN or ML-KNN and found to give good results.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Simulated boundary potential data for Electrical Impedance Tomography (EIT) are generated by a MATLAB based EIT data generator and the resistivity reconstruction is evaluated with Electrical Impedance Tomography and Diffuse Optical Tomography Reconstruction Software (EIDORS). Circular domains containing subdomains as inhomogeneity are defined in MATLAB-based EIT data generator and the boundary data are calculated by a constant current simulation with opposite current injection (OCI) method. The resistivity images reconstructed for different boundary data sets and images are analyzed with image parameters to evaluate the reconstruction.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

NrichD ( ext-link-type=''uri'' xlink:href=''http://proline.biochem.iisc.ernet.in/NRICHD/'' xlink:type=''simple''>http://proline.biochem.iisc.ernet.in/NRICHD/)< /named-content> is a database of computationally designed protein-like sequences, augmented into natural sequence databases that can perform hops in protein sequence space to assist in the detection of remote relationships. Establishing protein relationships in the absence of structural evidence or natural `intermediately related sequences' is a challenging task. Recently, we have demonstrated that the computational design of artificial intermediary sequences/linkers is an effective approach to fill naturally occurring voids in protein sequence space. Through a large-scale assessment we have demonstrated that such sequences can be plugged into commonly employed search databases to improve the performance of routinely used sequence search methods in detecting remote relationships. Since it is anticipated that such data sets will be employed to establish protein relationships, two databases that have already captured these relationships at the structural and functional domain level, namely, the SCOP database and the Pfam database, have been `enriched' with these artificial intermediary sequences. NrichD database currently contains 3 611 010 artificial sequences that have been generated between 27 882 pairs of families from 374 SCOP folds. The data sets are freely available for download. Additional features include the design of artificial sequences between any two protein families of interest to the user.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Large variations in human actions lead to major challenges in computer vision research. Several algorithms are designed to solve the challenges. Algorithms that stand apart, help in solving the challenge in addition to performing faster and efficient manner. In this paper, we propose a human cognition inspired projection based learning for person-independent human action recognition in the H.264/AVC compressed domain and demonstrate a PBL-McRBEN based approach to help take the machine learning algorithms to the next level. Here, we use gradient image based feature extraction process where the motion vectors and quantization parameters are extracted and these are studied temporally to form several Group of Pictures (GoP). The GoP is then considered individually for two different bench mark data sets and the results are classified using person independent human action recognition. The functional relationship is studied using Projection Based Learning algorithm of the Meta-cognitive Radial Basis Function Network (PBL-McRBFN) which has a cognitive and meta-cognitive component. The cognitive component is a radial basis function network while the Meta-Cognitive Component(MCC) employs self regulation. The McC emulates human cognition like learning to achieve better performance. Performance of the proposed approach can handle sparse information in compressed video domain and provides more accuracy than other pixel domain counterparts. Performance of the feature extraction process achieved more than 90% accuracy using the PTIL-McRBFN which catalyzes the speed of the proposed high speed action recognition algorithm. We have conducted twenty random trials to find the performance in GoP. The results are also compared with other well known classifiers in machine learning literature.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The similar to 700-km-long ``central seismic gap'' is the most prominent segment of the Himalayan front not to have ruptured in a major earthquake during the last 200-500 yr. This prolonged seismic quiescence has led to the proposition that this region, with a population >10 million, is overdue for a great earthquake. Despite the region's recognized seismic risk, the geometry of faults likely to host large earthquakes remains poorly understood. Here, we place new constraints on the spatial distribution of rock uplift within the western similar to 400 km of the central seismic gap using topographic and river profile analyses together with basinwide erosion rate estimates from cosmogenic Be-10. The data sets show a distinctive physiographic transition at the base of the high Himalaya in the state of Uttarakhand, India, characterized by abrupt strike-normal increases in channel steepness and a tenfold increase in erosion rates. When combined with previously published geophysical imaging and seismicity data sets, we interpret the observed spatial distribution of erosion rates and channel steepness to reflect the landscape response to spatially variable rock uplift due to a structurally coherent ramp-flat system of the Main Himalayan Thrust. Although it remains unresolved whether the kinematics of the Main Himalayan Thrust ramp involve an emergent fault or duplex, the landscape and erosion rate patterns suggest that the decollement beneath the state of Uttarakhand provides a sufficiently large and coherent fault segment capable of hosting a great earthquake.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In structured output learning, obtaining labeled data for real-world applications is usually costly, while unlabeled examples are available in abundance. Semisupervised structured classification deals with a small number of labeled examples and a large number of unlabeled structured data. In this work, we consider semisupervised structural support vector machines with domain constraints. The optimization problem, which in general is not convex, contains the loss terms associated with the labeled and unlabeled examples, along with the domain constraints. We propose a simple optimization approach that alternates between solving a supervised learning problem and a constraint matching problem. Solving the constraint matching problem is difficult for structured prediction, and we propose an efficient and effective label switching method to solve it. The alternating optimization is carried out within a deterministic annealing framework, which helps in effective constraint matching and avoiding poor local minima, which are not very useful. The algorithm is simple and easy to implement. Further, it is suitable for any structured output learning problem where exact inference is available. Experiments on benchmark sequence labeling data sets and a natural language parsing data set show that the proposed approach, though simple, achieves comparable generalization performance.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Large quantities of teleseismic short-period seismograms recorded at SCARLET provide travel time, apparent velocity and waveform data for study of upper mantle compressional velocity structure. Relative array analysis of arrival times from distant (30° < Δ < 95°) earthquakes at all azimuths constrains lateral velocity variations beneath southern California. We compare dT/dΔ back azimuth and averaged arrival time estimates from the entire network for 154 events to the same parameters derived from small subsets of SCARLET. Patterns of mislocation vectors for over 100 overlapping subarrays delimit the spatial extent of an east-west striking, high-velocity anomaly beneath the Transverse Ranges. Thin lens analysis of the averaged arrival time differences, called 'net delay' data, requires the mean depth of the corresponding lens to be more than 100 km. Our results are consistent with the PKP-delay times of Hadley and Kanamori (1977), who first proposed the high-velocity feature, but we place the anomalous material at substantially greater depths than their 40-100 km estimate.

Detailed analysis of travel time, ray parameter and waveform data from 29 events occurring in the distance range 9° to 40° reveals the upper mantle structure beneath an oceanic ridge to depths of over 900 km. More than 1400 digital seismograms from earthquakes in Mexico and Central America yield 1753 travel times and 58 dT/dΔ measurements as well as high-quality, stable waveforms for investigation of the deep structure of the Gulf of California. The result of a travel time inversion with the tau method (Bessonova et al., 1976) is adjusted to fit the p(Δ) data, then further refined by incorporation of relative amplitude information through synthetic seismogram modeling. The application of a modified wave field continuation method (Clayton and McMechan, 1981) to the data with the final model confirms that GCA is consistent with the entire data set and also provides an estimate of the data resolution in velocity-depth space. We discover that the upper mantle under this spreading center has anomalously slow velocities to depths of 350 km, and place new constraints on the shape of the 660 km discontinuity.

Seismograms from 22 earthquakes along the northeast Pacific rim recorded in southern California form the data set for a comparative investigation of the upper mantle beneath the Cascade Ranges-Juan de Fuca region, an ocean-continent transit ion. These data consist of 853 seismograms (6° < Δ < 42°) which produce 1068 travel times and 40 ray parameter estimates. We use the spreading center model initially in synthetic seismogram modeling, and perturb GCA until the Cascade Ranges data are matched. Wave field continuation of both data sets with a common reference model confirms that real differences exist between the two suites of seismograms, implying lateral variation in the upper mantle. The ocean-continent transition model, CJF, features velocities from 200 and 350 km that are intermediate between GCA and T7 (Burdick and Helmberger, 1978), a model for the inland western United States. Models of continental shield regions (e.g., King and Calcagnile, 1976) have higher velocities in this depth range, but all four model types are similar below 400 km. This variation in rate of velocity increase with tectonic regime suggests an inverse relationship between velocity gradient and lithospheric age above 400 km depth.