881 resultados para Landmark-based spectral clustering


Relevância:

30.00% 30.00%

Publicador:

Resumo:

This work proposes a method for data clustering based on complex networks theory. A data set is represented as a network by considering different metrics to establish the connection between each pair of objects. The clusters are obtained by taking into account five community detection algorithms. The network-based clustering approach is applied in two real-world databases and two sets of artificially generated data. The obtained results suggest that the exponential of the Minkowski distance is the most suitable metric to quantify the similarities between pairs of objects. In addition, the community identification method based on the greedy optimization provides the best cluster solution. We compare the network-based clustering approach with some traditional clustering algorithms and verify that it provides the lowest classification error rate. (C) 2012 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we perform a thorough analysis of a spectral phase-encoded time spreading optical code division multiple access (SPECTS-OCDMA) system based on Walsh-Hadamard (W-H) codes aiming not only at finding optimal code-set selections but also at assessing its loss of security due to crosstalk. We prove that an inadequate choice of codes can make the crosstalk between active users to become large enough so as to cause the data from the user of interest to be detected by other user. The proposed algorithm for code optimization targets code sets that produce minimum bit error rate (BER) among all codes for a specific number of simultaneous users. This methodology allows us to find optimal code sets for any OCDMA system, regardless the code family used and the number of active users. This procedure is crucial for circumventing the unexpected lack of security due to crosstalk. We also show that a SPECTS-OCDMA system based on W-H 32(64) fundamentally limits the number of simultaneous users to 4(8) with no security violation due to crosstalk. More importantly, we prove that only a small fraction of the available code sets is actually immune to crosstalk with acceptable BER (<10(-9)) i.e., approximately 0.5% for W-H 32 with four simultaneous users, and about 1 x 10(-4)% for W-H 64 with eight simultaneous users.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

XML similarity evaluation has become a central issue in the database and information communities, its applications ranging over document clustering, version control, data integration and ranked retrieval. Various algorithms for comparing hierarchically structured data, XML documents in particular, have been proposed in the literature. Most of them make use of techniques for finding the edit distance between tree structures, XML documents being commonly modeled as Ordered Labeled Trees. Yet, a thorough investigation of current approaches led us to identify several similarity aspects, i.e., sub-tree related structural and semantic similarities, which are not sufficiently addressed while comparing XML documents. In this paper, we provide an integrated and fine-grained comparison framework to deal with both structural and semantic similarities in XML documents (detecting the occurrences and repetitions of structurally and semantically similar sub-trees), and to allow the end-user to adjust the comparison process according to her requirements. Our framework consists of four main modules for (i) discovering the structural commonalities between sub-trees, (ii) identifying sub-tree semantic resemblances, (iii) computing tree-based edit operations costs, and (iv) computing tree edit distance. Experimental results demonstrate higher comparison accuracy with respect to alternative methods, while timing experiments reflect the impact of semantic similarity on overall system performance.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Air conditioning and lighting costs can be reduced substantially by changing the optical properties of "intelligent windows." The electrochromic devices studied to date have used copper as an additive. Copper, used here as an electrochromic material, was dissolved in an aqueous animal protein-derived gel electrolyte. This combination constitutes the electrochromic system for reversible electrodeposition. Cyclic voltammetry, chronoamperometric and chromogenic analyses indicated that were obtained good conditions of transparency (initial transmittance of 70%), optical reversibility, small potential window (2.1 V), variation of transmittance in visible light (63.6%) and near infrared (20%) spectral regions. Permanence in the darkened state was achieved by maintaining a lower pulse potential (-0.16 V) than the deposition potential (-1.0 V). Increasing the number of deposition and dissolution cycles favored the transmittance and photoelectrochemical reversibility of the device. The conductivity of the electrolyte (10(-3) S/cm) at several concentrations of CuCl2 was determined by electrochemical impedance spectroscopy. A thermogravimetric analysis confirmed the good thermal stability of the electrolyte, since the mass loss detected up to 100 degrees C corresponded to water evaporation and decomposition of the gel started only at 200 degrees C. Micrographic and small angle X-ray scattering analyses indicated the formation of a persistent deposit of copper particles on the ITO. (C) 2012 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Spectral decomposition has rarely been used to investigate complex networks. In this work we apply this concept in order to define two kinds of link-directed attacks while quantifying their respective effects on the topology. Several other kinds of more traditional attacks are also adopted and compared. These attacks had substantially diverse effects, depending on each specific network (models and real-world structures). It is also shown that the spectrally based attacks have special effects in affecting the transitivity of the networks.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background and Aim: The identification of gastric carcinomas (GC) has traditionally been based on histomorphology. Recently, DNA microarrays have successfully been used to identify tumors through clustering of the expression profiles. Random forest clustering is widely used for tissue microarrays and other immunohistochemical data, because it handles highly-skewed tumor marker expressions well, and weighs the contribution of each marker according to its relatedness with other tumor markers. In the present study, we e identified biologically- and clinically-meaningful groups of GC by hierarchical clustering analysis of immunohistochemical protein expression. Methods: We selected 28 proteins (p16, p27, p21, cyclin D1, cyclin A, cyclin B1, pRb, p53, c-met, c-erbB-2, vascular endothelial growth factor, transforming growth factor [TGF]-beta I, TGF-beta II, MutS homolog-2, bcl-2, bax, bak, bcl-x, adenomatous polyposis coli, clathrin, E-cadherin, beta-catenin, mucin (MUC) 1, MUC2, MUC5AC, MUC6, matrix metalloproteinase [ MMP]-2, and MMP-9) to be investigated by immunohistochemistry in 482 GC. The analyses of the data were done using a random forest-clustering method. Results: Proteins related to cell cycle, growth factor, cell motility, cell adhesion, apoptosis, and matrix remodeling were highly expressed in GC. We identified protein expressions associated with poor survival in diffuse-type GC. Conclusions: Based on the expression analysis of 28 proteins, we identified two groups of GC that could not be explained by any clinicopathological variables, and a subgroup of long-surviving diffuse-type GC patients with a distinct molecular profile. These results provide not only a new molecular basis for understanding the biological properties of GC, but also better prediction of survival than the classic pathological grouping.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Multivariate analyses of UV-Vis spectral data from cachaca wood extracts provide a simple and robust model to classify aged Brazilian cachacas according to the wood species used in the maturation barrels. The model is based on inspection of 93 extracts of oak and different Brazilian wood species by a non-aged cachaca used as an extraction solvent. Application of PCA (Principal Components Analysis) and HCA (Hierarchical Cluster Analysis) leads to identification of 6 clusters of cachaca wood extracts (amburana, amendoim, balsamo, castanheira, jatoba, and oak). LDA (Linear Discriminant Analysis) affords classification of 10 different wood species used in the cachaca extracts (amburana, amendoim, balsamo, cabreuva-parda, canela-sassafras, castanheira, jatoba, jequitiba-rosa, louro-canela, and oak) with an accuracy ranging from 80% (amendoim and castanheira) to 100% (balsamo and jequitiba-rosa). The methodology provides a low-cost alternative to methods based on liquid chromatography and mass spectrometry to classify cachacas aged in barrels that are composed of different wood species.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract Background Transcript enumeration methods such as SAGE, MPSS, and sequencing-by-synthesis EST "digital northern", are important high-throughput techniques for digital gene expression measurement. As other counting or voting processes, these measurements constitute compositional data exhibiting properties particular to the simplex space where the summation of the components is constrained. These properties are not present on regular Euclidean spaces, on which hybridization-based microarray data is often modeled. Therefore, pattern recognition methods commonly used for microarray data analysis may be non-informative for the data generated by transcript enumeration techniques since they ignore certain fundamental properties of this space. Results Here we present a software tool, Simcluster, designed to perform clustering analysis for data on the simplex space. We present Simcluster as a stand-alone command-line C package and as a user-friendly on-line tool. Both versions are available at: http://xerad.systemsbiology.net/simcluster. Conclusion Simcluster is designed in accordance with a well-established mathematical framework for compositional data analysis, which provides principled procedures for dealing with the simplex space, and is thus applicable in a number of contexts, including enumeration-based gene expression data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: A common approach for time series gene expression data analysis includes the clustering of genes with similar expression patterns throughout time. Clustered gene expression profiles point to the joint contribution of groups of genes to a particular cellular process. However, since genes belong to intricate networks, other features, besides comparable expression patterns, should provide additional information for the identification of functionally similar genes. Results: In this study we perform gene clustering through the identification of Granger causality between and within sets of time series gene expression data. Granger causality is based on the idea that the cause of an event cannot come after its consequence. Conclusions: This kind of analysis can be used as a complementary approach for functional clustering, wherein genes would be clustered not solely based on their expression similarity but on their topological proximity built according to the intensity of Granger causality among them.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

OBJECTIVE: The hippocampus has an important role in the acquisition and recall of aversive memories. The objective of this study was to investigate the relationship among hippocampal rhythms. METHODS: Microeletrodes arrays were implanted in the hippocampus of Wistar rats. The animals were trained and tested in a contextual fear conditioning task. The training consisted in applying shocks in the legs. The memory test was performed 1 day (recent memory) or 18 days (remote memory) after training. We proposed a measure based on the FFT power spectrum, denominated "delta-theta ratio", to characterize the different behaviors (active exploration and freezing) and the memories types. RESULTS: The delta-theta ratio was able to distinguish recent and remote memories. In this study, the ratio for the 18-day group was smaller than for the 1-day group. Moreover, this measure was useful to distinguish the different behavior states active exploration and freezing. CONCLUSIONS: The results suggest delta-theta oscillations could reflect the demands on information processing during recent and remote memory recalls.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Type Ia supernovae have been successfully used as standardized candles to study the expansion history of the Universe. In the past few years, these studies led to the exciting result of an accelerated expansion caused by the repelling action of some sort of dark energy. This result has been confirmed by measurements of cosmic microwave background radiation, the large-scale structure, and the dynamics of galaxy clusters. The combination of all these experiments points to a “concordance model” of the Universe with flat large-scale geometry and a dominant component of dark energy. However, there are several points related to supernova measurements which need careful analysis in order to doubtlessly establish the validity of the concordance model. As the amount and quality of data increases, the need of controlling possible systematic effects which may bias the results becomes crucial. Also important is the improvement of our knowledge of the physics of supernovae events to assure and possibly refine their calibration as standardized candle. This thesis addresses some of those issues through the quantitative analysis of supernova spectra. The stress is put on a careful treatment of the data and on the definition of spectral measurement methods. The comparison of measurements for a large set of spectra from nearby supernovae is used to study the homogeneity and to search for spectral parameters which may further refine the calibration of the standardized candle. One such parameter is found to reduce the dispersion in the distance estimation of a sample of supernovae to below 6%, a precision which is comparable with the current lightcurve-based calibration, and is obtained in an independent manner. Finally, the comparison of spectral measurements from nearby and distant objects is used to test the possibility of evolution with cosmic time of the intrinsic brightness of type Ia supernovae.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The present work proposes a method based on CLV (Clustering around Latent Variables) for identifying groups of consumers in L-shape data. This kind of datastructure is very common in consumer studies where a panel of consumers is asked to assess the global liking of a certain number of products and then, preference scores are arranged in a two-way table Y. External information on both products (physicalchemical description or sensory attributes) and consumers (socio-demographic background, purchase behaviours or consumption habits) may be available in a row descriptor matrix X and in a column descriptor matrix Z respectively. The aim of this method is to automatically provide a consumer segmentation where all the three matrices play an active role in the classification, getting homogeneous groups from all points of view: preference, products and consumer characteristics. The proposed clustering method is illustrated on data from preference studies on food products: juices based on berry fruits and traditional cheeses from Trentino. The hedonic ratings given by the consumer panel on the products under study were explained with respect to the product chemical compounds, sensory evaluation and consumer socio-demographic information, purchase behaviour and consumption habits.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In Performance-Based Earthquake Engineering (PBEE), evaluating the seismic performance (or seismic risk) of a structure at a designed site has gained major attention, especially in the past decade. One of the objectives in PBEE is to quantify the seismic reliability of a structure (due to the future random earthquakes) at a site. For that purpose, Probabilistic Seismic Demand Analysis (PSDA) is utilized as a tool to estimate the Mean Annual Frequency (MAF) of exceeding a specified value of a structural Engineering Demand Parameter (EDP). This dissertation focuses mainly on applying an average of a certain number of spectral acceleration ordinates in a certain interval of periods, Sa,avg (T1,…,Tn), as scalar ground motion Intensity Measure (IM) when assessing the seismic performance of inelastic structures. Since the interval of periods where computing Sa,avg is related to the more or less influence of higher vibration modes on the inelastic response, it is appropriate to speak about improved IMs. The results using these improved IMs are compared with a conventional elastic-based scalar IMs (e.g., pseudo spectral acceleration, Sa ( T(¹)), or peak ground acceleration, PGA) and the advanced inelastic-based scalar IM (i.e., inelastic spectral displacement, Sdi). The advantages of applying improved IMs are: (i ) "computability" of the seismic hazard according to traditional Probabilistic Seismic Hazard Analysis (PSHA), because ground motion prediction models are already available for Sa (Ti), and hence it is possibile to employ existing models to assess hazard in terms of Sa,avg, and (ii ) "efficiency" or smaller variability of structural response, which was minimized to assess the optimal range to compute Sa,avg. More work is needed to assess also "sufficiency" and "scaling robustness" desirable properties, which are disregarded in this dissertation. However, for ordinary records (i.e., with no pulse like effects), using the improved IMs is found to be more accurate than using the elastic- and inelastic-based IMs. For structural demands that are dominated by the first mode of vibration, using Sa,avg can be negligible relative to the conventionally-used Sa (T(¹)) and the advanced Sdi. For structural demands with sign.cant higher-mode contribution, an improved scalar IM that incorporates higher modes needs to be utilized. In order to fully understand the influence of the IM on the seismis risk, a simplified closed-form expression for the probability of exceeding a limit state capacity was chosen as a reliability measure under seismic excitations and implemented for Reinforced Concrete (RC) frame structures. This closed-form expression is partuclarly useful for seismic assessment and design of structures, taking into account the uncertainty in the generic variables, structural "demand" and "capacity" as well as the uncertainty in seismic excitations. The assumed framework employs nonlinear Incremental Dynamic Analysis (IDA) procedures in order to estimate variability in the response of the structure (demand) to seismic excitations, conditioned to IM. The estimation of the seismic risk using the simplified closed-form expression is affected by IM, because the final seismic risk is not constant, but with the same order of magnitude. Possible reasons concern the non-linear model assumed, or the insufficiency of the selected IM. Since it is impossibile to state what is the "real" probability of exceeding a limit state looking the total risk, the only way is represented by the optimization of the desirable properties of an IM.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Precipitation retrieval over high latitudes, particularly snowfall retrieval over ice and snow, using satellite-based passive microwave spectrometers, is currently an unsolved problem. The challenge results from the large variability of microwave emissivity spectra for snow and ice surfaces, which can mimic, to some degree, the spectral characteristics of snowfall. This work focuses on the investigation of a new snowfall detection algorithm specific for high latitude regions, based on a combination of active and passive sensors able to discriminate between snowing and non snowing areas. The space-borne Cloud Profiling Radar (on CloudSat), the Advanced Microwave Sensor units A and B (on NOAA-16) and the infrared spectrometer MODIS (on AQUA) have been co-located for 365 days, from October 1st 2006 to September 30th, 2007. CloudSat products have been used as truth to calibrate and validate all the proposed algorithms. The methodological approach followed can be summarised into two different steps. In a first step, an empirical search for a threshold, aimed at discriminating the case of no snow, was performed, following Kongoli et al. [2003]. This single-channel approach has not produced appropriate results, a more statistically sound approach was attempted. Two different techniques, which allow to compute the probability above and below a Brightness Temperature (BT) threshold, have been used on the available data. The first technique is based upon a Logistic Distribution to represent the probability of Snow given the predictors. The second technique, defined Bayesian Multivariate Binary Predictor (BMBP), is a fully Bayesian technique not requiring any hypothesis on the shape of the probabilistic model (such as for instance the Logistic), which only requires the estimation of the BT thresholds. The results obtained show that both methods proposed are able to discriminate snowing and non snowing condition over the Polar regions with a probability of correct detection larger than 0.5, highlighting the importance of a multispectral approach.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The intensity of regional specialization in specific activities, and conversely, the level of industrial concentration in specific locations, has been used as a complementary evidence for the existence and significance of externalities. Additionally, economists have mainly focused the debate on disentangling the sources of specialization and concentration processes according to three vectors: natural advantages, internal, and external scale economies. The arbitrariness of partitions plays a key role in capturing these effects, while the selection of the partition would have to reflect the actual characteristics of the economy. Thus, the identification of spatial boundaries to measure specialization becomes critical, since most likely the model will be adapted to different scales of distance, and be influenced by different types of externalities or economies of agglomeration, which are based on the mechanisms of interaction with particular requirements of spatial proximity. This work is based on the analysis of the spatial aspect of economic specialization supported by the manufacturing industry case. The main objective is to propose, for discrete and continuous space: i) a measure of global specialization; ii) a local disaggregation of the global measure; and iii) a spatial clustering method for the identification of specialized agglomerations.