Biblioteca Digital

987 resultados para cosmologia, clustering, AP-test

Vincoli cosmologici dalle distorsioni geometriche della funzione di correlazione

Relevância:

100.00% 100.00%

Publicador:

Resumo:

L'accurata determinazione dei parametri che costituiscono un modello cosmologico è fondamentale per caratterizzare correttamente l'evoluzione globale dell'universo e per spiegare le sue caratteristiche locali. Per quanto riguarda questo lavoro di Tesi abbiamo studiato l'efficienza di un particolare metodo per la stima del parametro di densità della materia detto test di Alcock-Paczynski. Con tale metodo si studiando le distorsioni geometriche che un’errata assunzione dei parametri cosmologici introduce nella funzione di correlazione a due punti bidimensionale. Abbiamo applicato il test a diversi cataloghi prodotti dalla simulazione Magneticum. In particolare abbiamo studiato come l'efficienza del metodo nel riconoscere il corretto valore del parametro di densità dipenda dal tipo di tracciante considerato, dal redshift d'osservazione e dai metodi di modelizzazione delle distorsioni dovute alla dinamica degli oggetti osservati. Si è potuto osservare come l’efficienza del metodo dipenda dalla densità d’oggetti contenuti nel catalogo (rendendo le galassie il tracciante migliore su cui applicare il test) e dall’ampiezza dell’effetto di distorsione geometrica, massimo per redshift vicini a 1. Abbiamo verificato che considerare come indipendenti le misure effettuate sui cataloghi a diversi redshift migliora l’identificazione del corretto valore del parametro di densità. Nella combinazione delle diverse misure, inoltre, si nota che i contributi meno significativi vengono dai redshift estremi dell’intervallo considerato, ma i singoli risultati, per quanto incerti, non intaccano il risultato finale. Infine si è osservato come ridurre il numero di parametri liberi attraverso i quali si introducono nei modelli le distorsioni dinamiche, in particolar modo gli effetti di distorsione a piccole scale, migliora sensibilmente l’efficacia del metodo.

DENGENE:一种高精度的基于密度的适用于基因表达数据的聚类算法

Relevância:

100.00% 100.00%

Publicador:

Resumo:

根据基因表达数据的特点,提出一种高精度的基于密度的聚类算法DENGENE.DENGENE通过定义一致性检测和引进峰点改进搜索方向,使得算法能够更好地处理基因表达数据.为了评价算法的性能,选取了两组广为使用的测试数据,即啤酒酵母基因表达数据集对算法来进行测试.实验结果表明,与基于模型的五种算法、CAST算法、K-均值聚类等相比,DENGENE在滤除噪声和聚类精度方面取得了显著的改善.

Impact of Testing Procedure on Permeability of Compacted Soils

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The commonly used British Standard constant head triaxial permeability (BS) test, for permeability testing of fine grained soils, is known to have a relatively long test duration. Consequently, a reduction in the required time for permeability test provides potential cost savings, to the construction industry (specifically, for use during Construction Quality Control (CQA) of landfill mineral liners). The purpose of this article is to investigate and evaluate alternative short duration testing methods for the measurement of the permeability of fine grained soils.

As part of the investigation the feasibility of an existing method of short duration permeability test, known as the Accelerated Permeability (AP) test was assessed and compared with permeability measured using British Standard method (BS) and Ramp Accelerated Permeability (RAP). Four different fine grained materials, of a variety of physical properties were compacted at various moisture contents to produced analogous samples for testing using three the three different methodologies. Fabric analysis was carried out on specimens derived from post-test samples using Mercury Intrusion Porosimetry (MIP) and Scanning Electron Microscope (SEM) to assess the effects of testing methodology on soil structure. Results showed that AP testing in general under predicts permeability values derived from the BS test due to large changes in structure of the soil caused by AP test methodology, which is also validated using MIP and SEM observations. RAP testing, in general provides an improvement to the AP test but still under-predicts permeability values. The potential savings in test duration are shown to be relatively minimal for both the AP and RAP tests.

Comparison of different methods to detect genetic barriers in a small mammal population

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Habitat fragmentation and the consequently the loss of connectivity between populations can reduce the individuals interchange and gene flow, increasing the chances of inbreeding, and the increase the risk of local extinction. Landscape genetics is providing more and better tools to identify genetic barriers.. To our knowledge, no comparison of methods in terms of consistency has been made with observed data and species with low dispersal ability. The aim of this study is to examine the consistency of the results of five methods to detect barriers to gene flow in a Mediterranean pine vole population Microtus duodecimcostatus: F-statistics estimations, Non-Bayesian clustering, Bayesian clustering, Boundary detection and Simple/Partial Mantel tests. All methods were consistent in detecting the stream as a non-genetic barrier. However, no consistency in results among the methods were found regarding the role of the highway as a genetic barrier. Fst, Bayesian clustering assignment test and Partial Mantel test identifyed the highway as a filter to individual interchange. The Mantel tests were the most sensitive method. Boundary detection method (Monmonier’s Algorithm) and Non-Bayesian approaches did not detect any genetic differentiation of the pine vole due to the highway. Based on our findings we recommend that the genetic barrier detection in low dispersal ability populations should be analyzed with multiple methods such as Mantel tests, Bayesian clustering approaches because they show more sensibility in those scenarios and with boundary detection methods by having the aim of detect drastic changes in a variable of interest between the closest individuals. Although simulation studies highlight the weaknesses and the strengths of each method and the factors that promote some results, tests with real data are needed to increase the effectiveness of genetic barrier detection.

Test of the Clustering Hypothesis in the Helsinki Exchanges

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper investigates the clustering pattern in the Finnish stock market. Using trading volume and time as factors capturing the clustering pattern in the market, the Keim and Madhavan (1996) and the Engle and Russell (1998) model provide the framework for the analysis. The descriptive and the parametric analysis provide evidences that an important determinant of the famous U-shape pattern in the market is the rate of information arrivals as measured by large trading volumes and durations at the market open and close. Precisely, 1) the larger the trading volume, the greater the impact on prices both in the short and the long run, thus prices will differ across quantities. 2) Large trading volume is a non-linear function of price changes in the long run. 3) Arrival times are positively autocorrelated, indicating a clustering pattern and 4) Information arrivals as approximated by durations are negatively related to trading flow.

Advances in sequential data assimilation and numerical weather forecasting: an Ensemble Transform Kalman-Bucy Filter, a study on clustering in deterministic Ensemble Square Root Filters, and a test of a new time stepping scheme in an atmospheric model

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This dissertation deals with aspects of sequential data assimilation (in particular ensemble Kalman filtering) and numerical weather forecasting. In the first part, the recently formulated Ensemble Kalman-Bucy (EnKBF) filter is revisited. It is shown that the previously used numerical integration scheme fails when the magnitude of the background error covariance grows beyond that of the observational error covariance in the forecast window. Therefore, we present a suitable integration scheme that handles the stiffening of the differential equations involved and doesn’t represent further computational expense. Moreover, a transform-based alternative to the EnKBF is developed: under this scheme, the operations are performed in the ensemble space instead of in the state space. Advantages of this formulation are explained. For the first time, the EnKBF is implemented in an atmospheric model. The second part of this work deals with ensemble clustering, a phenomenon that arises when performing data assimilation using of deterministic ensemble square root filters in highly nonlinear forecast models. Namely, an M-member ensemble detaches into an outlier and a cluster of M-1 members. Previous works may suggest that this issue represents a failure of EnSRFs; this work dispels that notion. It is shown that ensemble clustering can be reverted also due to nonlinear processes, in particular the alternation between nonlinear expansion and compression of the ensemble for different regions of the attractor. Some EnSRFs that use random rotations have been developed to overcome this issue; these formulations are analyzed and their advantages and disadvantages with respect to common EnSRFs are discussed. The third and last part contains the implementation of the Robert-Asselin-Williams (RAW) filter in an atmospheric model. The RAW filter is an improvement to the widely popular Robert-Asselin filter that successfully suppresses spurious computational waves while avoiding any distortion in the mean value of the function. Using statistical significance tests both at the local and field level, it is shown that the climatology of the SPEEDY model is not modified by the changed time stepping scheme; hence, no retuning of the parameterizations is required. It is found the accuracy of the medium-term forecasts is increased by using the RAW filter.

An adaptive algorithm for clustering cumulative probability distribution functions using the Kolmogorov–Smirnov two-sample test

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper proposes an adaptive algorithm for clustering cumulative probability distribution functions (c.p.d.f.) of a continuous random variable, observed in different populations, into the minimum homogeneous clusters, making no parametric assumptions about the c.p.d.f.’s. The distance function for clustering c.p.d.f.’s that is proposed is based on the Kolmogorov–Smirnov two sample statistic. This test is able to detect differences in position, dispersion or shape of the c.p.d.f.’s. In our context, this statistic allows us to cluster the recorded data with a homogeneity criterion based on the whole distribution of each data set, and to decide whether it is necessary to add more clusters or not. In this sense, the proposed algorithm is adaptive as it automatically increases the number of clusters only as necessary; therefore, there is no need to fix in advance the number of clusters. The output of the algorithm are the common c.p.d.f. of all observed data in the cluster (the centroid) and, for each cluster, the Kolmogorov–Smirnov statistic between the centroid and the most distant c.p.d.f. The proposed algorithm has been used for a large data set of solar global irradiation spectra distributions. The results obtained enable to reduce all the information of more than 270,000 c.p.d.f.’s in only 6 different clusters that correspond to 6 different c.p.d.f.’s.

XML clustering and its application to XML transformation

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The continuous growth of the XML data poses a great concern in the area of XML data management. The need for processing large amounts of XML data brings complications to many applications, such as information retrieval, data integration and many others. One way of simplifying this problem is to break the massive amount of data into smaller groups by application of clustering techniques. However, XML clustering is an intricate task that may involve the processing of both the structure and the content of XML data in order to identify similar XML data. This research presents four clustering methods, two methods utilizing the structure of XML documents and the other two utilizing both the structure and the content. The two structural clustering methods have different data models. One is based on a path model and other is based on a tree model. These methods employ rigid similarity measures which aim to identifying corresponding elements between documents with different or similar underlying structure. The two clustering methods that utilize both the structural and content information vary in terms of how the structure and content similarity are combined. One clustering method calculates the document similarity by using a linear weighting combination strategy of structure and content similarities. The content similarity in this clustering method is based on a semantic kernel. The other method calculates the distance between documents by a non-linear combination of the structure and content of XML documents using a semantic kernel. Empirical analysis shows that the structure-only clustering method based on the tree model is more scalable than the structure-only clustering method based on the path model as the tree similarity measure for the tree model does not need to visit the parents of an element many times. Experimental results also show that the clustering methods perform better with the inclusion of the content information on most test document collections. To further the research, the structural clustering method based on tree model is extended and employed in XML transformation. The results from the experiments show that the proposed transformation process is faster than the traditional transformation system that translates and converts the source XML documents sequentially. Also, the schema matching process of XML transformation produces a better matching result in a shorter time.

Spatiotemporal clustering analysis and risk assessments of human cutaneous anthrax in China, 2005–2012

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objective To investigate the epidemic characteristics of human cutaneous anthrax (CA) in China, detect the spatiotemporal clusters at the county level for preemptive public health interventions, and evaluate the differences in the epidemiological characteristics within and outside clusters. Methods CA cases reported during 2005–2012 from the national surveillance system were evaluated at the county level using space-time scan statistic. Comparative analysis of the epidemic characteristics within and outside identified clusters was performed using using the χ2 test or Kruskal-Wallis test. Results The group of 30–39 years had the highest incidence of CA, and the fatality rate increased with age, with persons ≥70 years showing a fatality rate of 4.04%. Seasonality analysis showed that most of CA cases occurred between May/June and September/October of each year. The primary spatiotemporal cluster contained 19 counties from June 2006 to May 2010, and it was mainly located straddling the borders of Sichuan, Gansu, and Qinghai provinces. In these high-risk areas, CA cases were predominantly found among younger, local, males, shepherds, who were living on agriculture and stockbreeding and characterized with high morbidity, low mortality and a shorter period from illness onset to diagnosis. Conclusion CA was geographically and persistently clustered in the Southwestern China during 2005–2012, with notable differences in the epidemic characteristics within and outside spatiotemporal clusters; this demonstrates the necessity for CA interventions such as enhanced surveillance, health education, mandatory and standard decontamination or disinfection procedures to be geographically targeted to the areas identified in this study.

An innovative clustering technique for unsupervized learning in the context of remotely sensed earth resources data analysis

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A new clustering technique, based on the concept of immediato neighbourhood, with a novel capability to self-learn the number of clusters expected in the unsupervized environment, has been developed. The method compares favourably with other clustering schemes based on distance measures, both in terms of conceptual innovations and computational economy. Test implementation of the scheme using C-l flight line training sample data in a simulated unsupervized mode has brought out the efficacy of the technique. The technique can easily be implemented as a front end to established pattern classification systems with supervized learning capabilities to derive unified learning systems capable of operating in both supervized and unsupervized environments. This makes the technique an attractive proposition in the context of remotely sensed earth resources data analysis wherein it is essential to have such a unified learning system capability.

Online client-AP association in WLANs

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In many IEEE 802.11 WLAN deployments, wireless clients have a choice of access points (AP) to connect to. In current systems, clients associate with the access point with the strongest signal to noise ratio. However, such an association mechanism can lead to unequal load sharing, resulting in diminished system performance. In this paper, we first provide a numerical approach based on stochastic dynamic programming to find the optimal client-AP association algorithm for a small topology consisting of two access points. Using the value iteration algorithm, we determine the optimal association rule for the two-AP topology. Next, utilizing the insights obtained from the optimal association ride for the two-AP case, we propose a near-optimal heuristic that we call RAT. We test the efficacy of RAT by considering more realistic arrival patterns and a larger topology. Our results show that RAT performs very well in these scenarios as well. Moreover, RAT lends itself to a fairly simple implementation.

Perturbative growth of cosmological clustering. I: Formalism

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Here we rederive the hierarchy of equations for the evolution of distribution functions of various orders using a convenient parameterization. We use this to obtain equations for two- and three-point correlation functions in powers of a small parameter, viz., the initial density contrast. The correspondence of the lowest order solutions of these equations to the results from the linear theory of density perturbations is shown for an OMEGA = 1 universe. These equations are then used to calculate, to the lowest order, the induced three-point correlation function that arises from Gaussian initial conditions in an OMEGA = 1 universe. We obtain an expression which explicitly exhibits the spatial structure of the induced three-point correlation function. It is seen that the spatial structure of this quantity is independent of the value of OMEGA. We also calculate the triplet momentum. We find that the induced three-point correlation function does not have the ''hierarchical'' form often assumed. We discuss possibilities of using the induced three-point correlation to interpret observational data. The formalism developed here can also be used to test a validity of different schemes to close the

Regionalization of precipitation in data sparse areas using large scale atmospheric variables - A fuzzy clustering approach

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Delineation of homogeneous precipitation regions (regionalization) is necessary for investigating frequency and spatial distribution of meteorological droughts. The conventional methods of regionalization use statistics of precipitation as attributes to establish homogeneous regions. Therefore they cannot be used to form regions in ungauged areas, and they may not be useful to form meaningful regions in areas having sparse rain gauge density. Further, validation of the regions for homogeneity in precipitation is not possible, since the use of the precipitation statistics to form regions and subsequently to test the regional homogeneity is not appropriate. To alleviate this problem, an approach based on fuzzy cluster analysis is presented. It allows delineation of homogeneous precipitation regions in data sparse areas using large scale atmospheric variables (LSAV), which influence precipitation in the study area, as attributes. The LSAV, location parameters (latitude, longitude and altitude) and seasonality of precipitation are suggested as features for regionalization. The approach allows independent validation of the identified regions for homogeneity using statistics computed from the observed precipitation. Further it has the ability to form regions even in ungauged areas, owing to the use of attributes that can be reliably estimated even when no at-site precipitation data are available. The approach was applied to delineate homogeneous annual rainfall regions in India, and its effectiveness is illustrated by comparing the results with those obtained using rainfall statistics, regionalization based on hard cluster analysis, and meteorological sub-divisions in India. (C) 2011 Elsevier B.V. All rights reserved.

A Fast Linear Separability Test by Projection of Positive Points on Subspaces

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A geometric and non parametric procedure for testing if two finite set of points are linearly separable is proposed. The Linear Separability Test is equivalent to a test that determines if a strictly positive point h > 0 exists in the range of a matrix A (related to the points in the two finite sets). The algorithm proposed in the paper iteratively checks if a strictly positive point exists in a subspace by projecting a strictly positive vector with equal co-ordinates (p), on the subspace. At the end of each iteration, the subspace is reduced to a lower dimensional subspace. The test is completed within r ≤ min(n, d + 1) steps, for both linearly separable and non separable problems (r is the rank of A, n is the number of points and d is the dimension of the space containing the points). The worst case time complexity of the algorithm is O(nr3) and space complexity of the algorithm is O(nd). A small review of some of the prominent algorithms and their time complexities is included. The worst case computational complexity of our algorithm is lower than the worst case computational complexity of Simplex, Perceptron, Support Vector Machine and Convex Hull Algorithms, if d

On Clustering Images Using Compression

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The need for the ability to cluster unknown data to better understand its relationship to know data is prevalent throughout science. Besides a better understanding of the data itself or learning about a new unknown object, cluster analysis can help with processing data, data standardization, and outlier detection. Most clustering algorithms are based on known features or expectations, such as the popular partition based, hierarchical, density-based, grid based, and model based algorithms. The choice of algorithm depends on many factors, including the type of data and the reason for clustering, nearly all rely on some known properties of the data being analyzed. Recently, Li et al. proposed a new universal similarity metric, this metric needs no prior knowledge about the object. Their similarity metric is based on the Kolmogorov Complexity of objects, the objects minimal description. While the Kolmogorov Complexity of an object is not computable, in "Clustering by Compression," Cilibrasi and Vitanyi use common compression algorithms to approximate the universal similarity metric and cluster objects with high success. Unfortunately, clustering using compression does not trivially extend to higher dimensions. Here we outline a method to adapt their procedure to images. We test these techniques on images of letters of the alphabet.

«
1
2
3
4
5
6
7
8
...
65
66
»