993 resultados para cluster selection
Resumo:
We present a detailed description of the Voronoi Tessellation (VT) cluster finder algorithm in 2+1 dimensions, which improves on past implementations of this technique. The need for cluster finder algorithms able to produce reliable cluster catalogs up to redshift 1 or beyond and down to 10(13.5) solar masses is paramount especially in light of upcoming surveys aiming at cosmological constraints from galaxy cluster number counts. We build the VT in photometric redshift shells and use the two-point correlation function of the galaxies in the field to both determine the density threshold for detection of cluster candidates and to establish their significance. This allows us to detect clusters in a self-consistent way without any assumptions about their astrophysical properties. We apply the VT to mock catalogs which extend to redshift 1.4 reproducing the ACDM cosmology and the clustering properties observed in the Sloan Digital Sky Survey data. An objective estimate of the cluster selection function in terms of the completeness and purity as a function of mass and redshift is as important as having a reliable cluster finder. We measure these quantities by matching the VT cluster catalog with the mock truth table. We show that the VT can produce a cluster catalog with completeness and purity > 80% for the redshift range up to similar to 1 and mass range down to similar to 10(13.5) solar masses.
Resumo:
We analyze the implications of a market imperfection related to the inability to establish intellectual property rights, that we label {\it unverifiable communication}. Employees are able to collude with external parties selling ``knowledge capital'' of the firm. The firm organizer engages in strategic interaction simultaneously with employees and competitors, as she introduces endogenous transaction costs in the market for information between those agents. Incentive schemes and communication costs are the key strategic variables used by the firm to induce frictions in collusive markets. Unverifiable communication introduces severe allocative distortions, both at internal product development and at intended sale of information (technology transfer). We derive implications of the model for observable decisions like characteristics of the employment relationship (full employment, incompatibility with other jobs), firms' preferences over cluster characteristics for location decisions, optimal size at entry, in--house development vs sale strategies for innovations and industry evolution.
Resumo:
This paper deals with the selection of centres for radial basis function (RBF) networks. A novel mean-tracking clustering algorithm is described as a way in which centers can be chosen based on a batch of collected data. A direct comparison is made between the mean-tracking algorithm and k-means clustering and it is shown how mean-tracking clustering is significantly better in terms of achieving an RBF network which performs accurate function modelling.
Resumo:
This paper proposes a filter-based algorithm for feature selection. The filter is based on the partitioning of the set of features into clusters. The number of clusters, and consequently the cardinality of the subset of selected features, is automatically estimated from data. The computational complexity of the proposed algorithm is also investigated. A variant of this filter that considers feature-class correlations is also proposed for classification problems. Empirical results involving ten datasets illustrate the performance of the developed algorithm, which in general has obtained competitive results in terms of classification accuracy when compared to state of the art algorithms that find clusters of features. We show that, if computational efficiency is an important issue, then the proposed filter May be preferred over their counterparts, thus becoming eligible to join a pool of feature selection algorithms to be used in practice. As an additional contribution of this work, a theoretical framework is used to formally analyze some properties of feature selection methods that rely on finding clusters of features. (C) 2011 Elsevier Inc. All rights reserved.
Resumo:
One way to organize knowledge and make its search and retrieval easier is to create a structural representation divided by hierarchically related topics. Once this structure is built, it is necessary to find labels for each of the obtained clusters. In many cases the labels must be built using all the terms in the documents of the collection. This paper presents the SeCLAR method, which explores the use of association rules in the selection of good candidates for labels of hierarchical document clusters. The purpose of this method is to select a subset of terms by exploring the relationship among the terms of each document. Thus, these candidates can be processed by a classical method to generate the labels. An experimental study demonstrates the potential of the proposed approach to improve the precision and recall of labels obtained by classical methods only considering the terms which are potentially more discriminative. © 2012 - IOS Press and the authors. All rights reserved.
Anthropometric characteristics and motor skills in talent selection and development in indoor soccer
Resumo:
Kick performance, anthropometric characteristics, slalom, and linear running were assessed in 49 (24 elite, 25 nonelite) postpubertal indoor soccer players in order to (a) verify whether anthropometric characteristics and physical and technical capacities can distinguish players of different competitive levels, (b) compare the kicking kinematics of these groups, with and without a defined target, and (c) compare results on the assessments and coaches` subjective rankings of the players. Thigh circumference and specific technical capacities differentiated the players by level of play; cluster analysis correctly classified 77.5% of the players. The correlation between players` standardized measures and the coaches` rankings was 0.29. Anthropometric characteristics and physical capacities do not necessarily differentiate players at post-pubertal stages and should not be overvalued during early development. Considering the coaches` rankings, performance measures outside the specific game conditions may not be useful in identification of talented players.
Resumo:
The Fornax Cluster Spectroscopic Survey (FCSS) project utilizes the Two-degree Field (2dF) multi-object spectrograph on the Anglo-Australian Telescope (AAT). Its aim is to obtain spectra for a complete sample of all 14 000 objects with 16 5 less than or equal to b(j) less than or equal to 19 7 irrespective of their morphology in a 12 deg(2) area centred on the Fornax cluster. A sample of 24 Fornax cluster members has been identified from the first 2dF field (3.1 deg(2) in area) to be completed. This is the first complete sample of cluster objects of known distance with well-defined selection limits. Nineteen of the galaxies (with -15.8 < M-B < 12.7) appear to be conventional dwarf elliptical (dE) or dwarf S0 (dS0) galaxies. The other five objects (with -13.6 < M-B < 11.3) are those galaxies which were described recently by Drinkwater et al. and labelled 'ultracompact dwarfs' (UCDs). A major result is that the conventional dwarfs all have scale sizes alpha greater than or similar to 3 arcsec (similar or equal to300 pc). This apparent minimum scale size implies an equivalent minimum luminosity for a dwarf of a given surface brightness. This produces a limit on their distribution in the magnitude-surface brightness plane, such that we do not observe dEs with high surface brightnesses but faint absolute magnitudes. Above this observed minimum scale size of 3 arcsec, the dEs and dS0s fill the whole area of the magnitude-surface brightness plane sampled by our selection limits. The observed correlation between magnitude and surface brightness noted by several recent studies of brighter galaxies is not seen with our fainter cluster sample. A comparison of our results with the Fornax Cluster Catalog (FCC) of Ferguson illustrates that attempts to determine cluster membership solely on the basis of observed morphology can produce significant errors. The FCC identified 17 of the 24 FCSS sample (i.e. 71 per cent) as being 'cluster' members, in particular missing all five of the UCDs. The FCC also suffers from significant contamination: within the FCSS's field and selection limits, 23 per cent of those objects described as cluster members by the FCC are shown by the FCSS to be background objects.
Resumo:
We have measured nucleotide variation in the CLOCK/CYCLE heterodimer inhibition domain (CCID) of the clock X-linked gene period in seven species belonging to the Drosophila buzzatii cluster, namely D. buzzatii, Drosophila koepferae, Drosophila antonietae, Drosophila serido, Drosophila gouveai, Drosophila seriema and Drosophila borborema. We detected that the purifying selection is the main force driving the sequence evolution in period, in agreement with the important role of CCID in clock machinery. Our survey revealed that period provides valuable phylogenetic information that allowed to resolve phylogenetic relationships among D. gouveai, D. borborema and D. seriema, which composed a polytomic clade in preliminary studies. The analysis of patterns of intraspecific variation revealed two different lineages of period in D. koepferae, probably reflecting introgressive hybridization from D. buzzatii, in concordance with previous molecular data.
Resumo:
Research on the problem of feature selection for clustering continues to develop. This is a challenging task, mainly due to the absence of class labels to guide the search for relevant features. Categorical feature selection for clustering has rarely been addressed in the literature, with most of the proposed approaches having focused on numerical data. In this work, we propose an approach to simultaneously cluster categorical data and select a subset of relevant features. Our approach is based on a modification of a finite mixture model (of multinomial distributions), where a set of latent variables indicate the relevance of each feature. To estimate the model parameters, we implement a variant of the expectation-maximization algorithm that simultaneously selects the subset of relevant features, using a minimum message length criterion. The proposed approach compares favourably with two baseline methods: a filter based on an entropy measure and a wrapper based on mutual information. The results obtained on synthetic data illustrate the ability of the proposed expectation-maximization method to recover ground truth. An application to real data, referred to official statistics, shows its usefulness.
Resumo:
In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which are not directly involved to cluster the data. An approach is proposed in the model-based clustering context to select a number of clusters which both fits the data well and takes advantage of the potential illustrative ability of the external variables. This approach makes use of the integrated joint likelihood of the data and the partitions at hand, namely the model-based partition and the partitions associated to the external variables. It is noteworthy that each mixture model is fitted by the maximum likelihood methodology to the data, excluding the external variables which are used to select a relevant mixture model only. Numerical experiments illustrate the promising behaviour of the derived criterion. © 2014 Springer-Verlag Berlin Heidelberg.
Resumo:
In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which are not directly involved to cluster the data. An approach is proposed in the model-based clustering context to select a number of clusters which both fits the data well and takes advantage of the potential illustrative ability of the external variables. This approach makes use of the integrated joint likelihood of the data and the partitions at hand, namely the model-based partition and the partitions associated to the external variables. It is noteworthy that each mixture model is fitted by the maximum likelihood methodology to the data, excluding the external variables which are used to select a relevant mixture model only. Numerical experiments illustrate the promising behaviour of the derived criterion.
Resumo:
In this manuscript we tackle the problem of semidistributed user selection with distributed linear precoding for sum rate maximization in multiuser multicell systems. A set of adjacent base stations (BS) form a cluster in order to perform coordinated transmission to cell-edge users, and coordination is carried out through a central processing unit (CU). However, the message exchange between BSs and the CU is limited to scheduling control signaling and no user data or channel state information (CSI) exchange is allowed. In the considered multicell coordinated approach, each BS has its own set of cell-edge users and transmits only to one intended user while interference to non-intended users at other BSs is suppressed by signal steering (precoding). We use two distributed linear precoding schemes, Distributed Zero Forcing (DZF) and Distributed Virtual Signalto-Interference-plus-Noise Ratio (DVSINR). Considering multiple users per cell and the backhaul limitations, the BSs rely on local CSI to solve the user selection problem. First we investigate how the signal-to-noise-ratio (SNR) regime and the number of antennas at the BSs impact the effective channel gain (the magnitude of the channels after precoding) and its relationship with multiuser diversity. Considering that user selection must be based on the type of implemented precoding, we develop metrics of compatibility (estimations of the effective channel gains) that can be computed from local CSI at each BS and reported to the CU for scheduling decisions. Based on such metrics, we design user selection algorithms that can find a set of users that potentially maximizes the sum rate. Numerical results show the effectiveness of the proposed metrics and algorithms for different configurations of users and antennas at the base stations.
Resumo:
A Work Project, presented as part of the requirements for the Award of a Masters Degree in Finance from the NOVA – School of Business and Economics
Resumo:
Botnets are a group of computers infected with a specific sub-set of a malware family and controlled by one individual, called botmaster. This kind of networks are used not only, but also for virtual extorsion, spam campaigns and identity theft. They implement different types of evasion techniques that make it harder for one to group and detect botnet traffic. This thesis introduces one methodology, called CONDENSER, that outputs clusters through a self-organizing map and that identify domain names generated by an unknown pseudo-random seed that is known by the botnet herder(s). Aditionally DNS Crawler is proposed, this system saves historic DNS data for fast-flux and double fastflux detection, and is used to identify live C&Cs IPs used by real botnets. A program, called CHEWER, was developed to automate the calculation of the SVM parameters and features that better perform against the available domain names associated with DGAs. CONDENSER and DNS Crawler were developed with scalability in mind so the detection of fast-flux and double fast-flux networks become faster. We used a SVM for the DGA classififer, selecting a total of 11 attributes and achieving a Precision of 77,9% and a F-Measure of 83,2%. The feature selection method identified the 3 most significant attributes of the total set of attributes. For clustering, a Self-Organizing Map was used on a total of 81 attributes. The conclusions of this thesis were accepted in Botconf through a submited article. Botconf is known conferênce for research, mitigation and discovery of botnets tailled for the industry, where is presented current work and research. This conference is known for having security and anti-virus companies, law enforcement agencies and researchers.
Resumo:
Twenty three isolates of Beauveria bassiana and 13 isolates of Metarhizium anisopliae were tested on third instar nymphs of Triatoma infestans, a serious vector of Chagas disease. Pathogenicity tests at saturated humidity showed that this insect is very susceptible to fungal infection. At lower relative humidity (50%), conditions expected in the vector microhabitat, virulence was significantly different among isolates. Cumulative mortality 15 days after treatment varied from 17.5 to 97.5%, and estimates of 50% survival time varied from 6 to 11 days. Maintaining lower relative humidity, four B. bassiana and two M. anisopliae isolates were selected for analysis of virulence at different conidial concentrations and temperatures. Lethal concentrations sufficient to kill 50% of insects (LC50) varied from 7.1x105 to 4.3x106 conidia/ml, for a B. bassiana isolate (CG 14) and a M. anisopliae isolate (CG 491) respectively. Most isolates, particularly B. bassiana isolates CG 24 and CG 306, proved to be more virulent at 25 and 30°C, compared to 15 and 20°C. The differential virulence at 50% humidity observed among some B. bassiana isolates was not correlated to phenetic groups in cluster analysis of RAPD markers. In fact, the B. bassiana isolates analyzed presented a high homogeneity (> 73% similarity).