243 resultados para regression algorithm


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The proliferation of the web presents an unsolved problem of automatically analyzing billions of pages of natural language. We introduce a scalable algorithm that clusters hundreds of millions of web pages into hundreds of thousands of clusters. It does this on a single mid-range machine using efficient algorithms and compressed document representations. It is applied to two web-scale crawls covering tens of terabytes. ClueWeb09 and ClueWeb12 contain 500 and 733 million web pages and were clustered into 500,000 to 700,000 clusters. To the best of our knowledge, such fine grained clustering has not been previously demonstrated. Previous approaches clustered a sample that limits the maximum number of discoverable clusters. The proposed EM-tree algorithm uses the entire collection in clustering and produces several orders of magnitude more clusters than the existing algorithms. Fine grained clustering is necessary for meaningful clustering in massive collections where the number of distinct topics grows linearly with collection size. These fine-grained clusters show an improved cluster quality when assessed with two novel evaluations using ad hoc search relevance judgments and spam classifications for external validation. These evaluations solve the problem of assessing the quality of clusters where categorical labeling is unavailable and unfeasible.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper discusses three different ways of applying the single-objective binary genetic algorithm into designing the wind farm. The introduction of different applications is through altering the binary encoding methods in GA codes. The first encoding method is the traditional one with fixed wind turbine positions. The second involves varying the initial positions from results of the first method, and it is achieved by using binary digits to represent the coordination of wind turbine on X or Y axis. The third is the mixing of the first encoding method with another one, which is by adding four more binary digits to represent one of the unavailable plots. The goal of this paper is to demonstrate how the single-objective binary algorithm can be applied and how the wind turbines are distributed under various conditions with best fitness. The main emphasis of discussion is focused on the scenario of wind direction varying from 0° to 45°. Results show that choosing the appropriate position of wind turbines is more significant than choosing the wind turbine numbers, considering that the former has a bigger influence on the whole farm fitness than the latter. And the farm has best performance of fitness values, farm efficiency, and total power with the direction between 20°to 30°.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Structural identification (St-Id) can be considered as the process of updating a finite element (FE) model of a structural system to match the measured response of the structure. This paper presents the St-Id of a laboratory-based steel through-truss cantilevered bridge with suspended span. There are a total of 600 degrees of freedom (DOFs) in the superstructure plus additional DOFs in the substructure. The St-Id of the bridge model used the modal parameters from a preliminary modal test in the objective function of a global optimisation technique using a layered genetic algorithm with patternsearch step (GAPS). Each layer of the St-Id process involved grouping of the structural parameters into a number of updating parameters and running parallel optimisations. The number of updating parameters was increased at each layer of the process. In order to accelerate the optimisation and ensure improved diversity within the population, a patternsearch step was applied to the fittest individuals at the end of each generation of the GA. The GAPS process was able to replicate the mode shapes for the first two lateral sway modes and the first vertical bending mode to a high degree of accuracy and, to a lesser degree, the mode shape of the first lateral bending mode. The mode shape and frequency of the torsional mode did not match very well. The frequencies of the first lateral bending mode, the first longitudinal mode and the first vertical mode matched very well. The frequency of the first sway mode was lower and that of the second sway mode was higher than the true values, indicating a possible problem with the FE model. Improvements to the model and the St-Id process will be presented at the upcoming conference and compared to the results presented in this paper. These improvements will include the use of multiple FE models in a multi-layered, multi-solution, GAPS St-Id approach.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we develop and validate a new Statistically Assisted Fluid Registration Algorithm (SAFIRA) for brain images. A non-statistical version of this algorithm was first implemented in [2] and re-formulated using Lagrangian mechanics in [3]. Here we extend this algorithm to 3D: given 3D brain images from a population, vector fields and their corresponding deformation matrices are computed in a first round of registrations using the non-statistical implementation. Covariance matrices for both the deformation matrices and the vector fields are then obtained and incorporated (separately or jointly) in the regularizing (i.e., the non-conservative Lagrangian) terms, creating four versions of the algorithm. We evaluated the accuracy of each algorithm variant using the manually labeled LPBA40 dataset, which provides us with ground truth anatomical segmentations. We also compared the power of the different algorithms using tensor-based morphometry -a technique to analyze local volumetric differences in brain structure- applied to 46 3D brain scans from healthy monozygotic twins.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Large multisite efforts (e.g., the ENIGMA Consortium), have shown that neuroimaging traits including tract integrity (from DTI fractional anisotropy, FA) and subcortical volumes (from T1-weighted scans) are highly heritable and promising phenotypes for discovering genetic variants associated with brain structure. However, genetic correlations (rg) among measures from these different modalities for mapping the human genome to the brain remain unknown. Discovering these correlations can help map genetic and neuroanatomical pathways implicated in development and inherited risk for disease. We use structural equation models and a twin design to find rg between pairs of phenotypes extracted from DTI and MRI scans. When controlling for intracranial volume, the caudate as well as related measures from the limbic system - hippocampal volume - showed high rg with the cingulum FA. Using an unrelated sample and a Seemingly Unrelated Regression model for bivariate analysis of this connection, we show that a multivariate GWAS approach may be more promising for genetic discovery than a univariate approach applied to each trait separately.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We implemented least absolute shrinkage and selection operator (LASSO) regression to evaluate gene effects in genome-wide association studies (GWAS) of brain images, using an MRI-derived temporal lobe volume measure from 729 subjects scanned as part of the Alzheimer's Disease Neuroimaging Initiative (ADNI). Sparse groups of SNPs in individual genes were selected by LASSO, which identifies efficient sets of variants influencing the data. These SNPs were considered jointly when assessing their association with neuroimaging measures. We discovered 22 genes that passed genome-wide significance for influencing temporal lobe volume. This was a substantially greater number of significant genes compared to those found with standard, univariate GWAS. These top genes are all expressed in the brain and include genes previously related to brain function or neuropsychiatric disorders such as MACROD2, SORCS2, GRIN2B, MAGI2, NPAS3, CLSTN2, GABRG3, NRXN3, PRKAG2, GAS7, RBFOX1, ADARB2, CHD4, and CDH13. The top genes we identified with this method also displayed significant and widespread post hoc effects on voxelwise, tensor-based morphometry (TBM) maps of the temporal lobes. The most significantly associated gene was an autism susceptibility gene known as MACROD2.We were able to successfully replicate the effect of the MACROD2 gene in an independent cohort of 564 young, Australian healthy adult twins and siblings scanned with MRI (mean age: 23.8±2.2 SD years). Our approach powerfully complements univariate techniques in detecting influences of genes on the living brain.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The increase in data center dependent services has made energy optimization of data centers one of the most exigent challenges in today's Information Age. The necessity of green and energy-efficient measures is very high for reducing carbon footprint and exorbitant energy costs. However, inefficient application management of data centers results in high energy consumption and low resource utilization efficiency. Unfortunately, in most cases, deploying an energy-efficient application management solution inevitably degrades the resource utilization efficiency of the data centers. To address this problem, a Penalty-based Genetic Algorithm (GA) is presented in this paper to solve a defined profile-based application assignment problem whilst maintaining a trade-off between the power consumption performance and resource utilization performance. Case studies show that the penalty-based GA is highly scalable and provides 16% to 32% better solutions than a greedy algorithm.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the past few years, the virtual machine (VM) placement problem has been studied intensively and many algorithms for the VM placement problem have been proposed. However, those proposed VM placement algorithms have not been widely used in today's cloud data centers as they do not consider the migration cost from current VM placement to the new optimal VM placement. As a result, the gain from optimizing VM placement may be less than the loss of the migration cost from current VM placement to the new VM placement. To address this issue, this paper presents a penalty-based genetic algorithm (GA) for the VM placement problem that considers the migration cost in addition to the energy-consumption of the new VM placement and the total inter-VM traffic flow in the new VM placement. The GA has been implemented and evaluated by experiments, and the experimental results show that the GA outperforms two well known algorithms for the VM placement problem.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Although live VM migration has been intensively studied, the problem of live migration of multiple interdependent VMs has hardly been investigated. The most important problem in the live migration of multiple interdependent VMs is how to schedule VM migrations as the schedule will directly affect the total migration time and the total downtime of those VMs. Aiming at minimizing both the total migration time and the total downtime simultaneously, this paper presents a Strength Pareto Evolutionary Algorithm 2 (SPEA2) for the multi-VM migration scheduling problem. The SPEA2 has been evaluated by experiments, and the experimental results show that the SPEA2 can generate a set of VM migration schedules with a shorter total migration time and a shorter total downtime than an existing genetic algorithm, namely Random Key Genetic Algorithm (RKGA). This paper also studies the scalability of the SPEA2.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Projective Hjelmslev planes and affine Hjelmslev planes are generalisations of projective planes and affine planes. We present an algorithm for constructing projective Hjelmslev planes and affine Hjelmslev planes that uses projective planes, affine planes and orthogonal arrays. We show that all 2-uniform projective Hjelmslev planes, and all 2-uniform affine Hjelmslev planes can be constructed in this way. As a corollary it is shown that all $2$-uniform affine Hjelmslev planes are sub-geometries of $2$-uniform projective Hjelmslev planes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Rolling-element bearing failures are the most frequent problems in rotating machinery, which can be catastrophic and cause major downtime. Hence, providing advance failure warning and precise fault detection in such components are pivotal and cost-effective. The vast majority of past research has focused on signal processing and spectral analysis for fault diagnostics in rotating components. In this study, a data mining approach using a machine learning technique called anomaly detection (AD) is presented. This method employs classification techniques to discriminate between defect examples. Two features, kurtosis and Non-Gaussianity Score (NGS), are extracted to develop anomaly detection algorithms. The performance of the developed algorithms was examined through real data from a test to failure bearing. Finally, the application of anomaly detection is compared with one of the popular methods called Support Vector Machine (SVM) to investigate the sensitivity and accuracy of this approach and its ability to detect the anomalies in early stages.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present an algorithm for multiarmed bandits that achieves almost optimal performance in both stochastic and adversarial regimes without prior knowledge about the nature of the environment. Our algorithm is based on augmentation of the EXP3 algorithm with a new control lever in the form of exploration parameters that are tailored individually for each arm. The algorithm simultaneously applies the “old” control lever, the learning rate, to control the regret in the adversarial regime and the new control lever to detect and exploit gaps between the arm losses. This secures problem-dependent “logarithmic” regret when gaps are present without compromising on the worst-case performance guarantee in the adversarial regime. We show that the algorithm can exploit both the usual expected gaps between the arm losses in the stochastic regime and deterministic gaps between the arm losses in the adversarial regime. The algorithm retains “logarithmic” regret guarantee in the stochastic regime even when some observations are contaminated by an adversary, as long as on average the contamination does not reduce the gap by more than a half. Our results for the stochastic regime are supported by experimental validation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We investigate the terminating concept of BKZ reduction first introduced by Hanrot et al. [Crypto'11] and make extensive experiments to predict the number of tours necessary to obtain the best possible trade off between reduction time and quality. Then, we improve Buchmann and Lindner's result [Indocrypt'09] to find sub-lattice collision in SWIFFT. We illustrate that further improvement in time is possible through special setting of SWIFFT parameters and also through the combination of different reduction parameters adaptively. Our contribution also include a probabilistic simulation approach top-up deterministic simulation described by Chen and Nguyen [Asiacrypt'11] that can able to predict the Gram-Schmidt norms more accurately for large block sizes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Ordinal qualitative data are often collected for phenotypical measurements in plant pathology and other biological sciences. Statistical methods, such as t tests or analysis of variance, are usually used to analyze ordinal data when comparing two groups or multiple groups. However, the underlying assumptions such as normality and homogeneous variances are often violated for qualitative data. To this end, we investigated an alternative methodology, rank regression, for analyzing the ordinal data. The rank-based methods are essentially based on pairwise comparisons and, therefore, can deal with qualitative data naturally. They require neither normality assumption nor data transformation. Apart from robustness against outliers and high efficiency, the rank regression can also incorporate covariate effects in the same way as the ordinary regression. By reanalyzing a data set from a wheat Fusarium crown rot study, we illustrated the use of the rank regression methodology and demonstrated that the rank regression models appear to be more appropriate and sensible for analyzing nonnormal data and data with outliers.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Rank-based inference is widely used because of its robustness. This article provides optimal rank-based estimating functions in analysis of clustered data with random cluster effects. The extensive simulation studies carried out to evaluate the performance of the proposed method demonstrate that it is robust to outliers and is highly efficient given the existence of strong cluster correlations. The performance of the proposed method is satisfactory even when the correlation structure is misspecified, or when heteroscedasticity in variance is present. Finally, a real dataset is analyzed for illustration.