904 resultados para Pruning algorithms
Resumo:
Choosing the right or the best option is often a demanding and challenging task for the user (e.g., a customer in an online retailer) when there are many available alternatives. In fact, the user rarely knows which offering will provide the highest value. To reduce the complexity of the choice process, automated recommender systems generate personalized recommendations. These recommendations take into account the preferences collected from the user in an explicit (e.g., letting users express their opinion about items) or implicit (e.g., studying some behavioral features) way. Such systems are widespread; research indicates that they increase the customers' satisfaction and lead to higher sales. Preference handling is one of the core issues in the design of every recommender system. This kind of system often aims at guiding users in a personalized way to interesting or useful options in a large space of possible options. Therefore, it is important for them to catch and model the user's preferences as accurately as possible. In this thesis, we develop a comparative preference-based user model to represent the user's preferences in conversational recommender systems. This type of user model allows the recommender system to capture several preference nuances from the user's feedback. We show that, when applied to conversational recommender systems, the comparative preference-based model is able to guide the user towards the best option while the system is interacting with her. We empirically test and validate the suitability and the practical computational aspects of the comparative preference-based user model and the related preference relations by comparing them to a sum of weights-based user model and the related preference relations. Product configuration, scheduling a meeting and the construction of autonomous agents are among several artificial intelligence tasks that involve a process of constrained optimization, that is, optimization of behavior or options subject to given constraints with regards to a set of preferences. When solving a constrained optimization problem, pruning techniques, such as the branch and bound technique, point at directing the search towards the best assignments, thus allowing the bounding functions to prune more branches in the search tree. Several constrained optimization problems may exhibit dominance relations. These dominance relations can be particularly useful in constrained optimization problems as they can instigate new ways (rules) of pruning non optimal solutions. Such pruning methods can achieve dramatic reductions in the search space while looking for optimal solutions. A number of constrained optimization problems can model the user's preferences using the comparative preferences. In this thesis, we develop a set of pruning rules used in the branch and bound technique to efficiently solve this kind of optimization problem. More specifically, we show how to generate newly defined pruning rules from a dominance algorithm that refers to a set of comparative preferences. These rules include pruning approaches (and combinations of them) which can drastically prune the search space. They mainly reduce the number of (expensive) pairwise comparisons performed during the search while guiding constrained optimization algorithms to find optimal solutions. Our experimental results show that the pruning rules that we have developed and their different combinations have varying impact on the performance of the branch and bound technique.
Resumo:
Case-Based Reasoning (CBR) uses past experiences to solve new problems. The quality of the past experiences, which are stored as cases in a case base, is a big factor in the performance of a CBR system. The system's competence may be improved by adding problems to the case base after they have been solved and their solutions verified to be correct. However, from time to time, the case base may have to be refined to reduce redundancy and to get rid of any noisy cases that may have been introduced. Many case base maintenance algorithms have been developed to delete noisy and redundant cases. However, different algorithms work well in different situations and it may be difficult for a knowledge engineer to know which one is the best to use for a particular case base. In this thesis, we investigate ways to combine algorithms to produce better deletion decisions than the decisions made by individual algorithms, and ways to choose which algorithm is best for a given case base at a given time. We analyse five of the most commonly-used maintenance algorithms in detail and show how the different algorithms perform better on different datasets. This motivates us to develop a new approach: maintenance by a committee of experts (MACE). MACE allows us to combine maintenance algorithms to produce a composite algorithm which exploits the merits of each of the algorithms that it contains. By combining different algorithms in different ways we can also define algorithms that have different trade-offs between accuracy and deletion. While MACE allows us to define an infinite number of new composite algorithms, we still face the problem of choosing which algorithm to use. To make this choice, we need to be able to identify properties of a case base that are predictive of which maintenance algorithm is best. We examine a number of measures of dataset complexity for this purpose. These provide a numerical way to describe a case base at a given time. We use the numerical description to develop a meta-case-based classification system. This system uses previous experience about which maintenance algorithm was best to use for other case bases to predict which algorithm to use for a new case base. Finally, we give the knowledge engineer more control over the deletion process by creating incremental versions of the maintenance algorithms. These incremental algorithms suggest one case at a time for deletion rather than a group of cases, which allows the knowledge engineer to decide whether or not each case in turn should be deleted or kept. We also develop incremental versions of the complexity measures, allowing us to create an incremental version of our meta-case-based classification system. Since the case base changes after each deletion, the best algorithm to use may also change. The incremental system allows us to choose which algorithm is the best to use at each point in the deletion process.
Resumo:
The electroencephalogram (EEG) is a medical technology that is used in the monitoring of the brain and in the diagnosis of many neurological illnesses. Although coarse in its precision, the EEG is a non-invasive tool that requires minimal set-up times, and is suitably unobtrusive and mobile to allow continuous monitoring of the patient, either in clinical or domestic environments. Consequently, the EEG is the current tool-of-choice with which to continuously monitor the brain where temporal resolution, ease-of- use and mobility are important. Traditionally, EEG data are examined by a trained clinician who identifies neurological events of interest. However, recent advances in signal processing and machine learning techniques have allowed the automated detection of neurological events for many medical applications. In doing so, the burden of work on the clinician has been significantly reduced, improving the response time to illness, and allowing the relevant medical treatment to be administered within minutes rather than hours. However, as typical EEG signals are of the order of microvolts (μV ), contamination by signals arising from sources other than the brain is frequent. These extra-cerebral sources, known as artefacts, can significantly distort the EEG signal, making its interpretation difficult, and can dramatically disimprove automatic neurological event detection classification performance. This thesis therefore, contributes to the further improvement of auto- mated neurological event detection systems, by identifying some of the major obstacles in deploying these EEG systems in ambulatory and clinical environments so that the EEG technologies can emerge from the laboratory towards real-world settings, where they can have a real-impact on the lives of patients. In this context, the thesis tackles three major problems in EEG monitoring, namely: (i) the problem of head-movement artefacts in ambulatory EEG, (ii) the high numbers of false detections in state-of-the-art, automated, epileptiform activity detection systems and (iii) false detections in state-of-the-art, automated neonatal seizure detection systems. To accomplish this, the thesis employs a wide range of statistical, signal processing and machine learning techniques drawn from mathematics, engineering and computer science. The first body of work outlined in this thesis proposes a system to automatically detect head-movement artefacts in ambulatory EEG and utilises supervised machine learning classifiers to do so. The resulting head-movement artefact detection system is the first of its kind and offers accurate detection of head-movement artefacts in ambulatory EEG. Subsequently, addtional physiological signals, in the form of gyroscopes, are used to detect head-movements and in doing so, bring additional information to the head- movement artefact detection task. A framework for combining EEG and gyroscope signals is then developed, offering improved head-movement arte- fact detection. The artefact detection methods developed for ambulatory EEG are subsequently adapted for use in an automated epileptiform activity detection system. Information from support vector machines classifiers used to detect epileptiform activity is fused with information from artefact-specific detection classifiers in order to significantly reduce the number of false detections in the epileptiform activity detection system. By this means, epileptiform activity detection which compares favourably with other state-of-the-art systems is achieved. Finally, the problem of false detections in automated neonatal seizure detection is approached in an alternative manner; blind source separation techniques, complimented with information from additional physiological signals are used to remove respiration artefact from the EEG. In utilising these methods, some encouraging advances have been made in detecting and removing respiration artefacts from the neonatal EEG, and in doing so, the performance of the underlying diagnostic technology is improved, bringing its deployment in the real-world, clinical domain one step closer.
Resumo:
Traditionally, attacks on cryptographic algorithms looked for mathematical weaknesses in the underlying structure of a cipher. Side-channel attacks, however, look to extract secret key information based on the leakage from the device on which the cipher is implemented, be it smart-card, microprocessor, dedicated hardware or personal computer. Attacks based on the power consumption, electromagnetic emanations and execution time have all been practically demonstrated on a range of devices to reveal partial secret-key information from which the full key can be reconstructed. The focus of this thesis is power analysis, more specifically a class of attacks known as profiling attacks. These attacks assume a potential attacker has access to, or can control, an identical device to that which is under attack, which allows him to profile the power consumption of operations or data flow during encryption. This assumes a stronger adversary than traditional non-profiling attacks such as differential or correlation power analysis, however the ability to model a device allows templates to be used post-profiling to extract key information from many different target devices using the power consumption of very few encryptions. This allows an adversary to overcome protocols intended to prevent secret key recovery by restricting the number of available traces. In this thesis a detailed investigation of template attacks is conducted, along with how the selection of various attack parameters practically affect the efficiency of the secret key recovery, as well as examining the underlying assumption of profiling attacks in that the power consumption of one device can be used to extract secret keys from another. Trace only attacks, where the corresponding plaintext or ciphertext data is unavailable, are then investigated against both symmetric and asymmetric algorithms with the goal of key recovery from a single trace. This allows an adversary to bypass many of the currently proposed countermeasures, particularly in the asymmetric domain. An investigation into machine-learning methods for side-channel analysis as an alternative to template or stochastic methods is also conducted, with support vector machines, logistic regression and neural networks investigated from a side-channel viewpoint. Both binary and multi-class classification attack scenarios are examined in order to explore the relative strengths of each algorithm. Finally these machine-learning based alternatives are empirically compared with template attacks, with their respective merits examined with regards to attack efficiency.
Resumo:
In this thesis, extensive experiments are firstly conducted to characterize the performance of using the emerging IEEE 802.15.4-2011 ultra wideband (UWB) for indoor localization, and the results demonstrate the accuracy and precision of using time of arrival measurements for ranging applications. A multipath propagation controlling technique is synthesized which considers the relationship between transmit power, transmission range and signal-to-noise ratio. The methodology includes a novel bilateral transmitter output power control algorithm which is demonstrated to be able to stabilize the multipath channel, and enable sub 5cm instant ranging accuracy in line of sight conditions. A fully-coupled architecture is proposed for the localization system using a combination of IEEE 802.15.4-2011 UWB and inertial sensors. This architecture not only implements the position estimation of the object by fusing the UWB and inertial measurements, but enables the nodes in the localization network to mutually share positional and other useful information via the UWB channel. The hybrid system has been demonstrated to be capable of simultaneous local-positioning and remote-tracking of the mobile object. Three fusion algorithms for relative position estimation are proposed, including internal navigation system (INS), INS with UWB ranging correction, and orientation plus ranging. Experimental results show that the INS with UWB correction algorithm achieves an average position accuracy of 0.1883m, and gets 83% and 62% improvements on the accuracy of the INS (1.0994m) and the existing extended Kalman filter tracking algorithm (0.5m), respectively.
Resumo:
Proteins are essential components of cells and are crucial for catalyzing reactions, signaling, recognition, motility, recycling, and structural stability. This diversity of function suggests that nature is only scratching the surface of protein functional space. Protein function is determined by structure, which in turn is determined predominantly by amino acid sequence. Protein design aims to explore protein sequence and conformational space to design novel proteins with new or improved function. The vast number of possible protein sequences makes exploring the space a challenging problem.
Computational structure-based protein design (CSPD) allows for the rational design of proteins. Because of the large search space, CSPD methods must balance search accuracy and modeling simplifications. We have developed algorithms that allow for the accurate and efficient search of protein conformational space. Specifically, we focus on algorithms that maintain provability, account for protein flexibility, and use ensemble-based rankings. We present several novel algorithms for incorporating improved flexibility into CSPD with continuous rotamers. We applied these algorithms to two biomedically important design problems. We designed peptide inhibitors of the cystic fibrosis agonist CAL that were able to restore function of the vital cystic fibrosis protein CFTR. We also designed improved HIV antibodies and nanobodies to combat HIV infections.
Resumo:
Scheduling a set of jobs over a collection of machines to optimize a certain quality-of-service measure is one of the most important research topics in both computer science theory and practice. In this thesis, we design algorithms that optimize {\em flow-time} (or delay) of jobs for scheduling problems that arise in a wide range of applications. We consider the classical model of unrelated machine scheduling and resolve several long standing open problems; we introduce new models that capture the novel algorithmic challenges in scheduling jobs in data centers or large clusters; we study the effect of selfish behavior in distributed and decentralized environments; we design algorithms that strive to balance the energy consumption and performance.
The technically interesting aspect of our work is the surprising connections we establish between approximation and online algorithms, economics, game theory, and queuing theory. It is the interplay of ideas from these different areas that lies at the heart of most of the algorithms presented in this thesis.
The main contributions of the thesis can be placed in one of the following categories.
1. Classical Unrelated Machine Scheduling: We give the first polygorithmic approximation algorithms for minimizing the average flow-time and minimizing the maximum flow-time in the offline setting. In the online and non-clairvoyant setting, we design the first non-clairvoyant algorithm for minimizing the weighted flow-time in the resource augmentation model. Our work introduces iterated rounding technique for the offline flow-time optimization, and gives the first framework to analyze non-clairvoyant algorithms for unrelated machines.
2. Polytope Scheduling Problem: To capture the multidimensional nature of the scheduling problems that arise in practice, we introduce Polytope Scheduling Problem (\psp). The \psp problem generalizes almost all classical scheduling models, and also captures hitherto unstudied scheduling problems such as routing multi-commodity flows, routing multicast (video-on-demand) trees, and multi-dimensional resource allocation. We design several competitive algorithms for the \psp problem and its variants for the objectives of minimizing the flow-time and completion time. Our work establishes many interesting connections between scheduling and market equilibrium concepts, fairness and non-clairvoyant scheduling, and queuing theoretic notion of stability and resource augmentation analysis.
3. Energy Efficient Scheduling: We give the first non-clairvoyant algorithm for minimizing the total flow-time + energy in the online and resource augmentation model for the most general setting of unrelated machines.
4. Selfish Scheduling: We study the effect of selfish behavior in scheduling and routing problems. We define a fairness index for scheduling policies called {\em bounded stretch}, and show that for the objective of minimizing the average (weighted) completion time, policies with small stretch lead to equilibrium outcomes with small price of anarchy. Our work gives the first linear/ convex programming duality based framework to bound the price of anarchy for general equilibrium concepts such as coarse correlated equilibrium.
Resumo:
Determination of copy number variants (CNVs) inferred in genome wide single nucleotide polymorphism arrays has shown increasing utility in genetic variant disease associations. Several CNV detection methods are available, but differences in CNV call thresholds and characteristics exist. We evaluated the relative performance of seven methods: circular binary segmentation, CNVFinder, cnvPartition, gain and loss of DNA, Nexus algorithms, PennCNV and QuantiSNP. Tested data included real and simulated Illumina HumHap 550 data from the Singapore cohort study of the risk factors for Myopia (SCORM) and simulated data from Affymetrix 6.0 and platform-independent distributions. The normalized singleton ratio (NSR) is proposed as a metric for parameter optimization before enacting full analysis. We used 10 SCORM samples for optimizing parameter settings for each method and then evaluated method performance at optimal parameters using 100 SCORM samples. The statistical power, false positive rates, and receiver operating characteristic (ROC) curve residuals were evaluated by simulation studies. Optimal parameters, as determined by NSR and ROC curve residuals, were consistent across datasets. QuantiSNP outperformed other methods based on ROC curve residuals over most datasets. Nexus Rank and SNPRank have low specificity and high power. Nexus Rank calls oversized CNVs. PennCNV detects one of the fewest numbers of CNVs.
Resumo:
In many practical situations, batching of similar jobs to avoid setups is performed while constructing a schedule. This paper addresses the problem of non-preemptively scheduling independent jobs in a two-machine flow shop with the objective of minimizing the makespan. Jobs are grouped into batches. A sequence independent batch setup time on each machine is required before the first job is processed, and when a machine switches from processing a job in some batch to a job of another batch. Besides its practical interest, this problem is a direct generalization of the classical two-machine flow shop problem with no grouping of jobs, which can be solved optimally by Johnson's well-known algorithm. The problem under investigation is known to be NP-hard. We propose two O(n logn) time heuristic algorithms. The first heuristic, which creates a schedule with minimum total setup time by forcing all jobs in the same batch to be sequenced in adjacent positions, has a worst-case performance ratio of 3/2. By allowing each batch to be split into at most two sub-batches, a second heuristic is developed which has an improved worst-case performance ratio of 4/3. © 1998 The Mathematical Programming Society, Inc. Published by Elsevier Science B.V.
Resumo:
The performance of loadsharing algorithms for heterogeneous distributed systems is investigated by simulation. The systems considered are networks of workstations (nodes) which differ in processing power. Two parameters are proposed for characterising system heterogeneity, namely the variance and skew of the distribution of processing power among the network nodes. A variety of networks are investigated, with the same number of nodes and total processing power, but with the processing power distributed differently among the nodes. Two loadsharing algorithms are evaluated, at overall system loadings of 50% and 90%, using job response time as the performance metric. Comparison is made with the ideal situation of ‘perfect sharing’, where it is assumed that the communication delays are zero and that complete knowledge is available about job lengths and the loading at the different nodes, so that an arriving job can be sent to the node where it will be completed in the shortest time. The algorithms studied are based on those already in use for homogeneous networks, but were adapted to take account of system heterogeneity. Both algorithms take into account the differences in the processing powers of the nodes in their location policies, but differ in the extent to which they ‘discriminate’ against the slower nodes. It is seen that the relative performance of the two is strongly influenced by the system utilisation and the distribution of processing power among the nodes.
Resumo:
Three parallel optimisation algorithms, for use in the context of multilevel graph partitioning of unstructured meshes, are described. The first, interface optimisation, reduces the computation to a set of independent optimisation problems in interface regions. The next, alternating optimisation, is a restriction of this technique in which mesh entities are only allowed to migrate between subdomains in one direction. The third treats the gain as a potential field and uses the concept of relative gain for selecting appropriate vertices to migrate. The results are compared and seen to produce very high global quality partitions, very rapidly. The results are also compared with another partitioning tool and shown to be of higher quality although taking longer to compute.
Resumo:
We consider a knapsack problem to minimize a symmetric quadratic function. We demonstrate that this symmetric quadratic knapsack problem is relevant to two problems of single machine scheduling: the problem of minimizing the weighted sum of the completion times with a single machine non-availability interval under the non-resumable scenario; and the problem of minimizing the total weighted earliness and tardiness with respect to a common small due date. We develop a polynomial-time approximation algorithm that delivers a constant worst-case performance ratio for a special form of the symmetric quadratic knapsack problem. We adapt that algorithm to our scheduling problems and achieve a better performance. For the problems under consideration no fixed-ratio approximation algorithms have been previously known.
Resumo:
In this paper, we first demonstrate that the classical Purcell's vector method when combined with row pivoting yields a consistently small growth factor in comparison to the well-known Gauss elimination method, the Gauss–Jordan method and the Gauss–Huard method with partial pivoting. We then present six parallel algorithms of the Purcell method that may be used for direct solution of linear systems. The algorithms differ in ways of pivoting and load balancing. We recommend algorithms V and VI for their reliability and algorithms III and IV for good load balance if local pivoting is acceptable. Some numerical results are presented.