875 resultados para constructive heuristic algorithms
Resumo:
Advancements in cloud computing have enabled the proliferation of distributed applications, which require management and control of multiple services. However, without an efficient mechanism for scaling services in response to changing workload conditions, such as number of connected users, application performance might suffer, leading to violations of Service Level Agreements (SLA) and possible inefficient use of hardware resources. Combining dynamic application requirements with the increased use of virtualised computing resources creates a challenging resource Management context for application and cloud-infrastructure owners. In such complex environments, business entities use SLAs as a means for specifying quantitative and qualitative requirements of services. There are several challenges in running distributed enterprise applications in cloud environments, ranging from the instantiation of service VMs in the correct order using an adequate quantity of computing resources, to adapting the number of running services in response to varying external loads, such as number of users. The application owner is interested in finding the optimum amount of computing and network resources to use for ensuring that the performance requirements of all her/his applications are met. She/he is also interested in appropriately scaling the distributed services so that application performance guarantees are maintained even under dynamic workload conditions. Similarly, the infrastructure Providers are interested in optimally provisioning the virtual resources onto the available physical infrastructure so that her/his operational costs are minimized, while maximizing the performance of tenants’ applications. Motivated by the complexities associated with the management and scaling of distributed applications, while satisfying multiple objectives (related to both consumers and providers of cloud resources), this thesis proposes a cloud resource management platform able to dynamically provision and coordinate the various lifecycle actions on both virtual and physical cloud resources using semantically enriched SLAs. The system focuses on dynamic sizing (scaling) of virtual infrastructures composed of virtual machines (VM) bounded application services. We describe several algorithms for adapting the number of VMs allocated to the distributed application in response to changing workload conditions, based on SLA-defined performance guarantees. We also present a framework for dynamic composition of scaling rules for distributed service, which used benchmark-generated application Monitoring traces. We show how these scaling rules can be combined and included into semantic SLAs for controlling allocation of services. We also provide a detailed description of the multi-objective infrastructure resource allocation problem and various approaches to satisfying this problem. We present a resource management system based on a genetic algorithm, which performs allocation of virtual resources, while considering the optimization of multiple criteria. We prove that our approach significantly outperforms reactive VM-scaling algorithms as well as heuristic-based VM-allocation approaches.
Resumo:
We present new algorithms for M-estimators of multivariate scatter and location and for symmetrized M-estimators of multivariate scatter. The new algorithms are considerably faster than currently used fixed-point and related algorithms. The main idea is to utilize a second order Taylor expansion of the target functional and to devise a partial Newton-Raphson procedure. In connection with symmetrized M-estimators we work with incomplete U-statistics to accelerate our procedures initially.
Resumo:
We present a real-world staff-assignment problem that was reported to us by a provider of an online workforce scheduling software. The problem consists of assigning employees to work shifts subject to a large variety of requirements related to work laws, work shift compatibility, workload balancing, and personal preferences of employees. A target value is given for each requirement, and all possible deviations from these values are associated with acceptance levels. The objective is to minimize the total number of deviations in ascending order of the acceptance levels. We present an exact lexicographic goal programming MILP formulation and an MILP-based heuristic. The heuristic consists of two phases: in the first phase a feasible schedule is built and in the second phase parts of the schedule are iteratively re-optimized by applying an exact MILP model. A major advantage of such MILP-based approaches is the flexibility to account for additional constraints or modified planning objectives, which is important as the requirements may vary depending on the company or planning period. The applicability of the heuristic is demonstrated for a test set derived from real-world data. Our computational results indicate that the heuristic is able to devise optimal solutions to non-trivial problem instances, and outperforms the exact lexicographic goal programming formulation on medium- and large-sized problem instances.
Resumo:
Human resources managers often use assessment centers to evaluate candidates for a job position. During an assessment center, the candidates perform a series of exercises. The exercises require one or two assessors (e.g., managers or psychologists) that observe and evaluate the candidate. If an exercise is designed as a role-play, an actor is required as well which plays, e.g., an unhappy customer with whom the candidate has to deal with. Besides performing the exercises, the candidates have a lunch break within a prescribed time window. Each candidate should be observed by approximately half the number of the assessors. Moreover, an assessor cannot be assigned to a candidate if they personally know each other. The planning problem consists of determining (1) resource-feasible start times of all exercises and lunch breaks and (2) a feasible assignment of assessors to candidates, such that the assessment center duration is minimized. We propose a list-scheduling heuristic that generates feasible schedules for such assessment centers. We develop novel procedures for devising an appropriate scheduling list and for incorporating the problem-specific constraints. Our computational results indicate that our approach is capable of devising optimal or near-optimal solutions to real-world instances within short CPU time.
Resumo:
This paper presents a shallow dialogue analysis model, aimed at human-human dialogues in the context of staff or business meetings. Four components of the model are defined, and several machine learning techniques are used to extract features from dialogue transcripts: maximum entropy classifiers for dialogue acts, latent semantic analysis for topic segmentation, or decision tree classifiers for discourse markers. A rule-based approach is proposed for solving cross-modal references to meeting documents. The methods are trained and evaluated thanks to a common data set and annotation format. The integration of the components into an automated shallow dialogue parser opens the way to multimodal meeting processing and retrieval applications.
Resumo:
BACKGROUND Lung clearance index (LCI), a marker of ventilation inhomogeneity, is elevated early in children with cystic fibrosis (CF). However, in infants with CF, LCI values are found to be normal, although structural lung abnormalities are often detectable. We hypothesized that this discrepancy is due to inadequate algorithms of the available software package. AIM Our aim was to challenge the validity of these software algorithms. METHODS We compared multiple breath washout (MBW) results of current software algorithms (automatic modus) to refined algorithms (manual modus) in 17 asymptomatic infants with CF, and 24 matched healthy term-born infants. The main difference between these two analysis methods lies in the calculation of the molar mass differences that the system uses to define the completion of the measurement. RESULTS In infants with CF the refined manual modus revealed clearly elevated LCI above 9 in 8 out of 35 measurements (23%), all showing LCI values below 8.3 using the automatic modus (paired t-test comparing the means, P < 0.001). Healthy infants showed normal LCI values using both analysis methods (n = 47, paired t-test, P = 0.79). The most relevant reason for false normal LCI values in infants with CF using the automatic modus was the incorrect recognition of the end-of-test too early during the washout. CONCLUSION We recommend the use of the manual modus for the analysis of MBW outcomes in infants in order to obtain more accurate results. This will allow appropriate use of infant lung function results for clinical and scientific purposes.
Resumo:
BACKGROUND: HIV surveillance requires monitoring of new HIV diagnoses and differentiation of incident and older infections. In 2008, Switzerland implemented a system for monitoring incident HIV infections based on the results of a line immunoassay (Inno-Lia) mandatorily conducted for HIV confirmation and type differentiation (HIV-1, HIV-2) of all newly diagnosed patients. Based on this system, we assessed the proportion of incident HIV infection among newly diagnosed cases in Switzerland during 2008-2013. METHODS AND RESULTS: Inno-Lia antibody reaction patterns recorded in anonymous HIV notifications to the federal health authority were classified by 10 published algorithms into incident (up to 12 months) or older infections. Utilizing these data, annual incident infection estimates were obtained in two ways, (i) based on the diagnostic performance of the algorithms and utilizing the relationship 'incident = true incident + false incident', (ii) based on the window-periods of the algorithms and utilizing the relationship 'Prevalence = Incidence x Duration'. From 2008-2013, 3'851 HIV notifications were received. Adult HIV-1 infections amounted to 3'809 cases, and 3'636 of them (95.5%) contained Inno-Lia data. Incident infection totals calculated were similar for the performance- and window-based methods, amounting on average to 1'755 (95% confidence interval, 1588-1923) and 1'790 cases (95% CI, 1679-1900), respectively. More than half of these were among men who had sex with men. Both methods showed a continuous decline of annual incident infections 2008-2013, totaling -59.5% and -50.2%, respectively. The decline of incident infections continued even in 2012, when a 15% increase in HIV notifications had been observed. This increase was entirely due to older infections. Overall declines 2008-2013 were of similar extent among the major transmission groups. CONCLUSIONS: Inno-Lia based incident HIV-1 infection surveillance proved useful and reliable. It represents a free, additional public health benefit of the use of this relatively costly test for HIV confirmation and type differentiation.
Resumo:
Academic and industrial research in the late 90s have brought about an exponential explosion of DNA sequence data. Automated expert systems are being created to help biologists to extract patterns, trends and links from this ever-deepening ocean of information. Two such systems aimed on retrieving and subsequently utilizing phylogenetically relevant information have been developed in this dissertation, the major objective of which was to automate the often difficult and confusing phylogenetic reconstruction process. ^ Popular phylogenetic reconstruction methods, such as distance-based methods, attempt to find an optimal tree topology (that reflects the relationships among related sequences and their evolutionary history) by searching through the topology space. Various compromises between the fast (but incomplete) and exhaustive (but computationally prohibitive) search heuristics have been suggested. An intelligent compromise algorithm that relies on a flexible “beam” search principle from the Artificial Intelligence domain and uses the pre-computed local topology reliability information to adjust the beam search space continuously is described in the second chapter of this dissertation. ^ However, sometimes even a (virtually) complete distance-based method is inferior to the significantly more elaborate (and computationally expensive) maximum likelihood (ML) method. In fact, depending on the nature of the sequence data in question either method might prove to be superior. Therefore, it is difficult (even for an expert) to tell a priori which phylogenetic reconstruction method—distance-based, ML or maybe maximum parsimony (MP)—should be chosen for any particular data set. ^ A number of factors, often hidden, influence the performance of a method. For example, it is generally understood that for a phylogenetically “difficult” data set more sophisticated methods (e.g., ML) tend to be more effective and thus should be chosen. However, it is the interplay of many factors that one needs to consider in order to avoid choosing an inferior method (potentially a costly mistake, both in terms of computational expenses and in terms of reconstruction accuracy.) ^ Chapter III of this dissertation details a phylogenetic reconstruction expert system that selects a superior proper method automatically. It uses a classifier (a Decision Tree-inducing algorithm) to map a new data set to the proper phylogenetic reconstruction method. ^
Resumo:
Background. Diabetes places a significant burden on the health care system. Reduction in blood glucose levels (HbA1c) reduces the risk of complications; however, little is known about the impact of disease management programs on medical costs for patients with diabetes. In 2001, economic costs associated with diabetes totaled $100 billion, and indirect costs totaled $54 billion. ^ Objective. To compare outcomes of nurse case management by treatment algorithms with conventional primary care for glycemic control and cardiovascular risk factors in type 2 diabetic patients in a low-income Mexican American community-based setting, and to compare the cost effectiveness of the two programs. Patient compliance was also assessed. ^ Research design and methods. An observational group-comparison to evaluate a treatment intervention for type 2 diabetes management was implemented at three out-patient health facilities in San Antonio, Texas. All eligible type 2 diabetic patients attending the clinics during 1994–1996 became part of the study. Data were obtained from the study database, medical records, hospital accounting, and pharmacy cost lists, and entered into a computerized database. Three groups were compared: a Community Clinic Nurse Case Manager (CC-TA) following treatment algorithms, a University Clinic Nurse Case Manager (UC-TA) following treatment algorithms, and Primary Care Physicians (PCP) following conventional care practices at a Family Practice Clinic. The algorithms provided a disease management model specifically for hyperglycemia, dyslipidemia, hypertension, and microalbuminuria that progressively moved the patient toward ideal goals through adjustments in medication, self-monitoring of blood glucose, meal planning, and reinforcement of diet and exercise. Cost effectiveness of hemoglobin AI, final endpoints was compared. ^ Results. There were 358 patients analyzed: 106 patients in CC-TA, 170 patients in UC-TA, and 82 patients in PCP groups. Change in hemoglobin A1c (HbA1c) was the primary outcome measured. HbA1c results were presented at baseline, 6 and 12 months for CC-TA (10.4%, 7.1%, 7.3%), UC-TA (10.5%, 7.1%, 7.2%), and PCP (10.0%, 8.5%, 8.7%). Mean patient compliance was 81%. Levels of cost effectiveness were significantly different between clinics. ^ Conclusion. Nurse case management with treatment algorithms significantly improved glycemic control in patients with type 2 diabetes, and was more cost effective. ^
Resumo:
Based on an order-theoretic approach, we derive sufficient conditions for the existence, characterization, and computation of Markovian equilibrium decision processes and stationary Markov equilibrium on minimal state spaces for a large class of stochastic overlapping generations models. In contrast to all previous work, we consider reduced-form stochastic production technologies that allow for a broad set of equilibrium distortions such as public policy distortions, social security, monetary equilibrium, and production nonconvexities. Our order-based methods are constructive, and we provide monotone iterative algorithms for computing extremal stationary Markov equilibrium decision processes and equilibrium invariant distributions, while avoiding many of the problems associated with the existence of indeterminacies that have been well-documented in previous work. We provide important results for existence of Markov equilibria for the case where capital income is not increasing in the aggregate stock. Finally, we conclude with examples common in macroeconomics such as models with fiat money and social security. We also show how some of our results extend to settings with unbounded state spaces.
Resumo:
Digital terrain models (DTM) typically contain large numbers of postings, from hundreds of thousands to billions. Many algorithms that run on DTMs require topological knowledge of the postings, such as finding nearest neighbors, finding the posting closest to a chosen location, etc. If the postings are arranged irregu- larly, topological information is costly to compute and to store. This paper offers a practical approach to organizing and searching irregularly-space data sets by presenting a collection of efficient algorithms (O(N),O(lgN)) that compute important topological relationships with only a simple supporting data structure. These relationships include finding the postings within a window, locating the posting nearest a point of interest, finding the neighborhood of postings nearest a point of interest, and ordering the neighborhood counter-clockwise. These algorithms depend only on two sorted arrays of two-element tuples, holding a planimetric coordinate and an integer identification number indicating which posting the coordinate belongs to. There is one array for each planimetric coordinate (eastings and northings). These two arrays cost minimal overhead to create and store but permit the data to remain arranged irregularly.
Resumo:
In population studies, most current methods focus on identifying one outcome-related SNP at a time by testing for differences of genotype frequencies between disease and healthy groups or among different population groups. However, testing a great number of SNPs simultaneously has a problem of multiple testing and will give false-positive results. Although, this problem can be effectively dealt with through several approaches such as Bonferroni correction, permutation testing and false discovery rates, patterns of the joint effects by several genes, each with weak effect, might not be able to be determined. With the availability of high-throughput genotyping technology, searching for multiple scattered SNPs over the whole genome and modeling their joint effect on the target variable has become possible. Exhaustive search of all SNP subsets is computationally infeasible for millions of SNPs in a genome-wide study. Several effective feature selection methods combined with classification functions have been proposed to search for an optimal SNP subset among big data sets where the number of feature SNPs far exceeds the number of observations. ^ In this study, we take two steps to achieve the goal. First we selected 1000 SNPs through an effective filter method and then we performed a feature selection wrapped around a classifier to identify an optimal SNP subset for predicting disease. And also we developed a novel classification method-sequential information bottleneck method wrapped inside different search algorithms to identify an optimal subset of SNPs for classifying the outcome variable. This new method was compared with the classical linear discriminant analysis in terms of classification performance. Finally, we performed chi-square test to look at the relationship between each SNP and disease from another point of view. ^ In general, our results show that filtering features using harmononic mean of sensitivity and specificity(HMSS) through linear discriminant analysis (LDA) is better than using LDA training accuracy or mutual information in our study. Our results also demonstrate that exhaustive search of a small subset with one SNP, two SNPs or 3 SNP subset based on best 100 composite 2-SNPs can find an optimal subset and further inclusion of more SNPs through heuristic algorithm doesn't always increase the performance of SNP subsets. Although sequential forward floating selection can be applied to prevent from the nesting effect of forward selection, it does not always out-perform the latter due to overfitting from observing more complex subset states. ^ Our results also indicate that HMSS as a criterion to evaluate the classification ability of a function can be used in imbalanced data without modifying the original dataset as against classification accuracy. Our four studies suggest that Sequential Information Bottleneck(sIB), a new unsupervised technique, can be adopted to predict the outcome and its ability to detect the target status is superior to the traditional LDA in the study. ^ From our results we can see that the best test probability-HMSS for predicting CVD, stroke,CAD and psoriasis through sIB is 0.59406, 0.641815, 0.645315 and 0.678658, respectively. In terms of group prediction accuracy, the highest test accuracy of sIB for diagnosing a normal status among controls can reach 0.708999, 0.863216, 0.639918 and 0.850275 respectively in the four studies if the test accuracy among cases is required to be not less than 0.4. On the other hand, the highest test accuracy of sIB for diagnosing a disease among cases can reach 0.748644, 0.789916, 0.705701 and 0.749436 respectively in the four studies if the test accuracy among controls is required to be at least 0.4. ^ A further genome-wide association study through Chi square test shows that there are no significant SNPs detected at the cut-off level 9.09451E-08 in the Framingham heart study of CVD. Study results in WTCCC can only detect two significant SNPs that are associated with CAD. In the genome-wide study of psoriasis most of top 20 SNP markers with impressive classification accuracy are also significantly associated with the disease through chi-square test at the cut-off value 1.11E-07. ^ Although our classification methods can achieve high accuracy in the study, complete descriptions of those classification results(95% confidence interval or statistical test of differences) require more cost-effective methods or efficient computing system, both of which can't be accomplished currently in our genome-wide study. We should also note that the purpose of this study is to identify subsets of SNPs with high prediction ability and those SNPs with good discriminant power are not necessary to be causal markers for the disease.^