5 resultados para random search algorithms
em Duke University
Resumo:
Predicting from first-principles calculations whether mixed metallic elements phase-separate or form ordered structures is a major challenge of current materials research. It can be partially addressed in cases where experiments suggest the underlying lattice is conserved, using cluster expansion (CE) and a variety of exhaustive evaluation or genetic search algorithms. Evolutionary algorithms have been recently introduced to search for stable off-lattice structures at fixed mixture compositions. The general off-lattice problem is still unsolved. We present an integrated approach of CE and high-throughput ab initio calculations (HT) applicable to the full range of compositions in binary systems where the constituent elements or the intermediate ordered structures have different lattice types. The HT method replaces the search algorithms by direct calculation of a moderate number of naturally occurring prototypes representing all crystal systems and guides CE calculations of derivative structures. This synergy achieves the precision of the CE and the guiding strengths of the HT. Its application to poorly characterized binary Hf systems, believed to be phase-separating, defines three classes of alloys where CE and HT complement each other to uncover new ordered structures.
Resumo:
Proteins are essential components of cells and are crucial for catalyzing reactions, signaling, recognition, motility, recycling, and structural stability. This diversity of function suggests that nature is only scratching the surface of protein functional space. Protein function is determined by structure, which in turn is determined predominantly by amino acid sequence. Protein design aims to explore protein sequence and conformational space to design novel proteins with new or improved function. The vast number of possible protein sequences makes exploring the space a challenging problem.
Computational structure-based protein design (CSPD) allows for the rational design of proteins. Because of the large search space, CSPD methods must balance search accuracy and modeling simplifications. We have developed algorithms that allow for the accurate and efficient search of protein conformational space. Specifically, we focus on algorithms that maintain provability, account for protein flexibility, and use ensemble-based rankings. We present several novel algorithms for incorporating improved flexibility into CSPD with continuous rotamers. We applied these algorithms to two biomedically important design problems. We designed peptide inhibitors of the cystic fibrosis agonist CAL that were able to restore function of the vital cystic fibrosis protein CFTR. We also designed improved HIV antibodies and nanobodies to combat HIV infections.
Resumo:
Although many feature selection methods for classification have been developed, there is a need to identify genes in high-dimensional data with censored survival outcomes. Traditional methods for gene selection in classification problems have several drawbacks. First, the majority of the gene selection approaches for classification are single-gene based. Second, many of the gene selection procedures are not embedded within the algorithm itself. The technique of random forests has been found to perform well in high-dimensional data settings with survival outcomes. It also has an embedded feature to identify variables of importance. Therefore, it is an ideal candidate for gene selection in high-dimensional data with survival outcomes. In this paper, we develop a novel method based on the random forests to identify a set of prognostic genes. We compare our method with several machine learning methods and various node split criteria using several real data sets. Our method performed well in both simulations and real data analysis.Additionally, we have shown the advantages of our approach over single-gene-based approaches. Our method incorporates multivariate correlations in microarray data for survival outcomes. The described method allows us to better utilize the information available from microarray data with survival outcomes.
Resumo:
This thesis focuses on the development of algorithms that will allow protein design calculations to incorporate more realistic modeling assumptions. Protein design algorithms search large sequence spaces for protein sequences that are biologically and medically useful. Better modeling could improve the chance of success in designs and expand the range of problems to which these algorithms are applied. I have developed algorithms to improve modeling of backbone flexibility (DEEPer) and of more extensive continuous flexibility in general (EPIC and LUTE). I’ve also developed algorithms to perform multistate designs, which account for effects like specificity, with provable guarantees of accuracy (COMETS), and to accommodate a wider range of energy functions in design (EPIC and LUTE).
Resumo:
The accurate description of ground and electronic excited states is an important and challenging topic in quantum chemistry. The pairing matrix fluctuation, as a counterpart of the density fluctuation, is applied to this topic. From the pairing matrix fluctuation, the exact electron correlation energy as well as two electron addition/removal energies can be extracted. Therefore, both ground state and excited states energies can be obtained and they are in principle exact with a complete knowledge of the pairing matrix fluctuation. In practice, considering the exact pairing matrix fluctuation is unknown, we adopt its simple approximation --- the particle-particle random phase approximation (pp-RPA) --- for ground and excited states calculations. The algorithms for accelerating the pp-RPA calculation, including spin separation, spin adaptation, as well as an iterative Davidson method, are developed. For ground states correlation descriptions, the results obtained from pp-RPA are usually comparable to and can be more accurate than those from traditional particle-hole random phase approximation (ph-RPA). For excited states, the pp-RPA is able to describe double, Rydberg, and charge transfer excitations, which are challenging for conventional time-dependent density functional theory (TDDFT). Although the pp-RPA intrinsically cannot describe those excitations excited from the orbitals below the highest occupied molecular orbital (HOMO), its performances on those single excitations that can be captured are comparable to TDDFT. The pp-RPA for excitation calculation is further applied to challenging diradical problems and is used to unveil the nature of the ground and electronic excited states of higher acenes. The pp-RPA and the corresponding Tamm-Dancoff approximation (pp-TDA) are also applied to conical intersections, an important concept in nonadiabatic dynamics. Their good description of the double-cone feature of conical intersections is in sharp contrast to the failure of TDDFT. All in all, the pairing matrix fluctuation opens up new channel of thinking for quantum chemistry, and the pp-RPA is a promising method in describing ground and electronic excited states.