22 resultados para STRUCTURE PREDICTION

em Deakin Research Online - Australia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many problems in chemistry depend on the ability to identify the global minimum or maximum of a function. Examples include applications in chemometrics, optimization of reaction or operating conditions, and non-linear least-squares analysis. This paper presents the results of the application of a new method of deterministic global optimization, called the cutting angle method (CAM), as applied to the prediction of molecular geometries. CAM is shown to be competitive with other global optimization techniques for several benchmark molecular conformation problem. CAM is a general method that can also be applied to other computational problems involving global minima, global maxima or finding the roots of nonlinear equations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The molecular geometry, the three dimensional arrangement of atoms in space, is a major factor determining the properties and reactivity of molecules, biomolecules and macromolecules. Computation of stable molecular conformations can be done by locating minima on the potential energy surface (PES). This is a very challenging global optimization problem because of extremely large numbers of shallow local minima and complicated landscape of PES. This paper illustrates the mathematical and computational challenges on one important instance of the problem, computation of molecular geometry of oligopeptides, and proposes the use of the Extended Cutting Angle Method (ECAM) to solve this problem.

ECAM is a deterministic global optimization technique, which computes tight lower bounds on the values of the objective function and fathoms those part of the domain where the global minimum cannot reside. As with any domain partitioning scheme, its challenge is an extremely large partition of the domain required for accurate lower bounds. We address this challenge by providing an efficient combinatorial algorithm for calculating the lower bounds, and by combining ECAM with a local optimization method, while preserving the deterministic character of ECAM.


Relevância:

100.00% 100.00%

Publicador:

Resumo:

A new multi-output interval type-2 fuzzy logic system (MOIT2FLS) is introduced for protein secondary structure prediction in this paper. Three outputs of the MOIT2FLS correspond to three structure classes including helix, strand (sheet) and coil. Quantitative properties of amino acids are employed to characterize twenty amino acids rather than the widely used computationally expensive binary encoding scheme. Three clustering tasks are performed using the adaptive vector quantization method to construct an equal number of initial rules for each type of secondary structure. Genetic algorithm is applied to optimally adjust parameters of the MOIT2FLS. The genetic fitness function is designed based on the Q3 measure. Experimental results demonstrate the dominance of the proposed approach against the traditional methods that are Chou-Fasman method, Garnier-Osguthorpe-Robson method, and artificial neural network models.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

As one of the primary substances in a living organism, protein defines the character of each cell by interacting with the cellular environment to promote the cell’s growth and function [1]. Previous studies on proteomics indicate that the functions of different proteins could be assigned based upon protein structures [2,3]. The knowledge on protein structures gives us an overview of protein fold space and is helpful for the understanding of the evolutionary principles behind structure. By observing the architectures and topologies of the protein families, biological processes can be investigated more directly with much higher resolution and finer detail. For this reason, the analysis of protein, its structure and the interaction with the other materials is emerging as an important problem in bioinformatics. However, the determination of protein structures is experimentally expensive and time consuming, this makes scientists largely dependent on sequence rather than more general structure to infer the function of the protein at the present time. For this reason, data mining technology is introduced into this area to provide more efficient data processing and knowledge discovery approaches.

Unlike many data mining applications which lack available data, the protein structure determination problem and its interaction study, on the contrary, could utilize a vast amount of biologically relevant information on protein and its interaction, such as the protein data bank (PDB) [4], the structural classification of proteins (SCOP) databases [5], CATH databases [6], UniProt [7], and others. The difficulty of predicting protein structures, specially its 3D structures, and the interactions between proteins as shown in Figure 6.1, lies in the computational complexity of the data. Although a large number of approaches have been developed to determine the protein structures such as ab initio modelling [8], homology modelling [9] and threading [10], more efficient and reliable methods are still greatly needed.

In this chapter, we will introduce a state-of-the-art data mining technique, graph mining, which is good at defining and discovering interesting structural patterns in graphical data sets, and take advantage of its expressive power to study protein structures, including protein structure prediction and comparison, and protein-protein interaction (PPI). The current graph pattern mining methods will be described, and typical algorithms will be presented, together with their applications in the protein structure analysis.

The rest of the chapter is organized as follows: Section 6.2 will give a brief introduction of the fundamental knowledge of protein, the publicly accessible protein data resources and the current research status of protein analysis; in Section 6.3, we will pay attention to one of the state-of-the-art data mining methods, graph mining; then Section 6.4 surveys several existing work for protein structure analysis using advanced graph mining methods in the recent decade; finally, in Section 6.5, a conclusion with potential further work will be summarized.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The application of nucleic acid aptamers for the diagnosis and therapy of cancer stem cells (CSCs) is expanding. The current study truncated and probed various existing aptamers against CSC markers CD44, ABCG2 and CD133 in retinoblastoma (RB) primary cells, cell lines, a breast cancer cell line and MCF7-sphere. Truncated CD44 aptamer retained its specific binding to cancer cells, ABCG2+ve MCF7-spheres and CD133+ve RB cells. Similarly, ABCG2 and CD133 aptamers showed higher affinity to ABCG2+ve, CD133+ve cells than the negative population and cell lines. All aptamers appreciably reduced viability of up to 50% and 32% of the primary RB tumor cells and cell lines, respectively. Colony formation of MCF7, RB cell lines and MCF7-sphere growth were inhibited significantly. Structure prediction, simulation of CD133 extracellular domain 2 (ExD2) and A15 followed by docking to comprehend the potential interaction revealed hydrogen bonds and non bonded interactions between them. This information could be used to improve the A15 aptamer to gain more interactions with CD133. Thus approaches undertaken here can be applied universally for cell-specific targeting, and the aptamers studied against CSC markers deserve further in vivo studies.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Modern healthcare is getting reshaped by growing Electronic Medical Records (EMR). Recently, these records have been shown of great value towards building clinical prediction models. In EMR data, patients' diseases and hospital interventions are captured through a set of diagnoses and procedures codes. These codes are usually represented in a tree form (e.g. ICD-10 tree) and the codes within a tree branch may be highly correlated. These codes can be used as features to build a prediction model and an appropriate feature selection can inform a clinician about important risk factors for a disease. Traditional feature selection methods (e.g. Information Gain, T-test, etc.) consider each variable independently and usually end up having a long feature list. Recently, Lasso and related l1-penalty based feature selection methods have become popular due to their joint feature selection property. However, Lasso is known to have problems of selecting one feature of many correlated features randomly. This hinders the clinicians to arrive at a stable feature set, which is crucial for clinical decision making process. In this paper, we solve this problem by using a recently proposed Tree-Lasso model. Since, the stability behavior of Tree-Lasso is not well understood, we study the stability behavior of Tree-Lasso and compare it with other feature selection methods. Using a synthetic and two real-world datasets (Cancer and Acute Myocardial Infarction), we show that Tree-Lasso based feature selection is significantly more stable than Lasso and comparable to other methods e.g. Information Gain, ReliefF and T-test. We further show that, using different types of classifiers such as logistic regression, naive Bayes, support vector machines, decision trees and Random Forest, the classification performance of Tree-Lasso is comparable to Lasso and better than other methods. Our result has implications in identifying stable risk factors for many healthcare problems and therefore can potentially assist clinical decision making for accurate medical prognosis.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The production of carbon fiber, particularly the oxidation/stabilization step, is a complex process. In the present study, a non-linear mathematical model has been developed for the prediction of density of polyacrylonitrile (PAN) and oxidized PAN fiber (OPF), as a key physical property for various applications, such as energy and material optimization, modeling, and design of the stabilization process. The model is based on the available functional groups in PAN and OPF. Expected functional groups, including [Formula presented], [Formula presented], –CH2, [Formula presented], and [Formula presented], were identified and quantified through the full deconvolution analysis of Fourier transform infrared attenuated total reflectance (FT-IR ATR) spectra obtained from fibers. These functional groups form the basis of three stabilization rendering parameters, representing the cyclization, dehydrogenation and oxidation reactions that occur during PAN stabilization, and are used as the independent variables of the non-linear predictive model. The k-fold cross validation approach, with k = 10, has been employed to find the coefficients of the model. This model estimates the density of PAN and OPF independent of operational parameters and can be expanded to all operational parameters. Statistical analysis revealed good agreement between the governing model and experiments. The maximum relative error was less than 1% for the present model.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Routing in ad hoc networks faces significant challenges due to node mobility and dynamic network topology. In this work we propose the use of mobility prediction to reduce the search space required for route discovery. A method of mobility prediction making use of a sectorized cluster structure is described with the proposal of the Prediction based Location Aided Routing (P-LAR) protocol. Simulation study and analytical results of P-LAR find it to offer considerable saving in the amount of routing traffic generated during the route discovery phase.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Routing in ad hoc networks faces significant challenges due to node mobility and dynamic network topology. In this work we propose the use of mobility prediction to reduce the search space required for route discovery. A method of mobility prediction making use of a sectorized cluster structure is described with the proposal of the Prediction based Location Aided routing (P-LAR) protocol. Simulation study and analytical results of the of P-LAR find it to offer considerable saving in the amount of routing traffic generated during the route discovery phase.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The nanoporous structure of membrane varies in 3-dimensional (3-D) space and has remarkable influences on the filtration or desalination achieved, fouling potentials and therefore, the quality of yielded water. Knowledge of the 3-D nanoporous structure is thus vital to understanding and predicting its performance. A novel method by incorporating transmission electronic microtomography, image processing and 3-D reconstruction is introduced to characterize membranes with nano structures. The reconstruction algorithm allows for the visualization of 3-D nanoporous structure in a non-destructive way from any directions. This novel technique Ieads to in-depth understanding and accurate prediction of filtration performance.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The nanoporous structure of a membrane varies in a 3-dimensional (3-D) space and has remarkable influences on the filtration or desalination achieved, fouling potentials and therefore, the quality of yielded water. Knowledge of the 3-D nanoporous structure is thus vital to understanding and predicting its performance. A novel method by incorporating transmission electronic microtomography, image processing and 3-D reconstruction is introduced to characterize membranes with nano structures. The reconstruction algorithm allows for the visualization of 3-D nanoporous structure in a non-destructive way from any directions. This novel technique leads to in-depth understanding and accurate prediction of filtration performance.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The delta technique has been proposed in literature for constructing
prediction intervals for targets estimated by neural networks. Quality of constructed prediction intervals using this technique highly depends on neural network characteristics. Unfortunately, literature is void of information about how these dependences can be managed in order to optimize prediction intervals. This study attempts to optimize length and coverage probability of prediction intervals through modifying structure and parameters of the underlying neural networks. In an evolutionary optimization, genetic algorithm is applied for finding the optimal values of network size and training hyper-parameters. The applicability and efficiency of the proposed optimization technique is examined and demonstrated using a real case study. It is shown that application of the proposed optimization technique significantly improves quality of constructed prediction intervals in term of length and coverage probability.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background The past few years have seen a rapid development in novel high-throughput technologies that have created large-scale data on protein-protein interactions (PPI) across human and most model species. This data is commonly represented as networks, with nodes representing proteins and edges representing the PPIs. A fundamental challenge to bioinformatics is how to interpret this wealth of data to elucidate the interaction of patterns and the biological characteristics of the proteins. One significant purpose of this interpretation is to predict unknown protein functions. Although many approaches have been proposed in recent years, the challenge still remains how to reasonably and precisely measure the functional similarities between proteins to improve the prediction effectiveness.

Results We used a Semantic and Layered Protein Function Prediction (SLPFP) framework to more effectively predict unknown protein functions at different functional levels. The framework relies on a new protein similarity measurement and a clustering-based protein function prediction algorithm. The new protein similarity measurement incorporates the topological structure of the PPI network, as well as the protein's semantic information in terms of known protein functions at different functional layers. Experiments on real PPI datasets were conducted to evaluate the effectiveness of the proposed framework in predicting unknown protein functions.

Conclusion The proposed framework has a higher prediction accuracy compared with other similar approaches. The prediction results are stable even for a large number of proteins. Furthermore, the framework is able to predict unknown functions at different functional layers within the Munich Information Center for Protein Sequence (MIPS) hierarchical functional scheme. The experimental results demonstrated that the new protein similarity measurement reflects more reasonably and precisely relationships between proteins.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The accurate prediction of travel times is desirable but frequently prone to error. This is mainly attributable to both the underlying traffic processes and the data that are used to infer travel time. A more meaningful and pragmatic approach is to view travel time prediction as a probabilistic inference and to construct prediction intervals (PIs), which cover the range of probable travel times travelers may encounter. This paper introduces the delta and Bayesian techniques for the construction of PIs. Quantitative measures are developed and applied for a comprehensive assessment of the constructed PIs. These measures simultaneously address two important aspects of PIs: 1) coverage probability and 2) length. The Bayesian and delta methods are used to construct PIs for the neural network (NN) point forecasts of bus and freeway travel time data sets. The obtained results indicate that the delta technique outperforms the Bayesian technique in terms of narrowness of PIs with satisfactory coverage probability. In contrast, PIs constructed using the Bayesian technique are more robust against the NN structure and exhibit excellent coverage probability.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Average number of fiber-to-fiber contacts in a fibrous structure is a prerequisite to investigate the mechanical, optical and transport properties of stochastic nanomicrofibrous networks. In this research work, based on theoretical analysis presented for the estimation of the number of contacts between fibers in electrospun random multilayer nanofibrous assembles, experimental verification for theoretical dependence of fiber diameter and network porosity on the fiber to fiber contacts has been provided. The analytical model formulated is compared with the existing theories to predict the average number of fiber contacts of nanofiber structures. The effect of fiber diameters and network porosities on average number of fiber contacts of nano-microfiber mats has been investigated. A comparison is also made between the experimental and theoretical number of inter-fiber contacts of multilayer electrospun random nanomicrofibrous networks. It has been found that both the fiber diameter and the network porosity have significant effects on the properties of fiber-to-fiber contacts.