940 resultados para identification method
Resumo:
Motivation: In molecular biology, molecular events describe observable alterations of biomolecules, such as binding of proteins or RNA production. These events might be responsible for drug reactions or development of certain diseases. As such, biomedical event extraction, the process of automatically detecting description of molecular interactions in research articles, attracted substantial research interest recently. Event trigger identification, detecting the words describing the event types, is a crucial and prerequisite step in the pipeline process of biomedical event extraction. Taking the event types as classes, event trigger identification can be viewed as a classification task. For each word in a sentence, a trained classifier predicts whether the word corresponds to an event type and which event type based on the context features. Therefore, a well-designed feature set with a good level of discrimination and generalization is crucial for the performance of event trigger identification. Results: In this article, we propose a novel framework for event trigger identification. In particular, we learn biomedical domain knowledge from a large text corpus built from Medline and embed it into word features using neural language modeling. The embedded features are then combined with the syntactic and semantic context features using the multiple kernel learning method. The combined feature set is used for training the event trigger classifier. Experimental results on the golden standard corpus show that >2.5% improvement on F-score is achieved by the proposed framework when compared with the state-of-the-art approach, demonstrating the effectiveness of the proposed framework. © 2014 The Author 2014. The source code for the proposed framework is freely available and can be downloaded at http://cse.seu.edu.cn/people/zhoudeyu/ETI_Sourcecode.zip.
Resumo:
DNA-binding proteins are crucial for various cellular processes and hence have become an important target for both basic research and drug development. With the avalanche of protein sequences generated in the postgenomic age, it is highly desired to establish an automated method for rapidly and accurately identifying DNA-binding proteins based on their sequence information alone. Owing to the fact that all biological species have developed beginning from a very limited number of ancestral species, it is important to take into account the evolutionary information in developing such a high-throughput tool. In view of this, a new predictor was proposed by incorporating the evolutionary information into the general form of pseudo amino acid composition via the top-n-gram approach. It was observed by comparing the new predictor with the existing methods via both jackknife test and independent data-set test that the new predictor outperformed its counterparts. It is anticipated that the new predictor may become a useful vehicle for identifying DNA-binding proteins. It has not escaped our notice that the novel approach to extract evolutionary information into the formulation of statistical samples can be used to identify many other protein attributes as well.
Resumo:
DNA-binding proteins are crucial for various cellular processes, such as recognition of specific nucleotide, regulation of transcription, and regulation of gene expression. Developing an effective model for identifying DNA-binding proteins is an urgent research problem. Up to now, many methods have been proposed, but most of them focus on only one classifier and cannot make full use of the large number of negative samples to improve predicting performance. This study proposed a predictor called enDNA-Prot for DNA-binding protein identification by employing the ensemble learning technique. Experiential results showed that enDNA-Prot was comparable with DNA-Prot and outperformed DNAbinder and iDNA-Prot with performance improvement in the range of 3.97-9.52% in ACC and 0.08-0.19 in MCC. Furthermore, when the benchmark dataset was expanded with negative samples, the performance of enDNA-Prot outperformed the three existing methods by 2.83-16.63% in terms of ACC and 0.02-0.16 in terms of MCC. It indicated that enDNA-Prot is an effective method for DNA-binding protein identification and expanding training dataset with negative samples can improve its performance. For the convenience of the vast majority of experimental scientists, we developed a user-friendly web-server for enDNA-Prot which is freely accessible to the public. © 2014 Ruifeng Xu et al.
Resumo:
Fermentation processes as objects of modelling and high-quality control are characterized with interdependence and time-varying of process variables that lead to non-linear models with a very complex structure. This is why the conventional optimization methods cannot lead to a satisfied solution. As an alternative, genetic algorithms, like the stochastic global optimization method, can be applied to overcome these limitations. The application of genetic algorithms is a precondition for robustness and reaching of a global minimum that makes them eligible and more workable for parameter identification of fermentation models. Different types of genetic algorithms, namely simple, modified and multi-population ones, have been applied and compared for estimation of nonlinear dynamic model parameters of fed-batch cultivation of S. cerevisiae.
Resumo:
ACM Computing Classification System (1998): I.2.8 , I.2.10, I.5.1, J.2.
Resumo:
This work is an initial study of a numerical method for identifying multiple leak zones in saturated unsteady flow. Using the conventional saturated groundwater flow equation, the leak identification problem is modelled as a Cauchy problem for the heat equation and the aim is to find the regions on the boundary of the solution domain where the solution vanishes, since leak zones correspond to null pressure values. This problem is ill-posed and to reconstruct the solution in a stable way, we therefore modify and employ an iterative regularizing method proposed in [1] and [2]. In this method, mixed well-posed problems obtained by changing the boundary conditions are solved for the heat operator as well as for its adjoint, to get a sequence of approximations to the original Cauchy problem. The mixed problems are solved using a Finite element method (FEM), and the numerical results indicate that the leak zones can be identified with the proposed method.
Resumo:
In the paper the identification of the time-dependent blood perfusion coefficient is formulated as an inverse problem. The bio-heat conduction problem is transformed into the classical heat conduction problem. Then the transformed inverse problem is solved using the method of fundamental solutions together with the Tikhonov regularization. Some numerical results are presented in order to demonstrate the accuracy and the stability of the proposed meshless numerical algorithm.
Resumo:
Protein-DNA interactions are involved in many fundamental biological processes essential for cellular function. Most of the existing computational approaches employed only the sequence context of the target residue for its prediction. In the present study, for each target residue, we applied both the spatial context and the sequence context to construct the feature space. Subsequently, Latent Semantic Analysis (LSA) was applied to remove the redundancies in the feature space. Finally, a predictor (PDNAsite) was developed through the integration of the support vector machines (SVM) classifier and ensemble learning. Results on the PDNA-62 and the PDNA-224 datasets demonstrate that features extracted from spatial context provide more information than those from sequence context and the combination of them gives more performance gain. An analysis of the number of binding sites in the spatial context of the target site indicates that the interactions between binding sites next to each other are important for protein-DNA recognition and their binding ability. The comparison between our proposed PDNAsite method and the existing methods indicate that PDNAsite outperforms most of the existing methods and is a useful tool for DNA-binding site identification. A web-server of our predictor (http://hlt.hitsz.edu.cn:8080/PDNAsite/) is made available for free public accessible to the biological research community.
Resumo:
Historically, grapevine (Vitis vinifera L.) leaf characterisation has been a driving force in the identification of cultivars. In this study, ampelometric (foliometric) analysis was done on leaf samples collected from hand-pruned, mechanically pruned and minimally pruned ‘Sauvignon blanc’ and ‘Syrah’ vines to estimate the impact of within-vineyard variability and a change in bud load on the stability of leaf properties. The results showed that within-vineyard variability of ampelometric characteristics was high within a cultivar, irrespective of bud load. In terms of the O.I.V. coding system, zero to four class differences were observed between minimum and maximum values of each characteristic. The value of variability of each characteristic was different between the three levels of bud load and the two cultivars. With respect to bud load, the number of shoots per vine had a significant effect on the characteristics of the leaf laminae. Single leaf area and lengths of veins changed significantly for both cultivars, irrespective of treatment, while angle between veins proved to be a stable characteristic. A large number of biometric data can be recorded on a single leaf; the data measured on several leaves, however, are not necessarily unique for a specific cultivar. The leaf characteristics analysed in this study can be divided into two groups according to the response to a change in bud load, i.e. stable (angles between the veins, depths of sinuses) and variable (length of the veins, length of the petiole, single leaf area). The variable characteristics are not recommended to be used in cultivar identification, unless the pruning method/bud load is known.
Resumo:
The purpose of this research study was to examine specific factors believed to be related to academic achievement in deaf children. More specifically, this research sought to determine whether there was a significant difference in achievement between those students whose parents use oral communication only and those whose parents use some type of sign language. An additional purpose of this research was to determine if there was a significant difference in academic achievement with those deaf students who used amplification devices early in life. This study also sought to determine whether providing early intervention program which emphasizes and enables parents to develop a language rich environment had a significant impact on the academic achievement of deaf children and whether the age at which initial services are received influence deaf student's subsequent academic achievement. This study examined the relationship, if any, between intellectual ability and academic achievement among deaf children. Finally, this study sought to investigate the relationship between the degree of hearing loss and academic achievement. ^ Purposive sampling was used to select subjects for this study. All 228 eligible Deaf/Hard of Hearing (DHH) students enrolled in a Broward County Public School were included in the original sample. Sixty-one students actually participated in this study. A correlational method of statistical analysis as well as a cross classification (crosstabs) was used to analyze the data. ^ The results show that academic achievement in the areas of reading and mathematics was significantly related to parental mode of communication and the mode of communication used in school. Academic achievement, in the area of reading, was also signficantly related to intellectual ability. The reading achievement was also found to be significantly related to degree of hearing loss. Written language was not significantly related to any factors investigated in this study. ^ Additional research should be conducted to further investigate the low academic achievement among deaf children. The diversity among signing systems at school and between home and school should also be analyzed. Finally, future studies should examine curriculum and instruction methods to increase the academic achievement of deaf children. ^
Resumo:
This dissertation develops a process improvement method for service operations based on the Theory of Constraints (TOC), a management philosophy that has been shown to be effective in manufacturing for decreasing WIP and improving throughput. While TOC has enjoyed much attention and success in the manufacturing arena, its application to services in general has been limited. The contribution to industry and knowledge is a method for improving global performance measures based on TOC principles. The method proposed in this dissertation will be tested using discrete event simulation based on the scenario of the service factory of airline turnaround operations. To evaluate the method, a simulation model of aircraft turn operations of a U.S. based carrier was made and validated using actual data from airline operations. The model was then adjusted to reflect an application of the Theory of Constraints for determining how to deploy the scarce resource of ramp workers. The results indicate that, given slight modifications to TOC terminology and the development of a method for constraint identification, the Theory of Constraints can be applied with success to services. Bottlenecks in services must be defined as those processes for which the process rates and amount of work remaining are such that completing the process will not be possible without an increase in the process rate. The bottleneck ratio is used to determine to what degree a process is a constraint. Simulation results also suggest that redefining performance measures to reflect a global business perspective of reducing costs related to specific flights versus the operational local optimum approach of turning all aircraft quickly results in significant savings to the company. Savings to the annual operating costs of the airline were simulated to equal 30% of possible current expenses for misconnecting passengers with a modest increase in utilization of the workers through a more efficient heuristic of deploying them to the highest priority tasks. This dissertation contributes to the literature on service operations by describing a dynamic, adaptive dispatch approach to manage service factory operations similar to airline turnaround operations using the management philosophy of the Theory of Constraints.
Resumo:
This study investigated the feasibility of using qualitative methods to provide empirical documentation of the long-term qualitative change in the life course trajectories of “at risk” youth in a school based positive youth development program (the Changing Lives Program—CLP). This work draws from life course theory for a developmental framework and from recent advances in the use of qualitative methods in general and a grounded theory approach in particular. Grounded theory provided a methodological framework for conceptualizing the use of qualitative methods for assessing qualitative life change. The study investigated the feasibility of using the Possible Selves Questionnaire-Qualitative Extension (PSQ-QE) for evaluating the impact of the program on qualitative change in participants' life trajectory relative to a non-intervention control group. Integrated Qualitative/Quantitative Data Analytic Strategies (IQ-DAS) that we have been developing a part of our program of research provided the data analytic framework for the study. ^ Change was evaluated in 85 at risk high school students in CLP high school counseling groups over three assessment periods (pre, post, and follow-up), and a non-intervention control group of 23 students over two assessment periods (pre and post). Intervention gains and maintenance and the extent to which these patterns of change were moderated by gender and ethnicity were evaluated using a mixed design Repeated Measures Multivariate Analysis of Variance (RMANOVA) in which Time (pre, post) was the within (repeated) factor and Condition, Gender, and Ethnicity the between group factors. The trends for the direction of qualitative change were positive from pre to post and maintained at the year-end follow-up. More important, the 3-way interaction for Time x Gender x Ethnicity was significant, Roy's Θ =. 205, F(2, 37) = 3.80, p <.032, indicating that the overall pattern of positive change was significantly moderated by gender and ethnicity. Thus, the findings also provided preliminary evidence for a positive impact of the youth development program on long-term change in life course trajectory, and were suggestive with respect to the issue of amenability to treatment, i.e., the identification of subgroups of individuals in a target population who are likely to be the most amenable or responsive to a treatment. ^
Resumo:
Background: During alternative splicing, the inclusion of an exon in the final mRNA molecule is determined by nuclear proteins that bind cis-regulatory sequences in a target pre-mRNA molecule. A recent study suggested that the regulatory codes of individual RNA-binding proteins may be nearly immutable between very diverse species such as mammals and insects. The model system Drosophila melanogaster therefore presents an excellent opportunity for the study of alternative splicing due to the availability of quality EST annotations in FlyBase. Methods: In this paper, we describe an in silico analysis pipeline to extract putative exonic splicing regulatory sequences from a multiple alignment of 15 species of insects. Our method, ESTs-to-ESRs (E2E), uses graph analysis of EST splicing graphs to identify mutually exclusive (ME) exons and combines phylogenetic measures, a sliding window approach along the multiple alignment and the Welch’s t statistic to extract conserved ESR motifs. Results: The most frequent 100% conserved word of length 5 bp in different insect exons was “ATGGA”. We identified 799 statistically significant “spike” hexamers, 218 motifs with either a left or right FDR corrected spike magnitude p-value < 0.05 and 83 with both left and right uncorrected p < 0.01. 11 genes were identified with highly significant motifs in one ME exon but not in the other, suggesting regulation of ME exon splicing through these highly conserved hexamers. The majority of these genes have been shown to have regulated spatiotemporal expression. 10 elements were found to match three mammalian splicing regulator databases. A putative ESR motif, GATGCAG, was identified in the ME-13b but not in the ME-13a of Drosophila N-Cadherin, a gene that has been shown to have a distinct spatiotemporal expression pattern of spliced isoforms in a recent study. Conclusions: Analysis of phylogenetic relationships and variability of sequence conservation as implemented in the E2E spikes method may lead to improved identification of ESRs. We found that approximately half of the putative ESRs in common between insects and mammals have a high statistical support (p < 0.01). Several Drosophila genes with spatiotemporal expression patterns were identified to contain putative ESRs located in one exon of the ME exon pairs but not in the other.
Resumo:
Microarray platforms have been around for many years and while there is a rise of new technologies in laboratories, microarrays are still prevalent. When it comes to the analysis of microarray data to identify differentially expressed (DE) genes, many methods have been proposed and modified for improvement. However, the most popular methods such as Significance Analysis of Microarrays (SAM), samroc, fold change, and rank product are far from perfect. When it comes down to choosing which method is most powerful, it comes down to the characteristics of the sample and distribution of the gene expressions. The most practiced method is usually SAM or samroc but when the data tends to be skewed, the power of these methods decrease. With the concept that the median becomes a better measure of central tendency than the mean when the data is skewed, the tests statistics of the SAM and fold change methods are modified in this thesis. This study shows that the median modified fold change method improves the power for many cases when identifying DE genes if the data follows a lognormal distribution.
Resumo:
The Photoproduction of neutral kaons off a deuteron target has been investigated at the Tohoku University Laboratory of Nuclear Science. The PID methods investigated incorporated a combination of momentum, velocity (β=v/c), and energy deposition per unit length (dE/dx) measurements. The analysis demonstrates that energy deposition and time of flight are exceedingly useful. A higher signal to background ratio was achieved for hard cuts in combination. A probabilistic likelihood estimation approach (LE) as a method for PID was also explored. The probability of a particle being correctly identified by this LE method and the preliminary results denote the need for highly precise limitations on the distributions from which the parameters would be extracted. It was confirmed that these PID are applicable approaches to properly identify pions for the analysis of this experiment. However, the background evident in the mass spectra points to the need for a higher level of proton identification.