819 resultados para Classification error rate
Resumo:
Background The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. Results We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. Conclusion The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences.
Resumo:
Many existing schemes for malware detection are signature-based. Although they can effectively detect known malwares, they cannot detect variants of known malwares or new ones. Most network servers do not expect executable code in their in-bound network traffic, such as on-line shopping malls, Picasa, Youtube, Blogger, etc. Therefore, such network applications can be protected from malware infection by monitoring their ports to see if incoming packets contain any executable contents. This paper proposes a content-classification scheme that identifies executable content in incoming packets. The proposed scheme analyzes the packet payload in two steps. It first analyzes the packet payload to see if it contains multimedia-type data (such as . If not, then it classifies the payload either as text-type (such as or executable. Although in our experiments the proposed scheme shows a low rate of false negatives and positives (4.69% and 2.53%, respectively), the presence of inaccuracies still requires further inspection to efficiently detect the occurrence of malware. In this paper, we also propose simple statistical and combinatorial analysis to deal with false positives and negatives.
Resumo:
Background: Few studies have specifically investigated the functional effects of uncorrected astigmatism on measures of reading fluency. This information is important to provide evidence for the development of clinical guidelines for the correction of astigmatism. Methods: Participants included 30 visually normal, young adults (mean age 21.7 ± 3.4 years). Distance and near visual acuity and reading fluency were assessed with optimal spectacle correction (baseline) and for two levels of astigmatism, 1.00DC and 2.00DC, at two axes (90° and 180°) to induce both against-the-rule (ATR) and with-the-rule (WTR) astigmatism. Reading and eye movement fluency were assessed using standardized clinical measures including the test of Discrete Reading Rate (DRR), the Developmental Eye Movement (DEM) test and by recording eye movement patterns with the Visagraph (III) during reading for comprehension. Results: Both distance and near acuity were significantly decreased compared to baseline for all of the astigmatic lens conditions (p < 0.001). Reading speed with the DRR for N16 print size was significantly reduced for the 2.00DC ATR condition (a reduction of 10%), while for smaller text sizes reading speed was reduced by up to 24% for the 1.00DC ATR and 2.00DC condition in both axis directions (p<0.05). For the DEM, sub-test completion speeds were significantly impaired, with the 2.00DC condition affecting both vertical and horizontal times and the 1.00DC ATR condition affecting only horizontal times (p<0.05). Visagraph reading eye movements were not significantly affected by the induced astigmatism. Conclusions: Induced astigmatism impaired performance on selected tests of reading fluency, with ATR astigmatism having significantly greater effects on performance than did WTR, even for relatively small amounts of astigmatic blur of 1.00DC. These findings have implications for the minimal prescribing criteria for astigmatic refractive errors.
Resumo:
After attending this presentation, attendees will gain awareness of: (1) the error and uncertainty associated with the application of the Suchey-Brooks (S-B) method of age estimation of the pubic symphysis to a contemporary Australian population; (2) the implications of sexual dimorphism and bilateral asymmetry of the pubic symphysis through preliminary geometric morphometric assessment; and (3) the value of three-dimensional (3D) autopsy data acquisition for creating forensic anthropological standards. This presentation will impact the forensic science community by demonstrating that, in the absence of demographically sound skeletal collections, post-mortem autopsy data provides an exciting platform for the construction of large contemporary ‘virtual osteological libraries’ for which forensic anthropological research can be conducted on Australian individuals. More specifically, this study assesses the applicability and accuracy of the S-B method to a contemporary adult population in Queensland, Australia, and using a geometric morphometric approach, provides an insight to the age-related degeneration of the pubic symphysis. Despite the prominent use of the Suchey-Brooks (1990) method of age estimation in forensic anthropological practice, it is subject to intrinsic limitations, with reports of differential inter-population error rates between geographical locations1-4. Australian forensic anthropology is constrained by a paucity of population specific standards due to a lack of repositories of documented skeletons. Consequently, in Australian casework proceedings, standards constructed from predominately American reference samples are applied to establish a biological profile. In the global era of terrorism and natural disasters, more specific population standards are required to improve the efficiency of medico-legal death investigation in Queensland. The sample comprises multi-slice computed tomography (MSCT) scans of the pubic symphysis (slice thickness: 0.5mm, overlap: 0.1mm) on 195 individuals of caucasian ethnicity aged 15-70 years. Volume rendering reconstruction of the symphyseal surface was conducted in Amira® (v.4.1) and quantitative analyses in Rapidform® XOS. The sample was divided into ten-year age sub-sets (eg. 15-24) with a final sub-set of 65-70 years. Error with respect to the method’s assigned means were analysed on the basis of bias (directionality of error), inaccuracy (magnitude of error) and percentage correct classification of left and right symphyseal surfaces. Morphometric variables including surface area, circumference, maximum height and width of the symphyseal surface and micro-architectural assessment of cortical and trabecular bone composition were quantified using novel automated engineering software capabilities. The results of this study demonstrated correct age classification utilizing the mean and standard deviations of each phase of the S-B method of 80.02% and 86.18% in Australian males and females, respectively. Application of the S-B method resulted in positive biases and mean inaccuracies of 7.24 (±6.56) years for individuals less than 55 years of age, compared to negative biases and mean inaccuracies of 5.89 (±3.90) years for individuals greater than 55 years of age. Statistically significant differences between chronological and S-B mean age were demonstrated in 83.33% and 50% of the six age subsets in males and females, respectively. Asymmetry of the pubic symphysis was a frequent phenomenon with 53.33% of the Queensland population exhibiting statistically significant (χ2 - p<0.01) differential phase classification of left and right surfaces of the same individual. Directionality was found in bilateral asymmetry, with the right symphyseal faces being slightly older on average and providing more accurate estimates using the S-B method5. Morphometric analysis verified these findings, with the left surface exhibiting significantly greater circumference and surface area than the right (p<0.05). Morphometric analysis demonstrated an increase in maximum height and width of the surface with age, with most significant changes (p<0.05) occurring between the 25-34 and 55-64 year age subsets. These differences may be attributed to hormonal components linked to menopause in females and a reduction in testosterone in males. Micro-architectural analysis demonstrated degradation of cortical composition with age, with differential bone resorption between the medial, ventral and dorsal surfaces of the pubic symphysis. This study recommends that the S-B method be applied with caution in medico-legal death investigations of unknown skeletal remains in Queensland. Age estimation will always be accompanied by error; therefore this study demonstrates the potential for quantitative morphometric modelling of age related changes of the pubic symphysis as a tool for methodological refinement, providing a rigor and robust assessment to remove the subjectivity associated with current pelvic aging methods.
Resumo:
Many large-scale GNSS CORS networks have been deployed around the world to support various commercial and scientific applications. To make use of these networks for real-time kinematic positioning services, one of the major challenges is the ambiguity resolution (AR) over long inter-station baselines in the presence of considerable atmosphere biases. Usually, the widelane ambiguities are fixed first, followed by the procedure of determination of the narrowlane ambiguity integers based on the ionosphere-free model in which the widelane integers are introduced as known quantities. This paper seeks to improve the AR performance over long baseline through efficient procedures for improved float solutions and ambiguity fixing. The contribution is threefold: (1) instead of using the ionosphere-free measurements, the absolute and/or relative ionospheric constraints are introduced in the ionosphere-constrained model to enhance the model strength, thus resulting in the better float solutions; (2) the realistic widelane ambiguity precision is estimated by capturing the multipath effects due to the observation complexity, leading to improvement of reliability of widelane AR; (3) for the narrowlane AR, the partial AR for a subset of ambiguities selected according to the successively increased elevation is applied. For fixing the scalar ambiguity, an error probability controllable rounding method is proposed. The established ionosphere-constrained model can be efficiently solved based on the sequential Kalman filter. It can be either reduced to some special models simply by adjusting the variances of ionospheric constraints, or extended with more parameters and constraints. The presented methodology is tested over seven baselines of around 100 km from USA CORS network. The results show that the new widelane AR scheme can obtain the 99.4 % successful fixing rate with 0.6 % failure rate; while the new rounding method of narrowlane AR can obtain the fix rate of 89 % with failure rate of 0.8 %. In summary, the AR reliability can be efficiently improved with rigorous controllable probability of incorrectly fixed ambiguities.
Resumo:
Background Cancer monitoring and prevention relies on the critical aspect of timely notification of cancer cases. However, the abstraction and classification of cancer from the free-text of pathology reports and other relevant documents, such as death certificates, exist as complex and time-consuming activities. Aims In this paper, approaches for the automatic detection of notifiable cancer cases as the cause of death from free-text death certificates supplied to Cancer Registries are investigated. Method A number of machine learning classifiers were studied. Features were extracted using natural language techniques and the Medtex toolkit. The numerous features encompassed stemmed words, bi-grams, and concepts from the SNOMED CT medical terminology. The baseline consisted of a keyword spotter using keywords extracted from the long description of ICD-10 cancer related codes. Results Death certificates with notifiable cancer listed as the cause of death can be effectively identified with the methods studied in this paper. A Support Vector Machine (SVM) classifier achieved best performance with an overall F-measure of 0.9866 when evaluated on a set of 5,000 free-text death certificates using the token stem feature set. The SNOMED CT concept plus token stem feature set reached the lowest variance (0.0032) and false negative rate (0.0297) while achieving an F-measure of 0.9864. The SVM classifier accounts for the first 18 of the top 40 evaluated runs, and entails the most robust classifier with a variance of 0.001141, half the variance of the other classifiers. Conclusion The selection of features significantly produced the most influences on the performance of the classifiers, although the type of classifier employed also affects performance. In contrast, the feature weighting schema created a negligible effect on performance. Specifically, it is found that stemmed tokens with or without SNOMED CT concepts create the most effective feature when combined with an SVM classifier.
Resumo:
Existing multi-model approaches for image set classification extract local models by clustering each image set individually only once, with fixed clusters used for matching with other image sets. However, this may result in the two closest clusters to represent different characteristics of an object, due to different undesirable environmental conditions (such as variations in illumination and pose). To address this problem, we propose to constrain the clustering of each query image set by forcing the clusters to have resemblance to the clusters in the gallery image sets. We first define a Frobenius norm distance between subspaces over Grassmann manifolds based on reconstruction error. We then extract local linear subspaces from a gallery image set via sparse representation. For each local linear subspace, we adaptively construct the corresponding closest subspace from the samples of a probe image set by joint sparse representation. We show that by minimising the sparse representation reconstruction error, we approach the nearest point on a Grassmann manifold. Experiments on Honda, ETH-80 and Cambridge-Gesture datasets show that the proposed method consistently outperforms several other recent techniques, such as Affine Hull based Image Set Distance (AHISD), Sparse Approximated Nearest Points (SANP) and Manifold Discriminant Analysis (MDA).
Resumo:
Heart rate variability (HRV) refers to the regulation of the sinoatrial node, the natural pacemaker of the heart by the sympathetic and parasympathetic branches of the autonomic nervous system. HRV analysis is an important tool to observe the heart’s ability to respond to normal regulatory impulses that affect its rhythm. Like many bio-signals, HRV signals are non-linear in nature. Higher order spectral analysis (HOS) is known to be a good tool for the analysis of non-linear systems and provides good noise immunity. A computer-based arrhythmia detection system of cardiac states is very useful in diagnostics and disease management. In this work, we studied the identification of the HRV signals using features derived from HOS. These features were fed to the support vector machine (SVM) for classification. Our proposed system can classify the normal and other four classes of arrhythmia with an average accuracy of more than 85%.
Resumo:
The ambiguity acceptance test is an important quality control procedure in high precision GNSS data processing. Although the ambiguity acceptance test methods have been extensively investigated, its threshold determine method is still not well understood. Currently, the threshold is determined with the empirical approach or the fixed failure rate (FF-) approach. The empirical approach is simple but lacking in theoretical basis, while the FF-approach is theoretical rigorous but computationally demanding. Hence, the key of the threshold determination problem is how to efficiently determine the threshold in a reasonable way. In this study, a new threshold determination method named threshold function method is proposed to reduce the complexity of the FF-approach. The threshold function method simplifies the FF-approach by a modeling procedure and an approximation procedure. The modeling procedure uses a rational function model to describe the relationship between the FF-difference test threshold and the integer least-squares (ILS) success rate. The approximation procedure replaces the ILS success rate with the easy-to-calculate integer bootstrapping (IB) success rate. Corresponding modeling error and approximation error are analysed with simulation data to avoid nuisance biases and unrealistic stochastic model impact. The results indicate the proposed method can greatly simplify the FF-approach without introducing significant modeling error. The threshold function method makes the fixed failure rate threshold determination method feasible for real-time applications.
Resumo:
Calls from 14 species of bat were classified to genus and species using discriminant function analysis (DFA), support vector machines (SVM) and ensembles of neural networks (ENN). Both SVMs and ENNs outperformed DFA for every species while ENNs (mean identification rate – 97%) consistently outperformed SVMs (mean identification rate – 87%). Correct classification rates produced by the ENNs varied from 91% to 100%; calls from six species were correctly identified with 100% accuracy. Calls from the five species of Myotis, a genus whose species are considered difficult to distinguish acoustically, had correct identification rates that varied from 91 – 100%. Five parameters were most important for classifying calls correctly while seven others contributed little to classification performance.
Resumo:
We recorded echolocation calls from 14 sympatric species of bat in Britain. Once digitised, one temporal and four spectral features were measured from each call. The frequency-time course of each call was approximated by fitting eight mathematical functions, and the goodness of fit, represented by the mean-squared error, was calculated. Measurements were taken using an automated process that extracted a single call from background noise and measured all variables without intervention. Two species of Rhinolophus were easily identified from call duration and spectral measurements. For the remaining 12 species, discriminant function analysis and multilayer back-propagation perceptrons were used to classify calls to species level. Analyses were carried out with and without the inclusion of curve-fitting data to evaluate its usefulness in distinguishing among species. Discriminant function analysis achieved an overall correct classification rate of 79% with curve-fitting data included, while an artificial neural network achieved 87%. The removal of curve-fitting data improved the performance of the discriminant function analysis by 2 %, while the performance of a perceptron decreased by 2 %. However, an increase in correct identification rates when curve-fitting information was included was not found for all species. The use of a hierarchical classification system, whereby calls were first classified to genus level and then to species level, had little effect on correct classification rates by discriminant function analysis but did improve rates achieved by perceptrons. This is the first published study to use artificial neural networks to classify the echolocation calls of bats to species level. Our findings are discussed in terms of recent advances in recording and analysis technologies, and are related to factors causing convergence and divergence of echolocation call design in bats.
Resumo:
Ambiguity validation as an important procedure of integer ambiguity resolution is to test the correctness of the fixed integer ambiguity of phase measurements before being used for positioning computation. Most existing investigations on ambiguity validation focus on test statistic. How to determine the threshold more reasonably is less understood, although it is one of the most important topics in ambiguity validation. Currently, there are two threshold determination methods in the ambiguity validation procedure: the empirical approach and the fixed failure rate (FF-) approach. The empirical approach is simple but lacks of theoretical basis. The fixed failure rate approach has a rigorous probability theory basis, but it employs a more complicated procedure. This paper focuses on how to determine the threshold easily and reasonably. Both FF-ratio test and FF-difference test are investigated in this research and the extensive simulation results show that the FF-difference test can achieve comparable or even better performance than the well-known FF-ratio test. Another benefit of adopting the FF-difference test is that its threshold can be expressed as a function of integer least-squares (ILS) success rate with specified failure rate tolerance. Thus, a new threshold determination method named threshold function for the FF-difference test is proposed. The threshold function method preserves the fixed failure rate characteristic and is also easy-to-apply. The performance of the threshold function is validated with simulated data. The validation results show that with the threshold function method, the impact of the modelling error on the failure rate is less than 0.08%. Overall, the threshold function for the FF-difference test is a very promising threshold validation method and it makes the FF-approach applicable for the real-time GNSS positioning applications.
Resumo:
We recorded echolocation calls from 14 sympatric species of bat in Britain. Once digitised, one temporal and four spectral features were measured from each call. The frequency-time course of each call was approximated by fitting eight mathematical functions, and the goodness of fit, represented by the mean-squared error, was calculated. Measurements were taken using an automated process that extracted a single call from background noise and measured all variables without intervention. Two species of Rhinolophus were easily identified from call duration and spectral measurements. For the remaining 12 species, discriminant function analysis and multilayer back-propagation perceptrons were used to classify calls to species level. Analyses were carried out with and without the inclusion of curve-fitting data to evaluate its usefulness in distinguishing among species. Discriminant function analysis achieved an overall correct classification rate of 79% with curve-fitting data included, while an artificial neural network achieved 87%. The removal of curve-fitting data improved the performance of the discriminant function analysis by 2 %, while the performance of a perceptron decreased by 2 %. However, an increase in correct identification rates when curve-fitting information was included was not found for all species. The use of a hierarchical classification system, whereby calls were first classified to genus level and then to species level, had little effect on correct classification rates by discriminant function analysis but did improve rates achieved by perceptrons. This is the first published study to use artificial neural networks to classify the echolocation calls of bats to species level. Our findings are discussed in terms of recent advances in recording and analysis technologies, and are related to factors causing convergence and divergence of echolocation call design in bats.
Resumo:
The Lagrangian particle tracking provides an effective method for simulating the deposition of nano- particles as well as micro-particles as it accounts for the particle inertia effect as well as the Brownian excitation. However, using the Lagrangian approach for simulating ultrafine particles has been limited due to computational cost and numerical difficulties. The aim of this paper is to study the deposition of nano-particles in cylindrical tubes under laminar condition using the Lagrangian particle tracking method. The commercial Fluent software is used to simulate the fluid flow in the pipes and to study the deposition and dispersion of nano-particles. Different particle diameters as well as different pipe lengths and flow rates are examined. The results show good agreement between the calculated deposition efficiency and different analytic correlations in the literature. Furthermore, for the nano-particles with higher diameters and when the effect of inertia has a higher importance, the calculated deposition efficiency by the Lagrangian method is less than the analytic correlations based on Eulerian method due to statistical error or the inertia effect.
Resumo:
Acoustic classification of anurans (frogs) has received increasing attention for its promising application in biological and environment studies. In this study, a novel feature extraction method for frog call classification is presented based on the analysis of spectrograms. The frog calls are first automatically segmented into syllables. Then, spectral peak tracks are extracted to separate desired signal (frog calls) from background noise. The spectral peak tracks are used to extract various syllable features, including: syllable duration, dominant frequency, oscillation rate, frequency modulation, and energy modulation. Finally, a k-nearest neighbor classifier is used for classifying frog calls based on the results of principal component analysis. The experiment results show that syllable features can achieve an average classification accuracy of 90.5% which outperforms Mel-frequency cepstral coefficients features (79.0%).