71 resultados para Classification of sciences.
Resumo:
A combined data matrix consisting of high performance liquid chromatography–diode array detector (HPLC–DAD) and inductively coupled plasma-mass spectrometry (ICP-MS) measurements of samples from the plant roots of the Cortex moutan (CM), produced much better classification and prediction results in comparison with those obtained from either of the individual data sets. The HPLC peaks (organic components) of the CM samples, and the ICP-MS measurements (trace metal elements) were investigated with the use of principal component analysis (PCA) and the linear discriminant analysis (LDA) methods of data analysis; essentially, qualitative results suggested that discrimination of the CM samples from three different provinces was possible with the combined matrix producing best results. Another three methods, K-nearest neighbor (KNN), back-propagation artificial neural network (BP-ANN) and least squares support vector machines (LS-SVM) were applied for the classification and prediction of the samples. Again, the combined data matrix analyzed by the KNN method produced best results (100% correct; prediction set data). Additionally, multiple linear regression (MLR) was utilized to explore any relationship between the organic constituents and the metal elements of the CM samples; the extracted linear regression equations showed that the essential metals as well as some metallic pollutants were related to the organic compounds on the basis of their concentrations
Resumo:
A novel combined near- and mid-infrared (NIR and MIR) spectroscopic method has been researched and developed for the analysis of complex substances such as the Traditional Chinese Medicine (TCM), Illicium verum Hook. F. (IVHF), and its noxious adulterant, Iuicium lanceolatum A.C. Smith (ILACS). Three types of spectral matrix were submitted for classification with the use of the linear discriminant analysis (LDA) method. The data were pretreated with either the successive projections algorithm (SPA) or the discrete wavelet transform (DWT) method. The SPA method performed somewhat better, principally because it required less spectral features for its pretreatment model. Thus, NIR or MIR matrix as well as the combined NIR/MIR one, were pretreated by the SPA method, and then analysed by LDA. This approach enabled the prediction and classification of the IVHF, ILACS and mixed samples. The MIR spectral data produced somewhat better classification rates than the NIR data. However, the best results were obtained from the combined NIR/MIR data matrix with 95–100% correct classifications for calibration, validation and prediction. Principal component analysis (PCA) of the three types of spectral data supported the results obtained with the LDA classification method.
Resumo:
Acoustic classification of anurans (frogs) has received increasing attention for its promising application in biological and environment studies. In this study, a novel feature extraction method for frog call classification is presented based on the analysis of spectrograms. The frog calls are first automatically segmented into syllables. Then, spectral peak tracks are extracted to separate desired signal (frog calls) from background noise. The spectral peak tracks are used to extract various syllable features, including: syllable duration, dominant frequency, oscillation rate, frequency modulation, and energy modulation. Finally, a k-nearest neighbor classifier is used for classifying frog calls based on the results of principal component analysis. The experiment results show that syllable features can achieve an average classification accuracy of 90.5% which outperforms Mel-frequency cepstral coefficients features (79.0%).
Resumo:
Over past few decades, frog species have been experiencing dramatic decline around the world. The reason for this decline includes habitat loss, invasive species, climate change and so on. To better know the status of frog species, classifying frogs has become increasingly important. In this study, acoustic features are investigated for multi-level classification of Australian frogs: family, genus and species, including three families, eleven genera and eighty five species which are collected from Queensland, Australia. For each frog species, six instances are selected from which ten acoustic features are calculated. Then, the multicollinearity between ten features are studied for selecting non-correlated features for subsequent analysis. A decision tree (DT) classifier is used to visually and explicitly determine which acoustic features are relatively important for classifying family, which for genus, and which for species. Finally, a weighted support vector machines (SVMs) classifier is used for the multi- level classification with three most important acoustic features respectively. Our experiment results indicate that using different acoustic feature sets can successfully classify frogs at different levels and the average classification accuracy can be up to 85.6%, 86.1% and 56.2% for family, genus and species respectively.
Resumo:
Objective Death certificates provide an invaluable source for cancer mortality statistics; however, this value can only be realised if accurate, quantitative data can be extracted from certificates – an aim hampered by both the volume and variable nature of certificates written in natural language. This paper proposes an automatic classification system for identifying cancer related causes of death from death certificates. Methods Detailed features, including terms, n-grams and SNOMED CT concepts were extracted from a collection of 447,336 death certificates. These features were used to train Support Vector Machine classifiers (one classifier for each cancer type). The classifiers were deployed in a cascaded architecture: the first level identified the presence of cancer (i.e., binary cancer/nocancer) and the second level identified the type of cancer (according to the ICD-10 classification system). A held-out test set was used to evaluate the effectiveness of the classifiers according to precision, recall and F-measure. In addition, detailed feature analysis was performed to reveal the characteristics of a successful cancer classification model. Results The system was highly effective at identifying cancer as the underlying cause of death (F-measure 0.94). The system was also effective at determining the type of cancer for common cancers (F-measure 0.7). Rare cancers, for which there was little training data, were difficult to classify accurately (F-measure 0.12). Factors influencing performance were the amount of training data and certain ambiguous cancers (e.g., those in the stomach region). The feature analysis revealed a combination of features were important for cancer type classification, with SNOMED CT concept and oncology specific morphology features proving the most valuable. Conclusion The system proposed in this study provides automatic identification and characterisation of cancers from large collections of free-text death certificates. This allows organisations such as Cancer Registries to monitor and report on cancer mortality in a timely and accurate manner. In addition, the methods and findings are generally applicable beyond cancer classification and to other sources of medical text besides death certificates.
Resumo:
Context: Pheochromocytomas and paragangliomas (PPGLs) are heritable neoplasms that can be classified into gene-expression subtypes corresponding to their underlying specific genetic drivers. Objective: This study aimed to develop a diagnostic and research tool (Pheo-type) capable of classifying PPGL tumors into gene-expression subtypes that could be used to guide and interpret genetic testing, determine surveillance programs, and aid in elucidation of PPGL biology. Design: A compendium of published microarray data representing 205 PPGL tumors was used for the selection of subtype-specific genes that were then translated to the Nanostring gene-expression platform. A support vector machine was trained on the microarray dataset and then tested on an independent Nanostring dataset representing 38 familial and sporadic cases of PPGL of known genotype (RET, NF1, TMEM127, MAX, HRAS, VHL, and SDHx). Different classifier models involving between three and six subtypes were compared for their discrimination potential. Results: A gene set of 46 genes and six endogenous controls was selected representing six known PPGL subtypes; RTK1–3 (RET, NF1, TMEM127, and HRAS), MAX-like, VHL, and SDHx. Of 38 test cases, 34 (90%) were correctly predicted to six subtypes based on the known genotype to gene-expression subtype association. Removal of the RTK2 subtype from training, characterized by an admixture of tumor and normal adrenal cortex, improved the classification accuracy (35/38). Consolidation of RTK and pseudohypoxic PPGL subtypes to four- and then three-class architectures improved the classification accuracy for clinical application. Conclusions: The Pheo-type gene-expression assay is a reliable method for predicting PPGL genotype using routine diagnostic tumor samples.
Resumo:
Quantitative behaviour analysis requires the classification of behaviour to produce the basic data. In practice, much of this work will be performed by multiple observers, and maximising inter-observer consistency is of particular importance. Another discipline where consistency in classification is vital is biological taxonomy. A classification tool of great utility, the binary key, is designed to simplify the classification decision process and ensure consistent identification of proper categories. We show how this same decision-making tool - the binary key - can be used to promote consistency in the classification of behaviour. The construction of a binary key also ensures that the categories in which behaviour is classified are complete and non-overlapping. We discuss the general principles of design of binary keys, and illustrate their construction and use with a practical example from education research.
Resumo:
In this paper, we define and present a comprehensive classification of user intent for Web searching. The classification consists of three hierarchical levels of informational, navigational, and transactional intent. After deriving attributes of each, we then developed a software application that automatically classified queries using a Web search engine log of over a million and a half queries submitted by several hundred thousand users. Our findings show that more than 80% of Web queries are informational in nature, with about 10% each being navigational and transactional. In order to validate the accuracy of our algorithm, we manually coded 400 queries and compared the results from this manual classification to the results determined by the automated method. This comparison showed that the automatic classification has an accuracy of 74%. Of the remaining 25% of the queries, the user intent is vague or multi-faceted, pointing to the need for probabilistic classification. We discuss how search engines can use knowledge of user intent to provide more targeted and relevant results in Web searching.
Resumo:
This paper suggests an approach for finding an appropriate combination of various parameters for extracting texture features (e.g. choice of spectral band for extracting texture feature, size of the moving window, quantization level of the image, and choice of texture feature etc.) to be used in the classification process. Gray level co-occurrence matrix (GLCM) method has been used for extracting texture from remotely sensed satellite image. Results of the classification of an Indian urban environment using spatial property (texture), derived from spectral and multi-resolution wavelet decomposed images have also been reported. A multivariate data analysis technique called ‘conjoint analysis’ has been used in the study to analyze the relative importance of these parameters. Results indicate that the choice of texture feature and window size have higher relative importance in the classification process than quantization level or the choice of image band for extracting texture feature. In case of texture features derived using wavelet decomposed image, the parameter ‘decomposition level’ has almost equal relative importance as the size of moving window and the decomposition of images up to level one is sufficient and there is no need to go for further decomposition. It was also observed that the classification incorporating texture features improves the overall classification accuracy in a statistically significant manner in comparison to pure spectral classification.
Resumo:
Genomic and proteomic analyses have attracted a great deal of interests in biological research in recent years. Many methods have been applied to discover useful information contained in the enormous databases of genomic sequences and amino acid sequences. The results of these investigations inspire further research in biological fields in return. These biological sequences, which may be considered as multiscale sequences, have some specific features which need further efforts to characterise using more refined methods. This project aims to study some of these biological challenges with multiscale analysis methods and stochastic modelling approach. The first part of the thesis aims to cluster some unknown proteins, and classify their families as well as their structural classes. A development in proteomic analysis is concerned with the determination of protein functions. The first step in this development is to classify proteins and predict their families. This motives us to study some unknown proteins from specific families, and to cluster them into families and structural classes. We select a large number of proteins from the same families or superfamilies, and link them to simulate some unknown large proteins from these families. We use multifractal analysis and the wavelet method to capture the characteristics of these linked proteins. The simulation results show that the method is valid for the classification of large proteins. The second part of the thesis aims to explore the relationship of proteins based on a layered comparison with their components. Many methods are based on homology of proteins because the resemblance at the protein sequence level normally indicates the similarity of functions and structures. However, some proteins may have similar functions with low sequential identity. We consider protein sequences at detail level to investigate the problem of comparison of proteins. The comparison is based on the empirical mode decomposition (EMD), and protein sequences are detected with the intrinsic mode functions. A measure of similarity is introduced with a new cross-correlation formula. The similarity results show that the EMD is useful for detection of functional relationships of proteins. The third part of the thesis aims to investigate the transcriptional regulatory network of yeast cell cycle via stochastic differential equations. As the investigation of genome-wide gene expressions has become a focus in genomic analysis, researchers have tried to understand the mechanisms of the yeast genome for many years. How cells control gene expressions still needs further investigation. We use a stochastic differential equation to model the expression profile of a target gene. We modify the model with a Gaussian membership function. For each target gene, a transcriptional rate is obtained, and the estimated transcriptional rate is also calculated with the information from five possible transcriptional regulators. Some regulators of these target genes are verified with the related references. With these results, we construct a transcriptional regulatory network for the genes from the yeast Saccharomyces cerevisiae. The construction of transcriptional regulatory network is useful for detecting more mechanisms of the yeast cell cycle.
Resumo:
Background The vast sequence divergence among different virus groups has presented a great challenge to alignment-based analysis of virus phylogeny. Due to the problems caused by the uncertainty in alignment, existing tools for phylogenetic analysis based on multiple alignment could not be directly applied to the whole-genome comparison and phylogenomic studies of viruses. There has been a growing interest in alignment-free methods for phylogenetic analysis using complete genome data. Among the alignment-free methods, a dynamical language (DL) method proposed by our group has successfully been applied to the phylogenetic analysis of bacteria and chloroplast genomes. Results In this paper, the DL method is used to analyze the whole-proteome phylogeny of 124 large dsDNA viruses and 30 parvoviruses, two data sets with large difference in genome size. The trees from our analyses are in good agreement to the latest classification of large dsDNA viruses and parvoviruses by the International Committee on Taxonomy of Viruses (ICTV). Conclusions The present method provides a new way for recovering the phylogeny of large dsDNA viruses and parvoviruses, and also some insights on the affiliation of a number of unclassified viruses. In comparison, some alignment-free methods such as the CV Tree method can be used for recovering the phylogeny of large dsDNA viruses, but they are not suitable for resolving the phylogeny of parvoviruses with a much smaller genome size.
Resumo:
The International Classification of Diseases (ICD) is used to categorise diseases, injuries and external causes, and is a key epidemiological tool enabling the storage and retrieval of data from health and vital records to produce core international mortality and morbidity statistics. The ICD is updated periodically to ensure the classification remains current and work is now underway to develop the next revision, ICD-11. There have been almost 20 years since the last ICD edition was published and over 60 years since the last substantial structural revision of the external causes chapter. Revision of such a critical tool requires transparency and documentation to ensure that changes made to the classification system are recorded comprehensively for future reference. In this paper, the authors provide a history of external causes classification development and outline the external cause structure. Approaches to manage ICD-10 deficiencies are discussed and the ICD-11 revision approach regarding the development of, rationale for and implications of proposed changes to the chapter are outlined. Through improved capture of external cause concepts in ICD-11, a stronger evidence base will be available to inform injury prevention, treatment, rehabilitation and policy initiatives to ultimately contribute to a reduction in injury morbidity and mortality.
Resumo:
Despite the prominent use of the Suchey-Brooks (S-B) method of age estimation in forensic anthropological practice, it is subject to intrinsic limitations, with reports of differential inter-population error rates between geographical locations. This study assessed the accuracy of the S-B method to a contemporary adult population in Queensland, Australia and provides robust age parameters calibrated for our population. Three-dimensional surface reconstructions were generated from computed tomography scans of the pubic symphysis of male and female Caucasian individuals aged 15–70 years (n = 195) in Amira® and Rapidform®. Error was analyzed on the basis of bias, inaccuracy and percentage correct classification for left and right symphyseal surfaces. Application of transition analysis and Chi-square statistics demonstrated 63.9% and 69.7% correct age classification associated with the left symphyseal surface of Australian males and females, respectively, using the S-B method. Using Bayesian statistics, probability density distributions for each S-B phase were calculated, providing refined age parameters for our population. Mean inaccuracies of 6.77 (±2.76) and 8.28 (±4.41) years were reported for the left surfaces of males and females, respectively; with positive biases for younger individuals (<55 years) and negative biases in older individuals. Significant sexual dimorphism in the application of the S-B method was observed; and asymmetry in phase classification of the pubic symphysis was a frequent phenomenon. These results recommend that the S-B method should be applied with caution in medico-legal death investigations of Queensland skeletal remains and warrant further investigation of reliable age estimation techniques.
Resumo:
Background & aims: One aim of the Australasian Nutrition Care Day Survey was to determine the nutritional status and dietary intake of acute care hospital patients. Methods: Dietitians from 56 hospitals in Australia and New Zealand completed a 24-h survey of nutritional status and dietary intake of adult hospitalised patients. Nutritional risk was evaluated using the Malnutrition Screening Tool. Participants ‘at risk’ underwent nutritional assessment using Subjective Global Assessment. Based on the International Classification of Diseases (Australian modification), participants were also deemed malnourished if their body mass index was <18.5 kg/m2. Dietitians recorded participants’ dietary intake at each main meal and snacks as 0%, 25%, 50%, 75%, or 100% of that offered. Results: 3122 patients (mean age: 64.6 ± 18 years) participated in the study. Forty-one percent of the participants were “at risk” of malnutrition. Overall malnutrition prevalence was 32%. Fifty-five percent of malnourished participants and 35% of well-nourished participants consumed ≤50% of the food during the 24-h audit. “Not hungry” was the most common reason for not consuming everything offered during the audit. Conclusion: Malnutrition and sub-optimal food intake is prevalent in acute care patients across hospitals in Australia and New Zealand and warrants appropriate interventions.
Resumo:
After attending this presentation, attendees will gain awareness of: (1) the error and uncertainty associated with the application of the Suchey-Brooks (S-B) method of age estimation of the pubic symphysis to a contemporary Australian population; (2) the implications of sexual dimorphism and bilateral asymmetry of the pubic symphysis through preliminary geometric morphometric assessment; and (3) the value of three-dimensional (3D) autopsy data acquisition for creating forensic anthropological standards. This presentation will impact the forensic science community by demonstrating that, in the absence of demographically sound skeletal collections, post-mortem autopsy data provides an exciting platform for the construction of large contemporary ‘virtual osteological libraries’ for which forensic anthropological research can be conducted on Australian individuals. More specifically, this study assesses the applicability and accuracy of the S-B method to a contemporary adult population in Queensland, Australia, and using a geometric morphometric approach, provides an insight to the age-related degeneration of the pubic symphysis. Despite the prominent use of the Suchey-Brooks (1990) method of age estimation in forensic anthropological practice, it is subject to intrinsic limitations, with reports of differential inter-population error rates between geographical locations1-4. Australian forensic anthropology is constrained by a paucity of population specific standards due to a lack of repositories of documented skeletons. Consequently, in Australian casework proceedings, standards constructed from predominately American reference samples are applied to establish a biological profile. In the global era of terrorism and natural disasters, more specific population standards are required to improve the efficiency of medico-legal death investigation in Queensland. The sample comprises multi-slice computed tomography (MSCT) scans of the pubic symphysis (slice thickness: 0.5mm, overlap: 0.1mm) on 195 individuals of caucasian ethnicity aged 15-70 years. Volume rendering reconstruction of the symphyseal surface was conducted in Amira® (v.4.1) and quantitative analyses in Rapidform® XOS. The sample was divided into ten-year age sub-sets (eg. 15-24) with a final sub-set of 65-70 years. Error with respect to the method’s assigned means were analysed on the basis of bias (directionality of error), inaccuracy (magnitude of error) and percentage correct classification of left and right symphyseal surfaces. Morphometric variables including surface area, circumference, maximum height and width of the symphyseal surface and micro-architectural assessment of cortical and trabecular bone composition were quantified using novel automated engineering software capabilities. The results of this study demonstrated correct age classification utilizing the mean and standard deviations of each phase of the S-B method of 80.02% and 86.18% in Australian males and females, respectively. Application of the S-B method resulted in positive biases and mean inaccuracies of 7.24 (±6.56) years for individuals less than 55 years of age, compared to negative biases and mean inaccuracies of 5.89 (±3.90) years for individuals greater than 55 years of age. Statistically significant differences between chronological and S-B mean age were demonstrated in 83.33% and 50% of the six age subsets in males and females, respectively. Asymmetry of the pubic symphysis was a frequent phenomenon with 53.33% of the Queensland population exhibiting statistically significant (χ2 - p<0.01) differential phase classification of left and right surfaces of the same individual. Directionality was found in bilateral asymmetry, with the right symphyseal faces being slightly older on average and providing more accurate estimates using the S-B method5. Morphometric analysis verified these findings, with the left surface exhibiting significantly greater circumference and surface area than the right (p<0.05). Morphometric analysis demonstrated an increase in maximum height and width of the surface with age, with most significant changes (p<0.05) occurring between the 25-34 and 55-64 year age subsets. These differences may be attributed to hormonal components linked to menopause in females and a reduction in testosterone in males. Micro-architectural analysis demonstrated degradation of cortical composition with age, with differential bone resorption between the medial, ventral and dorsal surfaces of the pubic symphysis. This study recommends that the S-B method be applied with caution in medico-legal death investigations of unknown skeletal remains in Queensland. Age estimation will always be accompanied by error; therefore this study demonstrates the potential for quantitative morphometric modelling of age related changes of the pubic symphysis as a tool for methodological refinement, providing a rigor and robust assessment to remove the subjectivity associated with current pelvic aging methods.