977 resultados para Document classification
Resumo:
Over past few decades, frog species have been experiencing dramatic decline around the world. The reason for this decline includes habitat loss, invasive species, climate change and so on. To better know the status of frog species, classifying frogs has become increasingly important. In this study, acoustic features are investigated for multi-level classification of Australian frogs: family, genus and species, including three families, eleven genera and eighty five species which are collected from Queensland, Australia. For each frog species, six instances are selected from which ten acoustic features are calculated. Then, the multicollinearity between ten features are studied for selecting non-correlated features for subsequent analysis. A decision tree (DT) classifier is used to visually and explicitly determine which acoustic features are relatively important for classifying family, which for genus, and which for species. Finally, a weighted support vector machines (SVMs) classifier is used for the multi- level classification with three most important acoustic features respectively. Our experiment results indicate that using different acoustic feature sets can successfully classify frogs at different levels and the average classification accuracy can be up to 85.6%, 86.1% and 56.2% for family, genus and species respectively.
Resumo:
Document clustering is one of the prominent methods for mining important information from the vast amount of data available on the web. However, document clustering generally suffers from the curse of dimensionality. Providentially in high dimensional space, data points tend to be more concentrated in some areas of clusters. We take advantage of this phenomenon by introducing a novel concept of dynamic cluster representation named as loci. Clusters’ loci are efficiently calculated using documents’ ranking scores generated from a search engine. We propose a fast loci-based semi-supervised document clustering algorithm that uses clusters’ loci instead of conventional centroids for assigning documents to clusters. Empirical analysis on real-world datasets shows that the proposed method produces cluster solutions with promising quality and is substantially faster than several benchmarked centroid-based semi-supervised document clustering methods.
Resumo:
We propose a robust method for mosaicing of document images using features derived from connected components. Each connected component is described using the Angular Radial Tran. form (ART). To ensure geometric consistency during feature matching, the ART coefficients of a connected component are augmented with those of its two nearest neighbors. The proposed method addresses two critical issues often encountered in correspondence matching: (i) The stability of features and (ii) Robustness against false matches due to the multiple instances of characters in a document image. The use of connected components guarantees a stable localization across images. The augmented features ensure a successful correspondence matching even in the presence of multiple similar regions within the page. We illustrate the effectiveness of the proposed method on camera captured document images exhibiting large variations in viewpoint, illumination and scale.
Resumo:
This thesis studies document signatures, which are small representations of documents and other objects that can be stored compactly and compared for similarity. This research finds that document signatures can be effectively and efficiently used to both search and understand relationships between documents in large collections, scalable enough to search a billion documents in a fraction of a second. Deliverables arising from the research include an investigation of the representational capacity of document signatures, the publication of an open-source signature search platform and an approach for scaling signature retrieval to operate efficiently on collections containing hundreds of millions of documents.
Resumo:
Remote sensing provides a lucid and effective means for crop coverage identification. Crop coverage identification is a very important technique, as it provides vital information on the type and extent of crop cultivated in a particular area. This information has immense potential in the planning for further cultivation activities and for optimal usage of the available fertile land. As the frontiers of space technology advance, the knowledge derived from the satellite data has also grown in sophistication. Further, image classification forms the core of the solution to the crop coverage identification problem. No single classifier can prove to satisfactorily classify all the basic crop cover mapping problems of a cultivated region. We present in this paper the experimental results of multiple classification techniques for the problem of crop cover mapping of a cultivated region. A detailed comparison of the algorithms inspired by social behaviour of insects and conventional statistical method for crop classification is presented in this paper. These include the Maximum Likelihood Classifier (MLC), Particle Swarm Optimisation (PSO) and Ant Colony Optimisation (ACO) techniques. The high resolution satellite image has been used for the experiments.
Resumo:
A complete list of homogeneous operators in the Cowen-Douglas class B-n(D) is given. This classification is obtained from an explicit realization of all the homogeneous Hermitian holomorphic vector bundles on the unit disc under the action of the universal covering group of the bi-holomorphic automorphism group of the unit disc.
Resumo:
This paper proposes new metrics and a performance-assessment framework for vision-based weed and fruit detection and classification algorithms. In order to compare algorithms, and make a decision on which one to use fora particular application, it is necessary to take into account that the performance obtained in a series of tests is subject to uncertainty. Such characterisation of uncertainty seems not to be captured by the performance metrics currently reported in the literature. Therefore, we pose the problem as a general problem of scientific inference, which arises out of incomplete information, and propose as a metric of performance the(posterior) predictive probabilities that the algorithms will provide a correct outcome for target and background detection. We detail the framework through which these predicted probabilities can be obtained, which is Bayesian in nature. As an illustration example, we apply the framework to the assessment of performance of four algorithms that could potentially be used in the detection of capsicums (peppers).
Resumo:
Background Contemporary Finnish, spoken and written, reveals loanwords or foreignisms in the form of hybrids: a mixture of Finnish and foreign syllables (alumiinivalua). Sometimes loanwords are inserted into the Finnish sentence in their raw form just as they are found in the source language (pulp, after sales palvelu). Again, sometimes loanwords are calques, which appear Finnish but are spelled and pronounced in an altogether foreign manner (Protomanageri, Promenadi kampuksella). Research Questions What role does Finnish business translation play in the migration of foreignisms into Finnish if we consider translation "as a construct of solutions determined by the ideological constraints and conflicts characterizing the target culture" (Robyns 1992: 212)? What attitudes do the Finns display toward the presence of foreignisms in their language? What socio-economic or ideological conditions (Bassnett 1994: 321) are responsible for these attitudes? Are these conditions dynamic? What tools can be used to measure such attitudes? This dissertation set out to answer these and similar questions. Attitudes are imperialist (where otherness is both denied and transformed), defensive (where otherness is acknowledged, transformed, and vilified), transdiscursive (a neutral attitude to both otherness and transformation), or finally defective (where alien migration is acknowledged and "stimulated") (Robyns 1994: 60). Methodology The research method follows Rose's schema (1984: 8): (a) take an existing theory, (b) develop from it a proposition specific enough to be tested, (c) devise a scheme that tests this proposition, (d) carry through the scheme in practice, (e) draw up results and discuss conclusions in relation to the original theory. In other words, the method attempts an explanation of a Finnish social phenomenon based on systematic analyses of translated evidence (Lewins 1992: 4) whereby what really matters is the logical sequence that connects the empirical data to the initial research questions raised above and, ultimately to its conclusion (Yin 1984: 29). Results This research found that Finnish translators of the Nokia annual reports used a foreignism whenever possible such as komponentin instead of rakenneosa, or investoida instead of sijoittaa, and often without any apparent justification (Pryce 2003: 203-12) more than the translator's personal preference. In the old documents (minutes of meetings of the Board of Directors of Osakeyhtio H. Saastamoinen, Ltd. dated 5 July 1912-1917, a NOPSA booklet (1932), Enzo-Gutzeit-Tornator Oy document (1938), Imatra Steel Oy Annual Report 1964, and Nokia Oy Annual Report 1946), foreignisms under Haugen's (1950: 210-31) Classification #1 occurred an average of 0.6 times, while in the new documents (Nokia 1998 translated Annual Reports) they occurred an average of 6.5 times. That big difference, suggests transdiscursive and defective attitudes in Finnish society toward the other. In the 1850s, Finnish attitudes toward alien persons and cultures were hardened, intolerant and prohibitive because language politics were both nascent and emerging, and Finns adopted a defensive stance (Paloposki 2002: 102 ff) to protect their cultural and national treasures such as language and folklore. Innovation The innovation here is that no prior doctoral level research measured Finnish attitudes toward foreignisms using a business translation approach. This is the first time that Haugen's classification has been modified and applied in target language analysis. It is hoped that this method would be replicated in similar research in the future. Applications For practical applications, researchers with interest in languages, language development, language influences, language ideologies, and power structures that affect national language policies will find this thesis useful, especially the model for collecting, grouping, and analyzing foreignisms that has been demonstrated here. It is intended to document for posterity current attitudes of Finns toward the other as revealed in business translations from 1912-1964, and in 1998. This way, future language researchers would be able to explore a time-line of Finnish language development and attitudes toward the other. Communication firms may also find this research interesting. In future, could the model we adopted be used to analyze literary texts or religious texts for example? Future Trends Though business documents show transdiscursive attitudes, other segments of Finnish society may show defensive or imperialist attitudes. When the ideology of industrialization changes in the future, will Finnish attitudes toward the other change as well? Will it then be possible to use the same kind of analytical tools to measure Finnish attitudes? More broadly, will linguistic change continue in the same direction of transdiscursive attitudes, or will the change slow down or even reverse into xenophobic attitudes? Is this our model culture-specific or can it be used in the context of other cultures? Conclusion There is anger against foreignisms in Finland as newspaper publications and television broadcasts show, but research shows that a majority of Finns consider foreignisms and the languages from which they come as sources of enrichment for Finnish culture (Laitinen 2000, Eurobarometer series 41 of July 1994, 44 of Spring 1996, 50 of Autumn 1998). Ideologies of industrialization and globalization in Finland have facilitated transdiscursive tendencies. When Finland's political ideology was intolerant toward foreign influences in the 1850s because Finland was in the process of consolidating her nascent country and language, attitudes toward the importation of loanwords also became intolerant. Presently, when industrialization and globalization became the dominant ideologies, we see a shift in attitudes toward transdiscursive tendencies. Ideology is usually unseen and too often ignored by translation researchers. However, ideology reveals itself as the most powerful factor affecting language attitudes in a target culture. Key words Finnish, Business Translation, Ideology, Foreignisms, Imperialist Attitudes, Defensive Attitudes, Transdiscursive Attitudes, Defective Attitudes, the Other, Old Documents, New Documents.
Resumo:
Ninety-two strong-motion earthquake records from the California region, U.S.A., have been statistically studied using principal component analysis in terms of twelve important standardized strong-motion characteristics. The first two principal components account for about 57 per cent of the total variance. Based on these two components the earthquake records are classified into nine groups in a two-dimensional principal component plane. Also a unidimensional engineering rating scale is proposed. The procedure can be used as an objective approach for classifying and rating future earthquakes.
Resumo:
A review was carried out of the radiographs of twenty-five infants with birth weights under 1000 G, who survived for more than twenty-eight days; eighteen of these had enough suitable films for a survey of the progressive bone changes which occur in these infants, including estimation of humeral cortical cross-sectional area. The incidence of the changes has been assessed and a typical progression of radiographic appearances has been shown, with a suggested system of staging. All infants showed some loss of bone mineral, with frank changes of rickets occurring in forty-four percent. Aetiological factors are mainly concerned with the difficulty of supplying and ensuring absorption of sufficient bone mineral (calcium and phosphate) and vitamin D. Liver immaturity may be another factor. Disease states additional to prematurity accentuate the problem. Rib fractures occurring around 80–90 days post-nataEy commonly draw attention to the bone disorder and are probably the major clinical factor of importance; there is a high incidence of associated lung disease of uncertain pathology. Attention is drawn to possible confusion with other bone disorders in the post-natal period.
Resumo:
This paper presents the site classification of Bangalore Mahanagar Palike (BMP) area using geophysical data and the evaluation of spectral acceleration at ground level using probabilistic approach. Site classification has been carried out using experimental data from the shallow geophysical method of Multichannel Analysis of Surface wave (MASW). One-dimensional (1-D) MASW survey has been carried out at 58 locations and respective velocity profiles are obtained. The average shear wave velocity for 30 m depth (Vs(30)) has been calculated and is used for the site classification of the BMP area as per NEHRP (National Earthquake Hazards Reduction Program). Based on the Vs(30) values major part of the BMP area can be classified as ``site class D'', and ``site class C'. A smaller portion of the study area, in and around Lalbagh Park, is classified as ``site class B''. Further, probabilistic seismic hazard analysis has been carried out to map the seismic hazard in terms spectral acceleration (S-a) at rock and the ground level considering the site classes and six seismogenic sources identified. The mean annual rate of exceedance and cumulative probability hazard curve for S. have been generated. The quantified hazard values in terms of spectral acceleration for short period and long period are mapped for rock, site class C and D with 10% probability of exceedance in 50 years on a grid size of 0.5 km. In addition to this, the Uniform Hazard Response Spectrum (UHRS) at surface level has been developed for the 5% damping and 10% probability of exceedance in 50 years for rock, site class C and D These spectral acceleration and uniform hazard spectrums can be used to assess the design force for important structures and also to develop the design spectrum.
Resumo:
Being able to accurately predict the risk of falling is crucial in patients with Parkinson’s dis- ease (PD). This is due to the unfavorable effect of falls, which can lower the quality of life as well as directly impact on survival. Three methods considered for predicting falls are decision trees (DT), Bayesian networks (BN), and support vector machines (SVM). Data on a 1-year prospective study conducted at IHBI, Australia, for 51 people with PD are used. Data processing are conducted using rpart and e1071 packages in R for DT and SVM, con- secutively; and Bayes Server 5.5 for the BN. The results show that BN and SVM produce consistently higher accuracy over the 12 months evaluation time points (average sensitivity and specificity > 92%) than DT (average sensitivity 88%, average specificity 72%). DT is prone to imbalanced data so needs to adjust for the misclassification cost. However, DT provides a straightforward, interpretable result and thus is appealing for helping to identify important items related to falls and to generate fallers’ profiles.
Resumo:
Objective Death certificates provide an invaluable source for cancer mortality statistics; however, this value can only be realised if accurate, quantitative data can be extracted from certificates – an aim hampered by both the volume and variable nature of certificates written in natural language. This paper proposes an automatic classification system for identifying cancer related causes of death from death certificates. Methods Detailed features, including terms, n-grams and SNOMED CT concepts were extracted from a collection of 447,336 death certificates. These features were used to train Support Vector Machine classifiers (one classifier for each cancer type). The classifiers were deployed in a cascaded architecture: the first level identified the presence of cancer (i.e., binary cancer/nocancer) and the second level identified the type of cancer (according to the ICD-10 classification system). A held-out test set was used to evaluate the effectiveness of the classifiers according to precision, recall and F-measure. In addition, detailed feature analysis was performed to reveal the characteristics of a successful cancer classification model. Results The system was highly effective at identifying cancer as the underlying cause of death (F-measure 0.94). The system was also effective at determining the type of cancer for common cancers (F-measure 0.7). Rare cancers, for which there was little training data, were difficult to classify accurately (F-measure 0.12). Factors influencing performance were the amount of training data and certain ambiguous cancers (e.g., those in the stomach region). The feature analysis revealed a combination of features were important for cancer type classification, with SNOMED CT concept and oncology specific morphology features proving the most valuable. Conclusion The system proposed in this study provides automatic identification and characterisation of cancers from large collections of free-text death certificates. This allows organisations such as Cancer Registries to monitor and report on cancer mortality in a timely and accurate manner. In addition, the methods and findings are generally applicable beyond cancer classification and to other sources of medical text besides death certificates.
Resumo:
The use of near infrared (NIR) hyperspectral imaging and hyperspectral image analysis for distinguishing between hard, intermediate and soft maize kernels from inbred lines was evaluated. NIR hyperspectral images of two sets (12 and 24 kernels) of whole maize kernels were acquired using a Spectral Dimensions MatrixNIR camera with a spectral range of 960-1662 nm and a sisuChema SWIR (short wave infrared) hyperspectral pushbroom imaging system with a spectral range of 1000-2498 nm. Exploratory principal component analysis (PCA) was used on absorbance images to remove background, bad pixels and shading. On the cleaned images. PCA could be used effectively to find histological classes including glassy (hard) and floury (soft) endosperm. PCA illustrated a distinct difference between glassy and floury endosperm along principal component (PC) three on the MatrixNIR and PC two on the sisuChema with two distinguishable clusters. Subsequently partial least squares discriminant analysis (PLS-DA) was applied to build a classification model. The PLS-DA model from the MatrixNIR image (12 kernels) resulted in root mean square error of prediction (RMSEP) value of 0.18. This was repeated on the MatrixNIR image of the 24 kernels which resulted in RMSEP of 0.18. The sisuChema image yielded RMSEP value of 0.29. The reproducible results obtained with the different data sets indicate that the method proposed in this paper has a real potential for future classification uses.
Resumo:
In this presentation, I reflect upon the global landscape surrounding the governance and classification of media content, at a time of rapid change in media platforms and services for content production and distribution, and contested cultural and social norms. I discuss the tensions and contradictions arising in the relationship between national, regional and global dimensions of media content distribution, as well as the changing relationships between state and non-state actors. These issues will be explored through consideration of issues such as: recent debates over film censorship; the review of the National Classification Scheme conducted by the Australian Law Reform Commission; online controversies such as the future of the Reddit social media site; and videos posted online by the militant group ISIS.