149 resultados para Matrix factorization

em Queensland University of Technology - ePrints Archive


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Narrative text is a useful way of identifying injury circumstances from the routine emergency department data collections. Automatically classifying narratives based on machine learning techniques is a promising technique, which can consequently reduce the tedious manual classification process. Existing works focus on using Naive Bayes which does not always offer the best performance. This paper proposes the Matrix Factorization approaches along with a learning enhancement process for this task. The results are compared with the performance of various other classification approaches. The impact on the classification results from the parameters setting during the classification of a medical text dataset is discussed. With the selection of right dimension k, Non Negative Matrix Factorization-model method achieves 10 CV accuracy of 0.93.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The multi-criteria decision making methods, Preference METHods for Enrichment Evaluation (PROMETHEE) and Graphical Analysis for Interactive Assistance (GAIA), and the two-way Positive Matrix Factorization (PMF) receptor model were applied to airborne fine particle compositional data collected at three sites in Hong Kong during two monitoring campaigns held from November 2000 to October 2001 and November 2004 to October 2005. PROMETHEE/GAIA indicated that the three sites were worse during the later monitoring campaign, and that the order of the air quality at the sites during each campaign was: rural site > urban site > roadside site. The PMF analysis on the other hand, identified 6 common sources at all of the sites (diesel vehicle, fresh sea salt, secondary sulphate, soil, aged sea salt and oil combustion) which accounted for approximately 68.8 ± 8.7% of the fine particle mass at the sites. In addition, road dust, gasoline vehicle, biomass burning, secondary nitrate, and metal processing were identified at some of the sites. Secondary sulphate was found to be the highest contributor to the fine particle mass at the rural and urban sites with vehicle emission as a high contributor to the roadside site. The PMF results are broadly similar to those obtained in a previous analysis by PCA/APCS. However, the PMF analysis resolved more factors at each site than the PCA/APCS. In addition, the study demonstrated that combined results from multi-criteria decision making analysis and receptor modelling can provide more detailed information that can be used to formulate the scientific basis for mitigating air pollution in the region.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Purpose: To investigate the significance of sources around measurement sites, assist the development of control strategies for the important sources and mitigate the adverse effects of air pollution due to particle size. Methods: In this study, sampling was conducted at two sites located in urban/industrial and residential areas situated at roadsides along the Brisbane Urban Corridor. Ultrafine and fine particle measurements obtained at the two sites in June-July 2002 were analysed by Positive Matrix Factorization (PMF). Results: Six sources were present, including local traffic, two traffic sources, biomass burning, and two currently unidentified sources. Secondary particles had a significant impact at Site 1, while nitrates, peak traffic hours and main roads located close to the source also affected the results for both sites. Conclusions: This significant traffic corridor exemplifies the type of sources present in heavily trafficked locations and future attempts to control pollution in this type of environment could focus on the sources that were identified.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The aim of this paper is to provide a comparison of various algorithms and parameters to build reduced semantic spaces. The effect of dimension reduction, the stability of the representation and the effect of word order are examined in the context of the five algorithms bearing on semantic vectors: Random projection (RP), singular value decom- position (SVD), non-negative matrix factorization (NMF), permutations and holographic reduced representations (HRR). The quality of semantic representation was tested by means of synonym finding task using the TOEFL test on the TASA corpus. Dimension reduction was found to improve the quality of semantic representation but it is hard to find the optimal parameter settings. Even though dimension reduction by RP was found to be more generally applicable than SVD, the semantic vectors produced by RP are somewhat unstable. The effect of encoding word order into the semantic vector representation via HRR did not lead to any increase in scores over vectors constructed from word co-occurrence in context information. In this regard, very small context windows resulted in better semantic vectors for the TOEFL test.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Topic recommendation can help users deal with the information overload issue in micro-blogging communities. This paper proposes to use the implicit information network formed by the multiple relationships among users, topics and micro-blogs, and the temporal information of micro-blogs to find semantically and temporally relevant topics of each topic, and to profile users' time-drifting topic interests. The Content based, Nearest Neighborhood based and Matrix Factorization models are used to make personalized recommendations. The effectiveness of the proposed approaches is demonstrated in the experiments conducted on a real world dataset that collected from Twitter.com.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Robust hashing is an emerging field that can be used to hash certain data types in applications unsuitable for traditional cryptographic hashing methods. Traditional hashing functions have been used extensively for data/message integrity, data/message authentication, efficient file identification and password verification. These applications are possible because the hashing process is compressive, allowing for efficient comparisons in the hash domain but non-invertible meaning hashes can be used without revealing the original data. These techniques were developed with deterministic (non-changing) inputs such as files and passwords. For such data types a 1-bit or one character change can be significant, as a result the hashing process is sensitive to any change in the input. Unfortunately, there are certain applications where input data are not perfectly deterministic and minor changes cannot be avoided. Digital images and biometric features are two types of data where such changes exist but do not alter the meaning or appearance of the input. For such data types cryptographic hash functions cannot be usefully applied. In light of this, robust hashing has been developed as an alternative to cryptographic hashing and is designed to be robust to minor changes in the input. Although similar in name, robust hashing is fundamentally different from cryptographic hashing. Current robust hashing techniques are not based on cryptographic methods, but instead on pattern recognition techniques. Modern robust hashing algorithms consist of feature extraction followed by a randomization stage that introduces non-invertibility and compression, followed by quantization and binary encoding to produce a binary hash output. In order to preserve robustness of the extracted features, most randomization methods are linear and this is detrimental to the security aspects required of hash functions. Furthermore, the quantization and encoding stages used to binarize real-valued features requires the learning of appropriate quantization thresholds. How these thresholds are learnt has an important effect on hashing accuracy and the mere presence of such thresholds are a source of information leakage that can reduce hashing security. This dissertation outlines a systematic investigation of the quantization and encoding stages of robust hash functions. While existing literature has focused on the importance of quantization scheme, this research is the first to emphasise the importance of the quantizer training on both hashing accuracy and hashing security. The quantizer training process is presented in a statistical framework which allows a theoretical analysis of the effects of quantizer training on hashing performance. This is experimentally verified using a number of baseline robust image hashing algorithms over a large database of real world images. This dissertation also proposes a new randomization method for robust image hashing based on Higher Order Spectra (HOS) and Radon projections. The method is non-linear and this is an essential requirement for non-invertibility. The method is also designed to produce features more suited for quantization and encoding. The system can operate without the need for quantizer training, is more easily encoded and displays improved hashing performance when compared to existing robust image hashing algorithms. The dissertation also shows how the HOS method can be adapted to work with biometric features obtained from 2D and 3D face images.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

An Aerodyne Aerosol Mass Spectrometer was deployed at five urban schools to examine spatial and temporal variability of organic aerosols (OA) and positive matrix factorization (PMF) used for the first time in the Southern Hemisphere to apportion the sources of the OA across an urban area. The sources identified included hydrocarbon-like OA (HOA), biomass burning OA (BBOA) and oxygenated OA (OOA). At all sites, the main source was OOA, which accounted for 62–73% of the total OA mass and was generally more oxidized compared to those reported in the Northern Hemisphere. This suggests that there are differences in aging processes or regional sources in the two hemispheres. Unlike HOA and BBOA, OOA demonstrated instructive temporal variations but not spatial variation across the urban area. Application of cluster analysis to the PMF-derived sources offered a simple and effective method for qualitative comparison of PMF sources that can be used in other studies.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Description of a patient's injuries is recorded in narrative text form by hospital emergency departments. For statistical reporting, this text data needs to be mapped to pre-defined codes. Existing research in this field uses the Naïve Bayes probabilistic method to build classifiers for mapping. In this paper, we focus on providing guidance on the selection of a classification method. We build a number of classifiers belonging to different classification families such as decision tree, probabilistic, neural networks, and instance-based, ensemble-based and kernel-based linear classifiers. An extensive pre-processing is carried out to ensure the quality of data and, in hence, the quality classification outcome. The records with a null entry in injury description are removed. The misspelling correction process is carried out by finding and replacing the misspelt word with a soundlike word. Meaningful phrases have been identified and kept, instead of removing the part of phrase as a stop word. The abbreviations appearing in many forms of entry are manually identified and only one form of abbreviations is used. Clustering is utilised to discriminate between non-frequent and frequent terms. This process reduced the number of text features dramatically from about 28,000 to 5000. The medical narrative text injury dataset, under consideration, is composed of many short documents. The data can be characterized as high-dimensional and sparse, i.e., few features are irrelevant but features are correlated with one another. Therefore, Matrix factorization techniques such as Singular Value Decomposition (SVD) and Non Negative Matrix Factorization (NNMF) have been used to map the processed feature space to a lower-dimensional feature space. Classifiers with these reduced feature space have been built. In experiments, a set of tests are conducted to reflect which classification method is best for the medical text classification. The Non Negative Matrix Factorization with Support Vector Machine method can achieve 93% precision which is higher than all the tested traditional classifiers. We also found that TF/IDF weighting which works well for long text classification is inferior to binary weighting in short document classification. Another finding is that the Top-n terms should be removed in consultation with medical experts, as it affects the classification performance.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Children are particularly susceptible to air pollution and schools are examples of urban microenvironments that can account for a large portion of children’s exposure to airborne particles. Thus this paper aimed to determine the sources of primary airborne particles that children are exposed to at school by analyzing selected organic molecular markers at 11 urban schools in Brisbane, Australia. Positive matrix factorization analysis identified four sources at the schools: vehicle emissions, biomass burning, meat cooking and plant wax emissions accounting for 45%, 29%, 16% and 7%, of the organic carbon respectively. Biomass burning peaked in winter due to prescribed burning of bushland around Brisbane. Overall, the results indicated that both local (traffic) and regional (biomass burning) sources of primary organic aerosols influence the levels of ambient particles that children are exposed at the schools. These results have implications for potential control strategies for mitigating exposure at schools.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Ameliorated strategies were put forward to improve the model predictive control in reducing the wind induced vibration of spatial latticed structures. The dynamic matrix control (DMC) predictive method was used and the reference trajectory which is called the decaying functions was suggested for the analysis of spatial latticed structure (SLS) under wind loads. The wind-induced vibration control model of SLS with improved DMC model predictive control was illustrated, then the different feedback strategies were investigated and a typical SLS was taken as example to investigate the reduction of wind-induced vibration. In addition, the robustness and reliability of DMC strategy were discussed by varying the model configurations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The ideas for this CRC research project are based directly on Sidwell, Kennedy and Chan (2002). That research examined a number of case studies to identify the characteristics of successful projects. The findings were used to construct a matrix of best practice project delivery strategies. The purpose of this literature review is to test the decision matrix against established theory and best practice in the subject of construction project management.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Co-operative Research Centre for Construction Innovation (CRC-CI) is funding a project known as Value Alignment Process for Project Delivery. The project consists of a study of best practice project delivery and the development of a suite of products, resources and services to guide project teams towards the best procurement approach for a specific project or group of projects. These resources will be focused on promoting the principles that underlie best practice project delivery rather than simply identifying an off-the-shelf procurement system. This project builds on earlier work by Sidwell, Kennedy and Chan (2002), on re-engineering the construction delivery process, which developed a procurement framework in the form of a Decision Matrix

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The effective management of bridge stock involves making decisions as to when to repair, remedy, or do nothing, taking into account the financial and service life implications. Such decisions require a reliable diagnosis as to the cause of distress and an understanding of the likely future degradation. Such diagnoses are based on a combination of visual inspections, laboratory tests on samples and expert opinions. In addition, the choice of appropriate laboratory tests requires an understanding of the degradation mechanisms involved. Under these circumstances, the use of expert systems or evaluation tools developed from “realtime” case studies provides a promising solution in the absence of expert knowledge. This paper addresses the issues in bridge infrastructure management in Queensland, Australia. Bridges affected by alkali silica reaction and chloride induced corrosion have been investigated and the results presented using a mind mapping tool. The analysis highights that several levels of rules are required to assess the mechanism causing distress. The systematic development of a rule based approach is presented. An example of this application to a case study bridge has been used to demonstrate that preliminary results are satisfactory.