971 resultados para Association mining
Resumo:
The management of main material prices of provincial highway project quota has problems of lag and blindness. Framework of provincial highway project quota data MIS and main material price data warehouse were established based on WEB firstly. Then concrete processes of provincial highway project main material prices were brought forward based on BP neural network algorithmic. After that standard BP algorithmic, additional momentum modify BP network algorithmic, self-adaptive study speed improved BP network algorithmic were compared in predicting highway project main prices. The result indicated that it is feasible to predict highway main material prices using BP NN, and using self-adaptive study speed improved BP network algorithmic is the relatively best one.
Resumo:
Cancer represents a major public health concern in Australia. Causes of cancer are multifactorial with lack of physical activity being considered one of the known risk factors, particularly for breast and colorectal cancers. Participating in exercise has also been associated with benefits during and following treatment for cancer, including improvements in psychosocial and physical outcomes, as well as better compliance with treatment regimens, reduced impact of disease symptoms and treatment-related side effects, and survival benefits for particular cancers. The general exercise prescription for people undertaking or having completed cancer treatment is of low to moderate intensity, regular frequency (3-5 times/week) for at least 20 minutes per session, involving aerobic, resistance or mixed exercise types. Future work needs to push the boundaries of this exercise prescription, so that we can better understand what constitutes optimal, desirable and necessary frequency, duration, intensity and type, and how specific characteristics of the individual (e.g., age, cancer type, treatment, presence of specific symptoms) influence this prescription. What follows is a summary of the cancer and exercise literature, in particular the purpose of exercise following diagnosis of cancer, the potential benefits derived by cancer patients and survivors from participating in exercise programs, and exercise prescription guidelines and contraindications or considerations for exercise prescription with this special population. This report represents the position stand of the Australian Association of Exercise and Sport Science on exercise and cancer recovery and has the purpose of guiding Accredited Exercise Physiologists in their work with cancer patients.
Resumo:
The neXus2 research project has sought to investigate the library and information services (LIS) workforce in Australia, from the institutional or employer perspective. The study builds on the neXus1 study, which collected data from individuals in the LIS workforce in order to present a snapshot of the profession in 2006, highlighting the demographics, educational background and career details of library and information professionals in Australia. To counterbalance this individual perspective, library institutions were invited to participate in a survey to contribute further data as employers. This final report on the neXus2 project compares the findings from the different library sectors, ie academic libraries, TAFE libraries, the National and State libraries, public libraries, special libraries and school libraries.
Resumo:
Classical negotiation models are weak in supporting real-world business negotiations because these models often assume that the preference information of each negotiator is made public. Although parametric learning methods have been proposed for acquiring the preference information of negotiation opponents, these methods suffer from the strong assumptions about the specific utility function and negotiation mechanism employed by the opponents. Consequently, it is difficult to apply these learning methods to the heterogeneous negotiation agents participating in e‑marketplaces. This paper illustrates the design, development, and evaluation of a nonparametric negotiation knowledge discovery method which is underpinned by the well-known Bayesian learning paradigm. According to our empirical testing, the novel knowledge discovery method can speed up the negotiation processes while maintaining negotiation effectiveness. To the best of our knowledge, this is the first nonparametric negotiation knowledge discovery method developed and evaluated in the context of multi-issue bargaining over e‑marketplaces.
Resumo:
It is a big challenge to clearly identify the boundary between positive and negative streams. Several attempts have used negative feedback to solve this challenge; however, there are two issues for using negative relevance feedback to improve the effectiveness of information filtering. The first one is how to select constructive negative samples in order to reduce the space of negative documents. The second issue is how to decide noisy extracted features that should be updated based on the selected negative samples. This paper proposes a pattern mining based approach to select some offenders from the negative documents, where an offender can be used to reduce the side effects of noisy features. It also classifies extracted features (i.e., terms) into three categories: positive specific terms, general terms, and negative specific terms. In this way, multiple revising strategies can be used to update extracted features. An iterative learning algorithm is also proposed to implement this approach on RCV1, and substantial experiments show that the proposed approach achieves encouraging performance.
Resumo:
Dealing with the ever-growing information overload in the Internet, Recommender Systems are widely used online to suggest potential customers item they may like or find useful. Collaborative Filtering is the most popular techniques for Recommender Systems which collects opinions from customers in the form of ratings on items, services or service providers. In addition to the customer rating about a service provider, there is also a good number of online customer feedback information available over the Internet as customer reviews, comments, newsgroups post, discussion forums or blogs which is collectively called user generated contents. This information can be used to generate the public reputation of the service providers’. To do this, data mining techniques, specially recently emerged opinion mining could be a useful tool. In this paper we present a state of the art review of Opinion Mining from online customer feedback. We critically evaluate the existing work and expose cutting edge area of interest in opinion mining. We also classify the approaches taken by different researchers into several categories and sub-categories. Each of those steps is analyzed with their strength and limitations in this paper.
Resumo:
An information filtering (IF) system monitors an incoming document stream to find the documents that match the information needs specified by the user profiles. To learn to use the user profiles effectively is one of the most challenging tasks when developing an IF system. With the document selection criteria better defined based on the users’ needs, filtering large streams of information can be more efficient and effective. To learn the user profiles, term-based approaches have been widely used in the IF community because of their simplicity and directness. Term-based approaches are relatively well established. However, these approaches have problems when dealing with polysemy and synonymy, which often lead to an information overload problem. Recently, pattern-based approaches (or Pattern Taxonomy Models (PTM) [160]) have been proposed for IF by the data mining community. These approaches are better at capturing sematic information and have shown encouraging results for improving the effectiveness of the IF system. On the other hand, pattern discovery from large data streams is not computationally efficient. Also, these approaches had to deal with low frequency pattern issues. The measures used by the data mining technique (for example, “support” and “confidences”) to learn the profile have turned out to be not suitable for filtering. They can lead to a mismatch problem. This thesis uses the rough set-based reasoning (term-based) and pattern mining approach as a unified framework for information filtering to overcome the aforementioned problems. This system consists of two stages - topic filtering and pattern mining stages. The topic filtering stage is intended to minimize information overloading by filtering out the most likely irrelevant information based on the user profiles. A novel user-profiles learning method and a theoretical model of the threshold setting have been developed by using rough set decision theory. The second stage (pattern mining) aims at solving the problem of the information mismatch. This stage is precision-oriented. A new document-ranking function has been derived by exploiting the patterns in the pattern taxonomy. The most likely relevant documents were assigned higher scores by the ranking function. Because there is a relatively small amount of documents left after the first stage, the computational cost is markedly reduced; at the same time, pattern discoveries yield more accurate results. The overall performance of the system was improved significantly. The new two-stage information filtering model has been evaluated by extensive experiments. Tests were based on the well-known IR bench-marking processes, using the latest version of the Reuters dataset, namely, the Reuters Corpus Volume 1 (RCV1). The performance of the new two-stage model was compared with both the term-based and data mining-based IF models. The results demonstrate that the proposed information filtering system outperforms significantly the other IF systems, such as the traditional Rocchio IF model, the state-of-the-art term-based models, including the BM25, Support Vector Machines (SVM), and Pattern Taxonomy Model (PTM).
Resumo:
The wide range of contributing factors and circumstances surrounding crashes on road curves suggest that no single intervention can prevent these crashes. This paper presents a novel methodology, based on data mining techniques, to identify contributing factors and the relationship between them. It identifies contributing factors that influence the risk of a crash. Incident records, described using free text, from a large insurance company were analysed with rough set theory. Rough set theory was used to discover dependencies among data, and reasons using the vague, uncertain and imprecise information that characterised the insurance dataset. The results show that male drivers, who are between 50 and 59 years old, driving during evening peak hours are involved with a collision, had a lowest crash risk. Drivers between 25 and 29 years old, driving from around midnight to 6 am and in a new car has the highest risk. The analysis of the most significant contributing factors on curves suggests that drivers with driving experience of 25 to 42 years, who are driving a new vehicle have the highest crash cost risk, characterised by the vehicle running off the road and hitting a tree. This research complements existing statistically based tools approach to analyse road crashes. Our data mining approach is supported with proven theory and will allow road safety practitioners to effectively understand the dependencies between contributing factors and the crash type with the view to designing tailored countermeasures.
Resumo:
Background: The seasonality of suicide has long been recognised. However, little is known about the relative importance of socio-environmental factors in the occurrence of suicide in different geographical areas. This study examined the association of climate, socioeconomic and demographic factors with suicide in Queensland, Australia, using a spatiotemporal approach. Methods: Seasonal data on suicide, demographic variables and socioeconomic indexes for areas in each Local Government Area (LGA) between 1999 and 2003 were acquired from the Australian Bureau of Statistics. Climate data were supplied by the Australian Bureau of Meteorology. A multivariable generalized estimating equation model was used to examine the impact of socio-environmental factors on suicide. Results: The preliminary data analyses show that far north Queensland had the highest suicide incidence (e.g., Cook and Mornington Shires), while the south-western areas had the lowest incidence (e.g., Barcoo and Bauhinia Shires) in all the seasons. Maximum temperature, unemployment rate, the proportion of Indigenous population and the proportion of population with low individual income were statistically significantly and positively associated with suicide. There were weaker but not significant associations for other variables. Conclusions: Maximum temperature, the proportion of Indigenous population and unemployment rate appeared to be major determinants of suicide at a LGA level in Queensland.
Resumo:
This paper proposes a novel Hybrid Clustering approach for XML documents (HCX) that first determines the structural similarity in the form of frequent subtrees and then uses these frequent subtrees to represent the constrained content of the XML documents in order to determine the content similarity. The empirical analysis reveals that the proposed method is scalable and accurate.
Resumo:
XML document clustering is essential for many document handling applications such as information storage, retrieval, integration and transformation. An XML clustering algorithm should process both the structural and the content information of XML documents in order to improve the accuracy and meaning of the clustering solution. However, the inclusion of both kinds of information in the clustering process results in a huge overhead for the underlying clustering algorithm because of the high dimensionality of the data. This paper introduces a novel approach that first determines the structural similarity in the form of frequent subtrees and then uses these frequent subtrees to represent the constrained content of the XML documents in order to determine the content similarity. The proposed method reduces the high dimensionality of input data by using only the structure-constrained content. The empirical analysis reveals that the proposed method can effectively cluster even very large XML datasets and outperform other existing methods.
Resumo:
Despite all attempts to prevent fraud, it continues to be a major threat to industry and government. Traditionally, organizations have focused on fraud prevention rather than detection, to combat fraud. In this paper we present a role mining inspired approach to represent user behaviour in Enterprise Resource Planning (ERP) systems, primarily aimed at detecting opportunities to commit fraud or potentially suspicious activities. We have adapted an approach which uses set theory to create transaction profiles based on analysis of user activity records. Based on these transaction profiles, we propose a set of (1) anomaly types to detect potentially suspicious user behaviour and (2) scenarios to identify inadequate segregation of duties in an ERP environment. In addition, we present two algorithms to construct a directed acyclic graph to represent relationships between transaction profiles. Experiments were conducted using a real dataset obtained from a teaching environment and a demonstration dataset, both using SAP R/3, presently the most predominant ERP system. The results of this empirical research demonstrate the effectiveness of the proposed approach.
Resumo:
Recommender systems are widely used online to help users find other products, items etc that they may be interested in based on what is known about that user in their profile. Often however user profiles may be short on information and thus when there is not sufficient knowledge on a user it is difficult for a recommender system to make quality recommendations. This problem is often referred to as the cold-start problem. Here we investigate whether association rules can be used as a source of information to expand a user profile and thus avoid this problem, leading to improved recommendations to users. Our pilot study shows that indeed it is possible to use association rules to improve the performance of a recommender system. This we believe can lead to further work in utilising appropriate association rules to lessen the impact of the cold-start problem.
Resumo:
Two areas of particular importance in prostate cancer progression are primary tumour development and metastasis. These processes involve a number of physiological events, the mediators of which are still being discovered and characterised. Serine proteases have been shown to play a major role in cancer invasion and metastasis. The recently discovered phenomenon of their activation of a receptor family known as the protease activated receptors (PARs) has extended their physiological role to that of signaling molecule. Several serine proteases are expressed by malignant prostate cancer cells, including members of the kallikreinrelated peptidase (KLK) serine protease family, and increasingly these are being shown to be associated with prostate cancer progression. KLK4 is highly expressed in the prostate and expression levels increase during prostate cancer progression. Critically, recent studies have implicated KLK4 in processes associated with cancer. For example, the ectopic over-expression of KLK4 in prostate cancer cell lines results in an increased ability of these cells to form colonies, proliferate and migrate. In addition, it has been demonstrated that KLK4 is a potential mediator of cellular interactions between prostate cancer cells and osteoblasts (bone forming cells). The ability of KLK4 to influence cellular behaviour is believed to be through the selective cleavage of specific substrates. Identification of relevant in vivo substrates of KLK4 is critical to understanding the pathophysiological roles of this enzyme. Significantly, recent reports have demonstrated that several members of the KLK family are able to activate PARs. The PARs are relatively new members of the seven transmembrane domain containing G protein coupled receptor (GPCR) family. PARs are activated through proteolytic cleavage of their N-terminus by serine proteases, the resulting nascent N-terminal binds intramolecularly to initiate receptor activation. PARs are involved in a number of patho-physiological processes, including vascular repair and inflammation, and a growing body of evidence suggests roles in cancer. While expression of PAR family members has been documented in several types of cancers, including prostate, the role of these GPCRs in prostate cancer development and progression is yet to be examined. Interestingly, several studies have suggested potential roles in cellular invasion through the induction of cytoskeletal reorganisation and expression of basement membrane-degrading enzymes. Accordingly, this program of research focussed on the activation of the PARs by the prostate cancer associated enzyme KLK4, cellular processing of activated PARs and the expression pattern of receptor and agonist in prostate cancer. For these studies KLK4 was purified from the conditioned media of stably transfected Sf9 insect cells expressing a construct containing the complete human KLK4 coding sequence in frame with a V5 epitope and poly-histidine encoding sequences. The first aspect of this study was the further characterisation of this recombinant zymogen form of KLK4. The recombinant KLK4 zymogen was demonstrated to be activatable by the metalloendopeptidase thermolysin and amino terminal sequencing indicated that thermolysin activated KLK4 had the predicted N-terminus of mature active KLK4 (31IINED). Critically, removal of the pro-region successfully generated a catalytically active enzyme, with comparable activity to a previously published recombinant KLK4 produced from S2 insect cells. The second aspect of this study was the activation of the PARs by KLK4 and the initiation of signal transduction. This study demonstrated that KLK4 can activate PAR-1 and PAR-2 to mobilise intracellular Ca2+, but failed to activate PAR-4. Further, KLK4 activated PAR-1 and PAR-2 over distinct concentration ranges, with KLK4 activation and mobilisation of Ca2+ demonstrating higher efficacy through PAR-2. Thus, the remainder of this study focussed on PAR-2. KLK4 was demonstrated to directly cleave a synthetic peptide that mimicked the PAR-2 Nterminal activation sequence. Further, KLK4 mediated Ca2+ mobilisation through PAR-2 was accompanied by the initiation of the extra-cellular regulated kinase (ERK) cascade. The specificity of intracellular signaling mediated through PAR-2 by KLK4 activation was demonstrated by siRNA mediated protein depletion, with a reduction in PAR-2 protein levels correlating to a reduction in KLK4 mediated Ca2+mobilisation and ERK phosphorylation. The third aspect of this study examined cellular processing of KLK4 activated PAR- 2 in a prostate cancer cell line. PAR-2 was demonstrated to be expressed by five prostate derived cell lines including the prostate cancer cell line PC-3. It was also demonstrated by flow cytometry and confocal microscopy analyses that activation of PC-3 cell surface PAR-2 by KLK4 leads to internalisation of this receptor in a time dependent manner. Critically, in vivo relevance of the interaction between KLK4 and PAR-2 was established by the observation of the co-expression of receptor and agonist in primary prostate cancer and prostate cancer bone lesion samples by immunohistochemical analysis. Based on the results of this study a number of exciting future studies have been proposed, including, delineating differences in KLK4 cellular signaling via PAR-1 and PAR-2 and the role of PAR-1 and PAR-2 activation by KLK4 in prostate cancer cells and bone cells in prostate cancer progression.
Resumo:
Traffic safety is a major concern world-wide. It is in both the sociological and economic interests of society that attempts should be made to identify the major and multiple contributory factors to those road crashes. This paper presents a text mining based method to better understand the contextual relationships inherent in road crashes. By examining and analyzing the crash report data in Queensland from year 2004 and year 2005, this paper identifies and reports the major and multiple contributory factors to those crashes. The outcome of this study will support road asset management in reducing road crashes.