930 resultados para selection methods


Relevância:

20.00% 20.00%

Publicador:

Resumo:

With the increasing number of XML documents in varied domains, it has become essential to identify ways of finding interesting information from these documents. Data mining techniques were used to derive this interesting information. Mining on XML documents is impacted by its model due to the semi-structured nature of these documents. Hence, in this chapter we present an overview of the various models of XML documents, how these models were used for mining and some of the issues and challenges in these models. In addition, this chapter also provides some insights into the future models of XML documents for effectively capturing the two important features namely structure and content of XML documents for mining.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes a new research method, Participatory Action Design Research (PADR), for studies in the Urban Informatics domain. PADR supports Urban Informatics research in developing new technological means (e.g. using mobile and ubiquitous computing) to resolve contemporary issues or support everyday life in urban environments. The paper discusses the nature, aims and inherent methodological needs of Urban Informatics research, and proposes PADR as a method to address these needs. Situated in a socio-technical context, Urban Informatics requires a close dialogue between social and design-oriented fields of research as well as their methods. PADR combines Action Research and Design Science Research, both of which are used in Information Systems, another field with a strong socio-technical emphasis, and further adapts them to the cross-disciplinary needs and research context of Urban Informatics.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Bioinformatics involves analyses of biological data such as DNA sequences, microarrays and protein-protein interaction (PPI) networks. Its two main objectives are the identification of genes or proteins and the prediction of their functions. Biological data often contain uncertain and imprecise information. Fuzzy theory provides useful tools to deal with this type of information, hence has played an important role in analyses of biological data. In this thesis, we aim to develop some new fuzzy techniques and apply them on DNA microarrays and PPI networks. We will focus on three problems: (1) clustering of microarrays; (2) identification of disease-associated genes in microarrays; and (3) identification of protein complexes in PPI networks. The first part of the thesis aims to detect, by the fuzzy C-means (FCM) method, clustering structures in DNA microarrays corrupted by noise. Because of the presence of noise, some clustering structures found in random data may not have any biological significance. In this part, we propose to combine the FCM with the empirical mode decomposition (EMD) for clustering microarray data. The purpose of EMD is to reduce, preferably to remove, the effect of noise, resulting in what is known as denoised data. We call this method the fuzzy C-means method with empirical mode decomposition (FCM-EMD). We applied this method on yeast and serum microarrays, and the silhouette values are used for assessment of the quality of clustering. The results indicate that the clustering structures of denoised data are more reasonable, implying that genes have tighter association with their clusters. Furthermore we found that the estimation of the fuzzy parameter m, which is a difficult step, can be avoided to some extent by analysing denoised microarray data. The second part aims to identify disease-associated genes from DNA microarray data which are generated under different conditions, e.g., patients and normal people. We developed a type-2 fuzzy membership (FM) function for identification of diseaseassociated genes. This approach is applied to diabetes and lung cancer data, and a comparison with the original FM test was carried out. Among the ten best-ranked genes of diabetes identified by the type-2 FM test, seven genes have been confirmed as diabetes-associated genes according to gene description information in Gene Bank and the published literature. An additional gene is further identified. Among the ten best-ranked genes identified in lung cancer data, seven are confirmed that they are associated with lung cancer or its treatment. The type-2 FM-d values are significantly different, which makes the identifications more convincing than the original FM test. The third part of the thesis aims to identify protein complexes in large interaction networks. Identification of protein complexes is crucial to understand the principles of cellular organisation and to predict protein functions. In this part, we proposed a novel method which combines the fuzzy clustering method and interaction probability to identify the overlapping and non-overlapping community structures in PPI networks, then to detect protein complexes in these sub-networks. Our method is based on both the fuzzy relation model and the graph model. We applied the method on several PPI networks and compared with a popular protein complex identification method, the clique percolation method. For the same data, we detected more protein complexes. We also applied our method on two social networks. The results showed our method works well for detecting sub-networks and give a reasonable understanding of these communities.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

It is a big challenge to guarantee the quality of discovered relevance features in text documents for describing user preferences because of the large number of terms, patterns, and noise. Most existing popular text mining and classification methods have adopted term-based approaches. However, they have all suffered from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern-based methods should perform better than term- based ones in describing user preferences, but many experiments do not support this hypothesis. This research presents a promising method, Relevance Feature Discovery (RFD), for solving this challenging issue. It discovers both positive and negative patterns in text documents as high-level features in order to accurately weight low-level features (terms) based on their specificity and their distributions in the high-level features. The thesis also introduces an adaptive model (called ARFD) to enhance the exibility of using RFD in adaptive environment. ARFD automatically updates the system's knowledge based on a sliding window over new incoming feedback documents. It can efficiently decide which incoming documents can bring in new knowledge into the system. Substantial experiments using the proposed models on Reuters Corpus Volume 1 and TREC topics show that the proposed models significantly outperform both the state-of-the-art term-based methods underpinned by Okapi BM25, Rocchio or Support Vector Machine and other pattern-based methods.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

With the growing number of XML documents on theWeb it becomes essential to effectively organise these XML documents in order to retrieve useful information from them. A possible solution is to apply clustering on the XML documents to discover knowledge that promotes effective data management, information retrieval and query processing. However, many issues arise in discovering knowledge from these types of semi-structured documents due to their heterogeneity and structural irregularity. Most of the existing research on clustering techniques focuses only on one feature of the XML documents, this being either their structure or their content due to scalability and complexity problems. The knowledge gained in the form of clusters based on the structure or the content is not suitable for reallife datasets. It therefore becomes essential to include both the structure and content of XML documents in order to improve the accuracy and meaning of the clustering solution. However, the inclusion of both these kinds of information in the clustering process results in a huge overhead for the underlying clustering algorithm because of the high dimensionality of the data. The overall objective of this thesis is to address these issues by: (1) proposing methods to utilise frequent pattern mining techniques to reduce the dimension; (2) developing models to effectively combine the structure and content of XML documents; and (3) utilising the proposed models in clustering. This research first determines the structural similarity in the form of frequent subtrees and then uses these frequent subtrees to represent the constrained content of the XML documents in order to determine the content similarity. A clustering framework with two types of models, implicit and explicit, is developed. The implicit model uses a Vector Space Model (VSM) to combine the structure and the content information. The explicit model uses a higher order model, namely a 3- order Tensor Space Model (TSM), to explicitly combine the structure and the content information. This thesis also proposes a novel incremental technique to decompose largesized tensor models to utilise the decomposed solution for clustering the XML documents. The proposed framework and its components were extensively evaluated on several real-life datasets exhibiting extreme characteristics to understand the usefulness of the proposed framework in real-life situations. Additionally, this research evaluates the outcome of the clustering process on the collection selection problem in the information retrieval on the Wikipedia dataset. The experimental results demonstrate that the proposed frequent pattern mining and clustering methods outperform the related state-of-the-art approaches. In particular, the proposed framework of utilising frequent structures for constraining the content shows an improvement in accuracy over content-only and structure-only clustering results. The scalability evaluation experiments conducted on large scaled datasets clearly show the strengths of the proposed methods over state-of-the-art methods. In particular, this thesis work contributes to effectively combining the structure and the content of XML documents for clustering, in order to improve the accuracy of the clustering solution. In addition, it also contributes by addressing the research gaps in frequent pattern mining to generate efficient and concise frequent subtrees with various node relationships that could be used in clustering.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Backgrounds Whether suicide in China has significant seasonal variations is unclear. The aim of this study is to examine the seasonality of suicide in Shandong China and to assess the associations of suicide seasonality with gender, residence, age and methods of suicide. Methods Three types of tests (Chi-square, Edwards' T and Roger's Log method) were used to detect the seasonality of the suicide data extracted from the official mortality data of Shandong Disease Surveillance Point (DSP) system. Peak/low ratios (PLRs) and 95% confidence intervals (CIs) were calculated to indicate the magnitude of seasonality. Results A statistically significant seasonality with a single peak in suicide rates in spring and early summer, and a dip in winter was observed, which remained relatively consistent over years. Regardless of gender, suicide seasonality was more pronounced in rural areas, younger age groups and for non-violent methods, in particular, self-poisoning by pesticide. Conclusions There are statistically significant seasonal variations of completed suicide for both men and women in Shandong, China. Differences exist between residence (urban/rural), age groups and suicide methods. Results appear to support a sociological explanation of suicide seasonality.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This study aimed to examine the effects on driving, usability and subjective workload of performing music selection tasks using a touch screen interface. Additionally, to explore whether the provision of visual and/or auditory feedback offers any performance and usability benefits. Thirty participants performed music selection tasks with a touch screen interface while driving. The interface provided four forms of feedback: no feedback, auditory feedback, visual feedback, and a combination of auditory and visual feedback. Performance on the music selection tasks significantly increased subjective workload and degraded performance on a range of driving measures including lane keeping variation and number of lane excursions. The provision of any form of feedback on the touch screen interface did not significantly affect driving performance, usability or subjective workload, but was preferred by users over no feedback. Overall, the results suggest that touch screens may not be a suitable input device for navigating scrollable lists.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The widespread development of Decision Support System (DSS) in construction indicate that the evaluation of software become more important than before. However, it is identified that most research in construction discipline did not attempt to assess its usability. Therefore, little is known about the approach on how to properly evaluate a DSS for specific problem. In this paper, we present a practical framework that can be guidance for DSS evaluation. It focuses on how to evaluate software that is dedicatedly designed for consultant selection problem. The framework features two main components i.e. Sub-system Validation and Face Validation. Two case studies of consultant selection at Malaysian Department of Irrigation and Drainage were integrated in this framework. Some inter-disciplinary area such as Software Engineering, Human Computer Interaction (HCI) and Construction Project Management underpinned the discussion of the paper. It is anticipated that this work can foster better DSS development and quality decision making that accurately meet the client’s expectation and needs

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Selecting an appropriate design-builder is critical to the success of DB projects. The objective of this study is to identify selection criteria for design-builders and compare their relative importance by means of a robust content analysis of 94 Request For Proposals (RFPs) for public DB projects. These DB projects had an aggregate contract value of over US$3.5 billion and were advertised between 2000 and 2010. This study summarized twenty-six selection criteria and classified into ten categories, i.e.: price, experience, technical approach, management approach, qualification, schedule, past performance, financial capability, responsiveness to the RFP, and legal status in descending order of their relative importance. The results showed that even though price still remains as the most important selection category, its relative importance declines significantly in the last decade. The categories of qualification, experience, past performance, by contrast, have been becoming more important to DB owners for selecting design-builders. Finally, it is found that the importance weighting of price in large projects is significantly higher than that in small projects. This study provides a useful reference for owners in selecting their preferred design-builders.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The design-build (DB) system has been demonstrated as an effective delivery method and has gained popularity worldwide. However it is observed that a number of operational variations of DB system have emerged since the last decade to cater for different client’s requirements. After the client decides to procure his project through the DB system, he still has to choose an appropriate configuration to deliver their projects optimally. However, there is little research on the selection of DB operational variations. One of the main reasons for this is the lack of evaluation criteria for determining the appropriateness of each operational variation. To obtain such criteria, a three-round Delphi survey has been conducted with 20 construction experts in the People’s Republic of China (PRC). Seven top selection criteria were identified. These are: (1) availability of competent design-builders; (2) client’s capabilities; (3) project complexity; (4) client’s control of project; (5) early commencement & short duration; (6) reduced responsibility or involvement; and (7) clearly defined end user’s requirements. These selection criteria were found to have a statistically significant agreement. These findings may furnish various stakeholders, DB clients in particular, with better insight to understand and compare the different operational variations of the DB system.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we describe the main processes and operations in mining industries and present a comprehensive survey of operations research methodologies that have been applied over the last several decades. The literature review is classified into four main categories: mine design; mine production; mine transportation; and mine evaluation. Mining design models are further separated according to two main mining methods: open-pit and underground. Moreover, mine production models are subcategorised into two groups: ore mining and coal mining. Mine transportation models are further partitioned in accordance with fleet management, truck haulage and train scheduling. Mine evaluation models are further subdivided into four clusters in terms of mining method selection, quality control, financial risks and environmental protection. The main characteristics of four Australian commercial mining software are addressed and compared. This paper bridges the gaps in the literature and motivates researchers to develop more applicable, realistic and comprehensive operations research models and solution techniques that are directly linked with mining industries.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Research Interests: Are parents complying with the legislation? Is this the same for urban, regional and rural parents? Indigenous parents? What difficulties do parents experience in complying? Do parents understand why the legislation was put in place? Have there been negative consequences for other organisations or sectors of the community?

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recent studies have started to explore context-awareness as a driver in the design of adaptable business processes. The emerging challenge of identifying and considering contextual drivers in the environment of a business process are well understood, however, typical methods used in business process modeling do not yet consider this additional contextual information in their process designs. In this chapter, we describe our research towards innovative and advanced process modeling methods that include mechanisms to incorporate relevant contextual drivers and their impacts on business processes in process design models. We report on our ongoing work with an Australian insurance provider and describe the design science we employed to develop these innovative and useful artifacts as part of a context-aware method framework. We discuss the utility of these artifacts in an application in the claims handling process at the case organization.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Feature extraction and selection are critical processes in developing facial expression recognition (FER) systems. While many algorithms have been proposed for these processes, direct comparison between texture, geometry and their fusion, as well as between multiple selection algorithms has not been found for spontaneous FER. This paper addresses this issue by proposing a unified framework for a comparative study on the widely used texture (LBP, Gabor and SIFT) and geometric (FAP) features, using Adaboost, mRMR and SVM feature selection algorithms. Our experiments on the Feedtum and NVIE databases demonstrate the benefits of fusing geometric and texture features, where SIFT+FAP shows the best performance, while mRMR outperforms Adaboost and SVM. In terms of computational time, LBP and Gabor perform better than SIFT. The optimal combination of SIFT+FAP+mRMR also exhibits a state-of-the-art performance.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper reviews the current state in the application of infrared methods, particularly mid-infrared (mid-IR) and near infrared (NIR), for the evaluation of the structural and functional integrity of articular cartilage. It is noted that while a considerable amount of research has been conducted with respect to tissue characterization using mid-IR, it is almost certain that full-thickness cartilage assessment is not feasible with this method. On the contrary, the relatively more considerable penetration capacity of NIR suggests that it is a suitable candidate for full-thickness cartilage evaluation. Nevertheless, significant research is still required to improve the specificity and clinical applicability of the method if we are going to be able to use it for distinguishing between functional and dysfunctional cartilage.