800 resultados para information bottleneck method
Resumo:
Grouping users in social networks is an important process that improves matching and recommendation activities in social networks. The data mining methods of clustering can be used in grouping the users in social networks. However, the existing general purpose clustering algorithms perform poorly on the social network data due to the special nature of users' data in social networks. One main reason is the constraints that need to be considered in grouping users in social networks. Another reason is the need of capturing large amount of information about users which imposes computational complexity to an algorithm. In this paper, we propose a scalable and effective constraint-based clustering algorithm based on a global similarity measure that takes into consideration the users' constraints and their importance in social networks. Each constraint's importance is calculated based on the occurrence of this constraint in the dataset. Performance of the algorithm is demonstrated on a dataset obtained from an online dating website using internal and external evaluation measures. Results show that the proposed algorithm is able to increases the accuracy of matching users in social networks by 10% in comparison to other algorithms.
Resumo:
Knowledge has been widely recognised as a determinant of business performance. Business capabilities require an effective share of resource and knowledge. Specifically, knowledge sharing (KS) between different companies and departments can improve manufacturing processes since intangible knowledge plays an enssential role in achieving competitive advantage. This paper presents a mixed method research study into the impact of KS on the effectiveness of new product development (NPD) in achieving desired business performance (BP). Firstly, an empirical study utilising moderated regression analysis was conducted to test whether and to what extent KS has leveraging power on the relationship between NPD and BP constructs and variables. Secondly, this empirically verified hypothesis was validated through explanatory case studies involving two Taiwanese manufacturing companies using a qualitative interaction term pattern matching technique. The study provides evidence that knowledge sharing and management activities are essential for deriving competitive advantage in the manufacturing industry.
Resumo:
Nowadays people heavily rely on the Internet for information and knowledge. Wikipedia is an online multilingual encyclopaedia that contains a very large number of detailed articles covering most written languages. It is often considered to be a treasury of human knowledge. It includes extensive hypertext links between documents of the same language for easy navigation. However, the pages in different languages are rarely cross-linked except for direct equivalent pages on the same subject in different languages. This could pose serious difficulties to users seeking information or knowledge from different lingual sources, or where there is no equivalent page in one language or another. In this thesis, a new information retrieval task—cross-lingual link discovery (CLLD) is proposed to tackle the problem of the lack of cross-lingual anchored links in a knowledge base such as Wikipedia. In contrast to traditional information retrieval tasks, cross language link discovery algorithms actively recommend a set of meaningful anchors in a source document and establish links to documents in an alternative language. In other words, cross-lingual link discovery is a way of automatically finding hypertext links between documents in different languages, which is particularly helpful for knowledge discovery in different language domains. This study is specifically focused on Chinese / English link discovery (C/ELD). Chinese / English link discovery is a special case of cross-lingual link discovery task. It involves tasks including natural language processing (NLP), cross-lingual information retrieval (CLIR) and cross-lingual link discovery. To justify the effectiveness of CLLD, a standard evaluation framework is also proposed. The evaluation framework includes topics, document collections, a gold standard dataset, evaluation metrics, and toolkits for run pooling, link assessment and system evaluation. With the evaluation framework, performance of CLLD approaches and systems can be quantified. This thesis contributes to the research on natural language processing and cross-lingual information retrieval in CLLD: 1) a new simple, but effective Chinese segmentation method, n-gram mutual information, is presented for determining the boundaries of Chinese text; 2) a voting mechanism of name entity translation is demonstrated for achieving a high precision of English / Chinese machine translation; 3) a link mining approach that mines the existing link structure for anchor probabilities achieves encouraging results in suggesting cross-lingual Chinese / English links in Wikipedia. This approach was examined in the experiments for better, automatic generation of cross-lingual links that were carried out as part of the study. The overall major contribution of this thesis is the provision of a standard evaluation framework for cross-lingual link discovery research. It is important in CLLD evaluation to have this framework which helps in benchmarking the performance of various CLLD systems and in identifying good CLLD realisation approaches. The evaluation methods and the evaluation framework described in this thesis have been utilised to quantify the system performance in the NTCIR-9 Crosslink task which is the first information retrieval track of this kind.
Resumo:
This paper presents a formal methodology for attack modeling and detection for networks. Our approach has three phases. First, we extend the basic attack tree approach 1 to capture (i) the temporal dependencies between components, and (ii) the expiration of an attack. Second, using the enhanced attack trees (EAT) we build a tree automaton that accepts a sequence of actions from input stream if there is a traverse of an attack tree from leaves to the root node. Finally, we show how to construct an enhanced parallel automaton (EPA) that has each tree automaton as a subroutine and can process the input stream by considering multiple trees simultaneously. As a case study, we show how to represent the attacks in IEEE 802.11 and construct an EPA for it.
Resumo:
Background Predicting protein subnuclear localization is a challenging problem. Some previous works based on non-sequence information including Gene Ontology annotations and kernel fusion have respective limitations. The aim of this work is twofold: one is to propose a novel individual feature extraction method; another is to develop an ensemble method to improve prediction performance using comprehensive information represented in the form of high dimensional feature vector obtained by 11 feature extraction methods. Methodology/Principal Findings A novel two-stage multiclass support vector machine is proposed to predict protein subnuclear localizations. It only considers those feature extraction methods based on amino acid classifications and physicochemical properties. In order to speed up our system, an automatic search method for the kernel parameter is used. The prediction performance of our method is evaluated on four datasets: Lei dataset, multi-localization dataset, SNL9 dataset and a new independent dataset. The overall accuracy of prediction for 6 localizations on Lei dataset is 75.2% and that for 9 localizations on SNL9 dataset is 72.1% in the leave-one-out cross validation, 71.7% for the multi-localization dataset and 69.8% for the new independent dataset, respectively. Comparisons with those existing methods show that our method performs better for both single-localization and multi-localization proteins and achieves more balanced sensitivities and specificities on large-size and small-size subcellular localizations. The overall accuracy improvements are 4.0% and 4.7% for single-localization proteins and 6.5% for multi-localization proteins. The reliability and stability of our classification model are further confirmed by permutation analysis. Conclusions It can be concluded that our method is effective and valuable for predicting protein subnuclear localizations. A web server has been designed to implement the proposed method. It is freely available at http://bioinformatics.awowshop.com/snlpred_page.php.
Resumo:
BACKGROUND There is increasing enrolment of international students in the Engineering and Information Technology disciplines and anecdotal evidence of a need for additional understanding and support for these students and their supervisors due to differences both in academic and social cultures. While there is a growing literature on supervisory styles and guidelines on effective supervision, there is little on discipline-specific, cross-cultural supervision responding to the growing diversity. In this paper, we report findings from a study of Engineering and Information technology Higher Degree Research (HDR)students and supervision in three Australian universities. PURPOSE The aim was to assess perceptions of students and supervisors of factors influencing success that are particular to international or culturally and linguistically diverse (CaLD) HDR students in Engineering and Information technology. DESIGN/METHOD Online survey and qualitative data was collected from international and CaLD HDR students and supervisors at the three universities. Bayesian network analysis, inferential statistics, and qualitative analysis provided the main findings. RESULTS Survey results indicate that both students and supervisors are positive about their experiences, and do not see language or culture as particularly problematic. The survey results also reveal strong consistency between the perceptions of students and supervisors on most factors influencing success. Qualitative analysis of critical supervision incidents has provided rich data that could help improve support services. CONCLUSIONS In contrast with anecdotal evidence, HDR completion data from the three universities reveal that international students, on average, complete in shorter time periods than domestic students. The analysis suggests that success is linked to a complex set of factors involving the student, supervision, the institution and broader community.
Resumo:
Recent empirical studies of gender discrimination point to the importance of accurately controlling for accumulated labour market experience. Unfortunately in Australia, most data sets do not include information on actual experience. The current paper using data from the National Social Science Survey 1984, examines the efficacy of imputing female labour market experience via the Zabalza and Arrufat (1985) method. The results suggest that the method provides a more accurate measure of experience than that provided by the traditional Mincer proxy. However, the imputation method is sensitive to the choice of identification restrictions. We suggest a novel alternative to a choice between arbitrary restrictions.
Resumo:
In this paper, we present an unsupervised graph cut based object segmentation method using 3D information provided by Structure from Motion (SFM), called Grab- CutSFM. Rather than focusing on the segmentation problem using a trained model or human intervention, our approach aims to achieve meaningful segmentation autonomously with direct application to vision based robotics. Generally, object (foreground) and background have certain discriminative geometric information in 3D space. By exploring the 3D information from multiple views, our proposed method can segment potential objects correctly and automatically compared to conventional unsupervised segmentation using only 2D visual cues. Experiments with real video data collected from indoor and outdoor environments verify the proposed approach.
Resumo:
Social media tools are often the result of innovations in Information Technology and developed by IT professionals and innovators. Nevertheless, IT professionals, many of whom are responsible for designing and building social media technologies, have not been investigated on how they themselves use or experience social media for professional purposes. This study will use Information Grounds Theory (Pettigrew, 1998) as a framework to study IT professionals’ experience in using social media for professional purposes. Information grounds facilitates the opportunistic discovery of information within social settings created temporarily at a place where people gather for a specific purpose (e.g., doctors’ waiting rooms, office tea rooms etc.), but the social atmosphere stimulates spontaneous sharing of information (Pettigrew, 1999). This study proposes that social media has the qualities that make it a rich information grounds; people participate from separate “places” in cyberspace in a synchronous manner in real-time, making it almost as dynamic and unplanned as physical information grounds. There is limited research on how social media platforms are perceived as a “place,” (a place to go to, a place to gather, or a place to be seen in) that is comparable to physical spaces. There is also no empirical study on how IT professionals use or “experience” social media. The data for this study is being collected through a study of IT professionals who currently use Twitter. A digital ethnography approach is being taken wherein the researcher uses online observations and “follows” the participants online and observes their behaviours and interactions on social media. Next, a sub-set of participants will be interviewed on their experiences with and within social media and how social media compares with traditional methods of information grounds, information communication, and collaborative environments. An Evolved Grounded Theory (Glaser, 1992) approach will be used to analyse tweets data and interviews and to map the findings against the Information Ground Theory. Findings from this study will provide foundational understanding of IT professionals’ experiences within social media, and can help both professionals and researchers understand this fast-evolving method of communications.
Resumo:
This project explores yarning as a methodology for understanding health and wellness from an indigenous woman's perspective. Previous research exploring indigenous Australian women's perspectives have used traditional Western methodologies and have often been felt by the women themselves to be inappropriate and ineffective in gathering information and promoting discussion. This research arose from the indigenous women themselves, and resulted in the exploration of using yarning as a methodology. Yarning is a conversational process that involves the sharing of stories and the development of knowledge. It prioritizes indigenous ways of communicating, in that it is culturally prescribed, cooperative, and respectful. The authors identify different types of yarning that are relevant throughout their research, and explain two types of yarning—family yarning and cross-cultural yarning—which have not been previously identified in research literature. This project found that yarning as a research method is appropriate for community-based health research with indigenous Australian women. This may be an important finding for health professionals and researchers to consider when working and researching with indigenous women from other countries.
Resumo:
This thesis takes a new data mining approach for analyzing road/crash data by developing models for the whole road network and generating a crash risk profile. Roads with an elevated crash risk due to road surface friction deficit are identified. The regression tree model, predicting road segment crash rate, is applied in a novel deployment coined regression tree extrapolation that produces a skid resistance/crash rate curve. Using extrapolation allows the method to be applied across the network and cope with the high proportion of missing road surface friction values. This risk profiling method can be applied in other domains.
Resumo:
Tags or personal metadata for annotating web resources have been widely adopted in Web 2.0 sites. However, as tags are freely chosen by users, the vocabularies are diverse, ambiguous and sometimes only meaningful to individuals. Tag recommenders may assist users during tagging process. Its objective is to suggest relevant tags to use as well as to help consolidating vocabulary in the systems. In this paper we discuss our approach for providing personalized tag recommendation by making use of existing domain ontology generated from folksonomy. Specifically we evaluated the approach in sparse situation. The evaluation shows that the proposed ontology-based method has improved the accuracy of tag recommendation in this situation.
Resumo:
Tag recommendation is a specific recommendation task for recommending metadata (tag) for a web resource (item) during user annotation process. In this context, sparsity problem refers to situation where tags need to be produced for items with few annotations or for user who tags few items. Most of the state of the art approaches in tag recommendation are rarely evaluated or perform poorly under this situation. This paper presents a combined method for mitigating sparsity problem in tag recommendation by mainly expanding and ranking candidate tags based on similar items’ tags and existing tag ontology. We evaluated the approach on two public social bookmarking datasets. The experiment results show better accuracy for recommendation in sparsity situation over several state of the art methods.
Resumo:
Effective management of chronic diseases is a global health priority. A healthcare information system offers opportunities to address challenges of chronic disease management. However, the requirements of health information systems are often not well understood. The accuracy of requirements has a direct impact on the successful design and implementation of a health information system. Our research describes methods used to understand the requirements of health information systems for advanced prostate cancer management. The research conducted a survey to identify heterogeneous sources of clinical records. Our research showed that the General Practitioner was the common source of patient's clinical records (41%) followed by the Urologist (14%) and other clinicians (14%). Our research describes a method to identify diverse data sources and proposes a novel patient journey browser prototype that integrates disparate data sources.
Resumo:
We propose a cluster ensemble method to map the corpus documents into the semantic space embedded in Wikipedia and group them using multiple types of feature space. A heterogeneous cluster ensemble is constructed with multiple types of relations i.e. document-term, document-concept and document-category. A final clustering solution is obtained by exploiting associations between document pairs and hubness of the documents. Empirical analysis with various real data sets reveals that the proposed meth-od outperforms state-of-the-art text clustering approaches.