984 resultados para Web documents


Relevância:

60.00% 60.00%

Publicador:

Resumo:

For many clustering algorithms, such as K-Means, EM, and CLOPE, there is usually a requirement to set some parameters. Often, these parameters directly or indirectly control the number of clusters, that is, k, to return. In the presence of different data characteristics and analysis contexts, it is often difficult for the user to estimate the number of clusters in the data set. This is especially true in text collections such as Web documents, images, or biological data. In an effort to improve the effectiveness of clustering, we seek the answer to a fundamental question: How can we effectively estimate the number of clusters in a given data set? We propose an efficient method based on spectra analysis of eigenvalues (not eigenvectors) of the data set as the solution to the above. We first present the relationship between a data set and its underlying spectra with theoretical and experimental results. We then show how our method is capable of suggesting a range of k that is well suited to different analysis contexts. Finally, we conclude with further  empirical results to show how the answer to this fundamental question enhances the clustering process for large text collections.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

For many clustering algorithms, such as k-means, EM, and CLOPE, there is usually a requirement to set some parameters. Often, these parameters directly or indirectly control the number of clusters to return. In the presence of different data characteristics and analysis contexts, it is often difficult for the user to estimate the number of clusters in the data set. This is especially true in text collections such as Web documents, images or biological data. The fundamental question this paper addresses is: ldquoHow can we effectively estimate the natural number of clusters in a given text collection?rdquo. We propose to use spectral analysis, which analyzes the eigenvalues (not eigenvectors) of the collection, as the solution to the above. We first present the relationship between a text collection and its underlying spectra. We then show how the answer to this question enhances the clustering process. Finally, we conclude with empirical results and related work.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Graph-structured databases are widely prevalent, and the problem of effective search and retrieval from such graphs has been receiving much attention recently. For example, the Web can be naturally viewed as a graph. Likewise, a relational database can be viewed as a graph where tuples are modeled as vertices connected via foreign-key relationships. Keyword search querying has emerged as one of the most effective paradigms for information discovery, especially over HTML documents in the World Wide Web. One of the key advantages of keyword search querying is its simplicity—users do not have to learn a complex query language, and can issue queries without any prior knowledge about the structure of the underlying data. The purpose of this dissertation was to develop techniques for user-friendly, high quality and efficient searching of graph structured databases. Several ranked search methods on data graphs have been studied in the recent years. Given a top-k keyword search query on a graph and some ranking criteria, a keyword proximity search finds the top-k answers where each answer is a substructure of the graph containing all query keywords, which illustrates the relationship between the keyword present in the graph. We applied keyword proximity search on the web and the page graph of web documents to find top-k answers that satisfy user’s information need and increase user satisfaction. Another effective ranking mechanism applied on data graphs is the authority flow based ranking mechanism. Given a top- k keyword search query on a graph, an authority-flow based search finds the top-k answers where each answer is a node in the graph ranked according to its relevance and importance to the query. We developed techniques that improved the authority flow based search on data graphs by creating a framework to explain and reformulate them taking in to consideration user preferences and feedback. We also applied the proposed graph search techniques for Information Discovery over biological databases. Our algorithms were experimentally evaluated for performance and quality. The quality of our method was compared to current approaches by using user surveys.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Dans cette thèse, nous présentons les problèmes d’échange de documents d'affaires et proposons une méthode pour y remédier. Nous proposons une méthodologie pour adapter les standards d’affaires basés sur XML aux technologies du Web sémantique en utilisant la transformation des documents définis en DTD ou XML Schema vers une représentation ontologique en OWL 2. Ensuite, nous proposons une approche basée sur l'analyse formelle de concept pour regrouper les classes de l'ontologie partageant une certaine sémantique dans le but d'améliorer la qualité, la lisibilité et la représentation de l'ontologie. Enfin, nous proposons l’alignement d'ontologies pour déterminer les liens sémantiques entre les ontologies d'affaires hétérogènes générés par le processus de transformation pour aider les entreprises à communiquer fructueusement.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Interlinking text documents with Linked Open Data enables the Web of Data to be used as background knowledge within document-oriented applications such as search and faceted browsing. As a step towards interconnecting the Web of Documents with the Web of Data, we developed DBpedia Spotlight, a system for automatically annotating text documents with DBpedia URIs. DBpedia Spotlight allows users to congure the annotations to their specic needs through the DBpedia Ontology and quality measures such as prominence, topical pertinence, contextual ambiguity and disambiguation condence. We compare our approach with the state of the art in disambiguation, and evaluate our results in light of three baselines and six publicly available annotation systems, demonstrating the competitiveness of our system. DBpedia Spotlight is shared as open source and deployed as a Web Service freely available for public use.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Interlinking text documents with Linked Open Data enables the Web of Data to be used as background knowledge within document-oriented applications such as search and faceted browsing. As a step towards interconnecting the Web of Documents with the Web of Data, we developed DBpedia Spotlight, a system for automatically annotating text documents with DBpedia URIs. DBpedia Spotlight allows users to configure the annotations to their specific needs through the DBpedia Ontology and quality measures such as prominence, topical pertinence, contextual ambiguity and disambiguation confidence. We compare our approach with the state of the art in disambiguation, and evaluate our results in light of three baselines and six publicly available annotation systems, demonstrating the competitiveness of our system. DBpedia Spotlight is shared as open source and deployed as a Web Service freely available for public use.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

With the advent of Service Oriented Architecture, Web Services have gained tremendous popularity. Due to the availability of a large number of Web services, finding an appropriate Web service according to the requirement of the user is a challenge. This warrants the need to establish an effective and reliable process of Web service discovery. A considerable body of research has emerged to develop methods to improve the accuracy of Web service discovery to match the best service. The process of Web service discovery results in suggesting many individual services that partially fulfil the user’s interest. By considering the semantic relationships of words used in describing the services as well as the use of input and output parameters can lead to accurate Web service discovery. Appropriate linking of individual matched services should fully satisfy the requirements which the user is looking for. This research proposes to integrate a semantic model and a data mining technique to enhance the accuracy of Web service discovery. A novel three-phase Web service discovery methodology has been proposed. The first phase performs match-making to find semantically similar Web services for a user query. In order to perform semantic analysis on the content present in the Web service description language document, the support-based latent semantic kernel is constructed using an innovative concept of binning and merging on the large quantity of text documents covering diverse areas of domain of knowledge. The use of a generic latent semantic kernel constructed with a large number of terms helps to find the hidden meaning of the query terms which otherwise could not be found. Sometimes a single Web service is unable to fully satisfy the requirement of the user. In such cases, a composition of multiple inter-related Web services is presented to the user. The task of checking the possibility of linking multiple Web services is done in the second phase. Once the feasibility of linking Web services is checked, the objective is to provide the user with the best composition of Web services. In the link analysis phase, the Web services are modelled as nodes of a graph and an allpair shortest-path algorithm is applied to find the optimum path at the minimum cost for traversal. The third phase which is the system integration, integrates the results from the preceding two phases by using an original fusion algorithm in the fusion engine. Finally, the recommendation engine which is an integral part of the system integration phase makes the final recommendations including individual and composite Web services to the user. In order to evaluate the performance of the proposed method, extensive experimentation has been performed. Results of the proposed support-based semantic kernel method of Web service discovery are compared with the results of the standard keyword-based information-retrieval method and a clustering-based machine-learning method of Web service discovery. The proposed method outperforms both information-retrieval and machine-learning based methods. Experimental results and statistical analysis also show that the best Web services compositions are obtained by considering 10 to 15 Web services that are found in phase-I for linking. Empirical results also ascertain that the fusion engine boosts the accuracy of Web service discovery by combining the inputs from both the semantic analysis (phase-I) and the link analysis (phase-II) in a systematic fashion. Overall, the accuracy of Web service discovery with the proposed method shows a significant improvement over traditional discovery methods.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We argue that web service discovery technology should help the user navigate a complex problem space by providing suggestions for services which they may not be able to formulate themselves as (s)he lacks the epistemic resources to do so. Free text documents in service environments provide an untapped source of information for augmenting the epistemic state of the user and hence their ability to search effectively for services. A quantitative approach to semantic knowledge representation is adopted in the form of semantic space models computed from these free text documents. Knowledge of the user’s agenda is promoted by associational inferences computed from the semantic space. The inferences are suggestive and aim to promote human abductive reasoning to guide the user from fuzzy search goals into a better understanding of the problem space surrounding the given agenda. Experimental results are discussed based on a complex and realistic planning activity.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

With the size and state of the Internet today, a good quality approach to organizing this mass of information is of great importance. Clustering web pages into groups of similar documents is one approach, but relies heavily on good feature extraction and document representation as well as a good clustering approach and algorithm. Due to the changing nature of the Internet, resulting in a dynamic dataset, an incremental approach is preferred. In this work we propose an enhanced incremental clustering approach to develop a better clustering algorithm that can help to better organize the information available on the Internet in an incremental fashion. Experiments show that the enhanced algorithm outperforms the original histogram based algorithm by up to 7.5%.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The inquiry documented in this thesis is located at the nexus of technological innovation and traditional schooling. As we enter the second decade of a new century, few would argue against the increasingly urgent need to integrate digital literacies with traditional academic knowledge. Yet, despite substantial investments from governments and businesses, the adoption and diffusion of contemporary digital tools in formal schooling remain sluggish. To date, research on technology adoption in schools tends to take a deficit perspective of schools and teachers, with the lack of resources and teacher ‘technophobia’ most commonly cited as barriers to digital uptake. Corresponding interventions that focus on increasing funding and upskilling teachers, however, have made little difference to adoption trends in the last decade. Empirical evidence that explicates the cultural and pedagogical complexities of innovation diffusion within long-established conventions of mainstream schooling, particularly from the standpoint of students, is wanting. To address this knowledge gap, this thesis inquires into how students evaluate and account for the constraints and affordances of contemporary digital tools when they engage with them as part of their conventional schooling. It documents the attempted integration of a student-led Web 2.0 learning initiative, known as the Student Media Centre (SMC), into the schooling practices of a long-established, high-performing independent senior boys’ school in urban Australia. The study employed an ‘explanatory’ two-phase research design (Creswell, 2003) that combined complementary quantitative and qualitative methods to achieve both breadth of measurement and richness of characterisation. In the initial quantitative phase, a self-reported questionnaire was administered to the senior school student population to determine adoption trends and predictors of SMC usage (N=481). Measurement constructs included individual learning dispositions (learning and performance goals, cognitive playfulness and personal innovativeness), as well as social and technological variables (peer support, perceived usefulness and ease of use). Incremental predictive models of SMC usage were conducted using Classification and Regression Tree (CART) modelling: (i) individual-level predictors, (ii) individual and social predictors, and (iii) individual, social and technological predictors. Peer support emerged as the best predictor of SMC usage. Other salient predictors include perceived ease of use and usefulness, cognitive playfulness and learning goals. On the whole, an overwhelming proportion of students reported low usage levels, low perceived usefulness and a lack of peer support for engaging with the digital learning initiative. The small minority of frequent users reported having high levels of peer support and robust learning goal orientations, rather than being predominantly driven by performance goals. These findings indicate that tensions around social validation, digital learning and academic performance pressures influence students’ engagement with the Web 2.0 learning initiative. The qualitative phase that followed provided insights into these tensions by shifting the analytics from individual attitudes and behaviours to shared social and cultural reasoning practices that explain students’ engagement with the innovation. Six indepth focus groups, comprising 60 students with different levels of SMC usage, were conducted, audio-recorded and transcribed. Textual data were analysed using Membership Categorisation Analysis. Students’ accounts converged around a key proposition. The Web 2.0 learning initiative was useful-in-principle but useless-in-practice. While students endorsed the usefulness of the SMC for enhancing multimodal engagement, extending peer-topeer networks and acquiring real-world skills, they also called attention to a number of constraints that obfuscated the realisation of these design affordances in practice. These constraints were cast in terms of three binary formulations of social and cultural imperatives at play within the school: (i) ‘cool/uncool’, (ii) ‘dominant staff/compliant student’, and (iii) ‘digital learning/academic performance’. The first formulation foregrounds the social stigma of the SMC among peers and its resultant lack of positive network benefits. The second relates to students’ perception of the school culture as authoritarian and punitive with adverse effects on the very student agency required to drive the innovation. The third points to academic performance pressures in a crowded curriculum with tight timelines. Taken together, findings from both phases of the study provide the following key insights. First, students endorsed the learning affordances of contemporary digital tools such as the SMC for enhancing their current schooling practices. For the majority of students, however, these learning affordances were overshadowed by the performative demands of schooling, both social and academic. The student participants saw engagement with the SMC in-school as distinct from, even oppositional to, the conventional social and academic performance indicators of schooling, namely (i) being ‘cool’ (or at least ‘not uncool’), (ii) sufficiently ‘compliant’, and (iii) achieving good academic grades. Their reasoned response therefore, was simply to resist engagement with the digital learning innovation. Second, a small minority of students seemed dispositionally inclined to negotiate the learning affordances and performance constraints of digital learning and traditional schooling more effectively than others. These students were able to engage more frequently and meaningfully with the SMC in school. Their ability to adapt and traverse seemingly incommensurate social and institutional identities and norms is theorised as cultural agility – a dispositional construct that comprises personal innovativeness, cognitive playfulness and learning goals orientation. The logic then is ‘both and’ rather than ‘either or’ for these individuals with a capacity to accommodate both learning and performance in school, whether in terms of digital engagement and academic excellence, or successful brokerage across multiple social identities and institutional affiliations within the school. In sum, this study takes us beyond the familiar terrain of deficit discourses that tend to blame institutional conservatism, lack of resourcing and teacher resistance for low uptake of digital technologies in schools. It does so by providing an empirical base for the development of a ‘third way’ of theorising technological and pedagogical innovation in schools, one which is more informed by students as critical stakeholders and thus more relevant to the lived culture within the school, and its complex relationship to students’ lives outside of school. It is in this relationship that we find an explanation for how these individuals can, at the one time, be digital kids and analogue students.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Interactive documents for use with the World Wide Web have been developed for viewing multi-dimensional radiographic and visual images of human anatomy, derived from the Visible Human Project. Emphasis has been placed on user-controlled features and selections. The purpose was to develop an interface which was independent of host operating system and browser software which would allow viewing of information by multiple users. The interfaces were implemented using HyperText Markup Language (HTML) forms, C programming language and Perl scripting language. Images were pre-processed using ANALYZE and stored on a Web server in CompuServe GIF format. Viewing options were included in the document design, such as interactive thresholding and two-dimensional slice direction. The interface is an example of what may be achieved using the World Wide Web. Key applications envisaged for such software include education, research and accessing of information through internal databases and simultaneous sharing of images by remote computers by health personnel for diagnostic purposes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

An often neglected but well recognised aspect of successful engineering asset management is the achievement of co-operation and collaboration between various occupational, functional and hierarchical levels present within complex technical environments. Engineering and technical contexts have been well documented for the presence of highly cohesive groups based around around functional or role orientations. However while highly cohesive groups are potentially advantageous they are also often correlated with the emergence of knowledge and information silos based around those same functional or occupational clusters. Improved collaboration and co-operation between groups has been demonstrated to result in a number of positive outcomes at an individual, group and organisational level. Example outcomes include an increased capacity for problem solving, improved responsiveness and adaptation to organisational crises, higher morale and an increased ability to leverage workforce capability. However, an essential challenge for organisations wishing to overcome informational silos is to implement mechanisms that facilitate, encourage and sustain interactions between otherwise disconnected groups. This paper reviews the ability of Web 2.0 technologies and mobile computing devices to facilitate and encourage knowledge sharing between “silo’d” groups. Commonly available tools such as Facebook, Twitter, Blogs, Wiki’s and others will be reviewed in relation to their applicability, functionality and ease-of-use by engineering and technical personnel. The paper also documents three case examples of engineering organisations that have successfully employed Web 2.0 to achieve superior knowledge management. With a number of clear recommendations he paper is an essential starting point for any organization looking at the use of new generation technologies for achieving the significant outcomes associated with knowledge transfer.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

An often neglected but well recognised aspect of successful engineering asset management is the achievement of co-operation and collaboration between various occupational, functional and hierarchical levels present within complex technical environments. Engineering and technical contexts have been well documented for the presence of highly cohesive groups based around around functional or role orientations. However while highly cohesive groups are potentially advantageous they are also often correlated with the emergence of knowledge and information silos based around those same functional or occupational clusters. Improved collaboration and co-operation between groups has been demonstrated to result in a number of positive outcomes at an individual, group and organisational level. Example outcomes include an increased capacity for problem solving, improved responsiveness and adaptation to organisational crises, higher morale and an increased ability to leverage workforce capability. However, an essential challenge for organisations wishing to overcome informational silos is to implement mechanisms that facilitate, encourage and sustain interactions between otherwise disconnected groups. This paper reviews the ability of Web 2.0 technologies and mobile computing devices to facilitate and encourage knowledge sharing between “silo’d” groups. Commonly available tools such as Facebook, Twitter, Blogs, Wiki’s and others will be reviewed in relation to their applicability, functionality and ease-of-use by engineering and technical personnel. The paper also documents three case examples of engineering organisations that have successfully employed Web 2.0 to achieve superior knowledge management. With a number of clear recommendations the paper is an essential starting point for any organization looking at the use of new generation technologies for achieving the significant outcomes associated with knowledge transfer.