10 resultados para Open source information retrieval

em Digital Commons at Florida International University


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In recent years, a surprising new phenomenon has emerged in which globally-distributed online communities collaborate to create useful and sophisticated computer software. These open source software groups are comprised of generally unaffiliated individuals and organizations who work in a seemingly chaotic fashion and who participate on a voluntary basis without direct financial incentive. ^ The purpose of this research is to investigate the relationship between the social network structure of these intriguing groups and their level of output and activity, where social network structure is defined as (1) closure or connectedness within the group, (2) bridging ties which extend outside of the group, and (3) leader centrality within the group. Based on well-tested theories of social capital and centrality in teams, propositions were formulated which suggest that social network structures associated with successful open source software project communities will exhibit high levels of bridging and moderate levels of closure and leader centrality. ^ The research setting was the SourceForge hosting organization and a study population of 143 project communities was identified. Independent variables included measures of closure and leader centrality defined over conversational ties, along with measures of bridging defined over membership ties. Dependent variables included source code commits and software releases for community output, and software downloads and project site page views for community activity. A cross-sectional study design was used and archival data were extracted and aggregated for the two-year period following the first release of project software. The resulting compiled variables were analyzed using multiple linear and quadratic regressions, controlling for group size and conversational volume. ^ Contrary to theory-based expectations, the surprising results showed that successful project groups exhibited low levels of closure and that the levels of bridging and leader centrality were not important factors of success. These findings suggest that the creation and use of open source software may represent a fundamentally new socio-technical development process which disrupts the team paradigm and which triggers the need for building new theories of collaborative development. These new theories could point towards the broader application of open source methods for the creation of knowledge-based products other than software. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The rapid growth of the Internet and the advancements of the Web technologies have made it possible for users to have access to large amounts of on-line music data, including music acoustic signals, lyrics, style/mood labels, and user-assigned tags. The progress has made music listening more fun, but has raised an issue of how to organize this data, and more generally, how computer programs can assist users in their music experience. An important subject in computer-aided music listening is music retrieval, i.e., the issue of efficiently helping users in locating the music they are looking for. Traditionally, songs were organized in a hierarchical structure such as genre->artist->album->track, to facilitate the users’ navigation. However, the intentions of the users are often hard to be captured in such a simply organized structure. The users may want to listen to music of a particular mood, style or topic; and/or any songs similar to some given music samples. This motivated us to work on user-centric music retrieval system to improve users’ satisfaction with the system. The traditional music information retrieval research was mainly concerned with classification, clustering, identification, and similarity search of acoustic data of music by way of feature extraction algorithms and machine learning techniques. More recently the music information retrieval research has focused on utilizing other types of data, such as lyrics, user-access patterns, and user-defined tags, and on targeting non-genre categories for classification, such as mood labels and styles. This dissertation focused on investigating and developing effective data mining techniques for (1) organizing and annotating music data with styles, moods and user-assigned tags; (2) performing effective analysis of music data with features from diverse information sources; and (3) recommending music songs to the users utilizing both content features and user access patterns.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Florida International University has undergone a reform in the introductory physics classes by focusing on the laboratory component of these classes. We present results from the secondary implementation of two research-based instructional strategies: the implementation of the Learning Assistant model as developed by the University of Colorado at Boulder and the Open Source Tutorial curriculum developed at the University of Maryland, College Park. We examine the results of the Force Concept Inventory (FCI) for introductory students over five years (n=872) and find that the mean raw gain of students in transformed lab sections was 0.243, while the mean raw gain of the traditional labs was 0.159, with a Cohen’s d effect size of 0.59. Average raw gains on the FCI were 0.243 for Hispanic students and 0.213 for women in the transformed labs, indicating that these reforms are not widening the gaps between underrepresented student groups and majority groups. Our results illustrate how research-based instructional strategies can be successfully implemented in a physics department with minimal department engagement and in a sustainable manner.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In recent years, a surprising new phenomenon has emerged in which globally-distributed online communities collaborate to create useful and sophisticated computer software. These open source software groups are comprised of generally unaffiliated individuals and organizations who work in a seemingly chaotic fashion and who participate on a voluntary basis without direct financial incentive. The purpose of this research is to investigate the relationship between the social network structure of these intriguing groups and their level of output and activity, where social network structure is defined as 1) closure or connectedness within the group, 2) bridging ties which extend outside of the group, and 3) leader centrality within the group. Based on well-tested theories of social capital and centrality in teams, propositions were formulated which suggest that social network structures associated with successful open source software project communities will exhibit high levels of bridging and moderate levels of closure and leader centrality. The research setting was the SourceForge hosting organization and a study population of 143 project communities was identified. Independent variables included measures of closure and leader centrality defined over conversational ties, along with measures of bridging defined over membership ties. Dependent variables included source code commits and software releases for community output, and software downloads and project site page views for community activity. A cross-sectional study design was used and archival data were extracted and aggregated for the two-year period following the first release of project software. The resulting compiled variables were analyzed using multiple linear and quadratic regressions, controlling for group size and conversational volume. Contrary to theory-based expectations, the surprising results showed that successful project groups exhibited low levels of closure and that the levels of bridging and leader centrality were not important factors of success. These findings suggest that the creation and use of open source software may represent a fundamentally new socio-technical development process which disrupts the team paradigm and which triggers the need for building new theories of collaborative development. These new theories could point towards the broader application of open source methods for the creation of knowledge-based products other than software.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Methods for accessing data on the Web have been the focus of active research over the past few years. In this thesis we propose a method for representing Web sites as data sources. We designed a Data Extractor data retrieval solution that allows us to define queries to Web sites and process resulting data sets. Data Extractor is being integrated into the MSemODB heterogeneous database management system. With its help database queries can be distributed over both local and Web data sources within MSemODB framework. ^ Data Extractor treats Web sites as data sources, controlling query execution and data retrieval. It works as an intermediary between the applications and the sites. Data Extractor utilizes a twofold “custom wrapper” approach for information retrieval. Wrappers for the majority of sites are easily built using a powerful and expressive scripting language, while complex cases are processed using Java-based wrappers that utilize specially designed library of data retrieval, parsing and Web access routines. In addition to wrapper development we thoroughly investigate issues associated with Web site selection, analysis and processing. ^ Data Extractor is designed to act as a data retrieval server, as well as an embedded data retrieval solution. We also use it to create mobile agents that are shipped over the Internet to the client's computer to perform data retrieval on behalf of the user. This approach allows Data Extractor to distribute and scale well. ^ This study confirms feasibility of building custom wrappers for Web sites. This approach provides accuracy of data retrieval, and power and flexibility in handling of complex cases. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background As the use of electronic health records (EHRs) becomes more widespread, so does the need to search and provide effective information discovery within them. Querying by keyword has emerged as one of the most effective paradigms for searching. Most work in this area is based on traditional Information Retrieval (IR) techniques, where each document is compared individually against the query. We compare the effectiveness of two fundamentally different techniques for keyword search of EHRs. Methods We built two ranking systems. The traditional BM25 system exploits the EHRs' content without regard to association among entities within. The Clinical ObjectRank (CO) system exploits the entities' associations in EHRs using an authority-flow algorithm to discover the most relevant entities. BM25 and CO were deployed on an EHR dataset of the cardiovascular division of Miami Children's Hospital. Using sequences of keywords as queries, sensitivity and specificity were measured by two physicians for a set of 11 queries related to congenital cardiac disease. Results Our pilot evaluation showed that CO outperforms BM25 in terms of sensitivity (65% vs. 38%) by 71% on average, while maintaining the specificity (64% vs. 61%). The evaluation was done by two physicians. Conclusions Authority-flow techniques can greatly improve the detection of relevant information in EHRs and hence deserve further study.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The increasing amount of available semistructured data demands efficient mechanisms to store, process, and search an enormous corpus of data to encourage its global adoption. Current techniques to store semistructured documents either map them to relational databases, or use a combination of flat files and indexes. These two approaches result in a mismatch between the tree-structure of semistructured data and the access characteristics of the underlying storage devices. Furthermore, the inefficiency of XML parsing methods has slowed down the large-scale adoption of XML into actual system implementations. The recent development of lazy parsing techniques is a major step towards improving this situation, but lazy parsers still have significant drawbacks that undermine the massive adoption of XML. Once the processing (storage and parsing) issues for semistructured data have been addressed, another key challenge to leverage semistructured data is to perform effective information discovery on such data. Previous works have addressed this problem in a generic (i.e. domain independent) way, but this process can be improved if knowledge about the specific domain is taken into consideration. This dissertation had two general goals: The first goal was to devise novel techniques to efficiently store and process semistructured documents. This goal had two specific aims: We proposed a method for storing semistructured documents that maps the physical characteristics of the documents to the geometrical layout of hard drives. We developed a Double-Lazy Parser for semistructured documents which introduces lazy behavior in both the pre-parsing and progressive parsing phases of the standard Document Object Model's parsing mechanism. The second goal was to construct a user-friendly and efficient engine for performing Information Discovery over domain-specific semistructured documents. This goal also had two aims: We presented a framework that exploits the domain-specific knowledge to improve the quality of the information discovery process by incorporating domain ontologies. We also proposed meaningful evaluation metrics to compare the results of search systems over semistructured documents.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Methods for accessing data on the Web have been the focus of active research over the past few years. In this thesis we propose a method for representing Web sites as data sources. We designed a Data Extractor data retrieval solution that allows us to define queries to Web sites and process resulting data sets. Data Extractor is being integrated into the MSemODB heterogeneous database management system. With its help database queries can be distributed over both local and Web data sources within MSemODB framework. Data Extractor treats Web sites as data sources, controlling query execution and data retrieval. It works as an intermediary between the applications and the sites. Data Extractor utilizes a two-fold "custom wrapper" approach for information retrieval. Wrappers for the majority of sites are easily built using a powerful and expressive scripting language, while complex cases are processed using Java-based wrappers that utilize specially designed library of data retrieval, parsing and Web access routines. In addition to wrapper development we thoroughly investigate issues associated with Web site selection, analysis and processing. Data Extractor is designed to act as a data retrieval server, as well as an embedded data retrieval solution. We also use it to create mobile agents that are shipped over the Internet to the client's computer to perform data retrieval on behalf of the user. This approach allows Data Extractor to distribute and scale well. This study confirms feasibility of building custom wrappers for Web sites. This approach provides accuracy of data retrieval, and power and flexibility in handling of complex cases.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This panel presentation provided several use cases that detail the complexity of large-scale digital library system (DLS) migration from the perspective of three university libraries and a statewide academic library services consortium. Each described the methodologies developed at the beginning of their migration process, the unique challenges that arose along the way, how issues were managed, and the outcomes of their work. Florida Atlantic University, Florida International University, and the University of Central Florida are members of the state's academic library services consortium, the Florida Virtual Campus (FLVC). In 2011, the Digital Services Committee members began exploring alternatives to DigiTool, their shared FLVC hosted DLS. After completing a review of functional requirements and existing systems, the universities and FLVC began the implementation process of their chosen platforms. Migrations began in 2013 with limited sets of materials. As functionalities were enhanced to support additional categories of materials from the legacy system, migration paths were created for the remaining materials. Some of the challenges experienced with the institutional and statewide collaborative legacy collections were due to gradual changes in standards, technology, policies, and personnel. This was manifested in the quality of original digital files and metadata, as well as collection and record structures. Additionally, the complexities involved with multiple institutions collaborating and compromising throughout the migration process, as well as the move from a consortial support structure with a vendor solution to open source systems (both locally and consortially supported), presented their own sets of unique challenges. Following the presentation, the speakers discussed commonalities in their migration experience, including learning opportunities for future migrations.