885 resultados para Semantic Publishing, Linked Data, Bibliometrics, Informetrics, Data Retrieval, Citations
Resumo:
The virtual quadrilateral is the coalescence of novel data structures that reduces the storage requirements of spatial data without jeopardizing the quality and operability of the inherent information. The data representative of the observed area is parsed to ascertain the necessary contiguous measures that, when contained, implicitly define a quadrilateral. The virtual quadrilateral then represents a geolocated area of the observed space where all of the measures are the same. The area, contoured as a rectangle, is pseudo-delimited by the opposite coordinates of the bounding area. Once defined, the virtual quadrilateral is representative of an area in the observed space and is represented in a database by the attributes of its bounding coordinates and measure of its contiguous space. Virtual quadrilaterals have been found to ensure a lossless reduction of the physical storage, maintain the implied features of the data, facilitate the rapid retrieval of vast amount of the represented spatial data and accommodate complex queries. The methods presented herein demonstrate that virtual quadrilaterals are created quite easily, are stable and versatile objects in a database and have proven to be beneficial to exigent spatial data applications such as geographic information systems. ^
Resumo:
Graph-structured databases are widely prevalent, and the problem of effective search and retrieval from such graphs has been receiving much attention recently. For example, the Web can be naturally viewed as a graph. Likewise, a relational database can be viewed as a graph where tuples are modeled as vertices connected via foreign-key relationships. Keyword search querying has emerged as one of the most effective paradigms for information discovery, especially over HTML documents in the World Wide Web. One of the key advantages of keyword search querying is its simplicity—users do not have to learn a complex query language, and can issue queries without any prior knowledge about the structure of the underlying data. The purpose of this dissertation was to develop techniques for user-friendly, high quality and efficient searching of graph structured databases. Several ranked search methods on data graphs have been studied in the recent years. Given a top-k keyword search query on a graph and some ranking criteria, a keyword proximity search finds the top-k answers where each answer is a substructure of the graph containing all query keywords, which illustrates the relationship between the keyword present in the graph. We applied keyword proximity search on the web and the page graph of web documents to find top-k answers that satisfy user’s information need and increase user satisfaction. Another effective ranking mechanism applied on data graphs is the authority flow based ranking mechanism. Given a top- k keyword search query on a graph, an authority-flow based search finds the top-k answers where each answer is a node in the graph ranked according to its relevance and importance to the query. We developed techniques that improved the authority flow based search on data graphs by creating a framework to explain and reformulate them taking in to consideration user preferences and feedback. We also applied the proposed graph search techniques for Information Discovery over biological databases. Our algorithms were experimentally evaluated for performance and quality. The quality of our method was compared to current approaches by using user surveys.
Resumo:
With the explosive growth of the volume and complexity of document data (e.g., news, blogs, web pages), it has become a necessity to semantically understand documents and deliver meaningful information to users. Areas dealing with these problems are crossing data mining, information retrieval, and machine learning. For example, document clustering and summarization are two fundamental techniques for understanding document data and have attracted much attention in recent years. Given a collection of documents, document clustering aims to partition them into different groups to provide efficient document browsing and navigation mechanisms. One unrevealed area in document clustering is that how to generate meaningful interpretation for the each document cluster resulted from the clustering process. Document summarization is another effective technique for document understanding, which generates a summary by selecting sentences that deliver the major or topic-relevant information in the original documents. How to improve the automatic summarization performance and apply it to newly emerging problems are two valuable research directions. To assist people to capture the semantics of documents effectively and efficiently, the dissertation focuses on developing effective data mining and machine learning algorithms and systems for (1) integrating document clustering and summarization to obtain meaningful document clusters with summarized interpretation, (2) improving document summarization performance and building document understanding systems to solve real-world applications, and (3) summarizing the differences and evolution of multiple document sources.
Resumo:
This ex post facto study (N = 209) examined the relationships between employer job strategies and job retention among organizations participating in Florida welfare-to-work network programs and associated the strategies with job retention data to determine best practices. ^ An internet-based self-report survey battery was administered to a heterogeneous sampling of organizations participating in the Florida welfare-to-work network program. Hypotheses were tested through correlational and hierarchical regression analytic procedures. The partial correlation results linked each of the job retention strategies to job retention. Wages, benefits, training and supervision, communication, job growth, work/life balance, fairness and respect were all significantly related to job retention. Hierarchical regression results indicated that the training and supervision variable was the best predictor of job retention in the regression equation. ^ The size of the organization was also a significant predictor of job retention. Large organizations reported higher job retention rates than small organizations. There was no statistical difference between the types of organizations (profit-making and non-profit) and job retention. The standardized betas ranged from to .26 to .41 in the regression equation. Twenty percent of the variance in job retention was explained by the combination of demographic and job retention strategy predictors, supporting the theoretical, empirical, and practical relevance of understanding the association between employer job strategies and job retention outcomes. Implications for adult education and human resource development theory, research, and practice are highlighted as possible strategic leverage points for creating conditions that facilitate the development of job strategies as a means for improving former welfare workers’ job retention.^
Resumo:
A method to estimate speed of free-ranging fishes using a passive sampling device is described and illustrated with data from the Everglades, U.S.A. Catch per unit effort (CPUE) from minnow traps embedded in drift fences was treated as an encounter rate and used to estimate speed, when combined with an independent estimate of density obtained by use of throw traps that enclose 1 m2 of marsh habitat. Underwater video was used to evaluate capture efficiency and species-specific bias of minnow traps and two sampling studies were used to estimate trap saturation and diel-movement patterns; these results were used to optimize sampling and derive correction factors to adjust species-specific encounter rates for bias and capture efficiency. Sailfin mollies Poecilia latipinna displayed a high frequency of escape from traps, whereas eastern mosquitofish Gambusia holbrooki were most likely to avoid a trap once they encountered it; dollar sunfish Lepomis marginatus were least likely to avoid the trap once they encountered it or to escape once they were captured. Length of sampling and time of day affected CPUE; fishes generally had a very low retention rate over a 24 h sample time and only the Everglades pygmy sunfish Elassoma evergladei were commonly captured at night. Dispersal speed of fishes in the Florida Everglades, U.S.A., was shown to vary seasonally and among species, ranging from 0· 05 to 0· 15 m s−1 for small poeciliids and fundulids to 0· 1 to 1· 8 m s−1 for L. marginatus. Speed was generally highest late in the wet season and lowest in the dry season, possibly tied to dispersal behaviours linked to finding and remaining in dry-season refuges. These speed estimates can be used to estimate the diffusive movement rate, which is commonly employed in spatial ecological models.
Resumo:
Construction organizations typically deal with large volumes of project data containing valuable information. It is found that these organizations do not use these data effectively for planning and decision-making. There are two reasons. First, the information systems in construction organizations are designed to support day-to-day construction operations. The data stored in these systems are often non-validated, non-integrated and are available in a format that makes it difficult for decision makers to use in order to make timely decisions. Second, the organizational structure and the IT infrastructure are often not compatible with the information systems thereby resulting in higher operational costs and lower productivity. These two issues have been investigated in this research with the objective of developing systems that are structured for effective decision-making. ^ A framework was developed to guide storage and retrieval of validated and integrated data for timely decision-making and to enable construction organizations to redesign their organizational structure and IT infrastructure matched with information system capabilities. The research was focused on construction owner organizations that were continuously involved in multiple construction projects. Action research and Data warehousing techniques were used to develop the framework. ^ One hundred and sixty-three construction owner organizations were surveyed in order to assess their data needs, data management practices and extent of use of information systems in planning and decision-making. For in-depth analysis, Miami-Dade Transit (MDT) was selected which is in-charge of all transportation-related construction projects in the Miami-Dade county. A functional model and a prototype system were developed to test the framework. The results revealed significant improvements in data management and decision-support operations that were examined through various qualitative (ease in data access, data quality, response time, productivity improvement, etc.) and quantitative (time savings and operational cost savings) measures. The research results were first validated by MDT and then by a representative group of twenty construction owner organizations involved in various types of construction projects. ^
Resumo:
The deployment of wireless communications coupled with the popularity of portable devices has led to significant research in the area of mobile data caching. Prior research has focused on the development of solutions that allow applications to run in wireless environments using proxy based techniques. Most of these approaches are semantic based and do not provide adequate support for representing the context of a user (i.e., the interpreted human intention.). Although the context may be treated implicitly it is still crucial to data management. In order to address this challenge this dissertation focuses on two characteristics: how to predict (i) the future location of the user and (ii) locations of the fetched data where the queried data item has valid answers. Using this approach, more complete information about the dynamics of an application environment is maintained. ^ The contribution of this dissertation is a novel data caching mechanism for pervasive computing environments that can adapt dynamically to a mobile user's context. In this dissertation, we design and develop a conceptual model and context aware protocols for wireless data caching management. Our replacement policy uses the validity of the data fetched from the server and the neighboring locations to decide which of the cache entries is less likely to be needed in the future, and therefore a good candidate for eviction when cache space is needed. The context aware driven prefetching algorithm exploits the query context to effectively guide the prefetching process. The query context is defined using a mobile user's movement pattern and requested information context. Numerical results and simulations show that the proposed prefetching and replacement policies significantly outperform conventional ones. ^ Anticipated applications of these solutions include biomedical engineering, tele-health, medical information systems and business. ^
Resumo:
Online Social Network (OSN) services provided by Internet companies bring people together to chat, share the information, and enjoy the information. Meanwhile, huge amounts of data are generated by those services (they can be regarded as the social media ) every day, every hour, even every minute, and every second. Currently, researchers are interested in analyzing the OSN data, extracting interesting patterns from it, and applying those patterns to real-world applications. However, due to the large-scale property of the OSN data, it is difficult to effectively analyze it. This dissertation focuses on applying data mining and information retrieval techniques to mine two key components in the social media data — users and user-generated contents. Specifically, it aims at addressing three problems related to the social media users and contents: (1) how does one organize the users and the contents? (2) how does one summarize the textual contents so that users do not have to go over every post to capture the general idea? (3) how does one identify the influential users in the social media to benefit other applications, e.g., Marketing Campaign? The contribution of this dissertation is briefly summarized as follows. (1) It provides a comprehensive and versatile data mining framework to analyze the users and user-generated contents from the social media. (2) It designs a hierarchical co-clustering algorithm to organize the users and contents. (3) It proposes multi-document summarization methods to extract core information from the social network contents. (4) It introduces three important dimensions of social influence, and a dynamic influence model for identifying influential users.
Resumo:
Construction organizations typically deal with large volumes of project data containing valuable information. It is found that these organizations do not use these data effectively for planning and decision-making. There are two reasons. First, the information systems in construction organizations are designed to support day-to-day construction operations. The data stored in these systems are often non-validated, nonintegrated and are available in a format that makes it difficult for decision makers to use in order to make timely decisions. Second, the organizational structure and the IT infrastructure are often not compatible with the information systems thereby resulting in higher operational costs and lower productivity. These two issues have been investigated in this research with the objective of developing systems that are structured for effective decision-making. A framework was developed to guide storage and retrieval of validated and integrated data for timely decision-making and to enable construction organizations to redesign their organizational structure and IT infrastructure matched with information system capabilities. The research was focused on construction owner organizations that were continuously involved in multiple construction projects. Action research and Data warehousing techniques were used to develop the framework. One hundred and sixty-three construction owner organizations were surveyed in order to assess their data needs, data management practices and extent of use of information systems in planning and decision-making. For in-depth analysis, Miami-Dade Transit (MDT) was selected which is in-charge of all transportation-related construction projects in the Miami-Dade county. A functional model and a prototype system were developed to test the framework. The results revealed significant improvements in data management and decision-support operations that were examined through various qualitative (ease in data access, data quality, response time, productivity improvement, etc.) and quantitative (time savings and operational cost savings) measures. The research results were first validated by MDT and then by a representative group of twenty construction owner organizations involved in various types of construction projects.
Resumo:
The mediator software architecture design has been developed to provide data integration and retrieval in distributed, heterogeneous environments. Since the initial conceptualization of this architecture, many new technologies have emerged that can facilitate the implementation of this design. The purpose of this thesis was to show that a mediator framework supporting users of mobile devices could be implemented using common software technologies available today. In addition, the prototype was developed with a view to providing a better understanding of what a mediator is and to expose issues that will have to be addressed in full, more robust designs. The prototype developed for this thesis was implemented using various technologies including: Java, XML, and Simple Object Access Protocol (SOAP) among others. SOAP was used to accomplish inter-process communication. In the end, it is expected that more data intensive software applications will be possible in a world with ever-increasing demands for information.
Resumo:
The uptake of anthropogenic CO2 by the oceans has led to a rise in the oceanic partial pressure of CO2, and to a decrease in pH and carbonate ion concentration. This modification of the marine carbonate system is referred to as ocean acidification. Numerous papers report the effects of ocean acidification on marine organisms and communities but few have provided details concerning full carbonate chemistry and complementary observations. Additionally, carbonate system variables are often reported in different units, calculated using different sets of dissociation constants and on different pH scales. Hence the direct comparison of experimental results has been problematic and often misleading. The need was identified to (1) gather data on carbonate chemistry, biological and biogeochemical properties, and other ancillary data from published experimental data, (2) transform the information into common framework, and (3) make data freely available. The present paper is the outcome of an effort to integrate ocean carbonate chemistry data from the literature which has been supported by the European Network of Excellence for Ocean Ecosystems Analysis (EUR-OCEANS) and the European Project on Ocean Acidification (EPOCA). A total of 185 papers were identified, 100 contained enough information to readily compute carbonate chemistry variables, and 81 data sets were archived at PANGAEA - The Publishing Network for Geoscientific & Environmental Data. This data compilation is regularly updated as an ongoing mission of EPOCA.
Resumo:
Underwater georeferenced photo-transect surveys were conducted on December 10-15, 2011 at various sections of the reef at Lizard Island, Great Barrier Reef. For this survey a snorkeler or diver swam over the bottom while taking photos of the benthos at a set height using a standard digital camera and towing a GPS in a surface float which logged the track every five seconds. A standard digital compact camera was placed in an underwater housing and fitted with a 16 mm lens which provided a 1.0 m x 1.0 m footprint, at 0.5 m height above the benthos. Horizontal distance between photos was estimated by three fin kicks of the survey diver/snorkeler, which corresponded to a surface distance of approximately 2.0 - 4.0 m. The GPS was placed in a dry-bag and logged the position as it floated at the surface while being towed by the photographer. A total of 5,735 benthic photos were taken. A floating GPS setup connected to the swimmer/diver by a line enabled recording of coordinates of each benthic photo (Roelfsema 2009). Approximation of coordinates of each benthic photo was conducted based on the photo timestamp and GPS coordinate time stamp, using GPS Photo Link Software (www.geospatialexperts.com). Coordinates of each photo were interpolated by finding the GPS coordinates that were logged at a set time before and after the photo was captured. Benthic or substrate cover data was derived from each photo by randomly placing 24 points over each image using the Coral Point Count for Microsoft Excel program (Kohler and Gill, 2006). Each point was then assigned to 1 of 78 cover types, which represented the benthic feature beneath it. Benthic cover composition summary of each photo scores was generated automatically using CPCE program. The resulting benthic cover data of each photo was linked to GPS coordinates, saved as an ArcMap point shapefile, and projected to Universal Transverse Mercator WGS84 Zone 55 South.
Resumo:
Underwater georeferenced photo-transect surveys were conducted on October 3-7, 2012 at various sections of the reef and lagoon at Lizard Island, Great Barrier Reef. For this survey a snorkeler swam while taking photos of the benthos at a set distance from the benthos using a standard digital camera and towing a GPS in a surface float which logged the track every five seconds. A Canon G12 digital camera was placed in a Canon underwater housing and photos were taken at 1 m height above the benthos. Horizontal distance between photos was estimated by three fin kicks of the survey snorkeler, which corresponded to a surface distance of approximately 2.0 - 4.0 m. The GPS was placed in a dry bag and logged the position at the surface while being towed by the photographer (Roelfsema, 2009). A total of 1,265 benthic photos were taken. Approximation of coordinates of each benthic photo was conducted based on the photo timestamp and GPS coordinate time stamp, using GPS Photo Link Software (www.geospatialexperts.com). Coordinates of each photo were interpolated by finding the GPS coordinates that were logged at a set time before and after the photo was captured. Benthic or substrate cover data was derived from each photo by randomly placing 24 points over each image using the Coral Point Count for Microsoft Excel program (Kohler and Gill, 2006). Each point was then assigned to 1 of 79 cover types, which represented the benthic feature beneath it. Benthic cover composition summary of each photo scores was generated automatically using CPCE program. The resulting benthic cover data of each photo was linked to GPS coordinates, saved as an ArcMap point shapefile, and projected to Universal Transverse Mercator WGS84 Zone 55 South.
Resumo:
An object based image analysis approach (OBIA) was used to create a habitat map of the Lizard Reef. Briefly, georeferenced dive and snorkel photo-transect surveys were conducted at different locations surrounding Lizard Island, Australia. For the surveys, a snorkeler or diver swam over the bottom at a depth of 1-2m in the lagoon, One Tree Beach and Research Station areas, and 7m depth in Watson's Bay, while taking photos of the benthos at a set height using a standard digital camera and towing a surface float GPS which was logging its track every five seconds. The camera lens provided a 1.0 m x 1.0 m footprint, at 0.5 m height above the benthos. Horizontal distance between photos was estimated by fin kicks, and corresponded to a surface distance of approximately 2.0 - 4.0 m. Approximation of coordinates of each benthic photo was done based on the photo timestamp and GPS coordinate time stamp, using GPS Photo Link Software (www.geospatialexperts.com). Coordinates of each photo were interpolated by finding the gps coordinates that were logged at a set time before and after the photo was captured. Dominant benthic or substrate cover type was assigned to each photo by placing 24 points random over each image using the Coral Point Count excel program (Kohler and Gill, 2006). Each point was then assigned a dominant cover type using a benthic cover type classification scheme containing nine first-level categories - seagrass high (>=70%), seagrass moderate (40-70%), seagrass low (<= 30%), coral, reef matrix, algae, rubble, rock and sand. Benthic cover composition summaries of each photo were generated automatically in CPCe. The resulting benthic cover data for each photo was linked to GPS coordinates, saved as an ArcMap point shapefile, and projected to Universal Transverse Mercator WGS84 Zone 56 South. The OBIA class assignment followed a hierarchical assignment based on membership rules with levels for "reef", "geomorphic zone" and "benthic community" (above).
Resumo:
TEX86 (TetraEther indeX of tetraethers consisting of 86 carbon atoms) is a sea surface temperature (SST) proxy based on the distribution of archaeal isoprenoid glycerol dialkyl glycerol tetraethers (GDGTs). In this study, we appraise the applicability of TEX86 and TEX86L in subpolar and polar regions using surface sediments. We present TEX86 and TEX86L data from 160 surface sediment samples collected in the Arctic, the Southern Ocean and the North Pacific. Most of the SST estimates derived from both TEX86 and TEX86L are anomalously high in the Arctic, especially in the vicinity of Siberian river mouths and the sea ice margin, plausibly due to additional archaeal contributions linked to terrigenous input. We found unusual GDGT distributions at five sites in the North Pacific. High GDGT-0/crenarchaeol and GDGT-2/crenarchaeol ratios at these sites suggest a substantial contribution of methanogenic and/or methanotrophic archaea to the sedimentary GDGT pool here. Apart from these anomalous findings, TEX86 and TEX86L values in the surface sediments from the Southern Ocean and the North Pacific do usually vary with overlaying SSTs. In these regions, the sedimentary TEX86-SST relationship is similar to the global calibration, and the derived temperature estimates agree well with overlaying annual mean SSTs at the sites. However, there is a systematic offset between the regional TEX86L-SST relationships and the global calibration. At these sites, temperature estimates based on the global TEX86L calibration are closer to summer SSTs than annual mean SSTs. This finding suggests that in these subpolar settings a regional TEX86L calibration may be a more suitable equation for temperature reconstruction than the global calibration.