917 resultados para Web Log Data
Resumo:
Conferencia por invitación, impartida el 31 d mayo de 2014 en el Workshop on Language Technology Service Platforms: Synergies, Standards, Sharing at LREC2014
Resumo:
The robotics is one of the most active areas. We also need to join a large number of disciplines to create robots. With these premises, one problem is the management of information from multiple heterogeneous sources. Each component, hardware or software, produces data with different nature: temporal frequencies, processing needs, size, type, etc. Nowadays, technologies and software engineering paradigms such as service-oriented architectures are applied to solve this problem in other areas. This paper proposes the use of these technologies to implement a robotic control system based on services. This type of system will allow integration and collaborative work of different elements that make up a robotic system.
Resumo:
The revelation of the top-secret US intelligence-led PRISM Programme has triggered wide-ranging debates across Europe. Press reports have shed new light on the electronic surveillance ‘fishing expeditions’ of the US National Security Agency and the FBI into the world’s largest electronic communications companies. This Policy Brief by a team of legal specialists and political scientists addresses the main controversies raised by the PRISM affair and the policy challenges that it poses for the EU. Two main arguments are presented: First, the leaks over the PRISM programme have undermined the trust that EU citizens have in their governments and the European institutions to safeguard and protect their privacy; and second, the PRISM affair raises questions regarding the capacity of EU institutions to draw lessons from the past and to protect the data of its citizens and residents in the context of transatlantic relations. The Policy Brief puts forward a set of policy recommendations for the EU to follow and implement a robust data protection strategy in response to the affair.
Resumo:
This paper proposes a novel application of fuzzy logic to web data mining for two basic problems of a website: popularity and satisfaction. Popularity means that people will visit the website while satisfaction refers to the usefulness of the site. We will illustrate that the popularity of a website is a fuzzy logic problem. It is an important characteristic of a website in order to survive in Internet commerce. The satisfaction of a website is also a fuzzy logic problem that represents the degree of success in the application of information technology to the business. We propose a framework of fuzzy logic for the representation of these two problems based on web data mining techniques to fuzzify the attributes of a website.
Resumo:
Effectively using heterogeneous, distributed information has attracted much research in recent years. Current web services technologies have been used successfully in some non data intensive distributed prototype systems. However, most of them can not work well in data intensive environment. This paper provides an infrastructure layer in data intensive environment for the effectively providing spatial information services by using the web services over the Internet. We extensively investigate and analyze the overhead of web services in data intensive environment, and propose some new optimization techniques which can greatly increase the system’s efficiency. Our experiments show that these techniques are suitable to data intensive environment. Finally, we present the requirement of these techniques for the information of web services over the Internet.
Resumo:
In current organizations, valuable enterprise knowledge is often buried under rapidly expanding huge amount of unstructured information in the form of web pages, blogs, and other forms of human text communications. We present a novel unsupervised machine learning method called CORDER (COmmunity Relation Discovery by named Entity Recognition) to turn these unstructured data into structured information for knowledge management in these organizations. CORDER exploits named entity recognition and co-occurrence data to associate individuals in an organization with their expertise and associates. We discuss the problems associated with evaluating unsupervised learners and report our initial evaluation experiments in an expert evaluation, a quantitative benchmarking, and an application of CORDER in a social networking tool called BuddyFinder.
Resumo:
We present CORDER (COmmunity Relation Discovery by named Entity Recognition) an un-supervised machine learning algorithm that exploits named entity recognition and co-occurrence data to associate individuals in an organization with their expertise and associates. We discuss the problems associated with evaluating unsupervised learners and report our initial evaluation experiments.
Resumo:
In many of the Statnotes described in this series, the statistical tests assume the data are a random sample from a normal distribution These Statnotes include most of the familiar statistical tests such as the ‘t’ test, analysis of variance (ANOVA), and Pearson’s correlation coefficient (‘r’). Nevertheless, many variables exhibit a more or less ‘skewed’ distribution. A skewed distribution is asymmetrical and the mean is displaced either to the left (positive skew) or to the right (negative skew). If the mean of the distribution is low, the degree of variation large, and when values can only be positive, a positively skewed distribution is usually the result. Many distributions have potentially a low mean and high variance including that of the abundance of bacterial species on plants, the latent period of an infectious disease, and the sensitivity of certain fungi to fungicides. These positively skewed distributions are often fitted successfully by a variant of the normal distribution called the log-normal distribution. This statnote describes fitting the log-normal distribution with reference to two scenarios: (1) the frequency distribution of bacterial numbers isolated from cloths in a domestic environment and (2), the sizes of lichenised ‘areolae’ growing on the hypothalus of Rhizocarpon geographicum (L.) DC.
Resumo:
This article presents a new method for data collection in regional dialectology based on site-restricted web searches. The method measures the usage and determines the distribution of lexical variants across a region of interest using common web search engines, such as Google or Bing. The method involves estimating the proportions of the variants of a lexical alternation variable over a series of cities by counting the number of webpages that contain the variants on newspaper websites originating from these cities through site-restricted web searches. The method is evaluated by mapping the 26 variants of 10 lexical variables with known distributions in American English. In almost all cases, the maps based on site-restricted web searches align closely with traditional dialect maps based on data gathered through questionnaires, demonstrating the accuracy of this method for the observation of regional linguistic variation. However, unlike collecting dialect data using traditional methods, which is a relatively slow process, the use of site-restricted web searches allows for dialect data to be collected from across a region as large as the United States in a matter of days.
Resumo:
In this paper we present, LEAPS, a Semantic Web and Linked data framework for searching and visualising datasets from the domain of Algal biomass. LEAPS provides tailored interfaces to explore algal biomass datasets via REST services and a SPARQL endpoint for stakeholders in the domain of algal biomass. The rich suite of datasets include data about potential algal biomass cultivation sites, sources of CO2, the pipelines connecting the cultivation sites to the CO2 sources and a subset of the biological taxonomy of algae derived from the world's largest online information source on algae.
Resumo:
Our modular approach to data hiding is an innovative concept in the data hiding research field. It enables the creation of modular digital watermarking methods that have extendable features and are designed for use in web applications. The methods consist of two types of modules – a basic module and an application-specific module. The basic module mainly provides features which are connected with the specific image format. As JPEG is a preferred image format on the Internet, we have put a focus on the achievement of a robust and error-free embedding and retrieval of the embedded data in JPEG images. The application-specific modules are adaptable to user requirements in the concrete web application. The experimental results of the modular data watermarking are very promising. They indicate excellent image quality, satisfactory size of the embedded data and perfect robustness against JPEG transformations with prespecified compression ratios. ACM Computing Classification System (1998): C.2.0.