998 resultados para Page name matching


Relevância:

100.00% 100.00%

Publicador:

Resumo:

At NTCIR-9, we participated in the cross-lingual link discovery (Crosslink) task. In this paper we describe our approaches to discovering Chinese, Japanese, and Korean (CJK) cross-lingual links for English documents in Wikipedia. Our experimental results show that a link mining approach that mines the existing link structure for anchor probabilities and relies on the “translation” using cross-lingual document name triangulation performs very well. The evaluation shows encouraging results for our system.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nowadays people heavily rely on the Internet for information and knowledge. Wikipedia is an online multilingual encyclopaedia that contains a very large number of detailed articles covering most written languages. It is often considered to be a treasury of human knowledge. It includes extensive hypertext links between documents of the same language for easy navigation. However, the pages in different languages are rarely cross-linked except for direct equivalent pages on the same subject in different languages. This could pose serious difficulties to users seeking information or knowledge from different lingual sources, or where there is no equivalent page in one language or another. In this thesis, a new information retrieval task—cross-lingual link discovery (CLLD) is proposed to tackle the problem of the lack of cross-lingual anchored links in a knowledge base such as Wikipedia. In contrast to traditional information retrieval tasks, cross language link discovery algorithms actively recommend a set of meaningful anchors in a source document and establish links to documents in an alternative language. In other words, cross-lingual link discovery is a way of automatically finding hypertext links between documents in different languages, which is particularly helpful for knowledge discovery in different language domains. This study is specifically focused on Chinese / English link discovery (C/ELD). Chinese / English link discovery is a special case of cross-lingual link discovery task. It involves tasks including natural language processing (NLP), cross-lingual information retrieval (CLIR) and cross-lingual link discovery. To justify the effectiveness of CLLD, a standard evaluation framework is also proposed. The evaluation framework includes topics, document collections, a gold standard dataset, evaluation metrics, and toolkits for run pooling, link assessment and system evaluation. With the evaluation framework, performance of CLLD approaches and systems can be quantified. This thesis contributes to the research on natural language processing and cross-lingual information retrieval in CLLD: 1) a new simple, but effective Chinese segmentation method, n-gram mutual information, is presented for determining the boundaries of Chinese text; 2) a voting mechanism of name entity translation is demonstrated for achieving a high precision of English / Chinese machine translation; 3) a link mining approach that mines the existing link structure for anchor probabilities achieves encouraging results in suggesting cross-lingual Chinese / English links in Wikipedia. This approach was examined in the experiments for better, automatic generation of cross-lingual links that were carried out as part of the study. The overall major contribution of this thesis is the provision of a standard evaluation framework for cross-lingual link discovery research. It is important in CLLD evaluation to have this framework which helps in benchmarking the performance of various CLLD systems and in identifying good CLLD realisation approaches. The evaluation methods and the evaluation framework described in this thesis have been utilised to quantify the system performance in the NTCIR-9 Crosslink task which is the first information retrieval track of this kind.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

At NTCIR-10 we participated in the cross-lingual link discovery (CrossLink-2) task. In this paper we describe our systems for discovering cross-lingual links between the Chinese, Japanese, and Korean (CJK) Wikipedia and the English Wikipedia. The evaluation results show that our implementation of the cross-lingual linking method achieved promising results.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

La asignatura de Competitividad Internacional Urbana (ciu) del programa de Gestión y Desarrollo Urbanos (gdu) de la Universidad del Rosario ha sido desde 2009, cuando asumí su dirección y orientación, un reto permanente de aprendizajes tan estimulantes y variados cuantas ciudades y atributos hay por descubrir en el inmenso mundo de lo urbano-rural-regional. Si bien la competitividad es un asunto urbano-regional antes que nacional, la mayor parte de los enfoques y de las consiguientes referencias bibliográficas abordan la competitividad a nivel nacional siendo relativamente escasas las publicaciones sobre la competitividad urbana. Así, los documentos abordan una descripción general de las ciudades, las causas de las crisis y las consecuencias para la ciudad y su estructura económica, analizadas a partir de los impactos sobre el mercado laboral, los precios de la vivienda, el desarrollo del turismo, entre otros, y las diversas estrategias que adoptaron para afrontar la crisis y convertirla en una oportunidad de desarrollo.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Index of names for the Welland Canal Company's Survey of Land book, 1826. Includes persons name, land cultivated, uncultivated, total land and remarks. The remarks noted include; who surveyed the lands, the dates of the survey, former property names, additional property features etc.The page is titled: Statement of Lands Surveyed and appropriated to the use of the Welland Canal Company.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This is an index of all the names contained within the Survey of Lands. The names are listed in alphabetical order and are paired with page numbers where more information can be found on the individuals listed. This page lists names beginning with "A" through to an including "N".

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This is an index of all the names contained within the Survey of Lands. The names are listed in alphabetical order and are paired with page numbers where more information can be found on the individuals listed. This page lists names beginning with "O" through to an including "Z".

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We report a general mass spectrometric approach for the rapid identification and characterization of proteins isolated by preparative two-dimensional polyacrylamide gel electrophoresis. This method possesses the inherent power to detect and structurally characterize covalent modifications. Absolute sensitivities of matrix-assisted laser desorption ionization and high-energy collision-induced dissociation tandem mass spectrometry are exploited to determine the mass and sequence of subpicomole sample quantities of tryptic peptides. These data permit mass matching and sequence homology searching of computerized peptide mass and protein sequence data bases for known proteins and design of oligonucleotide probes for cloning unknown proteins. We have identified 11 proteins in lysates of human A375 melanoma cells, including: alpha-enolase, cytokeratin, stathmin, protein disulfide isomerase, tropomyosin, Cu/Zn superoxide dismutase, nucleoside diphosphate kinase A, galaptin, and triosephosphate isomerase. We have characterized several posttranslational modifications and chemical modifications that may result from electrophoresis or subsequent sample processing steps. Detection of comigrating and covalently modified proteins illustrates the necessity of peptide sequencing and the advantages of tandem mass spectrometry to reliably and unambiguously establish the identity of each protein. This technology paves the way for studies of cell-type dependent gene expression and studies of large suites of cellular proteins with unprecedented speed and rigor to provide information complementary to the ongoing Human Genome Project.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The rapid growth in the number of users using social networks and the information that a social network requires about their users make the traditional matching systems insufficiently adept at matching users within social networks. This paper introduces the use of clustering to form communities of users and, then, uses these communities to generate matches. Forming communities within a social network helps to reduce the number of users that the matching system needs to consider, and helps to overcome other problems from which social networks suffer, such as the absence of user activities' information about a new user. The proposed system has been evaluated on a dataset obtained from an online dating website. Empirical analysis shows that accuracy of the matching process is increased using the community information.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Automated feature extraction and correspondence determination is an extremely important problem in the face recognition community as it often forms the foundation of the normalisation and database construction phases of many recognition and verification systems. This paper presents a completely automatic feature extraction system based upon a modified volume descriptor. These features form a stable descriptor for faces and are utilised in a reversible jump Markov chain Monte Carlo correspondence algorithm to automatically determine correspondences which exist between faces. The developed system is invariant to changes in pose and occlusion and results indicate that it is also robust to minor face deformations which may be present with variations in expression.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Traditional area-based matching techniques make use of similarity metrics such as the Sum of Absolute Differences(SAD), Sum of Squared Differences (SSD) and Normalised Cross Correlation (NCC). Non-parametric matching algorithms such as the rank and census rely on the relative ordering of pixel values rather than the pixels themselves as a similarity measure. Both traditional area-based and non-parametric stereo matching techniques have an algorithmic structure which is amenable to fast hardware realisation. This investigation undertakes a performance assessment of these two families of algorithms for robustness to radiometric distortion and random noise. A generic implementation framework is presented for the stereo matching problem and the relative hardware requirements for the various metrics investigated.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Automatically determining and assigning shared and meaningful text labels to data extracted from an e-Commerce web page is a challenging problem. An e-Commerce web page can display a list of data records, each of which can contain a combination of data items (e.g. product name and price) and explicit labels, which describe some of these data items. Recent advances in extraction techniques have made it much easier to precisely extract individual data items and labels from a web page, however, there are two open problems: 1. assigning an explicit label to a data item, and 2. determining labels for the remaining data items. Furthermore, improvements in the availability and coverage of vocabularies, especially in the context of e-Commerce web sites, means that we now have access to a bank of relevant, meaningful and shared labels which can be assigned to extracted data items. However, there is a need for a technique which will take as input a set of extracted data items and assign automatically to them the most relevant and meaningful labels from a shared vocabulary. We observe that the Information Extraction (IE) community has developed a great number of techniques which solve problems similar to our own. In this work-in-progress paper we propose our intention to theoretically and experimentally evaluate different IE techniques to ascertain which is most suitable to solve this problem.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Index of names for the Survey of Lands for the Welland Canal Company. The index includes names for people whos lands were surveyed on the line of the canal. The index also includes a very basic geographic location of the lands in reference to the canal (for example, east, west, resevoir. No page numbers are listed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

 Many web servers contain some dangerous pages (we name them eigenpages) that can indicate their vulnerabilities. Therefore, some worms such as Santy locate their targets by searching for these eigenpages in search engines with well-crafted queries. In this paper, we focus on the modeling and containment of these special worms targeting web applications. We propose a containment system based on honey pots. We make search engines randomly insert a few honey pages that will induce visitors to the pre-established honey pots among the search results for the arriving queries. And then infectious can be detected and reported to the search engines when their malicious scans hit the honey pots. We find that the Santy worm can be well stopped by inserting no more than two honey pages in every one hundred search results. We also solve the challenging issue to dynamically generate matching honey pages for those dynamically arriving queries. Finally, a prototype is implemented to prove the technical feasibility of this system. © 2013 by CESER Publications.