992 resultados para K-NN query
Resumo:
Abstract. The paper presents a list of 437 verbs which have not been recorded in lexicography. As the source of reference, the author consulted a spelling dictionary of the Polish language, Wielki słownik ortograficzny PWN, 2nd edition, 2006. The concept of this paper originated from the wish to satisfy the curiosity after reading Indeks neologizmów [Index of Neologisms] prepared by Krystyna Waszakowa in her work Przejawy internacjonalizacji w słowotwórstwie współczesnej polszczyzny [Word-formative internationalisation processes in modern Polish]. The index contains a list of nouns. Given that K. Waszakowa did not take verbs into account (there are far (?) fewer neo-verbs than neo-nouns), the author decided to find out whether it is true that the number of verb neologisms is so small that their philological analysis is pointless from the point of view of research, vocabulary registration, etc. If nouns, such as podczłowiek, miniokupacja, redefinicja, are of interest, why not record the prefixal constructions of the do-, z-, od-, na-, w-, wy-, za-, od-, nad- etc. -robić type? The analysis included randomly selected texts from the „Rzeczpospolita” daily (without any thorough preparation with respect to the content; the texts available were sequentially analysed until the satisfactory result was obtained). The texts under review included an incomplete (it is virtually impossible to determine completeness in this case) electronic archive from the years 1993–2006.
Resumo:
This article is a continuation of a four-piece work describing the condition of soil in what has been broadly defined as central Poznań. This article presents the content of active forms of six chemical elements which tend to be absorbed by plants in biggest quantities. Relations between these chemical elements are discussed and indications are made of how to counteract the negative effects of deficits as well as overdoses of specific chemical elements in the substrate.
Resumo:
http://www.archive.org/details/atseandinportorl00hineiala
Resumo:
http://www.archive.org/details/africanmissionar00kummuoft
Resumo:
ImageRover is a search by image content navigation tool for the world wide web. To gather images expediently, the image collection subsystem utilizes a distributed fleet of WWW robots running on different computers. The image robots gather information about the images they find, computing the appropriate image decompositions and indices, and store this extracted information in vector form for searches based on image content. At search time, users can iteratively guide the search through the selection of relevant examples. Search performance is made efficient through the use of an approximate, optimized k-d tree algorithm. The system employs a novel relevance feedback algorithm that selects the distance metrics appropriate for a particular query.
Resumo:
Large probabilistic graphs arise in various domains spanning from social networks to biological and communication networks. An important query in these graphs is the k nearest-neighbor query, which involves finding and reporting the k closest nodes to a specific node. This query assumes the existence of a measure of the "proximity" or the "distance" between any two nodes in the graph. To that end, we propose various novel distance functions that extend well known notions of classical graph theory, such as shortest paths and random walks. We argue that many meaningful distance functions are computationally intractable to compute exactly. Thus, in order to process nearest-neighbor queries, we resort to Monte Carlo sampling and exploit novel graph-transformation ideas and pruning opportunities. In our extensive experimental analysis, we explore the trade-offs of our approximation algorithms and demonstrate that they scale well on real-world probabilistic graphs with tens of millions of edges.
Resumo:
Research on the construction of logical overlay networks has gained significance in recent times. This is partly due to work on peer-to-peer (P2P) systems for locating and retrieving distributed data objects, and also scalable content distribution using end-system multicast techniques. However, there are emerging applications that require the real-time transport of data from various sources to potentially many thousands of subscribers, each having their own quality-of-service (QoS) constraints. This paper primarily focuses on the properties of two popular topologies found in interconnection networks, namely k-ary n-cubes and de Bruijn graphs. The regular structure of these graph topologies makes them easier to analyze and determine possible routes for real-time data than complete or irregular graphs. We show how these overlay topologies compare in their ability to deliver data according to the QoS constraints of many subscribers, each receiving data from specific publishing hosts. Comparisons are drawn on the ability of each topology to route data in the presence of dynamic system effects, due to end-hosts joining and departing the system. Finally, experimental results show the service guarantees and physical link stress resulting from efficient multicast trees constructed over both kinds of overlay networks.
Resumo:
A common problem in many types of databases is retrieving the most similar matches to a query object. Finding those matches in a large database can be too slow to be practical, especially in domains where objects are compared using computationally expensive similarity (or distance) measures. This paper proposes a novel method for approximate nearest neighbor retrieval in such spaces. Our method is embedding-based, meaning that it constructs a function that maps objects into a real vector space. The mapping preserves a large amount of the proximity structure of the original space, and it can be used to rapidly obtain a short list of likely matches to the query. The main novelty of our method is that it constructs, together with the embedding, a query-sensitive distance measure that should be used when measuring distances in the vector space. The term "query-sensitive" means that the distance measure changes depending on the current query object. We report experiments with an image database of handwritten digits, and a time-series database. In both cases, the proposed method outperforms existing state-of-the-art embedding methods, meaning that it provides significantly better trade-offs between efficiency and retrieval accuracy.
Resumo:
Personal communication devices are increasingly equipped with sensors that are able to collect and locally store information from their environs. The mobility of users carrying such devices, and hence the mobility of sensor readings in space and time, opens new horizons for interesting applications. In particular, we envision a system in which the collective sensing, storage and communication resources, and mobility of these devices could be leveraged to query the state of (possibly remote) neighborhoods. Such queries would have spatio-temporal constraints which must be met for the query answers to be useful. Using a simplified mobility model, we analytically quantify the benefits from cooperation (in terms of the system's ability to satisfy spatio-temporal constraints), which we show to go beyond simple space-time tradeoffs. In managing the limited storage resources of such cooperative systems, the goal should be to minimize the number of unsatisfiable spatio-temporal constraints. We show that Data Centric Storage (DCS), or "directed placement", is a viable approach for achieving this goal, but only when the underlying network is well connected. Alternatively, we propose, "amorphous placement", in which sensory samples are cached locally, and shuffling of cached samples is used to diffuse the sensory data throughout the whole network. We evaluate conditions under which directed versus amorphous placement strategies would be more efficient. These results lead us to propose a hybrid placement strategy, in which the spatio-temporal constraints associated with a sensory data type determine the most appropriate placement strategy for that data type. We perform an extensive simulation study to evaluate the performance of directed, amorphous, and hybrid placement protocols when applied to queries that are subject to timing constraints. Our results show that, directed placement is better for queries with moderately tight deadlines, whereas amorphous placement is better for queries with looser deadlines, and that under most operational conditions, the hybrid technique gives the best compromise.
Resumo:
PURPOSE: To demonstrate the feasibility of using a knowledge base of prior treatment plans to generate new prostate intensity modulated radiation therapy (IMRT) plans. Each new case would be matched against others in the knowledge base. Once the best match is identified, that clinically approved plan is used to generate the new plan. METHODS: A database of 100 prostate IMRT treatment plans was assembled into an information-theoretic system. An algorithm based on mutual information was implemented to identify similar patient cases by matching 2D beam's eye view projections of contours. Ten randomly selected query cases were each matched with the most similar case from the database of prior clinically approved plans. Treatment parameters from the matched case were used to develop new treatment plans. A comparison of the differences in the dose-volume histograms between the new and the original treatment plans were analyzed. RESULTS: On average, the new knowledge-based plan is capable of achieving very comparable planning target volume coverage as the original plan, to within 2% as evaluated for D98, D95, and D1. Similarly, the dose to the rectum and dose to the bladder are also comparable to the original plan. For the rectum, the mean and standard deviation of the dose percentage differences for D20, D30, and D50 are 1.8% +/- 8.5%, -2.5% +/- 13.9%, and -13.9% +/- 23.6%, respectively. For the bladder, the mean and standard deviation of the dose percentage differences for D20, D30, and D50 are -5.9% +/- 10.8%, -12.2% +/- 14.6%, and -24.9% +/- 21.2%, respectively. A negative percentage difference indicates that the new plan has greater dose sparing as compared to the original plan. CONCLUSIONS: The authors demonstrate a knowledge-based approach of using prior clinically approved treatment plans to generate clinically acceptable treatment plans of high quality. This semiautomated approach has the potential to improve the efficiency of the treatment planning process while ensuring that high quality plans are developed.
Resumo:
BACKGROUND: Scientists rarely reuse expert knowledge of phylogeny, in spite of years of effort to assemble a great "Tree of Life" (ToL). A notable exception involves the use of Phylomatic, which provides tools to generate custom phylogenies from a large, pre-computed, expert phylogeny of plant taxa. This suggests great potential for a more generalized system that, starting with a query consisting of a list of any known species, would rectify non-standard names, identify expert phylogenies containing the implicated taxa, prune away unneeded parts, and supply branch lengths and annotations, resulting in a custom phylogeny suited to the user's needs. Such a system could become a sustainable community resource if implemented as a distributed system of loosely coupled parts that interact through clearly defined interfaces. RESULTS: With the aim of building such a "phylotastic" system, the NESCent Hackathons, Interoperability, Phylogenies (HIP) working group recruited 2 dozen scientist-programmers to a weeklong programming hackathon in June 2012. During the hackathon (and a three-month follow-up period), 5 teams produced designs, implementations, documentation, presentations, and tests including: (1) a generalized scheme for integrating components; (2) proof-of-concept pruners and controllers; (3) a meta-API for taxonomic name resolution services; (4) a system for storing, finding, and retrieving phylogenies using semantic web technologies for data exchange, storage, and querying; (5) an innovative new service, DateLife.org, which synthesizes pre-computed, time-calibrated phylogenies to assign ages to nodes; and (6) demonstration projects. These outcomes are accessible via a public code repository (GitHub.com), a website (http://www.phylotastic.org), and a server image. CONCLUSIONS: Approximately 9 person-months of effort (centered on a software development hackathon) resulted in the design and implementation of proof-of-concept software for 4 core phylotastic components, 3 controllers, and 3 end-user demonstration tools. While these products have substantial limitations, they suggest considerable potential for a distributed system that makes phylogenetic knowledge readily accessible in computable form. Widespread use of phylotastic systems will create an electronic marketplace for sharing phylogenetic knowledge that will spur innovation in other areas of the ToL enterprise, such as annotation of sources and methods and third-party methods of quality assessment.
Resumo:
Our media is saturated with claims of ``facts'' made from data. Database research has in the past focused on how to answer queries, but has not devoted much attention to discerning more subtle qualities of the resulting claims, e.g., is a claim ``cherry-picking''? This paper proposes a Query Response Surface (QRS) based framework that models claims based on structured data as parameterized queries. A key insight is that we can learn a lot about a claim by perturbing its parameters and seeing how its conclusion changes. This framework lets us formulate and tackle practical fact-checking tasks --- reverse-engineering vague claims, and countering questionable claims --- as computational problems. Within the QRS based framework, we take one step further, and propose a problem along with efficient algorithms for finding high-quality claims of a given form from data, i.e. raising good questions, in the first place. This is achieved to using a limited number of high-valued claims to represent high-valued regions of the QRS. Besides the general purpose high-quality claim finding problem, lead-finding can be tailored towards specific claim quality measures, also defined within the QRS framework. An example of uniqueness-based lead-finding is presented for ``one-of-the-few'' claims, landing in interpretable high-quality claims, and an adjustable mechanism for ranking objects, e.g. NBA players, based on what claims can be made for them. Finally, we study the use of visualization as a powerful way of conveying results of a large number of claims. An efficient two stage sampling algorithm is proposed for generating input of 2d scatter plot with heatmap, evalutaing a limited amount of data, while preserving the two essential visual features, namely outliers and clusters. For all the problems, we present real-world examples and experiments that demonstrate the power of our model, efficiency of our algorithms, and usefulness of their results.
Resumo:
info:eu-repo/semantics/published
Resumo:
La mancha ojo de rana (MOR), causada por Cercospora sojina K. Hara, se encuentra dentro del complejo de enfermedades fúngicas de fin de ciclo (EFC), que afectan al cultivo de soja (Glycine max (L.) Merr.) preferentemente en los estadíos reproductivos. Es una enfermedad común en la soja en la mayor parte de las regiones sojeras del mundo, incluyendo Argentina. Con los objetivos de (i) estudiar el efecto de MOR sobre rendimiento y calidad industrial del grano de soja y (ii) analizar la respuesta a distintos momentos de aplicación de estrobilurina+triazol durante el ciclo reproductivo del cultivo para disminuir el daño de esta enfermedad evitando pérdidas del rendimiento y de calidad del grano, se realizó un ensayo durante la campaña agrícola 2010-2011, en la localidad de Oncativo (Cordoba), que involucró 2 genotipos contrastantes en susceptibilidad a MOR. Se realizaron para cada cultivar cuatro tratamientos con una mezcla de triazol y estrobilurina: T1= aplicación en R3; T2= aplicación en R3+R5; T3= testigo enfermo (sin aplicación de fungicida) y T4= testigo sano (aplicacion cada 20 dias a partir de R1). Se determinó el nivel de incidencia y severidad de la enfermedad, rendimiento, número de grano por m2 (NGm2), peso de mil semillas (PMA), concentración de proteína (Pr) y aceite (Ac) del grano. Se observaron distintos niveles de severidad de MOR entre los tratamientos del cultivar DM3700. Todos los tratamientos que involucraron a DM3810 mostraron hojas sin manchas de MOR. Se encontró una correlación negativa entre la severidad de MOR y el rendimiento alcanzado por el cultivo. Los valores más elevados de severidad (37,3 por ciento de manchas foliares) se correspondieron con los rendimientos mas bajos (2117 kg ha-1) en el cultivar DM3700. Todos los tratamientos aplicados a DM3810 exhibieron los rendimientos mas elevados y no se diferenciaron estadisticamente entre si (3478 kg ha-1). DM3700 alcanzó rendimientos similares a DM3810 sólo cuando se aplicó la mezcla de fungicidas en T1 y T4, no diferenciándose entre si ambos tratamientos y mostrando un incremento en el rendimiento de aproximadamente 35,5 por ciento comparado con su testigo enfermo. Ambos componentes del rendimiento disminuyeron en el testigo enfermo DM3700; sin embargo, las reducciones en el NGm2 fueron más pronunciadas que las del PMAj y; por lo tanto, disminuciones en el NGm2 parecían ser el principal componente de la pérdida de rendimiento. No se observaron incrementos de rendimiento, NGm2 y/o PMAj atribuídos a la mezcla estrobilurinas + triazoles en los tratamientos que involucraron a DM3810 ante ausencia de MOR. La severidad máxima alcanzada por MOR en el genotipo mas susceptible (37,3 por ciento) de area foliar dañada en el tratamiento enfermo de DM3700) no fue suficiente como para causar disminución significativa en la acumulación del aceite. Según nuestro conocimiento este trabajo es el primero en demostrar que genotipos susceptibles a MOR disminuyen su rendimiento, pero no la calidad química del grano, si la severidad es . 37,3 por cientoy pueden ser controlados con una única aplicación de estrobilurina+triazol en R3. Concluyéndose que la elección de genotipos poco susceptibles a MOR (como DM3810) es efectiva para controlar C. sojina, sin beneficios adicionales en el rendimiento debidos a la aplicacion de fungicidas.
Resumo:
p.317-323