882 resultados para content discovery and ranking
Resumo:
Collections of biological specimens are fundamental to scientific understanding and characterization of natural diversity - past, present and future. This paper presents a system for liberating useful information from physical collections by bringing specimens into the digital domain so they can be more readily shared, analyzed, annotated and compared. It focuses on insects and is strongly motivated by the desire to accelerate and augment current practices in insect taxonomy which predominantly use text, 2D diagrams and images to describe and characterize species. While these traditional kinds of descriptions are informative and useful, they cannot cover insect specimens "from all angles" and precious specimens are still exchanged between researchers and collections for this reason. Furthermore, insects can be complex in structure and pose many challenges to computer vision systems. We present a new prototype for a practical, cost-effective system of off-the-shelf components to acquire natural-colour 3D models of insects from around 3 mm to 30 mm in length. ("Natural-colour" is used to contrast with "false-colour", i.e., colour generated from, or applied to, gray-scale data post-acquisition.) Colour images are captured from different angles and focal depths using a digital single lens reflex (DSLR) camera rig and two-axis turntable. These 2D images are processed into 3D reconstructions using software based on a visual hull algorithm. The resulting models are compact (around 10 megabytes), afford excellent optical resolution, and can be readily embedded into documents and web pages, as well as viewed on mobile devices. The system is portable, safe, relatively affordable, and complements the sort of volumetric data that can be acquired by computed tomography. This system provides a new way to augment the description and documentation of insect species holotypes, reducing the need to handle or ship specimens. It opens up new opportunities to collect data for research, education, art, entertainment, biodiversity assessment and biosecurity control. © 2014 Nguyen et al.
Resumo:
We implemented least absolute shrinkage and selection operator (LASSO) regression to evaluate gene effects in genome-wide association studies (GWAS) of brain images, using an MRI-derived temporal lobe volume measure from 729 subjects scanned as part of the Alzheimer's Disease Neuroimaging Initiative (ADNI). Sparse groups of SNPs in individual genes were selected by LASSO, which identifies efficient sets of variants influencing the data. These SNPs were considered jointly when assessing their association with neuroimaging measures. We discovered 22 genes that passed genome-wide significance for influencing temporal lobe volume. This was a substantially greater number of significant genes compared to those found with standard, univariate GWAS. These top genes are all expressed in the brain and include genes previously related to brain function or neuropsychiatric disorders such as MACROD2, SORCS2, GRIN2B, MAGI2, NPAS3, CLSTN2, GABRG3, NRXN3, PRKAG2, GAS7, RBFOX1, ADARB2, CHD4, and CDH13. The top genes we identified with this method also displayed significant and widespread post hoc effects on voxelwise, tensor-based morphometry (TBM) maps of the temporal lobes. The most significantly associated gene was an autism susceptibility gene known as MACROD2.We were able to successfully replicate the effect of the MACROD2 gene in an independent cohort of 564 young, Australian healthy adult twins and siblings scanned with MRI (mean age: 23.8±2.2 SD years). Our approach powerfully complements univariate techniques in detecting influences of genes on the living brain.
Resumo:
The caudate is a subcortical brain structure implicated in many common neurological and psychiatric disorders. To identify specific genes associated with variations in caudate volume, structural magnetic resonance imaging and genome-wide genotypes were acquired from two large cohorts, the Alzheimer's Disease NeuroImaging Initiative (ADNI; N=734) and the Brisbane Adolescent/Young Adult Longitudinal Twin Study (BLTS; N=464). In a preliminary analysis of heritability, around 90% of the variation in caudate volume was due to genetic factors. We then conducted genome-wide association to find common variants that contribute to this relatively high heritability. Replicated genetic association was found for the right caudate volume at single-nucleotide polymorphism rs163030 in the ADNI discovery sample (P=2.36 × 10 -6) and in the BLTS replication sample (P=0.012). This genetic variation accounted for 2.79 and 1.61% of the trait variance, respectively. The peak of association was found in and around two genes, WDR41 and PDE8B, involved in dopamine signaling and development. In addition, a previously identified mutation in PDE8B causes a rare autosomal-dominant type of striatal degeneration. Searching across both samples offers a rigorous way to screen for genes consistently influencing brain structure at different stages of life. Variants identified here may be relevant to common disorders affecting the caudate.
Resumo:
The work is based on the assumption that words with similar syntactic usage have similar meaning, which was proposed by Zellig S. Harris (1954,1968). We study his assumption from two aspects: Firstly, different meanings (word senses) of a word should manifest themselves in different usages (contexts), and secondly, similar usages (contexts) should lead to similar meanings (word senses). If we start with the different meanings of a word, we should be able to find distinct contexts for the meanings in text corpora. We separate the meanings by grouping and labeling contexts in an unsupervised or weakly supervised manner (Publication 1, 2 and 3). We are confronted with the question of how best to represent contexts in order to induce effective classifiers of contexts, because differences in context are the only means we have to separate word senses. If we start with words in similar contexts, we should be able to discover similarities in meaning. We can do this monolingually or multilingually. In the monolingual material, we find synonyms and other related words in an unsupervised way (Publication 4). In the multilingual material, we ?nd translations by supervised learning of transliterations (Publication 5). In both the monolingual and multilingual case, we first discover words with similar contexts, i.e., synonym or translation lists. In the monolingual case we also aim at finding structure in the lists by discovering groups of similar words, e.g., synonym sets. In this introduction to the publications of the thesis, we consider the larger background issues of how meaning arises, how it is quantized into word senses, and how it is modeled. We also consider how to define, collect and represent contexts. We discuss how to evaluate the trained context classi?ers and discovered word sense classifications, and ?nally we present the word sense discovery and disambiguation methods of the publications. This work supports Harris' hypothesis by implementing three new methods modeled on his hypothesis. The methods have practical consequences for creating thesauruses and translation dictionaries, e.g., for information retrieval and machine translation purposes. Keywords: Word senses, Context, Evaluation, Word sense disambiguation, Word sense discovery.
Resumo:
Market microstructure is “the study of the trading mechanisms used for financial securities” (Hasbrouck (2007)). It seeks to understand the sources of value and reasons for trade, in a setting with different types of traders, and different private and public information sets. The actual mechanisms of trade are a continually changing object of study. These include continuous markets, auctions, limit order books, dealer markets, or combinations of these operating as a hybrid market. Microstructure also has to allow for the possibility of multiple prices. At any given time an investor may be faced with a multitude of different prices, depending on whether he or she is buying or selling, the quantity he or she wishes to trade, and the required speed for the trade. The price may also depend on the relationship that the trader has with potential counterparties. In this research, I touch upon all of the above issues. I do this by studying three specific areas, all of which have both practical and policy implications. First, I study the role of information in trading and pricing securities in markets with a heterogeneous population of traders, some of whom are informed and some not, and who trade for different private or public reasons. Second, I study the price discovery of stocks in a setting where they are simultaneously traded in more than one market. Third, I make a contribution to the ongoing discussion about market design, i.e. the question of which trading systems and ways of organizing trading are most efficient. A common characteristic throughout my thesis is the use of high frequency datasets, i.e. tick data. These datasets include all trades and quotes in a given security, rather than just the daily closing prices, as in traditional asset pricing literature. This thesis consists of four separate essays. In the first essay I study price discovery for European companies cross-listed in the United States. I also study explanatory variables for differences in price discovery. In my second essay I contribute to earlier research on two issues of broad interest in market microstructure: market transparency and informed trading. I examine the effects of a change to an anonymous market at the OMX Helsinki Stock Exchange. I broaden my focus slightly in the third essay, to include releases of macroeconomic data in the United States. I analyze the effect of these releases on European cross-listed stocks. The fourth and last essay examines the uses of standard methodologies of price discovery analysis in a novel way. Specifically, I study price discovery within one market, between local and foreign traders.
Resumo:
Mobile P2P technology provides a scalable approach for content delivery to a large number of users on their mobile devices. In this work, we study the dissemination of a single item of content (e. g., an item of news, a song or a video clip) among a population of mobile nodes. Each node in the population is either a destination (interested in the content) or a potential relay (not yet interested in the content). There is an interest evolution process by which nodes not yet interested in the content (i.e., relays) can become interested (i.e., become destinations) on learning about the popularity of the content (i.e., the number of already interested nodes). In our work, the interest in the content evolves under the linear threshold model. The content is copied between nodes when they make random contact. For this we employ a controlled epidemic spread model. We model the joint evolution of the copying process and the interest evolution process, and derive joint fluid limit ordinary differential equations. We then study the selection of parameters under the content provider's control, for the optimization of various objective functions that aim at maximizing content popularity and efficient content delivery.
Resumo:
We consider a setting in which a single item of content is disseminated in a population of mobile nodes by opportunistic copying when pairs of nodes come in radio contact. The nodes in the population may either be interested in receiving the content (referred to as destinations) or not yet interested in receiving the content (referred to as relays). We consider a model for the evolution of popularity, the process by which relays get converted into destinations. A key contribution of our work is to model and study the joint evolution of content popularity and its spread in the population. Copying the content to relay nodes is beneficial since they can help spread the content to destinations, and could themselves be converted into destinations. We derive a fluid limit for the joint evolution model and obtain optimal policies for copying to relay nodes in order to deliver content to a desired fraction of destinations, while limiting the fraction of relay nodes that get the content but never turn into destinations. We prove that a time-threshold policy is optimal for controlling the copying to relays, i.e., there is an optimal time-threshold up to which all opportunities for copying to relays are exploited, and after which relays are not copied to. We then utilize simulations and numerical evaluations to provide insights into the effects of various system parameters on the optimally controlled co-evolution model.
Resumo:
Today's programming languages are supported by powerful third-party APIs. For a given application domain, it is common to have many competing APIs that provide similar functionality. Programmer productivity therefore depends heavily on the programmer's ability to discover suitable APIs both during an initial coding phase, as well as during software maintenance. The aim of this work is to support the discovery and migration of math APIs. Math APIs are at the heart of many application domains ranging from machine learning to scientific computations. Our approach, called MATHFINDER, combines executable specifications of mathematical computations with unit tests (operational specifications) of API methods. Given a math expression, MATHFINDER synthesizes pseudo-code comprised of API methods to compute the expression by mining unit tests of the API methods. We present a sequential version of our unit test mining algorithm and also design a more scalable data-parallel version. We perform extensive evaluation of MATHFINDER (1) for API discovery, where math algorithms are to be implemented from scratch and (2) for API migration, where client programs utilizing a math API are to be migrated to another API. We evaluated the precision and recall of MATHFINDER on a diverse collection of math expressions, culled from algorithms used in a wide range of application areas such as control systems and structural dynamics. In a user study to evaluate the productivity gains obtained by using MATHFINDER for API discovery, the programmers who used MATHFINDER finished their programming tasks twice as fast as their counterparts who used the usual techniques like web and code search, IDE code completion, and manual inspection of library documentation. For the problem of API migration, as a case study, we used MATHFINDER to migrate Weka, a popular machine learning library. Overall, our evaluation shows that MATHFINDER is easy to use, provides highly precise results across several math APIs and application domains even with a small number of unit tests per method, and scales to large collections of unit tests.
Resumo:
Service provisioning in assisted living environments faces distinct challenges due to the heterogeneity of networks, access technology, and sensing/actuation devices in such an environment. Existing solutions, such as SOAP-based web services, can interconnect heterogeneous devices and services, and can be published, discovered and invoked dynamically. However, it is considered heavier than what is required in the smart environment-like context and hence suffers from performance degradation. Alternatively, REpresentational State Transfer (REST) has gained much attention from the community and is considered as a lighter and cleaner technology compared to the SOAP-based web services. Since it is simple to publish and use a RESTful web service, more and more service providers are moving toward REST-based solutions, which promote a resource-centric conceptualization as opposed to a service-centric conceptualization. Despite such benefits of REST, the dynamic discovery and eventing of RESTful services are yet considered a major hurdle to utilization of the full potential of REST-based approaches. In this paper, we address this issue, by providing a RESTful discovery and eventing specification and demonstrate it in an assisted living healthcare scenario. We envisage that through this approach, the service provisioning in ambient assisted living or other smart environment settings will be more efficient, timely, and less resource-intensive.