988 resultados para Search procedures


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Segmentation is a data mining technique yielding simplified representations of sequences of ordered points. A sequence is divided into some number of homogeneous blocks, and all points within a segment are described by a single value. The focus in this thesis is on piecewise-constant segments, where the most likely description for each segment and the most likely segmentation into some number of blocks can be computed efficiently. Representing sequences as segmentations is useful in, e.g., storage and indexing tasks in sequence databases, and segmentation can be used as a tool in learning about the structure of a given sequence. The discussion in this thesis begins with basic questions related to segmentation analysis, such as choosing the number of segments, and evaluating the obtained segmentations. Standard model selection techniques are shown to perform well for the sequence segmentation task. Segmentation evaluation is proposed with respect to a known segmentation structure. Applying segmentation on certain features of a sequence is shown to yield segmentations that are significantly close to the known underlying structure. Two extensions to the basic segmentation framework are introduced: unimodal segmentation and basis segmentation. The former is concerned with segmentations where the segment descriptions first increase and then decrease, and the latter with the interplay between different dimensions and segments in the sequence. These problems are formally defined and algorithms for solving them are provided and analyzed. Practical applications for segmentation techniques include time series and data stream analysis, text analysis, and biological sequence analysis. In this thesis segmentation applications are demonstrated in analyzing genomic sequences.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

XML documents are becoming more and more common in various environments. In particular, enterprise-scale document management is commonly centred around XML, and desktop applications as well as online document collections are soon to follow. The growing number of XML documents increases the importance of appropriate indexing methods and search tools in keeping the information accessible. Therefore, we focus on content that is stored in XML format as we develop such indexing methods. Because XML is used for different kinds of content ranging all the way from records of data fields to narrative full-texts, the methods for Information Retrieval are facing a new challenge in identifying which content is subject to data queries and which should be indexed for full-text search. In response to this challenge, we analyse the relation of character content and XML tags in XML documents in order to separate the full-text from data. As a result, we are able to both reduce the size of the index by 5-6\% and improve the retrieval precision as we select the XML fragments to be indexed. Besides being challenging, XML comes with many unexplored opportunities which are not paid much attention in the literature. For example, authors often tag the content they want to emphasise by using a typeface that stands out. The tagged content constitutes phrases that are descriptive of the content and useful for full-text search. They are simple to detect in XML documents, but also possible to confuse with other inline-level text. Nonetheless, the search results seem to improve when the detected phrases are given additional weight in the index. Similar improvements are reported when related content is associated with the indexed full-text including titles, captions, and references. Experimental results show that for certain types of document collections, at least, the proposed methods help us find the relevant answers. Even when we know nothing about the document structure but the XML syntax, we are able to take advantage of the XML structure when the content is indexed for full-text search.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Analyzing statistical dependencies is a fundamental problem in all empirical science. Dependencies help us understand causes and effects, create new scientific theories, and invent cures to problems. Nowadays, large amounts of data is available, but efficient computational tools for analyzing the data are missing. In this research, we develop efficient algorithms for a commonly occurring search problem - searching for the statistically most significant dependency rules in binary data. We consider dependency rules of the form X->A or X->not A, where X is a set of positive-valued attributes and A is a single attribute. Such rules describe which factors either increase or decrease the probability of the consequent A. A classical example are genetic and environmental factors, which can either cause or prevent a disease. The emphasis in this research is that the discovered dependencies should be genuine - i.e. they should also hold in future data. This is an important distinction from the traditional association rules, which - in spite of their name and a similar appearance to dependency rules - do not necessarily represent statistical dependencies at all or represent only spurious connections, which occur by chance. Therefore, the principal objective is to search for the rules with statistical significance measures. Another important objective is to search for only non-redundant rules, which express the real causes of dependence, without any occasional extra factors. The extra factors do not add any new information on the dependence, but can only blur it and make it less accurate in future data. The problem is computationally very demanding, because the number of all possible rules increases exponentially with the number of attributes. In addition, neither the statistical dependency nor the statistical significance are monotonic properties, which means that the traditional pruning techniques do not work. As a solution, we first derive the mathematical basis for pruning the search space with any well-behaving statistical significance measures. The mathematical theory is complemented by a new algorithmic invention, which enables an efficient search without any heuristic restrictions. The resulting algorithm can be used to search for both positive and negative dependencies with any commonly used statistical measures, like Fisher's exact test, the chi-squared measure, mutual information, and z scores. According to our experiments, the algorithm is well-scalable, especially with Fisher's exact test. It can easily handle even the densest data sets with 10000-20000 attributes. Still, the results are globally optimal, which is a remarkable improvement over the existing solutions. In practice, this means that the user does not have to worry whether the dependencies hold in future data or if the data still contains better, but undiscovered dependencies.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Progress in crop improvement is limited by the ability to identify favourable combinations of genotypes (G) and management practices (M) in relevant target environments (E) given the resources available to search among the myriad of possible combinations. To underpin yield advance we require prediction of phenotype based on genotype. In plant breeding, traditional phenotypic selection methods have involved measuring phenotypic performance of large segregating populations in multi-environment trials and applying rigorous statistical procedures based on quantitative genetic theory to identify superior individuals. Recent developments in the ability to inexpensively and densely map/sequence genomes have facilitated a shift from the level of the individual (genotype) to the level of the genomic region. Molecular breeding strategies using genome wide prediction and genomic selection approaches have developed rapidly. However, their applicability to complex traits remains constrained by gene-gene and gene-environment interactions, which restrict the predictive power of associations of genomic regions with phenotypic responses. Here it is argued that crop ecophysiology and functional whole plant modelling can provide an effective link between molecular and organism scales and enhance molecular breeding by adding value to genetic prediction approaches. A physiological framework that facilitates dissection and modelling of complex traits can inform phenotyping methods for marker/gene detection and underpin prediction of likely phenotypic consequences of trait and genetic variation in target environments. This approach holds considerable promise for more effectively linking genotype to phenotype for complex adaptive traits. Specific examples focused on drought adaptation are presented to highlight the concepts.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A simple yet efficient method for the minimization of incompletely specified sequential machines (ISSMs) is proposed. Precise theorems are developed, as a consequence of which several compatibles can be deleted from consideration at the very first stage in the search for a minimal closed cover. Thus, the computational work is significantly reduced. Initial cardinality of the minimal closed cover is further reduced by a consideration of the maximal compatibles (MC's) only; as a result the method converges to the solution faster than the existing procedures. "Rank" of a compatible is defined. It is shown that ordering the compatibles, in accordance with their rank, reduces the number of comparisons to be made in the search for exclusion of compatibles. The new method is simple, systematic, and programmable. It does not involve any heuristics or intuitive procedures. For small- and medium-sized machines, it canle used for hand computation as well. For one of the illustrative examples used in this paper, 30 out of 40 compatibles can be ignored in accordance with the proposed rules and the remaining 10 compatibles only need be considered for obtaining a minimal solution.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A method that yields optical Barker codes of smallest known lengths for given discrimination is described.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The following problem is considered. Given the locations of the Central Processing Unit (ar;the terminals which have to communicate with it, to determine the number and locations of the concentrators and to assign the terminals to the concentrators in such a way that the total cost is minimized. There is alao a fixed cost associated with each concentrator. There is ail upper limit to the number of terminals which can be connected to a concentrator. The terminals can be connected directly to the CPU also In this paper it is assumed that the concentrators can bo located anywhere in the area A containing the CPU and the terminals. Then this becomes a multimodal optimization problem. In the proposed algorithm a stochastic automaton is used as a search device to locate the minimum of the multimodal cost function . The proposed algorithm involves the following. The area A containing the CPU and the terminals is divided into an arbitrary number of regions (say K). An approximate value for the number of concentrators is assumed (say m). The optimum number is determined by iteration later The m concentrators can be assigned to the K regions in (mk) ways (m > K) or (km) ways (K>m).(All possible assignments are feasible, i.e. a region can contain 0,1,…, to concentrators). Each possible assignment is assumed to represent a state of the stochastic variable structure automaton. To start with, all the states are assigned equal probabilities. At each stage of the search the automaton visits a state according to the current probability distribution. At each visit the automaton selects a 'point' inside that state with uniform probability. The cost associated with that point is calculated and the average cost of that state is updated. Then the probabilities of all the states are updated. The probabilities are taken to bo inversely proportional to the average cost of the states After a certain number of searches the search probabilities become stationary and the automaton visits a particular state again and again. Then the automaton is said to have converged to that state Then by conducting a local gradient search within that state the exact locations of the concentrators are determined This algorithm was applied to a set of test problems and the results were compared with those given by Cooper's (1964, 1967) EAC algorithm and on the average it was found that the proposed algorithm performs better.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Current smartphones have a storage capacity of several gigabytes. More and more information is stored on mobile devices. To meet the challenge of information organization, we turn to desktop search. Users often possess multiple devices, and synchronize (subsets of) information between them. This makes file synchronization more important. This thesis presents Dessy, a desktop search and synchronization framework for mobile devices. Dessy uses desktop search techniques, such as indexing, query and index term stemming, and search relevance ranking. Dessy finds files by their content, metadata, and context information. For example, PDF files may be found by their author, subject, title, or text. EXIF data of JPEG files may be used in finding them. User–defined tags can be added to files to organize and retrieve them later. Retrieved files are ranked according to their relevance to the search query. The Dessy prototype uses the BM25 ranking function, used widely in information retrieval. Dessy provides an interface for locating files for both users and applications. Dessy is closely integrated with the Syxaw file synchronizer, which provides efficient file and metadata synchronization, optimizing network usage. Dessy supports synchronization of search results, individual files, and directory trees. It allows finding and synchronizing files that reside on remote computers, or the Internet. Dessy is designed to solve the problem of efficient mobile desktop search and synchronization, also supporting remote and Internet search. Remote searches may be carried out offline using a downloaded index, or while connected to the remote machine on a weak network. To secure user data, transmissions between the Dessy client and server are encrypted using symmetric encryption. Symmetric encryption keys are exchanged with RSA key exchange. Dessy emphasizes extensibility. Also the cryptography can be extended. Users may tag their files with context tags and control custom file metadata. Adding new indexed file types, metadata fields, ranking methods, and index types is easy. Finding files is done with virtual directories, which are views into the user’s files, browseable by regular file managers. On mobile devices, the Dessy GUI provides easy access to the search and synchronization system. This thesis includes results of Dessy synchronization and search experiments, including power usage measurements. Finally, Dessy has been designed with mobility and device constraints in mind. It requires only MIDP 2.0 Mobile Java with FileConnection support, and Java 1.5 on desktop machines.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We explore how a standardization effort (i.e., when a firm pursues standards to further innovation) involves different search processes for knowledge and innovation outcomes. Using an inductive case study of Vanke, a leading Chinese property developer, we show how varying degrees of knowledge complexity and codification combine to produce a typology of four types of search process: active, integrative, decentralized and passive, resulting in four types of innovation outcome: modular, radical, incremental and architectural. We argue that when the standardization effort in a firm involves highly codified knowledge, incremental and architectural innovation outcomes are fostered, while modular and radical innovations are hindered. We discuss how standardization efforts can result in a second-order innovation capability, and conclude by calling for comparative research in other settings to understand how standardization efforts can be suited to different types of search process in different industry contexts.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Austria and Finland are persistently referred to as the “success stories” of post-1945 European history. Notwithstanding their different points of departure, in the course of the Cold War both countries portrayed themselves as small and neutral border-states in the world dictated by superpower politics. By the 1970s, both countries frequently ranked at the top end in various international classifications regarding economic development and well-being in society. This trend continues today. The study takes under scrutiny the concept of consensus which figures centrally in the two national narratives of post-1945 success. Given that the two domestic contexts as such only share few direct links with one another and are more obviously different than similar in terms of their geographical location, historical experiences and politico-cultural traditions, the analogies and variations in the anatomies of the post-1945 “cultures of consensus” provide an interesting topic for a historical comparative and cross-national examination. The main research question concerns the identification and analysis of the conceptual and procedural convergence points of the concepts of the state and consensus. The thesis is divided into six main chapters. After the introduction, the second chapter presents the theoretical framework in more detail by focusing on the key concepts of the study – the state and consensus. Chapter two also introduces the comparative historical and cross-national research angles. Chapter three grounds the key concepts of the state and consensus in the historical contexts of Austria and Finland by discussing the state, the nation and democracy in a longer term comparative perspective. The fourth and fifth chapter present case studies on the two policy fields, the “pillars”, upon which the post-1945 Austrian and Finnish cultures of consensus are argued to have rested. Chapter four deals with neo-corporatist features in the economic policy making and chapter five discusses the building up of domestic consensus regarding the key concepts of neutrality policies in the 1950s and 1960s. The study concludes that it was not consensus as such but the strikingly intense preoccupation with the theme of domestic consensus that cross-cut, in a curiously analogous manner, the policy-making processes studied. The main challenge for the post-1945 architects of Austrian and Finnish cultures of consensus was to find strategies and concepts for consensus-building which would be compatible with the principles of democracy. Discussed at the level of procedures, the most important finding of the study concerns the triangular mechanism of coordination, consultation and cooperation that set into motion and facilitated a new type of search for consensus in both post-war societies. In this triangle, the agency of the state was central, though in varying ways. The new conceptions concerning a small state’s position in the Cold War world also prompted cross-nationally perceivable willingness to reconsider inherited concepts and procedures of the state and the nation. At the same time, the ways of understanding the role of the state and its relation to society remained profoundly different in Austria and Finland and this basic difference was in many ways reflected in the concepts and procedures deployed in the search for consensus and management of domestic conflicts. For more detailed information, please consult the author.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we first describe a framework to model the sponsored search auction on the web as a mechanism design problem. Using this framework, we describe two well-known mechanisms for sponsored search auction-Generalized Second Price (GSP) and Vickrey-Clarke-Groves (VCG). We then derive a new mechanism for sponsored search auction which we call optimal (OPT) mechanism. The OPT mechanism maximizes the search engine's expected revenue, while achieving Bayesian incentive compatibility and individual rationality of the advertisers. We then undertake a detailed comparative study of the mechanisms GSP, VCG, and OPT. We compute and compare the expected revenue earned by the search engine under the three mechanisms when the advertisers are symmetric and some special conditions are satisfied. We also compare the three mechanisms in terms of incentive compatibility, individual rationality, and computational complexity. Note to Practitioners-The advertiser-supported web site is one of the successful business models in the emerging web landscape. When an Internet user enters a keyword (i.e., a search phrase) into a search engine, the user gets back a page with results, containing the links most relevant to the query and also sponsored links, (also called paid advertisement links). When a sponsored link is clicked, the user is directed to the corresponding advertiser's web page. The advertiser pays the search engine in some appropriate manner for sending the user to its web page. Against every search performed by any user on any keyword, the search engine faces the problem of matching a set of advertisers to the sponsored slots. In addition, the search engine also needs to decide on a price to be charged to each advertiser. Due to increasing demands for Internet advertising space, most search engines currently use auction mechanisms for this purpose. These are called sponsored search auctions. A significant percentage of the revenue of Internet giants such as Google, Yahoo!, MSN, etc., comes from sponsored search auctions. In this paper, we study two auction mechanisms, GSP and VCG, which are quite popular in the sponsored auction context, and pursue the objective of designing a mechanism that is superior to these two mechanisms. In particular, we propose a new mechanism which we call the OPT mechanism. This mechanism maximizes the search engine's expected revenue subject to achieving Bayesian incentive compatibility and individual rationality. Bayesian incentive compatibility guarantees that it is optimal for each advertiser to bid his/her true value provided that all other agents also bid their respective true values. Individual rationality ensures that the agents participate voluntarily in the auction since they are assured of gaining a non-negative payoff by doing so.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

For many, particularly in the Anglophone world and Western Europe, it may be obvious that Google has a monopoly over online search and advertising and that this is an undesirable state of affairs, due to Google's ability to mediate information flows online. The baffling question may be why governments and regulators are doing little to nothing about this situation, given the increasingly pivotal importance of the internet and free flowing communications in our lives. However, the law concerning monopolies, namely antitrust or competition law, works in what may be seen as a less intuitive way by the general public. Monopolies themselves are not illegal. Conduct that is unlawful, i.e. abuses of that market power, is defined by a complex set of rules and revolves principally around economic harm suffered due to anticompetitive behavior. However the effect of information monopolies over search, such as Google’s, is more than just economic, yet competition law does not address this. Furthermore, Google’s collection and analysis of user data and its portfolio of related services make it difficult for others to compete. Such a situation may also explain why Google’s established search rivals, Bing and Yahoo, have not managed to provide services that are as effective or popular as Google’s own (on this issue see also the texts by Dirk Lewandowski and Astrid Mager in this reader). Users, however, are not entirely powerless. Google's business model rests, at least partially, on them – especially the data collected about them. If they stop using Google, then Google is nothing.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Because of limited sensor and communication ranges, designing efficient mechanisms for cooperative tasks is difficult. In this article, several negotiation schemes for multiple agents performing a cooperative task are presented. The negotiation schemes provide suboptimal solutions, but have attractive features of fast decision-making, and scalability to large number of agents without increasing the complexity of the algorithm. A software agent architecture of the decision-making process is also presented. The effect of the magnitude of information flow during the negotiation process is studied by using different models of the negotiation scheme. The performance of the various negotiation schemes, using different information structures, is studied based on the uncertainty reduction achieved for a specified number of search steps. The negotiation schemes perform comparable to that of optimal strategy in terms of uncertainty reduction and also require very low computational time, similar to 7 per cent to that of optimal strategy. Finally, analysis on computational and communication requirement for the negotiation schemes is carried out.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Site-specific geotechnical data are always random and variable in space. In the present study, a procedure for quantifying the variability in geotechnical characterization and design parameters is discussed using the site-specific cone tip resistance data (qc) obtained from static cone penetration test (SCPT). The parameters for the spatial variability modeling of geotechnical parameters i.e. (i) existing trend function in the in situ qc data; (ii) second moment statistics i.e. analysis of mean, variance, and auto-correlation structure of the soil strength and stiffness parameters; and (iii) inputs from the spatial correlation analysis, are utilized in the numerical modeling procedures using the finite difference numerical code FLAC 5.0. The influence of consideration of spatially variable soil parameters on the reliability-based geotechnical deign is studied for the two cases i.e. (a) bearing capacity analysis of a shallow foundation resting on a clayey soil, and (b) analysis of stability and deformation pattern of a cohesive-frictional soil slope. The study highlights the procedure for conducting a site-specific study using field test data such as SCPT in geotechnical analysis and demonstrates that a few additional computations involving soil variability provide a better insight into the role of variability in designs.