356 resultados para Similarity queries
Resumo:
Currently, recommender systems (RS) have been widely applied in many commercial e-commerce sites to help users deal with the information overload problem. Recommender systems provide personalized recommendations to users and, thus, help in making good decisions about which product to buy from the vast amount of product choices. Many of the current recommender systems are developed for simple and frequently purchased products like books and videos, by using collaborative-filtering and content-based approaches. These approaches are not directly applicable for recommending infrequently purchased products such as cars and houses as it is difficult to collect a large number of ratings data from users for such products. Many of the ecommerce sites for infrequently purchased products are still using basic search-based techniques whereby the products that match with the attributes given in the target user’s query are retrieved and recommended. However, search-based recommenders cannot provide personalized recommendations. For different users, the recommendations will be the same if they provide the same query regardless of any difference in their interest. In this article, a simple user profiling approach is proposed to generate user’s preferences to product attributes (i.e., user profiles) based on user product click stream data. The user profiles can be used to find similarminded users (i.e., neighbours) accurately. Two recommendation approaches are proposed, namely Round- Robin fusion algorithm (CFRRobin) and Collaborative Filtering-based Aggregated Query algorithm (CFAgQuery), to generate personalized recommendations based on the user profiles. Instead of using the target user’s query to search for products as normal search based systems do, the CFRRobin technique uses the attributes of the products in which the target user’s neighbours have shown interest as queries to retrieve relevant products, and then recommends to the target user a list of products by merging and ranking the returned products using the Round Robin method. The CFAgQuery technique uses the attributes of the products that the user’s neighbours have shown interest in to derive an aggregated query, which is then used to retrieve products to recommend to the target user. Experiments conducted on a real e-commerce dataset show that both the proposed techniques CFRRobin and CFAgQuery perform better than the standard Collaborative Filtering and the Basic Search approaches, which are widely applied by the current e-commerce applications.
Resumo:
Anthropomorphism is a cognitive bias, which occurs when individuals see human characteristics in a non-human agent, object or animal. Anthropomorphism is especially interesting to marketers, because once anthropomorphic bias has been triggered, it can lead to a greater feeling of connectedness to a non-human agent (Tam, Lee and Chao, 2013), the emulation of behaviours (Aggarwal and McGill, 2012) or greater attribution of brand personality and brand liking (Delbaere, McQuarrie and Phillips, 2011). Importantly, research now shows that levels of this tendency vary between individuals (Waytz, Cacioppo and Epley, 2010), but research to date has failed to focus on how anthropomorphic tendency influences individual responses to marketing communications messages. Spokes-characters present an ideal context through which to examine this gap, given that they function as personified brands, designed to trigger consumer anthropomorphic tendency. Further, little is understood about how spokes-characters operate and which consumers will prefer them to their human counterparts. Like anthropomorphic research, much empirical work to date has focused on design and outcomes, examining the sender’s encoding process and the feedback generated, but ignoring the individual decoding process that is so important to understanding individual differences and message effectiveness. The current research employs three experiments using an online survey with stimulus exposure to show that anthropomorphic tendency, personality similarity and spokes-character type all have relevance to the understanding of this complex relationship. Study one and two indicate that while a human spokesperson is still preferred by many, higher levels of anthropomorphic tendency increase likeability of cartoon spokes characters. Study three highlights the importance of personality similarity, which further increases likability. Additional analyses provide key findings concerning the nature of anthropomorphic tendency as an individual difference and trait. This research contributes to a greater understanding of anthropomorphism theory and fills existing gaps in the consumer psychology and marketing communications literature.
Resumo:
A novel method was developed for studying the genetic relatedness of Pseudomonas aeruginosa isolates from clinical and environmental sources. This bacterium is ubiquitous in the natural environment and is an important pathogen known to infect Cystic Fibrosis (CF) patients. The transmission route of strains has not yet been defined; current theories include acquisition from an environmental source or through patient-to-patient spread. A highly discriminatory, bioinformatics based, DNA typing method was developed to investigate the relatedness of clinical and environmental isolates. This study found a similarity between the environmental and several CF clonal strains and also highlighted occurrence of environmental P. aeruginosa strains in CF infections.
Resumo:
Entity-oriented retrieval aims to return a list of relevant entities rather than documents to provide exact answers for user queries. The nature of entity-oriented retrieval requires identifying the semantic intent of user queries, i.e., understanding the semantic role of query terms and determining the semantic categories which indicate the class of target entities. Existing methods are not able to exploit the semantic intent by capturing the semantic relationship between terms in a query and in a document that contains entity related information. To improve the understanding of the semantic intent of user queries, we propose concept-based retrieval method that not only automatically identifies the semantic intent of user queries, i.e., Intent Type and Intent Modifier but introduces concepts represented by Wikipedia articles to user queries. We evaluate our proposed method on entity profile documents annotated by concepts from Wikipedia category and list structure. Empirical analysis reveals that the proposed method outperforms several state-of-the-art approaches.
Resumo:
Reliability of the performance of biometric identity verification systems remains a significant challenge. Individual biometric samples of the same person (identity class) are not identical at each presentation and performance degradation arises from intra-class variability and inter-class similarity. These limitations lead to false accepts and false rejects that are dependent. It is therefore difficult to reduce the rate of one type of error without increasing the other. The focus of this dissertation is to investigate a method based on classifier fusion techniques to better control the trade-off between the verification errors using text-dependent speaker verification as the test platform. A sequential classifier fusion architecture that integrates multi-instance and multisample fusion schemes is proposed. This fusion method enables a controlled trade-off between false alarms and false rejects. For statistically independent classifier decisions, analytical expressions for each type of verification error are derived using base classifier performances. As this assumption may not be always valid, these expressions are modified to incorporate the correlation between statistically dependent decisions from clients and impostors. The architecture is empirically evaluated by applying the proposed architecture for text dependent speaker verification using the Hidden Markov Model based digit dependent speaker models in each stage with multiple attempts for each digit utterance. The trade-off between the verification errors is controlled using the parameters, number of decision stages (instances) and the number of attempts at each decision stage (samples), fine-tuned on evaluation/tune set. The statistical validation of the derived expressions for error estimates is evaluated on test data. The performance of the sequential method is further demonstrated to depend on the order of the combination of digits (instances) and the nature of repetitive attempts (samples). The false rejection and false acceptance rates for proposed fusion are estimated using the base classifier performances, the variance in correlation between classifier decisions and the sequence of classifiers with favourable dependence selected using the 'Sequential Error Ratio' criteria. The error rates are better estimated by incorporating user-dependent (such as speaker-dependent thresholds and speaker-specific digit combinations) and class-dependent (such as clientimpostor dependent favourable combinations and class-error based threshold estimation) information. The proposed architecture is desirable in most of the speaker verification applications such as remote authentication, telephone and internet shopping applications. The tuning of parameters - the number of instances and samples - serve both the security and user convenience requirements of speaker-specific verification. The architecture investigated here is applicable to verification using other biometric modalities such as handwriting, fingerprints and key strokes.
Resumo:
Cone-beam computed tomography (CBCT) has enormous potential to improve the accuracy of treatment delivery in image-guided radiotherapy (IGRT). To assist radiotherapists in interpreting these images, we use a Bayesian statistical model to label each voxel according to its tissue type. The rich sources of prior information in IGRT are incorporated into a hidden Markov random field model of the 3D image lattice. Tissue densities in the reference CT scan are estimated using inverse regression and then rescaled to approximate the corresponding CBCT intensity values. The treatment planning contours are combined with published studies of physiological variability to produce a spatial prior distribution for changes in the size, shape and position of the tumour volume and organs at risk. The voxel labels are estimated using iterated conditional modes. The accuracy of the method has been evaluated using 27 CBCT scans of an electron density phantom. The mean voxel-wise misclassification rate was 6.2\%, with Dice similarity coefficient of 0.73 for liver, muscle, breast and adipose tissue. By incorporating prior information, we are able to successfully segment CBCT images. This could be a viable approach for automated, online image analysis in radiotherapy.
Resumo:
Development and application of inorganic adsorbent materials have been continuously investigated due to their variability and versatility. This Master thesis has expanded the knowledge in the field of adsorption targeting radioactive iodine waste and proteins using modified inorganic materials. Industrial treatment of radioactive waste and safety disposal of nuclear waste is a constant concern around the world with the development of radioactive materials applications. To address the current problems, laminar titanate with large surface area (143 m2 g−1) was synthesized from inorganic titanium compounds by hydrothermal reactions at 433 K. Ag2O nanocrystals of particle size ranging from 5–30 nm were anchored on the titanate lamina surface which has crystallographic similarity to that of Ag2O nanocrystals. Therefore, the deposited Ag2O nanocrystals and titanate substrate could join together at these surfaces between which there forms a coherent interface. Such coherence between the two phases reduces the overall energy by minimizing surface energy and maintains the Ag2O nanocrystals firmly on the outer surface of the titanate structure. The combined adsorbent was then applied as efficient adsorbent to remove radioactive iodine from water (one gram adsorbent can capture up to 3.4 mmol of I- anions) and the composite adsorbent can be recovered easily for safe disposal. The structure changes of the titanate lamina and the composite adsorbent were characterized via various techniques. The isotherm and kinetics of iodine adsorption, competitive adsorption and column adsorption using the adsorbent were studied to determine the iodine removal abilities of the adsorbent. It is shown that the adsorbent exhibited excellent trapping ability towards iodine in the fix-bed column despite the presence of competitive ions. Hence, Ag2O deposited titanate lamina could serve as an effective adsorbent for removing iodine from radioactive waste. Surface hydroxyl group of the inorganic materials is widely applied for modification purposes and modification of inorganic materials for biomolecule adsorption can also be achieved. Specifically, γ-Al2O3 nanofibre material is converted via calcinations from boehmite precursor which is synthesised by hydrothermal chemical reactions under directing of surfactant. These γ-Al2O3 nanofibres possess large surface area (243 m2 g-1), good stability under extreme chemical conditions, good mechanical strength and rich surface hydroxyl groups making it an ideal candidate in industrialized separation column. The fibrous morphology of the adsorbent also guarantees facile recovery from aqueous solution under both centrifuge and sedimentation approaches. By chemically bonding the dyes molecules, the charge property of γ-Al2O3 is changed in the aim of selectively capturing of lysozyme from chicken egg white solution. The highest Lysozyme adsorption amount was obtained at around 600 mg/g and its proportion is elevated from around 5% to 69% in chicken egg white solution. It was found from the adsorption test under different solution pH that electrostatic force played the key role in the good selectivity and high adsorption rate of surface modified γ-Al2O3 nanofibre adsorbents. Overall, surface modified fibrous γ-Al2O3 could be applied potentially as an efficient adsorbent for capturing of various biomolecules.
Resumo:
Bioacoustic data can provide an important base for environmental monitoring. To explore a large amount of field recordings collected, an automated similarity search algorithm is presented in this paper. A region of an audio defined by frequency and time bounds is provided by a user; the content of the region is used to construct a query. In the retrieving process, our algorithm will automatically scan through recordings to search for similar regions. In detail, we present a feature extraction approach based on the visual content of vocalisations – in this case ridges, and develop a generic regional representation of vocalisations for indexing. Our feature extraction method works best for bird vocalisations showing ridge characteristics. The regional representation method allows the content of an arbitrary region of a continuous recording to be described in a compressed format.
Resumo:
In most intent recognition studies, annotations of query intent are created post hoc by external assessors who are not the searchers themselves. It is important for the field to get a better understanding of the quality of this process as an approximation for determining the searcher's actual intent. Some studies have investigated the reliability of the query intent annotation process by measuring the interassessor agreement. However, these studies did not measure the validity of the judgments, that is, to what extent the annotations match the searcher's actual intent. In this study, we asked both the searchers themselves and external assessors to classify queries using the same intent classification scheme. We show that of the seven dimensions in our intent classification scheme, four can reliably be used for query annotation. Of these four, only the annotations on the topic and spatial sensitivity dimension are valid when compared with the searcher's annotations. The difference between the interassessor agreement and the assessor-searcher agreement was significant on all dimensions, showing that the agreement between external assessors is not a good estimator of the validity of the intent classifications. Therefore, we encourage the research community to consider using query intent classifications by the searchers themselves as test data.
Resumo:
Over the last decade, the majority of existing search techniques is either keyword- based or category-based, resulting in unsatisfactory effectiveness. Meanwhile, studies have illustrated that more than 80% of users preferred personalized search results. As a result, many studies paid a great deal of efforts (referred to as col- laborative filtering) investigating on personalized notions for enhancing retrieval performance. One of the fundamental yet most challenging steps is to capture precise user information needs. Most Web users are inexperienced or lack the capability to express their needs properly, whereas the existent retrieval systems are highly sensitive to vocabulary. Researchers have increasingly proposed the utilization of ontology-based tech- niques to improve current mining approaches. The related techniques are not only able to refine search intentions among specific generic domains, but also to access new knowledge by tracking semantic relations. In recent years, some researchers have attempted to build ontological user profiles according to discovered user background knowledge. The knowledge is considered to be both global and lo- cal analyses, which aim to produce tailored ontologies by a group of concepts. However, a key problem here that has not been addressed is: how to accurately match diverse local information to universal global knowledge. This research conducts a theoretical study on the use of personalized ontolo- gies to enhance text mining performance. The objective is to understand user information needs by a \bag-of-concepts" rather than \words". The concepts are gathered from a general world knowledge base named the Library of Congress Subject Headings. To return desirable search results, a novel ontology-based mining approach is introduced to discover accurate search intentions and learn personalized ontologies as user profiles. The approach can not only pinpoint users' individual intentions in a rough hierarchical structure, but can also in- terpret their needs by a set of acknowledged concepts. Along with global and local analyses, another solid concept matching approach is carried out to address about the mismatch between local information and world knowledge. Relevance features produced by the Relevance Feature Discovery model, are determined as representatives of local information. These features have been proven as the best alternative for user queries to avoid ambiguity and consistently outperform the features extracted by other filtering models. The two attempt-to-proposed ap- proaches are both evaluated by a scientific evaluation with the standard Reuters Corpus Volume 1 testing set. A comprehensive comparison is made with a num- ber of the state-of-the art baseline models, including TF-IDF, Rocchio, Okapi BM25, the deploying Pattern Taxonomy Model, and an ontology-based model. The gathered results indicate that the top precision can be improved remarkably with the proposed ontology mining approach, where the matching approach is successful and achieves significant improvements in most information filtering measurements. This research contributes to the fields of ontological filtering, user profiling, and knowledge representation. The related outputs are critical when systems are expected to return proper mining results and provide personalized services. The scientific findings have the potential to facilitate the design of advanced preference mining models, where impact on people's daily lives.
Resumo:
Increasing global competition, rapid technological changes, advances in manufacturing and information technology and discerning customers are forcing supply chains to adopt improvement practices that enable them to deliver high quality products at a lower cost and in a shorter period of time. A lean initiative is one of the most effective approaches toward achieving this goal. In the lean improvement process, it is critical to measure current and desired performance level in order to clearly evaluate the lean implementation efforts. Many attempts have tried to measure supply chain performance incorporating both quantitative and qualitative measures but failed to provide an effective method of measuring improvements in performances for dynamic lean supply chain situations. Therefore, the necessity of appropriate measurement of lean supply chain performance has become imperative. There are many lean tools available for supply chains; however, effectiveness of a lean tool depends on the type of the product and supply chain. One tool may be highly effective for a supply chain involved in high volume products but may not be effective for low volume products. There is currently no systematic methodology available for selecting appropriate lean strategies based on the type of supply chain and market strategy This thesis develops an effective method to measure the performance of supply chain consisting of both quantitative and qualitative metrics and investigates the effects of product types and lean tool selection on the supply chain performance Supply chain performance matrices and the effects of various lean tools over performance metrics mentioned in the SCOR framework have been investigated. A lean supply chain model based on the SCOR metric framework is then developed where non- lean and lean as well as quantitative and qualitative metrics are incorporated in appropriate metrics. The values of appropriate metrics are converted into triangular fuzzy numbers using similarity rules and heuristic methods. Data have been collected from an apparel manufacturing company for multiple supply chain products and then a fuzzy based method is applied to measure the performance improvements in supply chains. Using the fuzzy TOPSIS method, which chooses an optimum alternative to maximise similarities with positive ideal solutions and to minimise similarities with negative ideal solutions, the performances of lean and non- lean supply chain situations for three different apparel products have been evaluated. To address the research questions related to effective performance evaluation method and the effects of lean tools over different types of supply chains; a conceptual framework and two hypotheses are investigated. Empirical results show that implementation of lean tools have significant effects over performance improvements in terms of time, quality and flexibility. Fuzzy TOPSIS based method developed is able to integrate multiple supply chain matrices onto a single performance measure while lean supply chain model incorporates qualitative and quantitative metrics. It can therefore effectively measure the improvements for supply chain after implementing lean tools. It is demonstrated that product types involved in the supply chain and ability to select right lean tools have significant effect on lean supply chain performance. Future study can conduct multiple case studies in different contexts.
Resumo:
An Application Specific Instruction-set Processor (ASIP) is a specialized processor tailored to run a particular application/s efficiently. However, when there are multiple candidate applications in the application’s domain it is difficult and time consuming to find optimum set of applications to be implemented. Existing ASIP design approaches perform this selection manually based on a designer’s knowledge. We help in cutting down the number of candidate applications by devising a classification method to cluster similar applications based on the special-purpose operations they share. This provides a significant reduction in the comparison overhead while resulting in customized ASIP instruction sets which can benefit a whole family of related applications. Our method gives users the ability to quantify the degree of similarity between the sets of shared operations to control the size of clusters. A case study involving twelve algorithms confirms that our approach can successfully cluster similar algorithms together based on the similarity of their component operations.
Resumo:
In situ atomic force microscopy (AFM) allows images from the upper face and sides of TCNQ crystals to be monitored during the course of the electrochemical solid–solid state conversion of 50 × 50 μm2 three-dimensional drop cast crystals of TCNQ to CuTCNQ or M[TCNQ]2(H2O)2 (M = Co, Ni). Ex situ images obtained by scanning electron microscopy (SEM) also allow the bottom face of the TCNQ crystals, in contact with the indium tin oxide or gold electrode surface and aqueous metal electrolyte solution, to be examined. Results show that by carefully controlling the reaction conditions, nearly mono-dispersed, rod-like phase I CuTCNQ or M[TCNQ]2(H2O)2 can be achieved on all faces. However, CuTCNQ has two different phases, and the transformation of rod-like phase 1 to rhombic-like phase 2 achieved under conditions of cyclic voltammetry was monitored in situ by AFM. The similarity of in situ AFM results with ex situ SEM studies accomplished previously implies that the morphology of the samples remains unchanged when the solvent environment is removed. In the process of crystal transformation, the triple phase solid∣electrode∣electrolyte junction is confirmed to be the initial nucleation site. Raman spectra and AFM images suggest that 100% interconversion is not always achieved, even after extended electrolysis of large 50 × 50 μm2 TCNQ crystals.
Resumo:
The continuous growth of the XML data poses a great concern in the area of XML data management. The need for processing large amounts of XML data brings complications to many applications, such as information retrieval, data integration and many others. One way of simplifying this problem is to break the massive amount of data into smaller groups by application of clustering techniques. However, XML clustering is an intricate task that may involve the processing of both the structure and the content of XML data in order to identify similar XML data. This research presents four clustering methods, two methods utilizing the structure of XML documents and the other two utilizing both the structure and the content. The two structural clustering methods have different data models. One is based on a path model and other is based on a tree model. These methods employ rigid similarity measures which aim to identifying corresponding elements between documents with different or similar underlying structure. The two clustering methods that utilize both the structural and content information vary in terms of how the structure and content similarity are combined. One clustering method calculates the document similarity by using a linear weighting combination strategy of structure and content similarities. The content similarity in this clustering method is based on a semantic kernel. The other method calculates the distance between documents by a non-linear combination of the structure and content of XML documents using a semantic kernel. Empirical analysis shows that the structure-only clustering method based on the tree model is more scalable than the structure-only clustering method based on the path model as the tree similarity measure for the tree model does not need to visit the parents of an element many times. Experimental results also show that the clustering methods perform better with the inclusion of the content information on most test document collections. To further the research, the structural clustering method based on tree model is extended and employed in XML transformation. The results from the experiments show that the proposed transformation process is faster than the traditional transformation system that translates and converts the source XML documents sequentially. Also, the schema matching process of XML transformation produces a better matching result in a shorter time.
Resumo:
Trees are capable of portraying the semi-structured data which is common in web domain. Finding similarities between trees is mandatory for several applications that deal with semi-structured data. Existing similarity methods examine a pair of trees by comparing through nodes and paths of two trees, and find the similarity between them. However, these methods provide unfavorable results for unordered tree data and result in yielding NP-hard or MAX-SNP hard complexity. In this paper, we present a novel method that encodes a tree with an optimal traversing approach first, and then, utilizes it to model the tree with its equivalent matrix representation for finding similarity between unordered trees efficiently. Empirical analysis shows that the proposed method is able to achieve high accuracy even on the large data sets.