937 resultados para cross-domain distinguishing features


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Joint sentiment-topic (JST) model was previously proposed to detect sentiment and topic simultaneously from text. The only supervision required by JST model learning is domain-independent polarity word priors. In this paper, we modify the JST model by incorporating word polarity priors through modifying the topic-word Dirichlet priors. We study the polarity-bearing topics extracted by JST and show that by augmenting the original feature space with polarity-bearing topics, the in-domain supervised classifiers learned from augmented feature representation achieve the state-of-the-art performance of 95% on the movie review data and an average of 90% on the multi-domain sentiment dataset. Furthermore, using feature augmentation and selection according to the information gain criteria for cross-domain sentiment classification, our proposed approach performs either better or comparably compared to previous approaches. Nevertheless, our approach is much simpler and does not require difficult parameter tuning.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The InterPARES 2 Terminology Cross-Domain has created three terminological instruments in service to the project, and by extension, Archival Science. Over the course of the five-year project this Cross-Domain has collected words, definition, and phrases from extant documents, research tools, models, and direct researcher submission and discussion. From these raw materials, the Cross-Domain has identified a systematic and pragmatic way establishing a coherent view on the concepts involved in dynamic, experiential, and interactive records and systems in the arts, sciences, and e-government.The three terminological instruments are the Glossary, Dictionary, and Ontologies. The first of these is an authoritative list of terms and definitions that are core to our understanding of the evolving records creation, keeping, and preservation environments. The Dictionary is a tool used to facilitate interdisciplinary communication. It contains multiple definitions for terms, from multiple disciplines. By using this tool, researchers can see how Archival Science deploys terminology compared to Computer Science, Library and Information Science, or Arts, etc. The third terminological instrument, the Ontologies, identify explicit relationships between concepts of records. This is useful for communicating the nuances of Diplomatics in the dynamic, experiential, and interactive environment.All three of these instruments were drawn from a Register of terms gathered over the course of the project. This Register served as a holding place for terms, definitions, and phrases, and allowed researchers to discuss, comment on, and modify submissions. The Register and the terminological instruments were housed in the Terminology Database. The Database provides searching, display, and file downloads – making it easy to navigate through the terminological instruments.Terminology used in InterPARES 1 and the UBC Project was carried forward to this Database. In this sense, we are building on our past knowledge, and making it relevant to the contemporary environment.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Metadata that is associated with either an information system or an information object for purposes of description, administration, legal requirements, technical functionality, use and usage, and preservation, plays a critical role in ensuring the creation, management, preservation and use and re-use of trustworthymaterials, including records. Recordkeeping1 metadata, of which one key type is archival description, plays a particularly important role in documenting the reliability and authenticity of records and recordkeeping systemsas well as the various contexts (legal-administrative, provenancial, procedural, documentary, and technical) within which records are created and kept as they move across space and time. In the digital environment, metadata is also the means by which it is possible to identify how record components – those constituent aspects of a digital record that may be managed, stored and used separately by the creator or the preserver – can be reassembled to generate an authentic copy of a record or reformulated per a user’s request as a customized output package.Issues relating to the creation, capture, management and preservation of adequate metadata are, therefore, integral to any research study addressing the reliability and authenticity of digital entities, regardless of the community, sector or institution within which they are being created. The InterPARES 2 Description Cross-Domain Group (DCD) examined the conceptualization, definitions, roles, and current functionality of metadata and archival description in terms of requirements generated by InterPARES 12. Because of the needs to communicate the work of InterPARES in a meaningful way across not only other disciplines, but also different archival traditions; to interface with, evaluate and inform existing standards, practices and other research projects; and to ensure interoperability across the three focus areas of InterPARES2, the Description Cross-Domain also addressed its research goals with reference to wider thinking about and developments in recordkeeping and metadata. InterPARES2 addressed not only records, however, but a range of digital information objects (referred to as “entities” by InterPARES 2, but not to be confused with the term “entities” as used in metadata and database applications) that are the products and by-products of government, scientific and artistic activities that are carried out using dynamic, interactive or experiential digital systems. The nature of these entities was determined through a diplomatic analysis undertaken as part of extensive case studies of digital systems that were conducted by the InterPARES 2 Focus Groups. This diplomatic analysis established whether the entities identified during the case studies were records, non-records that nevertheless raised important concerns relating to reliability and authenticity, or “potential records.” To be determined to be records, the entities had to meet the criteria outlined by archival theory – they had to have a fixed documentary format and stable content. It was not sufficient that they be considered to be or treated as records by the creator. “Potential records” is a new construct that indicates that a digital system has the potential to create records upon demand, but does not actually fix and set aside records in the normal course of business. The work of the Description Cross-Domain Group, therefore, addresses the metadata needs for all three categories of entities.Finally, since “metadata” as a term is used today so ubiquitously and in so many different ways by different communities, that it is in peril of losing any specificity, part of the work of the DCD sought to name and type categories of metadata. It also addressed incentives for creators to generate appropriate metadata, as well as issues associated with the retention, maintenance and eventual disposition of the metadata that aggregates around digital entities over time.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The CopA copper ATPase of Enterococcus hirae belongs to the family of heavy metal pumping CPx-type ATPases and shares 43% sequence similarity with the human Menkes and Wilson copper ATPases. Due to a lack of suitable protein crystals, only partial three-dimensional structures have so far been obtained for this family of ion pumps. We present a structural model of CopA derived by combining topological information obtained by intramolecular cross-linking with molecular modeling. Purified CopA was cross-linked with different bivalent reagents, followed by tryptic digestion and identification of cross-linked peptides by mass spectrometry. The structural proximity of tryptic fragments provided information about the structural arrangement of the hydrophilic protein domains, which was integrated into a three-dimensional model of CopA. Comparative modeling of CopA was guided by the sequence similarity to the calcium ATPase of the sarcoplasmic reticulum, Serca1, for which detailed structures are available. In addition, known partial structures of CPx-ATPase homologous to CopA were used as modeling templates. A docking approach was used to predict the orientation of the heavy metal binding domain of CopA relative to the core structure, which was verified by distance constraints derived from cross-links. The overall structural model of CopA resembles the Serca1 structure, but reveals distinctive features of CPx-type ATPases. A prominent feature is the positioning of the heavy metal binding domain. It features an orientation of the Cu binding ligands which is appropriate for the interaction with Cu-loaded metallochaperones in solution. Moreover, a novel model of the architecture of the intramembranous Cu binding sites could be derived.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Generic sentiment lexicons have been widely used for sentiment analysis these days. However, manually constructing sentiment lexicons is very time-consuming and it may not be feasible for certain application domains where annotation expertise is not available. One contribution of this paper is the development of a statistical learning based computational method for the automatic construction of domain-specific sentiment lexicons to enhance cross-domain sentiment analysis. Our initial experiments show that the proposed methodology can automatically generate domain-specific sentiment lexicons which contribute to improve the effectiveness of opinion retrieval at the document level. Another contribution of our work is that we show the feasibility of applying the sentiment metric derived based on the automatically constructed sentiment lexicons to predict product sales of certain product categories. Our research contributes to the development of more effective sentiment analysis system to extract business intelligence from numerous opinionated expressions posted to the Web

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We employed an action research approach to develop a context specific peer mentoring program (Postgrad Assist) that aids first year postgraduate coursework (PGCW) students in their transition to postgraduate study. We explored the transition literature and best practice approaches, undertook comprehensive surveys of both our students and staff, conducted student focus groups, formed a diverse working party, with strong representation from students and adopted an on-going evaluation program to develop and refine Postgrad Assist. The program substantially alleviated transitioning students’ cultural and academic shock, and social isolation. The program makes a contribution to the transition and mentoring research domain in that it challenges the common misconception that PGCW students are similar to undergraduate students with no particular distinguishing features that would suggest a need for a different approach to their mentoring.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Lack of supervision in clustering algorithms often leads to clusters that are not useful or interesting to human reviewers. We investigate if supervision can be automatically transferred for clustering a target task, by providing a relevant supervised partitioning of a dataset from a different source task. The target clustering is made more meaningful for the human user by trading-off intrinsic clustering goodness on the target task for alignment with relevant supervised partitions in the source task, wherever possible. We propose a cross-guided clustering algorithm that builds on traditional k-means by aligning the target clusters with source partitions. The alignment process makes use of a cross-task similarity measure that discovers hidden relationships across tasks. When the source and target tasks correspond to different domains with potentially different vocabularies, we propose a projection approach using pivot vocabularies for the cross-domain similarity measure. Using multiple real-world and synthetic datasets, we show that our approach improves clustering accuracy significantly over traditional k-means and state-of-the-art semi-supervised clustering baselines, over a wide range of data characteristics and parameter settings.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We describe our work on developing a speech recognition system for multi-genre media archives. The high diversity of the data makes this a challenging recognition task, which may benefit from systems trained on a combination of in-domain and out-of-domain data. Working with tandem HMMs, we present Multi-level Adaptive Networks (MLAN), a novel technique for incorporating information from out-of-domain posterior features using deep neural networks. We show that it provides a substantial reduction in WER over other systems, with relative WER reductions of 15% over a PLP baseline, 9% over in-domain tandem features and 8% over the best out-of-domain tandem features. © 2012 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nearest neighbor retrieval is the task of identifying, given a database of objects and a query object, the objects in the database that are the most similar to the query. Retrieving nearest neighbors is a necessary component of many practical applications, in fields as diverse as computer vision, pattern recognition, multimedia databases, bioinformatics, and computer networks. At the same time, finding nearest neighbors accurately and efficiently can be challenging, especially when the database contains a large number of objects, and when the underlying distance measure is computationally expensive. This thesis proposes new methods for improving the efficiency and accuracy of nearest neighbor retrieval and classification in spaces with computationally expensive distance measures. The proposed methods are domain-independent, and can be applied in arbitrary spaces, including non-Euclidean and non-metric spaces. In this thesis particular emphasis is given to computer vision applications related to object and shape recognition, where expensive non-Euclidean distance measures are often needed to achieve high accuracy. The first contribution of this thesis is the BoostMap algorithm for embedding arbitrary spaces into a vector space with a computationally efficient distance measure. Using this approach, an approximate set of nearest neighbors can be retrieved efficiently - often orders of magnitude faster than retrieval using the exact distance measure in the original space. The BoostMap algorithm has two key distinguishing features with respect to existing embedding methods. First, embedding construction explicitly maximizes the amount of nearest neighbor information preserved by the embedding. Second, embedding construction is treated as a machine learning problem, in contrast to existing methods that are based on geometric considerations. The second contribution is a method for constructing query-sensitive distance measures for the purposes of nearest neighbor retrieval and classification. In high-dimensional spaces, query-sensitive distance measures allow for automatic selection of the dimensions that are the most informative for each specific query object. It is shown theoretically and experimentally that query-sensitivity increases the modeling power of embeddings, allowing embeddings to capture a larger amount of the nearest neighbor structure of the original space. The third contribution is a method for speeding up nearest neighbor classification by combining multiple embedding-based nearest neighbor classifiers in a cascade. In a cascade, computationally efficient classifiers are used to quickly classify easy cases, and classifiers that are more computationally expensive and also more accurate are only applied to objects that are harder to classify. An interesting property of the proposed cascade method is that, under certain conditions, classification time actually decreases as the size of the database increases, a behavior that is in stark contrast to the behavior of typical nearest neighbor classification systems. The proposed methods are evaluated experimentally in several different applications: hand shape recognition, off-line character recognition, online character recognition, and efficient retrieval of time series. In all datasets, the proposed methods lead to significant improvements in accuracy and efficiency compared to existing state-of-the-art methods. In some datasets, the general-purpose methods introduced in this thesis even outperform domain-specific methods that have been custom-designed for such datasets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

— Consideration of how people respond to the question What is this? has suggested new problem frontiers for pattern recognition and information fusion, as well as neural systems that embody the cognitive transformation of declarative information into relational knowledge. In contrast to traditional classification methods, which aim to find the single correct label for each exemplar (This is a car), the new approach discovers rules that embody coherent relationships among labels which would otherwise appear contradictory to a learning system (This is a car, that is a vehicle, over there is a sedan). This talk will describe how an individual who experiences exemplars in real time, with each exemplar trained on at most one category label, can autonomously discover a hierarchy of cognitive rules, thereby converting local information into global knowledge. Computational examples are based on the observation that sensors working at different times, locations, and spatial scales, and experts with different goals, languages, and situations, may produce apparently inconsistent image labels, which are reconciled by implicit underlying relationships that the network’s learning process discovers. The ARTMAP information fusion system can, moreover, integrate multiple separate knowledge hierarchies, by fusing independent domains into a unified structure. In the process, the system discovers cross-domain rules, inferring multilevel relationships among groups of output classes, without any supervised labeling of these relationships. In order to self-organize its expert system, the ARTMAP information fusion network features distributed code representations which exploit the model’s intrinsic capacity for one-to-many learning (This is a car and a vehicle and a sedan) as well as many-to-one learning (Each of those vehicles is a car). Fusion system software, testbed datasets, and articles are available from http://cns.bu.edu/techlab.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Parry Sound domain is a granulite nappe-stack transported cratonward during reactivation of the ductile lower and middle crust in the late convergence of the Mesoproterozoic Grenville orogeny. Field observations suggest the following with respect to the ductile sheath: (1) Formation of a carapace of transposed amphibolite facies gneiss derived from and enveloping the western extremity of the Parry Sound domain and separating it from high-strain gneiss of adjacent allochthons. This ductile sheath formed dynamically around the moving granulite nappe through the development of systems of progressively linked shear zones. (2) Transposition initiated by hydration (amphibolization) of granulite facies gneiss by introduction of fluid along cracks accompanying pegmatite emplacement. Shear zones nucleated along pegmatite margins and subsequently linked and rotated. The source of the pegmatites was most likely subjacent migmatitic and pegmatite-rich units or units over which Parry Sound domain was transported. Comparison of gneisses of the ductile sheath with high-strain layered gneiss of adjacent allochthons show the mode of transposition of penetratively layered gneiss depended on whether or not the gneiss protoliths were amphibolite or granulite facies tectonites before initiation of transposition, resulting in, e.g., folding before shearing, no folding before shearing, respectively. Meter-scale truncation along high-strain gradients at the margins of both types of transposition-related shear zones observed within and marginal to Parry Sound domain mimic features at kilometer scales, implying that apparent truncation by transposition originating in a manner similar to the ductile sheath may be a common feature of deep crustal ductile reworking. Citation: Culshaw, N., C. Gerbi, and J. Marsh (2010), Softening the lower crust: Modes of syn-transport transposition around and adjacent to a deep crustal granulite nappe, Parry Sound domain, Grenville Province, Ontario, Canada, Tectonics, 29, TC5013, doi:10.1029/2009TC002537.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The solutions to cope with new challenges that societies have to face nowadays involve providing smarter daily systems. To achieve this, technology has to evolve and leverage physical systems automatic interactions, with less human intervention. Technological paradigms like Internet of Things (IoT) and Cyber-Physical Systems (CPS) are providing reference models, architectures, approaches and tools that are to support cross-domain solutions. Thus, CPS based solutions will be applied in different application domains like e-Health, Smart Grid, Smart Transportation and so on, to assure the expected response from a complex system that relies on the smooth interaction and cooperation of diverse networked physical systems. The Wireless Sensors Networks (WSN) are a well-known wireless technology that are part of large CPS. The WSN aims at monitoring a physical system, object, (e.g., the environmental condition of a cargo container), and relaying data to the targeted processing element. The WSN communication reliability, as well as a restrained energy consumption, are expected features in a WSN. This paper shows the results obtained in a real WSN deployment, based on SunSPOT nodes, which carries out a fuzzy based control strategy to improve energy consumption while keeping communication reliability and computational resources usage among boundaries.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Sequences of timestamped events are currently being generated across nearly every domain of data analytics, from e-commerce web logging to electronic health records used by doctors and medical researchers. Every day, this data type is reviewed by humans who apply statistical tests, hoping to learn everything they can about how these processes work, why they break, and how they can be improved upon. To further uncover how these processes work the way they do, researchers often compare two groups, or cohorts, of event sequences to find the differences and similarities between outcomes and processes. With temporal event sequence data, this task is complex because of the variety of ways single events and sequences of events can differ between the two cohorts of records: the structure of the event sequences (e.g., event order, co-occurring events, or frequencies of events), the attributes about the events and records (e.g., gender of a patient), or metrics about the timestamps themselves (e.g., duration of an event). Running statistical tests to cover all these cases and determining which results are significant becomes cumbersome. Current visual analytics tools for comparing groups of event sequences emphasize a purely statistical or purely visual approach for comparison. Visual analytics tools leverage humans' ability to easily see patterns and anomalies that they were not expecting, but is limited by uncertainty in findings. Statistical tools emphasize finding significant differences in the data, but often requires researchers have a concrete question and doesn't facilitate more general exploration of the data. Combining visual analytics tools with statistical methods leverages the benefits of both approaches for quicker and easier insight discovery. Integrating statistics into a visualization tool presents many challenges on the frontend (e.g., displaying the results of many different metrics concisely) and in the backend (e.g., scalability challenges with running various metrics on multi-dimensional data at once). I begin by exploring the problem of comparing cohorts of event sequences and understanding the questions that analysts commonly ask in this task. From there, I demonstrate that combining automated statistics with an interactive user interface amplifies the benefits of both types of tools, thereby enabling analysts to conduct quicker and easier data exploration, hypothesis generation, and insight discovery. The direct contributions of this dissertation are: (1) a taxonomy of metrics for comparing cohorts of temporal event sequences, (2) a statistical framework for exploratory data analysis with a method I refer to as high-volume hypothesis testing (HVHT), (3) a family of visualizations and guidelines for interaction techniques that are useful for understanding and parsing the results, and (4) a user study, five long-term case studies, and five short-term case studies which demonstrate the utility and impact of these methods in various domains: four in the medical domain, one in web log analysis, two in education, and one each in social networks, sports analytics, and security. My dissertation contributes an understanding of how cohorts of temporal event sequences are commonly compared and the difficulties associated with applying and parsing the results of these metrics. It also contributes a set of visualizations, algorithms, and design guidelines for balancing automated statistics with user-driven analysis to guide users to significant, distinguishing features between cohorts. This work opens avenues for future research in comparing two or more groups of temporal event sequences, opening traditional machine learning and data mining techniques to user interaction, and extending the principles found in this dissertation to data types beyond temporal event sequences.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Poly(vinylidene fluoride) and copolymers of vinylidene fluoride with hexafluoropropylene, trifluoroethylene and chlorotrifluoroethylene have been exposed to gamma irradiation in vacuum, up to doses of 1MGy under identical conditions, to obtain a ranking of radiation sensitivities. Changes in the tensile properties, crystalline melting points,heats of fusion, gel contents and solvent uptake factors were used as the defining parameters. The initial degree of crystallinity and film processing had the greatest influence on relative radiation damage, although the cross-linked network features were almost identical in their solvent swelling characteristics, regardless of the comonomer composition or content.