945 resultados para File processing (Computer science)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Most research in the area of emotion detection in written text focused on detecting explicit expressions of emotions in text. In this paper, we present a rule-based pipeline approach for detecting implicit emotions in written text without emotion-bearing words based on the OCC Model. We have evaluated our approach on three different datasets with five emotion categories. Our results show that the proposed approach outperforms the lexicon matching method consistently across all the three datasets by a large margin of 17–30% in F-measure and gives competitive performance compared to a supervised classifier. In particular, when dealing with formal text which follows grammatical rules strictly, our approach gives an average F-measure of 82.7% on “Happy”, “Angry-Disgust” and “Sad”, even outperforming the supervised baseline by nearly 17% in F-measure. Our preliminary results show the feasibility of the approach for the task of implicit emotion detection in written text.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A szerzők cikkükben a számítástechnikai hulladékokkal foglalkoznak, számítástechnikai eszközök alatt a számítógép konfigurációk összetevőit értik, tehát számítógépeket (asztali, hordozható, terminál stb.), és perifériáit (monitor, nyomtató, cd-író stb.), valamint ezek alkatrészeit és kiegészítőit (chipek, mechanikus részek, festékkazetták stb.). A rendszeres használat környezeti hatásait csak abból a szempontból vizsgálták, hogy ennek során bizonyos alkatrészek, kellékek (kiemelten a nyomtatók festékkazettái) a gépnél nagyobb gyakorisággal cserélődnek, s válhatnak hulladékká. A fő fókusz a számítástechnikai eszközök élettartamának vége, s ebből a szempontból kulcsfogalom a használt személyi számítógép kategória. _____ In their article, the authors discuss the issue of computer waste; under the category of information technology devices they understand the components of computer configurations, that is computers (desktop, portable, terminal etc.) and their peripheries (monitor, printer, CD writer, etc), and also the components and supplements of these (chips, mechanical parts, toner cartridges, etc.). The environmental impact of regular use was examined only from one aspect: during regular use certain components and accessories (especially the toner cartridges of printers) are more often changed and become waste. The main focus is the end of the life time of computer devices, and from this point of view used personal computers are a key concept.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Néhány éve vonult be a köztudatba a cloud computing fogalom, mely ma már a szakirodalomban és az informatikai alkalmazásokban is egyre nagyobb teret foglal el. Ez az új IT-technológia a számítási felhő számítástechnikai szolgáltatásaihoz kapcsolódó ERP-rendszerek szabványosítását, elterjedését eredményezi. A szerzők cikkükben áttekintést adnak a cloud computing mai helyzetéről és a számítási felhőben működő adatfeldolgozó rendszerekkel (kiemelten ERP) kapcsolatos felhasználói elvárásokról, illetve kezdeti, németországi alkalmazási tapasztalatokról. Külön tárgyalják az ERP-rendszerek új kiválasztási céljait és kritériumait, melyek a felhőkörnyezet speciális lehetőségei miatt alakultak ki. _____ The concept of ‘Cloud’ as an IT notion emerged in the past years and proliferated within the business and IT professional community. The concept of cloud gained awareness both in the professional and scientific literature and in the practice of IT/IS world. The cloud has a profound impact on the Business Information Systems, especially on ERP systems. Indirectly, the cloud leads to a massive standardization on ERP systems and their services. In this paper, the authors provide a literature overview about the current situation of Cloud Computing and the requirements established by end-users against the other data processing facilities and systems, outstandingly the ERP systems. The majority of investigated cases are based on samples from Germany. Furthermore, the initial experiences of application are discussed. Separately, the recent selection objectives and criteria for ERP systems are investigated that came into existence because the appearance of Cloud in the IT environment.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The objectives of this research are to analyze and develop a modified Principal Component Analysis (PCA) and to develop a two-dimensional PCA with applications in image processing. PCA is a classical multivariate technique where its mathematical treatment is purely based on the eigensystem of positive-definite symmetric matrices. Its main function is to statistically transform a set of correlated variables to a new set of uncorrelated variables over $\IR\sp{n}$ by retaining most of the variations present in the original variables.^ The variances of the Principal Components (PCs) obtained from the modified PCA form a correlation matrix of the original variables. The decomposition of this correlation matrix into a diagonal matrix produces a set of orthonormal basis that can be used to linearly transform the given PCs. It is this linear transformation that reproduces the original variables. The two-dimensional PCA can be devised as a two successive of one-dimensional PCA. It can be shown that, for an $m\times n$ matrix, the PCs obtained from the two-dimensional PCA are the singular values of that matrix.^ In this research, several applications for image analysis based on PCA are developed, i.e., edge detection, feature extraction, and multi-resolution PCA decomposition and reconstruction. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Methods for accessing data on the Web have been the focus of active research over the past few years. In this thesis we propose a method for representing Web sites as data sources. We designed a Data Extractor data retrieval solution that allows us to define queries to Web sites and process resulting data sets. Data Extractor is being integrated into the MSemODB heterogeneous database management system. With its help database queries can be distributed over both local and Web data sources within MSemODB framework. ^ Data Extractor treats Web sites as data sources, controlling query execution and data retrieval. It works as an intermediary between the applications and the sites. Data Extractor utilizes a twofold “custom wrapper” approach for information retrieval. Wrappers for the majority of sites are easily built using a powerful and expressive scripting language, while complex cases are processed using Java-based wrappers that utilize specially designed library of data retrieval, parsing and Web access routines. In addition to wrapper development we thoroughly investigate issues associated with Web site selection, analysis and processing. ^ Data Extractor is designed to act as a data retrieval server, as well as an embedded data retrieval solution. We also use it to create mobile agents that are shipped over the Internet to the client's computer to perform data retrieval on behalf of the user. This approach allows Data Extractor to distribute and scale well. ^ This study confirms feasibility of building custom wrappers for Web sites. This approach provides accuracy of data retrieval, and power and flexibility in handling of complex cases. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Today, databases have become an integral part of information systems. In the past two decades, we have seen different database systems being developed independently and used in different applications domains. Today's interconnected networks and advanced applications, such as data warehousing, data mining & knowledge discovery and intelligent data access to information on the Web, have created a need for integrated access to such heterogeneous, autonomous, distributed database systems. Heterogeneous/multidatabase research has focused on this issue resulting in many different approaches. However, a single, generally accepted methodology in academia or industry has not emerged providing ubiquitous intelligent data access from heterogeneous, autonomous, distributed information sources. ^ This thesis describes a heterogeneous database system being developed at High-performance Database Research Center (HPDRC). A major impediment to ubiquitous deployment of multidatabase technology is the difficulty in resolving semantic heterogeneity. That is, identifying related information sources for integration and querying purposes. Our approach considers the semantics of the meta-data constructs in resolving this issue. The major contributions of the thesis work include: (i) providing a scalable, easy-to-implement architecture for developing a heterogeneous multidatabase system, utilizing Semantic Binary Object-oriented Data Model (Sem-ODM) and Semantic SQL query language to capture the semantics of the data sources being integrated and to provide an easy-to-use query facility; (ii) a methodology for semantic heterogeneity resolution by investigating into the extents of the meta-data constructs of component schemas. This methodology is shown to be correct, complete and unambiguous; (iii) a semi-automated technique for identifying semantic relations, which is the basis of semantic knowledge for integration and querying, using shared ontologies for context-mediation; (iv) resolutions for schematic conflicts and a language for defining global views from a set of component Sem-ODM schemas; (v) design of a knowledge base for storing and manipulating meta-data and knowledge acquired during the integration process. This knowledge base acts as the interface between integration and query processing modules; (vi) techniques for Semantic SQL query processing and optimization based on semantic knowledge in a heterogeneous database environment; and (vii) a framework for intelligent computing and communication on the Internet applying the concepts of our work. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Query processing is a commonly performed procedure and a vital and integral part of information processing. It is therefore important and necessary for information processing applications to continuously improve the accessibility of data sources as well as the ability to perform queries on those data sources. ^ It is well known that the relational database model and the Structured Query Language (SQL) are currently the most popular tools to implement and query databases. However, a certain level of expertise is needed to use SQL and to access relational databases. This study presents a semantic modeling approach that enables the average user to access and query existing relational databases without the concern of the database's structure or technicalities. This method includes an algorithm to represent relational database schemas in a more semantically rich way. The result of which is a semantic view of the relational database. The user performs queries using an adapted version of SQL, namely Semantic SQL. This method substantially reduces the size and complexity of queries. Additionally, it shortens the database application development cycle and improves maintenance and reliability by reducing the size of application programs. Furthermore, a Semantic Wrapper tool illustrating the semantic wrapping method is presented. ^ I further extend the use of this semantic wrapping method to heterogeneous database management. Relational, object-oriented databases and the Internet data sources are considered to be part of the heterogeneous database environment. Semantic schemas resulting from the algorithm presented in the method were employed to describe the structure of these data sources in a uniform way. Semantic SQL was utilized to query various data sources. As a result, this method provides users with the ability to access and perform queries on heterogeneous database systems in a more innate way. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A methodology for formally modeling and analyzing software architecture of mobile agent systems provides a solid basis to develop high quality mobile agent systems, and the methodology is helpful to study other distributed and concurrent systems as well. However, it is a challenge to provide the methodology because of the agent mobility in mobile agent systems.^ The methodology was defined from two essential parts of software architecture: a formalism to define the architectural models and an analysis method to formally verify system properties. The formalism is two-layer Predicate/Transition (PrT) nets extended with dynamic channels, and the analysis method is a hierarchical approach to verify models on different levels. The two-layer modeling formalism smoothly transforms physical models of mobile agent systems into their architectural models. Dynamic channels facilitate the synchronous communication between nets, and they naturally capture the dynamic architecture configuration and agent mobility of mobile agent systems. Component properties are verified based on transformed individual components, system properties are checked in a simplified system model, and interaction properties are analyzed on models composing from involved nets. Based on the formalism and the analysis method, this researcher formally modeled and analyzed a software architecture of mobile agent systems, and designed an architectural model of a medical information processing system based on mobile agents. The model checking tool SPIN was used to verify system properties such as reachability, concurrency and safety of the medical information processing system. ^ From successful modeling and analyzing the software architecture of mobile agent systems, the conclusion is that PrT nets extended with channels are a powerful tool to model mobile agent systems, and the hierarchical analysis method provides a rigorous foundation for the modeling tool. The hierarchical analysis method not only reduces the complexity of the analysis, but also expands the application scope of model checking techniques. The results of formally modeling and analyzing the software architecture of the medical information processing system show that model checking is an effective and an efficient way to verify software architecture. Moreover, this system shows a high level of flexibility, efficiency and low cost of mobile agent technologies. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

3D geographic information system (GIS) is data and computation intensive in nature. Internet users are usually equipped with low-end personal computers and network connections of limited bandwidth. Data reduction and performance optimization techniques are of critical importance in quality of service (QoS) management for online 3D GIS. In this research, QoS management issues regarding distributed 3D GIS presentation were studied to develop 3D TerraFly, an interactive 3D GIS that supports high quality online terrain visualization and navigation. ^ To tackle the QoS management challenges, multi-resolution rendering model, adaptive level of detail (LOD) control and mesh simplification algorithms were proposed to effectively reduce the terrain model complexity. The rendering model is adaptively decomposed into sub-regions of up-to-three detail levels according to viewing distance and other dynamic quality measurements. The mesh simplification algorithm was designed as a hybrid algorithm that combines edge straightening and quad-tree compression to reduce the mesh complexity by removing geometrically redundant vertices. The main advantage of this mesh simplification algorithm is that grid mesh can be directly processed in parallel without triangulation overhead. Algorithms facilitating remote accessing and distributed processing of volumetric GIS data, such as data replication, directory service, request scheduling, predictive data retrieving and caching were also proposed. ^ A prototype of the proposed 3D TerraFly implemented in this research demonstrates the effectiveness of our proposed QoS management framework in handling interactive online 3D GIS. The system implementation details and future directions of this research are also addressed in this thesis. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mediation techniques provide interoperability and support integrated query processing among heterogeneous databases. While such techniques help data sharing among different sources, they increase the risk for data security, such as violating access control rules. Successful protection of information by an effective access control mechanism is a basic requirement for interoperation among heterogeneous data sources. ^ This dissertation first identified the challenges in the mediation system in order to achieve both interoperability and security in the interconnected and collaborative computing environment, which includes: (1) context-awareness, (2) semantic heterogeneity, and (3) multiple security policy specification. Currently few existing approaches address all three security challenges in mediation system. This dissertation provides a modeling and architectural solution to the problem of mediation security that addresses the aforementioned security challenges. A context-aware flexible authorization framework was developed in the dissertation to deal with security challenges faced by mediation system. The authorization framework consists of two major tasks, specifying security policies and enforcing security policies. Firstly, the security policy specification provides a generic and extensible method to model the security policies with respect to the challenges posed by the mediation system. The security policies in this study are specified by 5-tuples followed by a series of authorization constraints, which are identified based on the relationship of the different security components in the mediation system. Two essential features of mediation systems, i. e., relationship among authorization components and interoperability among heterogeneous data sources, are the focus of this investigation. Secondly, this dissertation supports effective access control on mediation systems while providing uniform access for heterogeneous data sources. The dynamic security constraints are handled in the authorization phase instead of the authentication phase, thus the maintenance cost of security specification can be reduced compared with related solutions. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This research presents several components encompassing the scope of the objective of Data Partitioning and Replication Management in Distributed GIS Database. Modern Geographic Information Systems (GIS) databases are often large and complicated. Therefore data partitioning and replication management problems need to be addresses in development of an efficient and scalable solution. ^ Part of the research is to study the patterns of geographical raster data processing and to propose the algorithms to improve availability of such data. These algorithms and approaches are targeting granularity of geographic data objects as well as data partitioning in geographic databases to achieve high data availability and Quality of Service(QoS) considering distributed data delivery and processing. To achieve this goal a dynamic, real-time approach for mosaicking digital images of different temporal and spatial characteristics into tiles is proposed. This dynamic approach reuses digital images upon demand and generates mosaicked tiles only for the required region according to user's requirements such as resolution, temporal range, and target bands to reduce redundancy in storage and to utilize available computing and storage resources more efficiently. ^ Another part of the research pursued methods for efficient acquiring of GIS data from external heterogeneous databases and Web services as well as end-user GIS data delivery enhancements, automation and 3D virtual reality presentation. ^ There are vast numbers of computing, network, and storage resources idling or not fully utilized available on the Internet. Proposed "Crawling Distributed Operating System "(CDOS) approach employs such resources and creates benefits for the hosts that lend their CPU, network, and storage resources to be used in GIS database context. ^ The results of this dissertation demonstrate effective ways to develop a highly scalable GIS database. The approach developed in this dissertation has resulted in creation of TerraFly GIS database that is used by US government, researchers, and general public to facilitate Web access to remotely-sensed imagery and GIS vector information. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Moving objects database systems are the most challenging sub-category among Spatio-Temporal database systems. A database system that updates in real-time the location information of GPS-equipped moving vehicles has to meet even stricter requirements. Currently existing data storage models and indexing mechanisms work well only when the number of moving objects in the system is relatively small. This dissertation research aimed at the real-time tracking and history retrieval of massive numbers of vehicles moving on road networks. A total solution has been provided for the real-time update of the vehicles' location and motion information, range queries on current and history data, and prediction of vehicles' movement in the near future. ^ To achieve these goals, a new approach called Segmented Time Associated to Partitioned Space (STAPS) was first proposed in this dissertation for building and manipulating the indexing structures for moving objects databases. ^ Applying the STAPS approach, an indexing structure of associating a time interval tree to each road segment was developed for real-time database systems of vehicles moving on road networks. The indexing structure uses affordable storage to support real-time data updates and efficient query processing. The data update and query processing performance it provides is consistent without restrictions such as a time window or assuming linear moving trajectories. ^ An application system design based on distributed system architecture with centralized organization was developed to maximally support the proposed data and indexing structures. The suggested system architecture is highly scalable and flexible. Finally, based on a real-world application model of vehicles moving in region-wide, main issues on the implementation of such a system were addressed. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The microarray technology provides a high-throughput technique to study gene expression. Microarrays can help us diagnose different types of cancers, understand biological processes, assess host responses to drugs and pathogens, find markers for specific diseases, and much more. Microarray experiments generate large amounts of data. Thus, effective data processing and analysis are critical for making reliable inferences from the data. ^ The first part of dissertation addresses the problem of finding an optimal set of genes (biomarkers) to classify a set of samples as diseased or normal. Three statistical gene selection methods (GS, GS-NR, and GS-PCA) were developed to identify a set of genes that best differentiate between samples. A comparative study on different classification tools was performed and the best combinations of gene selection and classifiers for multi-class cancer classification were identified. For most of the benchmarking cancer data sets, the gene selection method proposed in this dissertation, GS, outperformed other gene selection methods. The classifiers based on Random Forests, neural network ensembles, and K-nearest neighbor (KNN) showed consistently god performance. A striking commonality among these classifiers is that they all use a committee-based approach, suggesting that ensemble classification methods are superior. ^ The same biological problem may be studied at different research labs and/or performed using different lab protocols or samples. In such situations, it is important to combine results from these efforts. The second part of the dissertation addresses the problem of pooling the results from different independent experiments to obtain improved results. Four statistical pooling techniques (Fisher inverse chi-square method, Logit method. Stouffer's Z transform method, and Liptak-Stouffer weighted Z-method) were investigated in this dissertation. These pooling techniques were applied to the problem of identifying cell cycle-regulated genes in two different yeast species. As a result, improved sets of cell cycle-regulated genes were identified. The last part of dissertation explores the effectiveness of wavelet data transforms for the task of clustering. Discrete wavelet transforms, with an appropriate choice of wavelet bases, were shown to be effective in producing clusters that were biologically more meaningful. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recent advances in airborne Light Detection and Ranging (LIDAR) technology allow rapid and inexpensive measurements of topography over large areas. Airborne LIDAR systems usually return a 3-dimensional cloud of point measurements from reflective objects scanned by the laser beneath the flight path. This technology is becoming a primary method for extracting information of different kinds of geometrical objects, such as high-resolution digital terrain models (DTMs), buildings and trees, etc. In the past decade, LIDAR gets more and more interest from researchers in the field of remote sensing and GIS. Compared to the traditional data sources, such as aerial photography and satellite images, LIDAR measurements are not influenced by sun shadow and relief displacement. However, voluminous data pose a new challenge for automated extraction the geometrical information from LIDAR measurements because many raster image processing techniques cannot be directly applied to irregularly spaced LIDAR points. ^ In this dissertation, a framework is proposed to filter out information about different kinds of geometrical objects, such as terrain and buildings from LIDAR automatically. They are essential to numerous applications such as flood modeling, landslide prediction and hurricane animation. The framework consists of several intuitive algorithms. Firstly, a progressive morphological filter was developed to detect non-ground LIDAR measurements. By gradually increasing the window size and elevation difference threshold of the filter, the measurements of vehicles, vegetation, and buildings are removed, while ground data are preserved. Then, building measurements are identified from no-ground measurements using a region growing algorithm based on the plane-fitting technique. Raw footprints for segmented building measurements are derived by connecting boundary points and are further simplified and adjusted by several proposed operations to remove noise, which is caused by irregularly spaced LIDAR measurements. To reconstruct 3D building models, the raw 2D topology of each building is first extracted and then further adjusted. Since the adjusting operations for simple building models do not work well on 2D topology, 2D snake algorithm is proposed to adjust 2D topology. The 2D snake algorithm consists of newly defined energy functions for topology adjusting and a linear algorithm to find the minimal energy value of 2D snake problems. Data sets from urbanized areas including large institutional, commercial, and small residential buildings were employed to test the proposed framework. The results demonstrated that the proposed framework achieves a very good performance. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fast spreading unknown viruses have caused major damage on computer systems upon their initial release. Current detection methods have lacked capabilities to detect unknown viruses quickly enough to avoid mass spreading and damage. This dissertation has presented a behavior based approach to detecting known and unknown viruses based on their attempt to replicate. Replication is the qualifying fundamental characteristic of a virus and is consistently present in all viruses making this approach applicable to viruses belonging to many classes and executing under several conditions. A form of replication called self-reference replication, (SR-replication), has been formalized as one main type of replication which specifically replicates by modifying or creating other files on a system to include the virus itself. This replication type was used to detect viruses attempting replication by referencing themselves which is a necessary step to successfully replicate files. The approach does not require a priori knowledge about known viruses. Detection was accomplished at runtime by monitoring currently executing processes attempting to replicate. Two implementation prototypes of the detection approach called SRRAT were created and tested on the Microsoft Windows operating systems focusing on the tracking of user mode Win32 API system calls and Kernel mode system services. The research results showed SR-replication capable of distinguishing between file infecting viruses and benign processes with little or no false positives and false negatives. ^