933 resultados para File organization (Computer science)
Resumo:
A good object representation or object descriptor is one of the key issues in object based image analysis. To effectively fuse color and texture as a unified descriptor at object level, this paper presents a novel method for feature fusion. Color histogram and the uniform local binary patterns are extracted from arbitrary-shaped image-objects, and kernel principal component analysis (kernel PCA) is employed to find nonlinear relationships of the extracted color and texture features. The maximum likelihood approach is used to estimate the intrinsic dimensionality, which is then used as a criterion for automatic selection of optimal feature set from the fused feature. The proposed method is evaluated using SVM as the benchmark classifier and is applied to object-based vegetation species classification using high spatial resolution aerial imagery. Experimental results demonstrate that great improvement can be achieved by using proposed feature fusion method.
Resumo:
Information Overload and Mismatch are two fundamental problems affecting the effectiveness of information filtering systems. Even though both term-based and patternbased approaches have been proposed to address the problems of overload and mismatch, neither of these approaches alone can provide a satisfactory solution to address these problems. This paper presents a novel two-stage information filtering model which combines the merits of term-based and pattern-based approaches to effectively filter sheer volume of information. In particular, the first filtering stage is supported by a novel rough analysis model which efficiently removes a large number of irrelevant documents, thereby addressing the overload problem. The second filtering stage is empowered by a semantically rich pattern taxonomy mining model which effectively fetches incoming documents according to the specific information needs of a user, thereby addressing the mismatch problem. The experimental results based on the RCV1 corpus show that the proposed twostage filtering model significantly outperforms the both termbased and pattern-based information filtering models.
Resumo:
The Guardian reportage of the United Kingdom Member of Parliament (MP) expenses scandal of 2009 used crowdsourcing and computational journalism techniques. Computational journalism can be broadly defined as the application of computer science techniques to the activities of journalism. Its foundation lies in computer assisted reporting techniques and its importance is increasing due to the: (a) increasing availability of large scale government datasets for scrutiny; (b) declining cost, increasing power and ease of use of data mining and filtering software; and Web 2.0; and (c) explosion of online public engagement and opinion.. This paper provides a case study of the Guardian MP expenses scandal reportage and reveals some key challenges and opportunities for digital journalism. It finds journalists may increasingly take an active role in understanding, interpreting, verifying and reporting clues or conclusions that arise from the interrogations of datasets (computational journalism). Secondly a distinction should be made between information reportage and computational journalism in the digital realm, just as a distinction might be made between citizen reporting and citizen journalism. Thirdly, an opportunity exists for online news providers to take a ‘curatorial’ role, selecting and making easily available the best data sources for readers to use (information reportage). These activities have always been fundamental to journalism, however the way in which they are undertaken may change. Findings from this paper may suggest opportunities and challenges for the implementation of computational journalism techniques in practice by digital Australian media providers, and further areas of research.
Resumo:
We describe research into the identification of anomalous events and event patterns as manifested in computer system logs. Prototype software has been developed with a capability that identifies anomalous events based on usage patterns or user profiles, and alerts administrators when such events are identified. To reduce the number of false positive alerts we have investigated the use of different user profile training techniques and introduce the use of abstractions to group together applications which are related. Our results suggest that the number of false alerts that are generated is significantly reduced when a growing time window is used for user profile training and when abstraction into groups of applications is used.
Resumo:
We report on a longitudinal research study of the development of novice programmers in their first semester of programming. In the third week, almost half of our sample of students could not answer an explain-in-plain-English question, for code consisting of just three assignment statements, which swapped the values in two variables. We regard code that swaps the values of two variables as the simplest case of where a programming student can manifest a SOLO relational response. Our results demonstrate that the problems many students face with understanding code can begin very early, on relatively trivial code. However, using traditional programming exercises, these problems often go undetected until late in the semester. New approaches are required to detect and fix these problems earlier.
Resumo:
There are at least four key challenges in the online news environment that computational journalism may address. Firstly, news providers operate in a rapidly evolving environment and larger businesses are typically slower to adapt to market innovations. News consumption patterns have changed and news providers need to find new ways to capture and retain digital users. Meanwhile, declining financial performance has led to cost cuts in mass market newspapers. Finally investigative reporting is typically slow, high cost and may be tedious, and yet is valuable to the reputation of a news provider. Computational journalism involves the application of software and technologies to the activities of journalism, and it draws from the fields of computer science, social science and communications. New technologies may enhance the traditional aims of journalism, or may require “a new breed of people who are midway between technologists and journalists” (Irfan Essa in Mecklin 2009: 3). Historically referred to as ‘computer assisted reporting’, the use of software in online reportage is increasingly valuable due to three factors: larger datasets are becoming publicly available; software is becoming sophisticated and ubiquitous; and the developing Australian digital economy. This paper introduces key elements of computational journalism – it describes why it is needed; what it involves; benefits and challenges; and provides a case study and examples. Computational techniques can quickly provide a solid factual basis for original investigative journalism and may increase interaction with readers, when correctly used. It is a major opportunity to enhance the delivery of original investigative journalism, which ultimately may attract and retain readers online.
Resumo:
Following the completion of the draft Human Genome in 2001, genomic sequence data is becoming available at an accelerating rate, fueled by advances in sequencing and computational technology. Meanwhile, large collections of astronomical and geospatial data have allowed the creation of virtual observatories, accessible throughout the world and requiring only commodity hardware. Through a combination of advances in data management, data mining and visualization, this infrastructure enables the development of new scientific and educational applications as diverse as galaxy classification and real-time tracking of earthquakes and volcanic plumes. In the present paper, we describe steps taken along a similar path towards a virtual observatory for genomes – an immersive three-dimensional visual navigation and query system for comparative genomic data.
Resumo:
The concept of non-destructive testing (NDT) of materials and structures is of immense importance in engineering and medicine. Several NDT methods including electromagnetic (EM)-based e.g. X-ray and Infrared; ultrasound; and S-waves have been proposed for medical applications. This paper evaluates the viability of near infrared (NIR) spectroscopy, an EM method for rapid non-destructive evaluation of articular cartilage. Specifically, we tested the hypothesis that there is a correlation between the NIR spectrum and the physical and mechanical characteristics of articular cartilage such as thickness, stress and stiffness. Intact, visually normal cartilage-on-bone plugs from 2-3yr old bovine patellae were exposed to NIR light from a diffuse reflectance fibre-optic probe and tested mechanically to obtain their thickness, stress, and stiffness. Multivariate statistical analysis-based predictive models relating articular cartilage NIR spectra to these characterising parameters were developed. Our results show that there is a varying degree of correlation between the different parameters and the NIR spectra of the samples with R2 varying between 65 and 93%. We therefore conclude that NIR can be used to determine, nondestructively, the physical and functional characteristics of articular cartilage.
Resumo:
In cloud computing resource allocation and scheduling of multiple composite web services is an important challenge. This is especially so in a hybrid cloud where there may be some free resources available from private clouds but some fee-paying resources from public clouds. Meeting this challenge involves two classical computational problems. One is assigning resources to each of the tasks in the composite web service. The other is scheduling the allocated resources when each resource may be used by more than one task and may be needed at different points of time. In addition, we must consider Quality-of-Service issues, such as execution time and running costs. Existing approaches to resource allocation and scheduling in public clouds and grid computing are not applicable to this new problem. This paper presents a random-key genetic algorithm that solves new resource allocation and scheduling problem. Experimental results demonstrate the effectiveness and scalability of the algorithm.
Resumo:
Client puzzles are meant to act as a defense against denial of service (DoS) attacks by requiring a client to solve some moderately hard problem before being granted access to a resource. However, recent client puzzle difficulty definitions (Stebila and Ustaoglu, 2009; Chen et al., 2009) do not ensure that solving n puzzles is n times harder than solving one puzzle. Motivated by examples of puzzles where this is the case, we present stronger definitions of difficulty for client puzzles that are meaningful in the context of adversaries with more computational power than required to solve a single puzzle. A protocol using strong client puzzles may still not be secure against DoS attacks if the puzzles are not used in a secure manner. We describe a security model for analyzing the DoS resistance of any protocol in the context of client puzzles and give a generic technique for combining any protocol with a strong client puzzle to obtain a DoS-resistant protocol.
Resumo:
A hierarchical structure is used to represent the content of the semi-structured documents such as XML and XHTML. The traditional Vector Space Model (VSM) is not sufficient to represent both the structure and the content of such web documents. Hence in this paper, we introduce a novel method of representing the XML documents in Tensor Space Model (TSM) and then utilize it for clustering. Empirical analysis shows that the proposed method is scalable for a real-life dataset as well as the factorized matrices produced from the proposed method helps to improve the quality of clusters due to the enriched document representation with both the structure and the content information.
Resumo:
Online dating networks, a type of social network, are gaining popularity. With many people joining and being available in the network, users are overwhelmed with choices when choosing their ideal partners. This problem can be overcome by utilizing recommendation methods. However, traditional recommendation methods are ineffective and inefficient for online dating networks where the dataset is sparse and/or large and two-way matching is required. We propose a methodology by using clustering, SimRank to recommend matching candidates to users in an online dating network. Data from a live online dating network is used in evaluation. The success rate of recommendation obtained using the proposed method is compared with baseline success rate of the network and the performance is improved by double.
Resumo:
Recommender systems are widely used online to help users find other products, items etc that they may be interested in based on what is known about that user in their profile. Often however user profiles may be short on information and thus it is difficult for a recommender system to make quality recommendations. This problem is known as the cold-start problem. Here we investigate using association rules as a source of information to expand a user profile and thus avoid this problem. Our experiments show that it is possible to use association rules to noticeably improve the performance of a recommender system under the cold-start situation. Furthermore, we also show that the improvement in performance obtained can be achieved while using non-redundant rule sets. This shows that non-redundant rules do not cause a loss of information and are just as informative as a set of association rules that contain redundancy.
Resumo:
Due to the change in attitudes and lifestyles, people expect to find new partners and friends via various ways now-a-days. Online dating networks create a network for people to meet each other and allow making contact with different objectives of developing a personal, romantic or sexual relationship. Due to the higher expectation of users, online matching companies are trying to adopt recommender systems. However, the existing recommendation techniques such as content-based, collaborative filtering or hybrid techniques focus on users explicit contact behaviors but ignore the implicit relationship among users in the network. This paper proposes a social matching system that uses past relations and user similarities in finding potential matches. The proposed system is evaluated on the dataset collected from an online dating network. Empirical analysis shows that the recommendation success rate has increased to 31% as compared to the baseline success rate of 19%.
Resumo:
The paper provides an assessment of the performance of commercial Real Time Kinematic (RTK) systems over longer than recommended inter-station distances. The experiments were set up to test and analyse solutions from the i-MAX, MAX and VRS systems being operated with three triangle shaped network cells, each having an average inter-station distance of 69km, 118km and 166km. The performance characteristics appraised included initialization success rate, initialization time, RTK position accuracy and availability, ambiguity resolution risk and RTK integrity risk in order to provide a wider perspective of the performance of the testing systems. ----- ----- The results showed that the performances of all network RTK solutions assessed were affected by the increase in the inter-station distances to similar degrees. The MAX solution achieved the highest initialization success rate of 96.6% on average, albeit with a longer initialisation time. Two VRS approaches achieved lower initialization success rate of 80% over the large triangle. In terms of RTK positioning accuracy after successful initialisation, the results indicated a good agreement between the actual error growth in both horizontal and vertical components and the accuracy specified in the RMS and part per million (ppm) values by the manufacturers. ----- ----- Additionally, the VRS approaches performed better than the MAX and i-MAX when being tested under the standard triangle network with a mean inter-station distance of 69km. However as the inter-station distance increases, the network RTK software may fail to generate VRS correction and then may turn to operate in the nearest single-base RTK (or RAW) mode. The position uncertainty reached beyond 2 meters occasionally, showing that the RTK rover software was using an incorrect ambiguity fixed solution to estimate the rover position rather than automatically dropping back to using an ambiguity float solution. Results identified that the risk of incorrectly resolving ambiguities reached 18%, 20%, 13% and 25% for i-MAX, MAX, Leica VRS and Trimble VRS respectively when operating over the large triangle network. Additionally, the Coordinate Quality indicator values given by the Leica GX1230 GG rover receiver tended to be over-optimistic and not functioning well with the identification of incorrectly fixed integer ambiguity solutions. In summary, this independent assessment has identified some problems and failures that can occur in all of the systems tested, especially when being pushed beyond the recommended limits. While such failures are expected, they can offer useful insights into where users should be wary and how manufacturers might improve their products. The results also demonstrate that integrity monitoring of RTK solutions is indeed necessary for precision applications, thus deserving serious attention from researchers and system providers.