943 resultados para data transformation
Resumo:
It is a big challenge to acquire correct user profiles for personalized text classification since users may be unsure in providing their interests. Traditional approaches to user profiling adopt machine learning (ML) to automatically discover classification knowledge from explicit user feedback in describing personal interests. However, the accuracy of ML-based methods cannot be significantly improved in many cases due to the term independence assumption and uncertainties associated with them. This paper presents a novel relevance feedback approach for personalized text classification. It basically applies data mining to discover knowledge from relevant and non-relevant text and constraints specific knowledge by reasoning rules to eliminate some conflicting information. We also developed a Dempster-Shafer (DS) approach as the means to utilise the specific knowledge to build high-quality data models for classification. The experimental results conducted on Reuters Corpus Volume 1 and TREC topics support that the proposed technique achieves encouraging performance in comparing with the state-of-the-art relevance feedback models.
Resumo:
This paper studies the missing covariate problem which is often encountered in survival analysis. Three covariate imputation methods are employed in the study, and the effectiveness of each method is evaluated within the hazard prediction framework. Data from a typical engineering asset is used in the case study. Covariate values in some time steps are deliberately discarded to generate an incomplete covariate set. It is found that although the mean imputation method is simpler than others for solving missing covariate problems, the results calculated by it can differ largely from the real values of the missing covariates. This study also shows that in general, results obtained from the regression method are more accurate than those of the mean imputation method but at the cost of a higher computational expensive. Gaussian Mixture Model (GMM) method is found to be the most effective method within these three in terms of both computation efficiency and predication accuracy.
Resumo:
Cities accumulate and distribute vast sets of digital information. Many decision-making and planning processes in councils, local governments and organisations are based on both real-time and historical data. Until recently, only a small, carefully selected subset of this information has been released to the public – usually for specific purposes (e.g. train timetables, release of planning application through websites to name just a few). This situation is however changing rapidly. Regulatory frameworks, such as the Freedom of Information Legislation in the US, the UK, the European Union and many other countries guarantee public access to data held by the state. One of the results of this legislation and changing attitudes towards open data has been the widespread release of public information as part of recent Government 2.0 initiatives. This includes the creation of public data catalogues such as data.gov.au (U.S.), data.gov.uk (U.K.), data.gov.au (Australia) at federal government levels, and datasf.org (San Francisco) and data.london.gov.uk (London) at municipal levels. The release of this data has opened up the possibility of a wide range of future applications and services which are now the subject of intensified research efforts. Previous research endeavours have explored the creation of specialised tools to aid decision-making by urban citizens, councils and other stakeholders (Calabrese, Kloeckl & Ratti, 2008; Paulos, Honicky & Hooker, 2009). While these initiatives represent an important step towards open data, they too often result in mere collections of data repositories. Proprietary database formats and the lack of an open application programming interface (API) limit the full potential achievable by allowing these data sets to be cross-queried. Our research, presented in this paper, looks beyond the pure release of data. It is concerned with three essential questions: First, how can data from different sources be integrated into a consistent framework and made accessible? Second, how can ordinary citizens be supported in easily composing data from different sources in order to address their specific problems? Third, what are interfaces that make it easy for citizens to interact with data in an urban environment? How can data be accessed and collected?
Resumo:
Typical reference year (TRY) weather data is often used to represent the long term weather pattern for building simulation and design. Through the analysis of ten year historical hourly weather data for seven Australian major capital cities using the frequencies procedure of descriptive statistics analysis (by SPSS software), this paper investigates: • the closeness of the typical reference year (TRY) weather data in representing the long term weather pattern; • the variations and common features that may exist between relatively hot and cold years. It is found that for the given set of input data, in comparison with the other weather elements, the discrepancy between TRY and multiple years is much smaller for the dry bulb temperature, relative humidity and global solar irradiance. The overall distribution patterns of key weather elements are also generally similar between the hot and cold years, but with some shift and/or small distortion. There is little common tendency of change between the hot and the cold years for different weather variables at different study locations.
Resumo:
Individual science teachers who have inspired colleagues to transform their classroom praxis have been labelled transformational leaders. As the notion of distributed leadership became more accepted in the educational literature, the focus on the individual teacher-leader shifted to the study of leadership praxis both by individuals (whoever they might be) and by collectives within schools and science classrooms. This review traces the trajectory of leadership research, in the context of learning and teaching science, from an individual focus to a dialectical relationship between individual and collective praxis. The implications of applying an individual-collective perspective to praxis for teachers, students and their designated leaders are discussed.
Resumo:
Concerns regarding groundwater contamination with nitrate and the long-term sustainability of groundwater resources have prompted the development of a multi-layered three dimensional (3D) geological model to characterise the aquifer geometry of the Wairau Plain, Marlborough District, New Zealand. The 3D geological model which consists of eight litho-stratigraphic units has been subsequently used to synthesise hydrogeological and hydrogeochemical data for different aquifers in an approach that aims to demonstrate how integration of water chemistry data within the physical framework of a 3D geological model can help to better understand and conceptualise groundwater systems in complex geological settings. Multivariate statistical techniques(e.g. Principal Component Analysis and Hierarchical Cluster Analysis) were applied to groundwater chemistry data to identify hydrochemical facies which are characteristic of distinct evolutionary pathways and a common hydrologic history of groundwaters. Principal Component Analysis on hydrochemical data demonstrated that natural water-rock interactions, redox potential and human agricultural impact are the key controls of groundwater quality in the Wairau Plain. Hierarchical Cluster Analysis revealed distinct hydrochemical water quality groups in the Wairau Plain groundwater system. Visualisation of the results of the multivariate statistical analyses and distribution of groundwater nitrate concentrations in the context of aquifer lithology highlighted the link between groundwater chemistry and the lithology of host aquifers. The methodology followed in this study can be applied in a variety of hydrogeological settings to synthesise geological, hydrogeological and hydrochemical data and present them in a format readily understood by a wide range of stakeholders. This enables a more efficient communication of the results of scientific studies to the wider community.
Resumo:
During the course of several natural disasters in recent years, Twitter has been found to play an important role as an additional medium for many–to–many crisis communication. Emergency services are successfully using Twitter to inform the public about current developments, and are increasingly also attempting to source first–hand situational information from Twitter feeds (such as relevant hashtags). The further study of the uses of Twitter during natural disasters relies on the development of flexible and reliable research infrastructure for tracking and analysing Twitter feeds at scale and in close to real time, however. This article outlines two approaches to the development of such infrastructure: one which builds on the readily available open source platform yourTwapperkeeper to provide a low–cost, simple, and basic solution; and, one which establishes a more powerful and flexible framework by drawing on highly scaleable, state–of–the–art technology.
Resumo:
The demand for high-speed data services for portable device has become a driving force for development of advanced broadband access technologies. Despite recent advances in broadband wireless technologies, there remain a number of critical issues to be resolved. One of the major concerns is the implementation of compact antennas that can operate in a wide frequency band. Spiral antenna has been used extensively for broadband applications due to its planar structure, wide bandwidth characteristics and circular polarisation. However, the practical implementation of spiral antennas is challenged by its high input characteristic impedance, relatively low gain and the need for balanced feeding structures. Further development of wideband balanced feeding structures for spiral antennas with matching impedance capabilities remain a need. This thesis proposes three wideband feeding systems for spiral antennas which are compatible with wideband array antenna geometries. First, a novel tapered geometry is proposed for a symmetric coplanar waveguide (CPW) to coplanar strip line (CPS) wideband balun. This balun can achieve the unbalanced to balanced transformation while matching the high input impedance of the antenna to a reference impedance of 50 . The discontinuity between CPW and CPS is accommodated by using a radial stub and bond wires. The bandwidth of the balun is improved by appropriately tapering the CPW line instead of using a stepped impedance transformer. Next, the tapered design is applied to an asymmetric CPW to propose a novel asymmetric CPW to CPS wideband balun. The use of asymmetric CPW does away with the discontinuities between CPW and CPS without having to use a radial stub or bond wires. Finally, a tapered microstrip line to parallel striplines balun is proposed. The balun consists of two sections. One section is the parallel striplines which are connected to the antenna, with the impedance of balanced line equal to the antenna input impedance. The other section consists of a microstrip line where the width of the ground plane is gradually reduced to eventually resemble a parallel stripline. The taper accomplishes the mode and impedance transformation. This balun has significantly improved bandwidth characteristics. Characteristics of proposed feeding structures are measured in a back-to-back configuration and compared to simulated results. The simulated and measured results show the tapered microstrip to parallel striplines balun to have more than three octaves of bandwidth. The tapered microstrip line to parallel striplines balun is integrated with a single Archimedean spiral antenna and with an array of spiral antennas. The performance of the integrated structures is simulated with the aid of electromagnetic simulation software, and results are compared to measurements. The back-to-back microstrip to parallel strip balun has a return loss of better than 10 dB over a wide bandwidth from 1.75 to 15 GHz. The performance of the microstrip to parallel strip balun was validated with the spiral antennas. The results show the balun to be an effective mean of feeding network with a low profile and wide bandwidth (2.5 to 15 GHz) for balanced spiral antennas.
Resumo:
This thesis provides a query model suitable for context sensitive access to a wide range of distributed linked datasets which are available to scientists using the Internet. The model is designed based on scientific research standards which require scientists to provide replicable methods in their publications. Although there are query models available that provide limited replicability, they do not contextualise the process whereby different scientists select dataset locations based on their trust and physical location. In different contexts, scientists need to perform different data cleaning actions, independent of the overall query, and the model was designed to accommodate this function. The query model was implemented as a prototype web application and its features were verified through its use as the engine behind a major scientific data access site, Bio2RDF.org. The prototype showed that it was possible to have context sensitive behaviour for each of the three mirrors of Bio2RDF.org using a single set of configuration settings. The prototype provided executable query provenance that could be attached to scientific publications to fulfil replicability requirements. The model was designed to make it simple to independently interpret and execute the query provenance documents using context specific profiles, without modifying the original provenance documents. Experiments using the prototype as the data access tool in workflow management systems confirmed that the design of the model made it possible to replicate results in different contexts with minimal additions, and no deletions, to query provenance documents.
Resumo:
The rapid growth in the number of users using social networks and the information that a social network requires about their users make the traditional matching systems insufficiently adept at matching users within social networks. This paper introduces the use of clustering to form communities of users and, then, uses these communities to generate matches. Forming communities within a social network helps to reduce the number of users that the matching system needs to consider, and helps to overcome other problems from which social networks suffer, such as the absence of user activities' information about a new user. The proposed system has been evaluated on a dataset obtained from an online dating website. Empirical analysis shows that accuracy of the matching process is increased using the community information.
Resumo:
Recent increases in cycling have led to many media articles highlighting concerns about interactions between cyclists and pedestrians on footpaths and off-road paths. Under the Australian Road Rules, adults are not allowed to ride on footpaths unless accompanying a child 12 years of age or younger. However, this rule does not apply in Queensland. This paper reviews international studies that examine the safety of footpath cycling for both cyclists and pedestrians, and relevant Australian crash and injury data. The results of a survey of more than 2,500 Queensland adult cyclists are presented in terms of the frequency of footpath cycling, the characteristics of those cyclists and the characteristics of self-reported footpath crashes. A third of the respondents reported riding on the footpath and, of those, about two-thirds did so reluctantly. Riding on the footpath was more common for utilitarian trips and for new riders, although the average distance ridden on footpaths was greater for experienced riders. About 5% of distance ridden and a similar percentage of self-reported crashes occurred on footpaths. These data are discussed in terms of the Safe Systems principle of separating road users with vastly different levels of kinetic energy. The paper concludes that footpaths are important facilities for both inexperienced and experienced riders and for utilitarian riding, especially in locations riders consider do not provide a safe system for cycling.
Resumo:
Serving as a powerful tool for extracting localized variations in non-stationary signals, applications of wavelet transforms (WTs) in traffic engineering have been introduced; however, lacking in some important theoretical fundamentals. In particular, there is little guidance provided on selecting an appropriate WT across potential transport applications. This research described in this paper contributes uniquely to the literature by first describing a numerical experiment to demonstrate the shortcomings of commonly-used data processing techniques in traffic engineering (i.e., averaging, moving averaging, second-order difference, oblique cumulative curve, and short-time Fourier transform). It then mathematically describes WT’s ability to detect singularities in traffic data. Next, selecting a suitable WT for a particular research topic in traffic engineering is discussed in detail by objectively and quantitatively comparing candidate wavelets’ performances using a numerical experiment. Finally, based on several case studies using both loop detector data and vehicle trajectories, it is shown that selecting a suitable wavelet largely depends on the specific research topic, and that the Mexican hat wavelet generally gives a satisfactory performance in detecting singularities in traffic and vehicular data.
Resumo:
The encryption method is a well established technology for protecting sensitive data. However, once encrypted, the data can no longer be easily queried. The performance of the database depends on how to encrypt the sensitive data. In this paper we review the conventional encryption method which can be partially queried and propose the encryption method for numerical data which can be effectively queried. The proposed system includes the design of the service scenario, and metadata.
Resumo:
The National Road Safety Strategy 2011-2020 outlines plans to reduce the burden of road trauma via improvements and interventions relating to safe roads, safe speeds, safe vehicles, and safe people. It also highlights that a key aspect in achieving these goals is the availability of comprehensive data on the issue. The use of data is essential so that more in-depth epidemiologic studies of risk can be conducted as well as to allow effective evaluation of road safety interventions and programs. Before utilising data to evaluate the efficacy of prevention programs it is important for a systematic evaluation of the quality of underlying data sources to be undertaken to ensure any trends which are identified reflect true estimates rather than spurious data effects. However, there has been little scientific work specifically focused on establishing core data quality characteristics pertinent to the road safety field and limited work undertaken to develop methods for evaluating data sources according to these core characteristics. There are a variety of data sources in which traffic-related incidents and resulting injuries are recorded, which are collected for a variety of defined purposes. These include police reports, transport safety databases, emergency department data, hospital morbidity data and mortality data to name a few. However, as these data are collected for specific purposes, each of these data sources suffers from some limitations when seeking to gain a complete picture of the problem. Limitations of current data sources include: delays in data being available, lack of accurate and/or specific location information, and an underreporting of crashes involving particular road user groups such as cyclists. This paper proposes core data quality characteristics that could be used to systematically assess road crash data sources to provide a standardised approach for evaluating data quality in the road safety field. The potential for data linkage to qualitatively and quantitatively improve the quality and comprehensiveness of road crash data is also discussed.
Resumo:
-International recognition of need for public health response to child maltreatment -Need for early intervention at health system level -Important role of health professionals in identifying, reporting, documenting suspician of maltreatment -Up to 10% of all children presenting at ED’s are victims and without identification, 35% reinjured and 5% die -In Qld, mandatory reporting requirement for doctors and nurses for suspected abuse or neglect